Natural language can be such an ass headache

It was exciting to see Luminoso’s new product for streaming text analytics, Compass, get an article in Wired. Skimming past the picture of Catherine and me looking ridiculous at SXSW long ago, there’s an image of our “concept cloud” visualizer looking at what people say on Twitter when they’re sick:

Luminoso's concept cloud, showing words, phrases, and emoji people use when they're feeling sick.

Wait a minute. Zoom in. Enhance.

The text "ass headache" appears in the word cloud, near "biggest headache" and "got the worst headache".

The article includes a screenshot that includes a natural-language glitch that’s already caused a lot of amusement around the office.

Here’s what’s going on. One important thing that Luminoso does is to identify relevant phrases that contain more information than the sum of their parts. When looking at text from people who are feeling sick, the phrase “throat hurts so bad” is much more informative than the words “throat”, “hurts”, “so”, and “bad” in isolation.

Usually, these informative phrases end up being reasonable phrases of natural language, or at least close enough (“headache is killing” is missing the object, but we all get the idea).

One case where this missed slightly is the phrase “ass headache”. This is not an affliction that people would usually complain of. And yet it looks entirely reasonable to the computer, given the source data, which contains many phrases such as:

  • “I got this crazy ass headache”
  • “I have a biggg ass headache”
  • “I gotta mean ass headache bruh”

Statistically, it looks like an “ass headache” is a thing you can have. You can have a crazy one, or a mean one, or simply a biggg one, but lots of people have one.

Because we’re actual speakers of the language, as opposed to computers stumbling through it to the best of their ability, we know how these phrases should really be interpreted. We understand that the word “ass”, for whatever reason, can be a modifier for the adjective before it. (That doesn’t stop us from humorously reinterpreting it as a modifier for the noun after it, as an early XKCD comic encourages us to, which is essentially what Luminoso’s analytics did!)

XKCD #37, by Randall Munroe.
XKCD #37, by Randall Munroe.

Phrases that come up in our everyday conversation can contain surprising grammatical quirks. And that’s why natural language is such an ass headache.

2 thoughts on “Natural language can be such an ass headache

  1. Love it! Just another reason why I love working in the quirky, murky, world of qual and text. Reminds me of my older teen son, who whenever is not feeling feel, says he “feels like ass.” 🙂

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s