Client Guest Blog Post: Text Analytics for Tobacco Control Research – University of Illinois at Chicago’s Experience with Luminoso

The following is a reposted blog by Hy Tran discussing Luminoso, which UIC used to analyze a collection of tweets from the Center for Disease Control and Prevention’s (CDC) Tips From Former Smokers (Tips) II campaign.

In 2013 the CDC launched the second round of Tips from Former Smokers – a national media campaign that used real-life stories from smokers suffering from health consequences of smoking. A hallmark of Tips is its use of very graphic images to elicit strong emotional responses.

Our challenge was to analyze 140,000 Tips-relevant tweets to see how Twitter users react to Tips messages. Specifically, what emotions do the ads make people feel? Do people accept or reject the messages? To help answer these research questions, we turned to the text analytics software Luminoso.

Luminoso offers several helpful features: it is cloud-based, so any computer with internet connection can access the analysis; it has friendly user interface so users need not be familiar with natural language processing (NLP) to use it; and it comes with a number of handy tools to visualize the complexity of the corpus. Below is the ConceptCloud, a main part of the user interface. Users can manipulate the cloud directly to extract information and create visualizations.

Concept Cloud

Figure 1 – Concept Clouds representing associations between concepts found in CDC Tips Tweets Corpus. Each concept is a word or phrase that has a certain meaning. Concepts that appear more often are displayed in bigger fonts. Concepts that are strongly correlated are located close to each other. Related concepts are displayed in various shades of the same color, e.g., the concepts in light green are about Terrie, the lady with hole in throat.

What really sets Luminoso apart is that it comes with a built-in network of “common sense” knowledge, which allows the machine to understand implicit connections in language that we humans take for granted. In Luminoso’s own words, this tool recognizes the difference between “I find your pants disturbing” and “I find your lack of pants disturbing.” In order to have such understanding, one needs to have the common sense knowledge that people are expected to wear pants. Of course for humans, this kind of knowledge is taken for granted and never spelled out in day-to-day conversations. However, computers do not have such knowledge; someone has to teach it to them first. Any attempt for a machine to analyze human language would be incomplete unless the computer possesses an equivalent set of such common sense knowledge.

Luminoso acquires common sense knowledge from ConceptNet, a semantic network built by MIT Media Lab. It then takes advantage of a technique known as AnalogySpace to combine “big picture” knowledge from ConceptNet with specific knowledge learned from the text data in hand. The technical details behind ConceptNet and AnalogySpace are beyond the scope of this blog post. Users with interest are encouraged to start with this paper, which provides a comprehensive introduction to these topics. For further reading, please refer to this publication list by Catherine Havasi, one of the active developers of ConceptNet.

Our challenge with Tips data is to extract sentiments from text and see which sentiments are strongly related to each other. Sentiments are difficult to analyze – they can be hard to define and understanding them requires a good deal of implicit knowledge. Luminoso’s ability to incorporate common sense knowledge made it attractive to our purpose.

Table 1

Table 1 – Correlation among some Sentiment Concepts found in Tips Corpus. Each row and column indicates a concept. The numbers are correlation coefficients between concepts.

For example, Luminoso understands that sentiments such as “depressing” and “sad” are strongly correlated, whereas “sad” and “funny” are not correlated. While this seems obvious to a human reader, it is no simple task to teach such ideas to a machine. This kind of understanding, though simplified, can serve as a good starting point for further development in sentiment analysis.


One of our main goals is to understand how people react to the graphic depictions of the consequences of smoking shown in Tips ads. The table below provides correlations between specific smoking-related diseases and sentiment reaction.

Table 2

Table 2 – Correlation between smoking consequences (found by Search Rule) and sentiments (found by Luminoso).

Here we can see a strong positive correlation between ads showing patients with tobacco-related cancer and sadness. This indicates that talking about cancer elicits sad emotions in the audience, which may be a sign that people are engaged with this message. Likewise, it seems that a stoma can freak and creep people out, suggesting a fear response to effects of disease as opposed to a humor response. COPD and stroke have weak correlation with sentiments, meaning ads featuring these conditions provoke a less emotional response.

We also have concepts such as “laugh” or “funny.” Positive correlation with these concepts can indicate that audiences are rejecting the message. In general, correlations with ads are weak for these concepts. The moderately negative correlations between “lol”, “funny” and stoma ads suggest that the seriousness of the disease caused by smoking is relatively well-accepted among audiences.

Another useful feature of Luminoso is the ability to correlate a topic with users’ metadata. Below is an example with topics related to Terrie – one of the former smokers featured in Tips whose story was the most prominent topic of discussion in our corpus.

Figure 2 – Correlations between Topics about Terrie (vertical axis) and Number of Followers of the tweet authors (horizontal axis).

Figure 2 – Correlations between Topics about Terrie (vertical axis) and Number of Followers of the tweet authors (horizontal axis).

The trend indicates that people with lower numbers of followers are more likely to talk about Terrie ads than those with many followers. Since people with fewer followers tend to be organic as opposed to commercial users, this indicates that Terrie ads managed to engage organic audiences.

We can also see if people talk about quitting smoking after viewing the ads.

Figure 3

Figure 3 – Correlations between Topics about Not Smoking (vertical axis) and Number of Followers of the tweet authors (horizontal axis).

“Not Start” refers to users saying don’t start smoking. “Stop Smoking” refers to people telling others to stop smoking. “Quit Smoking” refers to discussion about quitting smoking. Note that there’s a very large correlation of Quit Smoking talk for users with large followers. However, these are mainly institutional users such as state public health agencies giving advice about how to quit. It appears that people with fewer followers (likely to be organic users) are motivated to tell others to stop smoking after seeing these ads.

These insights, drawn from over 140,000 Tweets about the CDC Tips campaign, would not be possible using human coders alone. Thus Luminoso proves to be a promising tool for analyzing sentiment in social media data. These analyses reveal important differences in emotional responses to different kinds of ads, show how people talk about smoking after viewing the ads, and provide further support for running graphic anti-smoking media campaigns in the future.

This blog post was written by Research Assistant Hy Tran, a doctoral candidate in biostatistics.

One thought on “Client Guest Blog Post: Text Analytics for Tobacco Control Research – University of Illinois at Chicago’s Experience with Luminoso

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s