Edelman v. Sichuan Garden & Yelp Reviews!

Our own Alice Kaanta, an analytics engineer at Luminoso, provides an interesting take on the high profile social media eruption of Edelman v. Sichuan Garden.


In recent news, Ben Edelman, an HBS professor, made a fully-loaded verbal assault against Woburn-based restaurant Sichuan Garden over a $4.00 discrepancy in his check.

You might have observed the blowback across social media as this story went viral, largely in defense of the Sichuan Garden, though defenders of Edelman’s argument exist. Since no one has time to read all of the commentary strewn across social media, we decided to analyze some of the surrounding feedback using Luminoso.

People have taken to Yelp to defend/decry Sichuan Garden, so we’ve loaded 204 of Sichuan Garden’s reviews into our solution.

Sichuan Garden - All reviews (1)

All Reviews
Color Representation: Positive Sentiment Negative Sentiment Sichuan Garden Edelman, Harvard Food Service)


It seems that most of the reviews about Sichuan Garden are about the food and the authenticity of the restaurant. However, “Edelman”, “Ben Edelman”, and “Harvard” are a definite presence.

Sichuan Garden - Five Star Edelman (1)

All Five-Star Reviews (Color Representation: Very related to “Edelman” Moderately related to “Edelman” Slightly related to “Edelman” Not related to “Edelman”)


We found that people who discussed “Edelman” also gave the restaurant five stars. What they had to say about Professor Edelman was not flattering, as the top term related to “Edelman” was “bully”.

Sichuan Garden - Recommended Reviews (1)

Recommended Reviews
(Color Representation: Positive Sentiment Negative Sentiment Sichuan Garden Edelman, Harvard Food Service)

Sichuan Garden - Not Recommended (1)

Not Recommended Reviews
(Color Representation: Positive Sentiment Negative Sentiment Sichuan Garden Edelman, Harvard Food Service)


If you’re familiar with Yelp, you might have noticed that some reviews are quietly hidden, having been sent into the “not recommended” reviews section. According to Yelp’s FAQ, this might be because, “… the review might have been posted by a less established user, or it may seem like an unhelpful rant or rave.”  “Not recommended” reviews do not contribute to a restaurant’s star rating, nor are the reviews easily viewable.

We found that all of the reviews mentioning “Edelman” appear to have been marked “not recommended”, which suggests that Yelp understandably does not wish to host a flame war.  It also suggests that well-meaning Sichuan Garden supporters aren’t actually contributing to the restaurant’s star rating or visible positive reviews…

Sichuan Garden - Graph (1)

…that is, unless they are avoiding the Edelman commentary entirely, and sticking to talking about the food.

Concept-Based vs. Keyword-Based Text Analytics; What’s the difference and why does it matter?

A parting gift in the form of a blog post, by our MBA intern, Saman Djabbari.

The average person is thinking one of two things after reading the title of this post. “What in the world is text analytics?” or “You have my attention, what’s the difference?”

Leading industry analyst, Seth Grimes, who in fact has posted on this site before, defines text analytics as “software and transformational processes that uncover business value in ‘unstructured’ text”. Make sense? Not quite? Well, Grimes continues the definition to state that “text analytics applies statistical, linguistic, machine learning and data analysis and visualization techniques to identify and extract salient information and insights. The goal is to inform decision making and support business optimization.” To simplify that, text analytics is here to help you run your business more effectively by ideally saving you time to uncover insights and ideas from your data you might never have been aware.

Now that you get the basics, let’s highlight two of the existing methodologies for text analytics: keyword-based and concept-based. But, before I go into the details of the two, I’m going to provide you with an analogy to help make understanding these methodologies as clear as possible.

Think of some common decisions people have to make in everyday life – let’s think about trying to decide whether to make dinner or order in. By choosing to make dinner you have to purchase ingredients and follow a recipe, which can take a good portion of time. By choosing to order in you typically tell someone over the phone what you want, or you place an order via an app or website. And, are you picking up or do they deliver? For one meal, the cost of each option can often times be comparable, which is why this is a common decision for people, but the true differentiator and pain alleviator is in time and convenience.

Now let’s go into the text analytics methodologies.

In keyword-based text analytics you need to tell the software exactly what to look for. A good example of this is utilizing Boolean logic, which involves typing out a string of words for the software you’re using to detect. Have you ever used Boolean logic? It’s terrible. You have to type in each and every word, permutation of misspelling, any jargon you think might exist, and separating them with conjunctions. There isn’t a greater waste of time than typing in a separate iteration and synonym of a word over and over while separating them with “and’ and “or” repeatedly. Not only must you identify what it is you’re looking for, but also what you’re not looking for, again wasting valuable human hours that can instead be directed toward deriving the insights from the data. Let’s call this the “cooking” method.

Rest assured, there is an easier way. Concept-based text analytics allows you to upload the data to the solution and it will immediately begin to derive insights after a few minutes of processing the data. Sounds much easier, right? Using concept-based text analytics will save you valuable time, not having to tell a system what to search for, and rather let the solution discover new insights for you. Let’s call this the “ordering-in” method.

A perfect example is the work that the Health Media Collaboratory at the University of Illinois Chicago, who is one of our clients, performed using Luminoso’s concept-based text analytics. In order to analyze 140,000 tweets, they simply uploaded the data to the solution all without having to pre-program the solution. The most important themes were delivered to them – “ordering in” their insights if you will.

Stop telling your software what to do and stop wasting money. Go with a concept-based text analytics platform. You’ll save your business thousands of dollars, save time and pain of labor hours, and be more effective in gaining the insight you’ve always been looking for. Don’t get me wrong, I love to cook sometimes…but, you get the point.

Luminoso Software Updates

Another few weeks have passed, which means that our the development team has made parts of our solution better, and added some new features.

Check out our most recent software update here highlighting improvements on the following:

  • Improvement to upload of subsets
  • Dashboard CSV downloads per subset
  • Enhanced approach to numbers in documents

As always, if you ever have any questions, comments or feedback about your Luminoso experience, or want to know more about our solutions and software updates, please feel free to reach out to us at support@luminoso.com.

Luminoso’s Co-founder & CEO Interviewed on BBC’s Newsday

You might have seen an article on us in the Wall Street Journal recently that talked about our monitoring and analysis of the global social media conversation on the ebola virus.

Well, the BBC caught wind of our work and asked Catherine Havasi, a co-founder and CEO of Luminoso, to speak on its program, Newsday.

You can hear Catherine in her interview here!

Luminoso Software Updates

As you may know, the development team at Luminoso works tirelessly to make sure that our solutions are doing what they should without any hiccups. When we make improvements and fix things, we want you to be with us every step of the way.

Check out our most recent software update here highlighting improvements on the following:

  • Retrieving Documents in Bulk via API
  • Improved Collocation
  • Improved API Documentation

If you ever have any questions, comments or feedback about your Luminoso experience, or want to know more about our solutions and software updates, please feel free to reach out to us at support@luminoso.com.

Client Guest Blog Post: Text Analytics for Tobacco Control Research – University of Illinois at Chicago’s Experience with Luminoso

The following is a reposted blog by Hy Tran discussing Luminoso, which UIC used to analyze a collection of tweets from the Center for Disease Control and Prevention’s (CDC) Tips From Former Smokers (Tips) II campaign.

In 2013 the CDC launched the second round of Tips from Former Smokers – a national media campaign that used real-life stories from smokers suffering from health consequences of smoking. A hallmark of Tips is its use of very graphic images to elicit strong emotional responses.

Our challenge was to analyze 140,000 Tips-relevant tweets to see how Twitter users react to Tips messages. Specifically, what emotions do the ads make people feel? Do people accept or reject the messages? To help answer these research questions, we turned to the text analytics software Luminoso.

Luminoso offers several helpful features: it is cloud-based, so any computer with internet connection can access the analysis; it has friendly user interface so users need not be familiar with natural language processing (NLP) to use it; and it comes with a number of handy tools to visualize the complexity of the corpus. Below is the ConceptCloud, a main part of the user interface. Users can manipulate the cloud directly to extract information and create visualizations.

Concept Cloud

Figure 1 – Concept Clouds representing associations between concepts found in CDC Tips Tweets Corpus. Each concept is a word or phrase that has a certain meaning. Concepts that appear more often are displayed in bigger fonts. Concepts that are strongly correlated are located close to each other. Related concepts are displayed in various shades of the same color, e.g., the concepts in light green are about Terrie, the lady with hole in throat.

What really sets Luminoso apart is that it comes with a built-in network of “common sense” knowledge, which allows the machine to understand implicit connections in language that we humans take for granted. In Luminoso’s own words, this tool recognizes the difference between “I find your pants disturbing” and “I find your lack of pants disturbing.” In order to have such understanding, one needs to have the common sense knowledge that people are expected to wear pants. Of course for humans, this kind of knowledge is taken for granted and never spelled out in day-to-day conversations. However, computers do not have such knowledge; someone has to teach it to them first. Any attempt for a machine to analyze human language would be incomplete unless the computer possesses an equivalent set of such common sense knowledge.

Luminoso acquires common sense knowledge from ConceptNet, a semantic network built by MIT Media Lab. It then takes advantage of a technique known as AnalogySpace to combine “big picture” knowledge from ConceptNet with specific knowledge learned from the text data in hand. The technical details behind ConceptNet and AnalogySpace are beyond the scope of this blog post. Users with interest are encouraged to start with this paper, which provides a comprehensive introduction to these topics. For further reading, please refer to this publication list by Catherine Havasi, one of the active developers of ConceptNet.

Our challenge with Tips data is to extract sentiments from text and see which sentiments are strongly related to each other. Sentiments are difficult to analyze – they can be hard to define and understanding them requires a good deal of implicit knowledge. Luminoso’s ability to incorporate common sense knowledge made it attractive to our purpose.

Table 1

Table 1 – Correlation among some Sentiment Concepts found in Tips Corpus. Each row and column indicates a concept. The numbers are correlation coefficients between concepts.

For example, Luminoso understands that sentiments such as “depressing” and “sad” are strongly correlated, whereas “sad” and “funny” are not correlated. While this seems obvious to a human reader, it is no simple task to teach such ideas to a machine. This kind of understanding, though simplified, can serve as a good starting point for further development in sentiment analysis.


One of our main goals is to understand how people react to the graphic depictions of the consequences of smoking shown in Tips ads. The table below provides correlations between specific smoking-related diseases and sentiment reaction.

Table 2

Table 2 – Correlation between smoking consequences (found by Search Rule) and sentiments (found by Luminoso).

Here we can see a strong positive correlation between ads showing patients with tobacco-related cancer and sadness. This indicates that talking about cancer elicits sad emotions in the audience, which may be a sign that people are engaged with this message. Likewise, it seems that a stoma can freak and creep people out, suggesting a fear response to effects of disease as opposed to a humor response. COPD and stroke have weak correlation with sentiments, meaning ads featuring these conditions provoke a less emotional response.

We also have concepts such as “laugh” or “funny.” Positive correlation with these concepts can indicate that audiences are rejecting the message. In general, correlations with ads are weak for these concepts. The moderately negative correlations between “lol”, “funny” and stoma ads suggest that the seriousness of the disease caused by smoking is relatively well-accepted among audiences.

Another useful feature of Luminoso is the ability to correlate a topic with users’ metadata. Below is an example with topics related to Terrie – one of the former smokers featured in Tips whose story was the most prominent topic of discussion in our corpus.

Figure 2 – Correlations between Topics about Terrie (vertical axis) and Number of Followers of the tweet authors (horizontal axis).

Figure 2 – Correlations between Topics about Terrie (vertical axis) and Number of Followers of the tweet authors (horizontal axis).

The trend indicates that people with lower numbers of followers are more likely to talk about Terrie ads than those with many followers. Since people with fewer followers tend to be organic as opposed to commercial users, this indicates that Terrie ads managed to engage organic audiences.

We can also see if people talk about quitting smoking after viewing the ads.

Figure 3

Figure 3 – Correlations between Topics about Not Smoking (vertical axis) and Number of Followers of the tweet authors (horizontal axis).

“Not Start” refers to users saying don’t start smoking. “Stop Smoking” refers to people telling others to stop smoking. “Quit Smoking” refers to discussion about quitting smoking. Note that there’s a very large correlation of Quit Smoking talk for users with large followers. However, these are mainly institutional users such as state public health agencies giving advice about how to quit. It appears that people with fewer followers (likely to be organic users) are motivated to tell others to stop smoking after seeing these ads.

These insights, drawn from over 140,000 Tweets about the CDC Tips campaign, would not be possible using human coders alone. Thus Luminoso proves to be a promising tool for analyzing sentiment in social media data. These analyses reveal important differences in emotional responses to different kinds of ads, show how people talk about smoking after viewing the ads, and provide further support for running graphic anti-smoking media campaigns in the future.

This blog post was written by Research Assistant Hy Tran, a doctoral candidate in biostatistics.

Updates: Timelines, Subsets and Vectors

Here at Luminoso, we are committed to consistently improving the science of our system. Over the weekend, a few upgrades were made live in our Dashboard.

As an example to help show the what the new updates look like, we’ve taken a few screen shots from our Amazon Kindle Fire project.

Timeline Accuracy

Topic timelines are now based on all of the documents instead of using a sampling, making them more accurate than ever before.


The scaling of topic-subset correlations has also been improved, so users can now make direct comparisons to topic timelines, in addition to more accurate comparisons between subsets.

Previous Timelines and Subsets

Old Timeline

Old Subsets

New Timelines and Subsets

New Timeline New Subsets


We’ve increased the accuracy of our vectors (the fundamental portrayal on which our visualizations and statistics are based), resulting in small improvements in the accuracy of correlation values.