A guide to using sentiment analysis on employee text

SentimentAnalysis2.png

One of the most frequent questions that we are asked is whether Workometry does sentiment analysis. The answer is ‘yes, sort of’.

Thinking about measurement

All measurement has some form of measurement error. What measurement you choose should always depend on a balance between the cost (including effort) of the measurement and the benefit (including desired accuracy).

Think of weighing something. You might have access to two different types of scale:

  • An electronic kitchen scale

  • A laboratory scale

If your aim is to weigh some butter to bake a cake then you’d likely reach for the former. However, if you are weighing compounds for a medicine you’d likely go straight to the latter.

The question of which tool to use therefore depends on several factors including the level of accuracy you need and what you have available. It’s also worth determining whether one measurement is acceptable or whether you want to take multiple and, for example, take an average.

Sentiment analysis

Sentiment analysis are a set of techniques that are used to infer an individual’s sentiment from the text that they’ve written. It’s a classification model where classification is usually positive, negative or neutral. At the simplest it uses a dictionary of positive words and negative words. More sophisticated methods use various machine learning techniques to rate or classify text.

As with any form of inference, there is significant inaccuracy in doing this. The inaccuracy will depend on the techniques and the types of text provided. Short text is harder to classify than longer text, if the text is likely to have sarcasm the chance is that the inferance will be less accurate.

There is some dispute on exactly how accurate the best sentiment analysis techniques are. As all sentiment analysis algorithms will have been trained on some labelled data set then using the algorithm on a different type of data will likely mean lower accuracy. There have been a number of studies on accuracy and the range is mostly in the 65% - 80% level. 

Whether this will be good enough for you will depend on your use case. If you want to measure a trend over time then probably the ones misclassified as positive will mosly be balanced out with those misclassified as negative. If you want to accurately filter the the positive sentences from the the negative sentences then it might not be good enough.

Is an inaccurate approach the best you have?

If you’re trying to identify sentiment from found data - for example social media text - then using sentiment analysis might be the best you have. 

If, however you are going to ask someone for their opinion you might be able to use a more accurate method. In many surveys this more accurate method is probably likely to be a well-chosen scale question, such as a likert question of something like a 0–10 scale recommendation question.

Sentiment analysis is there most useful when there is no other way of assessing the feelings of the individual. If you can ask a scale question, do so.

Extensions

At what level do you want to do the sentiment analysis?

There are at least three levels to which you can do sentiment analysis. All have their advantages and disadvantages.

You can do sentiment analysis at the document level. This is usually some form of average of the overall sentiment found within the document. In an employee survey this would be at the level of the answer.

Alternatively you can break the answer down to sentences and look at sentiment of each seperately. Finally you could break the sentences down to clauses or entities and calculate sentiment for each.

Our experience is that as you go down ot a lower level the usefulness increases, but unfortunately so does the inaccuracy. This is probably due to context being removed. As always in text analysis context is highly valuable.

When you can do both

The astute might be asking themselves ‘What if I can both ask a scale question and then use sentiment analysis on the text?’

This is similar to how we prefer to use sentiment analysis - to help understand additional meaning in texts. We will often use sentiment analysis to flag potentially negative comments in a question which is supposed to draw a positive response (or vice-versa). Our approach is to pass these statements for human review but you could use other ensemble-type methods such as voting.

If you don’t have a scale question it still might make sense to use multiple sentiment analysis approaches to review the same text, using an ensemble approach to increase overall accuracy.

Human-level accuracy (inter-annotator agreement)

65 - 80% might not seem very accurate for an algorithm but it’s important to understand what you’re comparing it against. Research has shown that assessing sentiment from a text, especially something short like a tweet, is a task that humans find difficult. In Saif et al. a figure of 65.5% inter-annotator agreement is quoted when using Mechanical Turk for classification, or about the same as algorithmic sentiment analysis.

The advantage of using a system to classify sentiment is likely to be increased consistency of ratings.

Direct vs comparitive sentiment

Consider the following two sentences:

“The time management training was a waste of time”

Obviously this sentence would (hopefully) be flagged as having a negative sentiment.

“Today’s time management training was better than the one last week”

Here we know that the training last week was comparatively worse. However it is harder to determine whether today’s was positive, neutral or even negative. All we know is that it is comparatively better than the previous one.

Explicit vs implicit

We see this frequently in employee texts.

The easiest to understand algorithmically is an explicit statement:

“I really like that our values are used in decision making.”

This, most algorithms would rate postively.

However an implicit statement might be much harder for the algorithm to classify:

“When I started at BIGCORP we were great at using our values in decision making”.

Here it’s obvious that the individual misses how values drove decisions in earlier times. However an algorithm might identify that ‘great’ was being used with values and suggest that the sentiment was good.

Emotion analysis

Emotion analysis is very similar to sentiment analysis but is more granular. Hence it should be assumed to be less accurate still to sentiment analysis.

Do you actually need to measure sentiment?

I think that many people who use sentiment analysis do so not because they need it but because it’s available (and often low cost). 

If you’re selecting a text analysis approach you should ask yourself whether it is something that you really need. A good way of assessing this is whether you’d have allocated resources for a human to review the texts for sentiment. If the answer is ‘no’ the chances are you don’t need an algorithm to do it. Sentiment analysis is useful but in most use cases we see with workforce data not of great importance.