Analysing Glassdoor Reviews of the major tech firms

We started to focus on analysing employee text feedback 5 years ago when we realised that executives were far more likely to act of text feedback than they were to act on the analyses of other survey questions. However interpreting large amounts of text feedback is difficult, both in terms of the time required and spotting key patterns whilst reducing human biases (eg remembering just powerfully written statements or those reflecting the reader’s world-view).

To try and be as efficient and objective as possible we need to apply analysis of the text. Historically this was done manually. It takes about 25 hours of an analysts’ time to review 1,000 surveys answers.

In the dataset we’ve used there are about 135,000 answers in the two questions we’ve analysed combined. Obviously this would take a prohibitively long time to analyse manually. We therefore apply computers and algorithms.

Our background is building inductive models for clients based on their data and the exact questions that they’ve asked. In this instance however we’ve applied our new ‘Workometry Standard’ service to code the data in an automated manner, then apply a set of automated analyses on the results. It’s a fair statement that the greatest marginal time requirement here was writing this article based on the results.

The purpose of conducting this analysis was therefore to show what is possible to be inferred by a relatively quick, automated analysis of feedback. The Glassdoor answers are remarkably similar in terms of length and content to those we’d expect to see from a typical employee survey.

I hope that this analysis and write-up gives you some inspiration on the sort of insight you can infer from your employee feedback. It’s certainly not an all-encompassing analysis nor have we shown all we can extract from this data.

The dataset

The dataset is relatively well known. It comprises about 67,000 reviews from Amazon, Apple, Facebook, Google, Microsoft and Netflix. We downloaded it from Kaggle. Other people have conducted exploratory analysis of this data so please have a look at these.

To consider the size of the dataset there are over 1.8 million words across the two questions. The average answer length is about 24 words with the longest answer being just under 1,000 words long.

It’s worth noting that this dataset is now relatively old so the theme identified here relate to the information in the dataset, not necessarily the current reality of working for these firms. Glassdoor has seen an accelerating number of reviews over the last few years. The purpose of this article is not to critique working at these firms but instead show how feedback can be analysed and insight created from a relatively simple set of feedback questions.

The dataset is relatively clean. It only needed a small amount of data-cleansing to process and analyse it - eg changing ‘none’ for NA. Many of the reviews had a location in text so we geocoded this data via the Google Maps API which also cleaned and standardised these labels. This enables us to map the data but also analyse comments and topics across geographies if we wished.

Locations of Glassdoor reviewers. Some locations only state country so a central point is provided.

Workometry Standard

Workometry Standard is our new text classification model which fits the two questions most used in employee engagement surveys: What is the best thing about working for this company What could we do to improve working for this company?

The model we use is ‘symetrical’ - that is we use the same categories for both the positive and negative questions which provides additional post-classification opportunities. The model currently has 125 different categories. It is not a hierarchical model, as we believe grouping topics depends on a number of factors such as objective and even organizational structure. Instead we typically use a bottoms-up clustering approach to provide a hierarchy based on topic usage.

Workometry Standard is suitable for datasets from 500 responses and typically has a misclassification rate of less than 1%.

Text fields

There are 5 fields with text data:

Summary
Pro
Cons
Advice to management
Job Title

For this analysis I chose to use the ‘Pro’ and ‘Cons’ fields as the data in them maps closely with our new ‘Standard’ employee survey models.

Our strategy for analysing this, and other similar datasets is relatively simple:

Clean the data (some spelling correction, translation where necessary, splitting into sentences etc.)
Run the prediction models to identify themes in the data. We use a set of binary classification models, one for each theme (“is ‘Training’”, "isn’t ‘Training’). This is based on an approach which looks at the semantic meaning of the sentence
Use the identified themes as a features in a set of other analyses and machine-learning models to reveal patterns between themes in the data.

How the dataset is classified

The first output from our approach is a simple frequency of topics. Given we’re looking at two questions and have different training data for both positive and negative questions to handle the different manners people use describe the same topics in these types of questions we look at the occurrence of themes at a question level

For the positive question we see the three top themes are:

Colleague Quality
Benefit Package
Salary

For the ‘cons’ question the top themes are:

Managers
Hours or shifts
Career development.

Overall there is nothing remarkable in either chart. Managers is always in our experience the number one negative theme. If we compare with another sector - this is an analysis of a synthetic dataset from numerous Pharma industry clients’ ‘What could we improve’ type questions - we see many of the similar themes appearing, though sometimes at different proportions or frequencies.

Top themes for a pharma synthetic dataset

This dataset would enable us to compare themes across the different firms - we’ll look at those analyses soon but in a different format.

Comparing themes across groups

One way of visualising the comparisons between groups is to use a Slopegraph. Here we use the information on whether someone was a current or former employee.

For the positive question we see that Salary, Colleague Quality and Workplace and Location are all more likely to be mentioned as positives by former employees. Career Development, Culture and Work-Life balance are all likely to be mentioned less.

The salary part might conflict with the sort of information that you might collect during exit interviews but as I’ll explain below Salary is one of the components that are typically mentioned as a positive by those less engaged.

Differences in mentions between current and former employees - Pro question

On the negative side we see former employees are more likely to mention managers as one of the ‘cons’ than current employees. This is typical of all datasets that we see and maybe supports the view that people join organizations and leave managers. Let’s be clear - at this level and with this data we can’t assign causation without looking at relevant comments.

On the flip side we see that salary and work-life balance are less often mentioned by former employees than current employees. At the same time ‘Hours and Shifts’ is more often mentioned. Why could there be such a difference between hours and work-life balance?

Differences in mentions between current and former employees - Cons question

The key aspect to consider is not all these firms employee the same type of employees. Amazon has large warehouses and distribution functions. Apple has an extensive retail operation. In both of these types of operations employees typically have less control of their working hours and both will types of operations will typically require employees to work anti-social hours.

Work-life balance as a concept is much more likely to be aligned to roles where employees have, or expect, control of their working day. In this way complaining about work-life balance can be seen much more as a feeling that the reviewer isn’t able to determine the right balance for them.

It is worth noting though that work-life balance is less likely to be a positive by former employees, but at the same time they’re less likely to say it’s a negative.

Who is saying what?

The following visualisation is one of the two most-anticipated visualisations by our clients when we present their data. (The other being a similar plot but showing topics by change in engagement at the individual level since their last survey)

On the x axis we show the balance between the proportion of mentions on a positive question and the proportion of mentions on the negative question. A bubble to the right indicates that theme is proportionally mentioned more in the Pro question and to the left it would show it’s mentioned more on the negative question.

On the y axis we show the average (mean) overall rating by those individuals mentioning that theme. The horizontal orange line is the average rating for the dataset as a whole.

Plot of themes by question and sentiment

We can see some patterns in the 4 quadrants

In the top right quadrant we can see themes that those providing a higher overall rating than the average are commenting on. As well as ‘Great Company’ we see themes such as ‘Empowerment’, ‘Making an impact’, ‘Innovation’ and ‘Colleagues that help you’.

Moving to the bottom right quadrant we look at themes mentioned as positives by those providing lower overall ratings. This group focuses on transactional items such as ‘Staff discounts’, ‘Health Insurance’ and ‘Stock Options’.

In the top left we see themes mentioned as Cons by those more positive than the average. The highest ranked theme is ‘Nothing’ which is when someone says ‘there are no cons working here’. Like most datasets we see themes that can be seen as issues getting in the way of someone performing at their best. Examples include ‘Complexity and Simplification’, ‘Bureaucracy’, ‘Change Management’ and ‘Big Company’

In the bottom left corner we see a range of topics mentioned as cons by those providing lower than average ratings. The bottom theme is ‘Bullying’ but this group also talk about ‘Politics’, ‘Micromanagement’, ‘Performance Management’ and ‘Job Security’.

This type of analysis enables you to review topics in context of the sentiment of who is mentioning them.

Co-occurrence and clustering

Those who have seen previous analyses that we’ve conducted will recognise the following types of plots.

People do not mention themes independent of each other. Therefore we like to understand how themes are used together in the same answers. We do this not by the absolute number of mentions but via the likelihood that there is a strong relationship between the two themes. The thickness of the edges therefore reflects this probability not the actual frequency that they are linked. For analysis and understanding purposes we filter edges based on this probability and drop themes with no strong relationships.

After creating the network we use community detection to identify groups of themes that are linked through shared usage. One way of thinking about this is that we’re creating needs, or topic-based segments of the population. It’s also a nice way of doing dimensionality reduction as our standard 125 themes is too many to understand and remember.

This is a bottoms-up method of creating a hierarchy based on usage. Another way of developing a hierarchy would be to use top down approach based on domain knowledge. There are advantages and disadvantages of both, and which you choose to do will depend on what you’re trying to achieve.

For the Pros concepts you can see we create 10 different groups, with several groups being prominent. In the large dark-pink group we can see a collection of components of the broader benefits package. In addition here we see two interesting links. We see that staff discounts are closely linked to Retail, which in this dataset suggests they’re likely to be relating to those working at Apple. We also see that ‘Good Experience’ is often mentioned alongside ‘Training or Enablement’.

Co-occurrence network of themes in the Pros question

In the large green element we see elements relating the the firm and its beliefs, with Leadership at the heart. We see, through the connection with the peach-coloured segment that the link between leadership and managers is through feedback and communication.

Co-occurrence network of themes in the Cons question

On the negative side we can see a few notable patterns. Diversity and Inclusion isn’t being linked to things that relate to Human Resources but instead to Politics. It’s in the same group as Innovation, Decision-Making and Making an Impact.

We also see that Performance Management is not only linked to Feedback & Bonuses, which I suspect we’d expect but also to collaboration. Without digging into these comments I would expect such a link to happen if a performance management system had been designed that created that encouraged competition between employees, and by implication created incentives that discouraged competition. We know that some of these firms during this time period used a forced-ranking type approach well known for these issues.

Who is most likely to be in which group?

When we have built each segment the obvious next question is to want to identify who is most likely to be in each group.

One the positive network we can see a group comprising “Brand and Reputation”, “IT or Technologies”, “Products and Services” and “Marketing”. Here we can see that the following groups were more likely to be mentioning this theme.

Which groups are most likely to mention cluster 7: Showing how many more times than expected does each group mention the theme. Filtered to only show significant groups.

Which topics are most likely to be used by certain groups

Instead of starting with a theme, or cluster of themes, and mapping the other variables against that to find who is most likely to be mentioning a topic, we can start with a group and map which topics this group is more likely to be mentioning.

When the number of groups is 2 the slopegraph above is probably a good choice. When there are more than 2 classifications we tend to use a heatmap to highlight which themes are most and least likely to be mentioned by each group.

These heatmaps at the moment show the most frequently mentioned themes in the dataset. We use a model similar to the one we use to create the co-occurence networks above to identify the chance that a group and a theme are closely linked. In the heatmap a blue indicates that there is a strong link - that the theme is mentioned more than would be expected - and a red indicates that the theme is rarer than would be expected.

We can also cluster the themes and groups to bring together those themes which are similar.

Themes used by overall rating

One analyses that we usually advise a client to conduct is to look at which themes are most likely to be mentioned by engaged or disengaged employees. Here we can use the overall rating as a proxy for engagement.

For ordinal (and continuous) variables such as this we tend to create quartiles and then compare the bottom and top quartiles. Given there are only 5 possible levels the data doesn’t split into perfect quartiles so we create the next-best, here relating to about 30% in both the top and bottom groups.

The positive question themes group in this analysis in a similar manner to the views above - ie topics such as salary and benefits being appreciated by those providing the lowest ratings and less functional or transactional topics such as culture, innovation and leadership being mentioned by those who rate higher.

Here we can see from the top dendogram (the tree) that the middle and top groups are more similar to each other than they are to the lowest rankers which is a common pattern across most similar questions.

Heatmap showing themes mentioned by overall rating, Pro question

On the negative side there still is a link between the top ‘quartile’ and the middle but there is more overlap between the topics mentioned by the middle group and those providing low ratings.

Both “Speed and Pace” and “Big Company” are typical of those giving high ratings. These individuals tend to complain about things that get in the way of them performing. The category ‘nothing’ is used when the individual states that there is nothing negative so again expected. Food and catering feels a bit like they’re trying to think about something to say rather than it being a real cause for concern.

Heatmap showing themes mentioned by overall rating, Cons question

By employer

Looking at topics by company provdes a great way of seeing differentiating factors of organizations.

In the positive chart it’s interesting to see that the younger organizations are more similar to each other than they are to the more mature organizations. Culture (and food) feature higher with these organizatons as a differentiator.

Heatmap showing themes mentioned by company, Pro question

On the negative side a key point is that Amazon - with its large number of warehouse staff - and Apple - with its large number of retail staff and the most closely related. As mentioned above we see topics such as Hours and Shifts appearing here along with communication and training. I suspect this reflects the very different working environments that these organizations have.

Heatmap showing themes mentioned by company, Cons question

On the other side we see that Microsoft and Google both are more likely to have employees that complain about it being a large company and that the organization structure is an issue. Processes and Politics both seem to be Microsoft issues.

Colleague quality as a negative

There are a lot of differences between this dataset and the ones we usually analyse. One of the more striking ones is that ‘Colleague Quality’ is seen so high on the issues.

‘Colleague Quality’ is a theme which we usually see in positive comments. It might be expressed as ‘I get to work with some really smart people’ or ‘my colleagues are wonderful’. In this dataset we see the same thing in the positive comments - “Nice people and managers” or “very inspiring engineers to work with.”

In this dataset we see Colleague Quality being mentioned as a negative, especially at Google, Microsoft and Facebook. Looking at the comments quickly reveals two issues: * An issue with people being overqualified for the tasks that they’re being asked to complete * A difficulty in getting a promotion as there are so many talented people ‘competing’ for each role.

Exploring themes

One goal when we analyse text is to provide as much summarising information about the themes and their usage without forcing the reader to dive into the raw answers. We therefore typically provide an extensive appendix with key information about the theme and some highlighted examples

ngrams

Our models are effectively ‘black box’ models we feed the algorithms training examples, they’re converted into embeddings (numerical vector representations) and the ML builds a classification based on these embeddings.

One important way of understanding what is in the sentences that have been coded with a theme is to look at the combinations of words - n-grams - that are specific or typical within these answers.

To do this we use a modified TF-IDF algorithm. From this analysis we can identify key combinations that are common in the text. The following shows how complaints about Colleague Quality are about working with bright people.

nGrams mentioned in sentences coded Colleague Quality (Cons)

words by frequency and sentiment

The analysis above looks at words that are specific to the theme. It is also useful to look at all words in these answers.

Instead of removing stop words as would be typical in text analysis our first stage is to identify a parts of speech associated with each word and then to filter to relevant classes. We then plot these words by frequency - x axis - and the average rating - y axis - of those mentioning it.

Words mentioned by frequency and sentiment

Who is most likely to be mentioning each theme

Our analysis typically includes an analysis of who is most likely to mention each theme. This analysis is almost identical to the analysis that we use for the clusters above.

Co-occurence of themes

We provide two visualisations related to co-occurence of the theme. The first is a simple bar chart of the most common other themes mentioned by those mentioning the theme in question - in this instance Colleague Quality

Themes mentioned in answers mentioning Colleague Quality

The second is a version where we show the co-occurence of themes most different than would be expected. This uses the same underlying data as used in the co-occurence network.

Themes mentioned in answers mentioning Colleague Quality, showing those most different than the number expected

Example sentences

Finally we identify a selection of sentences that represent the range of answers mentioning the theme in question. To do this we use an amended text summarisation algorithm treating the class as a document. Typically we’ll show around 10 sentences but are trying to find 10 different sentences that cover the key usage.

Conclusion

This analysis was conducted to highlight some of the ways that employee text feedback can be analysed and the sort of ways we can visualise text data. It’s far from complete.

It is worth remembering that we’ve really only looked at 2 questions. Most companies will have richer data from a typical employee survey. Many of our clients have larger data sets. This is just scratching the surface but I think it’s a good starting point which is why we decided to create cost-effective automated service to enable everyone to have access to at least this level of analysis.

Frequently we hear from clients that analyses like this encourages their teams to collect more text-based feedback and hence make better decisions. I hope that sharing this will encourage People Analytics teams to be more confident asking for employee voice, knowing that making sense of the answers is possible.

If you have any questions or comments on this analysis please get in touch.