Using your employee survey data to prioritise Employee Experience projects

Using your employee survey verbatims to prioritise employee experience journeys

With Employee Experience rapidly ascending up the priority list of organizations and HR teams it can be daunting knowing where to start. The purpose of this article is to provide an overview of a pragmatic approach being used by many of our clients.

In the last article on mixing quantitive and qualitative approaches I mentioned that one of the best uses of qualitative data is in an exploratory manner at the beginning of the project. This approach, where you start with an open-mind and look at the data to point you in a promising direction is perfectly aligned to identifying where to focus with employee experience.

What you need

A rich store of information that few companies have maximised the value of is the text comments in their surveys. Typically firms are asking broad, open questions such as ‘What is great about working here?’ and ‘What could we do to improve working here?’. Both questions will contain rich information about the experiences that people are having. You might also have employee-lifecycle related text, such as comments on exit or recruitment surveys.

In addition to the text it’s useful to add individual level metadata to provide some information of who it is who is providing the content. This shouldn’t be names but instead locations, tenure, business functions, grade or level. If you’re able to use data such as gender and age then this can be useful, not least to ensure that you’re focussing on providing unbiased recommendations and covering key groups. It’s important that you don’t just recommend what is important for the largest groups.

One piece of data that is highly valuable if available is some form of performance metric. Whilst you don’t necessarily just want to focus on high performers you might want to ensure that the experience that you’re providing to your most valuable employees is premium.

What is being said?

The first task is to classify the texts into themes, and sub-themes. Think of this as structuring the data. What you’re effectively doing is adding one or more tags that will enable you to sort or filter the data and provides the basis for a range of further statistical analyses.

In an employee survey whilst every comment arguably relates to an experience, not all the comments are as useful as others.

Many of the answers will be short, either a few words or in many cases a list of issues. It’s important to analyse these as they can provide information about the breadth of an issue, but in themselves are unlikely to provide enough richness to explain what is the problem, why it is causing concerns, what are the implications of the problem and ultimately how you’d start addressing them.

How many themes are enough? The trick here is to be granular-enough so you’re not aggregating topics that actually could need to be dealt with separately, but not having so many that you become overwhelmed.

One option, which we’ve found works well, is to initially create quite a granular structure and then to start joining similar topics. It’s likely that you might need to create different groupings depending on the use-case and stakeholder group.

There are a number of different approaches to joining similar topics together. One option is to start by sorting the categories based on frequency of use. You then work from the smallest group, adding it to a larger one if you feel that it is very similar. If it is unique keep it on its own, or add it to an ‘other’ group. This exercise works well in a group, with each theme written on a different post-it note.

Another technique which we really like is using co-occurrence analysis - we use a probabilistic approach to look at pairs of themes most likely to be used together - to identify themes that might be candidates for merging.

Identify which topics are causing the most pain

This is why it’s best to capture a quantitive variable such as eNPS, eSat or ease-of-use score. With this score it’s possible to use statistical techniques to identify which theme or themes seem to be most linked with the variation in the scores (ie which ones, if mentioned, are likely to show particularly good / bad scores.

There are a variety of different techniques that can be used. Part of the choice will depend on how you asked the rating-score question. A good option is to build a random forest model and look at feature importance. Another option is to look at one of the many relative importance methods. There is no ‘perfect’ method so it can be useful to try a few and aggregate the results (or use a voting method).

Identify answers which link cause & effect

One class of answers that are particularly useful are ones that explicitly describe both a problem, and the outcomes of that problem. For example:

“The shortage of staff causes everyone else to have to work twice as hard.”

Here we can see an explicit link between workload and the staff. These types of sentences are especially useful when looking to build your story of what is happening as your stakeholders will be able to understand the implications of the issue.

It is interesting to identify in these sentences which themes are connected. For example it the sentence above the staffing issues cause high workload. Though it might be correct, the sentence doesn’t explicitly link the high workload to staffing issues.

Determine co-occurence of themes

Themes in your text will co-occur with other themes and looking at the patterns of these relationships will help you understand linkages which might not have been defined as clearly as above.

Co-occurence in a survey can be at several levels. You could be looking at co-occurence within an answer. However you could also combine answers and look at relationships between between questions (e.g. people who like the pension scheme tend to have an issue with parental policies).

Co-occurence network showing how themes link for a large firm. The data is taken from a question about “what could we do to improve working here”.


Look at emotions scores

As we’ve mentioned before, we’re not huge fans of sentiment analysis. However whilst it isn’t reliable-enough to use to sort the comments between the positive and negative ones, it can have uses in highlighting which sentences might be worth reviewing powerfully-written quotes. These will often be ones with more ‘extreme’ sentiment scores.

The downside of this is that it risks over-emphasising examples with certain written styles. Therefore it can be a help to find the right examples, but should not be the only approach used.

Build personas

Themes are almost never distributed evenly across an organization. Certainly as the starting point of a design-journey identifying which themes, or groups of themes are most likely to be mentioned by similar groups of employees can be a great way of starting to build robust, data-informed personas.

We typically do these analysis in both directions - i.e. we look at which groups of employees are most likely to be mentioning each theme and we also look at which themes key populations are most likely to mention.

Build detailed pictures of your key choices

From this approach you should be able to identify in detail some potential options. For each option you’ll need:

  • A description of the problem (e.g. technical onboarding training for new joiners)

  • Who this problem impacts (New Joiners especially those in area “X”, their managers, their peers)

  • What are the implications of this problem? (takes longer to get up-to-speed, peers and managers can’t focus on their tasks as they’re onboarding new joiners, increases stress and workload of area which is linked to higher attrition.)

  • Size of problem (We hire ‘y’ people in this area every year, new joiner attrition is ‘z’…)

Estimate the cost / benefit of each option

This part can’t be done just from the survey data. To help prioritise the possible interventions you’ll need to identify the likely cost & benefit associated with each.

Our preferred approach here has two components

One benefit of using this approach is that by decomposing the problem in this manner if someone challenges your calculations then you can identify whether the issue is related to your calculation (you might have missed something) or your values. Either way it’s much easier to rerun the analysis with a different scenario.

The point here is not to produce the most accurate number, but instead to have a rough-figure or range that can be used in the project-prioritisation stage.

Identify further questions

Though we love using massive text datasets to inform such prioritisation, you shouldn’t assume that this data will provide all the information that you need.

  • You might want to add some more metadata to the post-coding data to explore the issue in more detail. In the example above you might want to create a flag with indicates the presence or quantity of new joiners within a team and assign to team members so you can analyse the text responses with regards to this data. You might want to look at other data for these teams, for example attrition or sickness of peers.

  • You might need to capture quantitive or qualitative data to further understand the issue. The analysis however should enable you to refine which groups or individuals you need to collect more data from. For qualitative data collection (e.g. interviews) you should now be able to focus the conversation to a much greater degree

Can’t replace good analysts

Great text analysis is about sorting and filtering to focus the analysts’ attention to where it is most needed to help build a story. It is unlikely that these stories and presentations can be written automatically anytime soon.

That being said, whilst the interpretation of the information requires human expertise the technical and analytical ‘infrastructure’ to do this can be standardised. From a project perspective, this type of analysis can dramatically increase the velocity of the project whilst reducing the amount of resource.

It is quite possible that you’ve been challenged to look at a focus area or ‘journey’ that you just don’t have sufficient data - for example international mobility when there are only a small number of people impacted. In these cases you should be thinking of capturing data - either by asking those who’ve gone through the process recently and / or building measurement processes to monitor experiences. Using the above approach as a framework to deal with this new data is a good place to start.