Engagement surveys – Part 1, issues with the traditional approach

In these articles I use the term ‘survey’ to mean both a survey, with sampling, and a census where everyone is asked.

There is a shift at the moment from long infrequent engagement surveys to shorter, ‘pulse’ surveys which are being used either as a replacement or supplement to the longer survey. With this and the following post I wanted to discuss some of what we see as reasons and advantages and in doing this. As always I hope to give a data-led perspective.

Some background

Organizations, recently through their HR departments but before that via operation research groups, have been conducting employee surveys for around 100 years. In the 1970s the focus was on organization commitment and job satisfaction and the focus went from OR to HR. There had been some earlier work by Katz (1964) on Organizational Citizenship Behaviour which talks about Organization Commitment.

Engagement was first described by William Kahn in 1990 but was made popular by Gallup’s ‘First break all the rules’ book of 1999. Since then most organizations have been conducting some form of engagment survey.

During the same sort of timeframe the technology for completing surveys has changed. In the 1990s and before most surveys were still done on paper. When web technology started to enter the employee survey space we saw surveys which were fundamentally a replication of a paper survey done electronically. This made sense at first as many organizations spent some time doing both in parallel. We still see paper in some environments such as delivery drivers.

About the surveys

Engagement surveys tend to follow a common design – they ask a set of questions to create an engagement index. Mostly this is in the region of 5 questions. They then ask a large number of questions to identify items which are linked to that engagement. Most annual surveys are in the range of 60 – 150 questions long. I would estimate it takes about 20 – 30 seconds for an employee to answer each question.

Data is also used to consider the demographics of each participant. We see both self-reported demographics and demographics that are imported from other HR systems. The latter is a more effective way of getting good data but in some firms there are concerns about privacy.

There is a potentially enormous number of factors that could be associated with engagement. As Kieron Shaw noted in “Employee engagement, how to build a high-performance workforce.”:

“It’s arguably unfeasible to directly measure in the survey all the actions behind engagement,” due to the fact that, “there are potentially thousands of different individual actions, attitudes, and processes that affect engagement.”

Hence, however long the survey is the designer has to have selected a subset of potential factors.

Criticisms of traditional surveys.

In a fascinating paper “Measuring Employee Engagement: Off the Pedestal and into the Toolbox” Andrew Graham of Queen’s University notes 9 issues with the traditional survey:

Not frequent enough
Single scoring leads to issue distortion
Aggregation reduces meaning
Does not capture the specifics (context seldom captured)
Lengthy or poor response planning
Managers are busy & have no incentive to implement any actions
Lot of resources & monitoring
Surveys get old
Causality not clear

A tenth issue that we find as analysts is that there is typically an illusion of richness. Many firms think that by asking 80 questions they are capturing 80 independent data points. This is clearly not the case.

Issues with survey data

One of the analyses that we like to do with survey data is to build a correlation graph. This uses each question as a node and the correlation between each question as an edge. When you visualise survey data in this manner you typically get something like the following:

What we see is a hairball. Each question tends to be highly correlated with another. (In the graph above Questions 31 – 33 are questions that the HR team wanted to add relating to a process which obviously has little link to engagement).

We’ve done experiments with survey data where we ‘destroyed’ 80% of all answers randomly and then used recommendation algorithms to back-fill what had been removed. In most instances we’re able to accurately replace what has been removed. People answer in patterns (hence that hairball), and if you know some answers you can pretty accurately infer what all the others will be (this means that you could probably randomly ask each employee different questions and dramatically shorten the survey without much loss in accuracy).

Issues with User Interfaces

This is a bit more contentious. It relates to how questions are asked.

Most employee surveys use Likert-scale questions, mostly 5 points between strongly agree and strongly disagree. One of the reasons for doing this has been that on a paper survey it’s easy to get someone to code the data into a reporting system (it’s easy to see a check in a box). What has been done is to take this process that was designed for paper and put it onto the web with little thought in terms of adapting the question to take advantage of the opportunities presented by the new medium.

Employees actually have a true feeling on a continuum between the two end points. When you ask them to answer on a 5 or 7 point scale what you’re actually doing is asking them to ‘bin’ their true feeling to the nearest compromise point. Doing so is adding burden on the survey taker and potentially adding inaccuracy in the data. The data can’t be seen as linear, instead one should use statistical methods appropriate for ordinal data.

In a 2012 paper in the journal Field Methods “Why Semantic Differentials in Web-Based Research Should be Made From Visual Analogue Scales and Not From 5-Point Scales”, Funke & Reips show experimental evidence that show that marking a line between two points – a visual analogue scale – has numerous advantages over traditional 5 point scales. Two of these are better (more accurate) data and less burden on the survey taker.

Whether the answer is a visual analogue scale or something with a large but distinct number of points (the 0-10 scale used by NPS practitioners?) is harder to determine. However I see little evidence that 5 points is the right approach.

Should we even be asking scale-based questions?

Finally, too often what drives executive action from survey data is the responses to a few open text questions. As Graham notes on his fourth issue survey data rarely provides context. The qualitative nature of open text does provide this opportunity.

Often the initial action from a survey is to do more qualitative research focussing on several key topics. Such research is both time consuming and expensive. (Arguably acting without understanding the context can be more expensive).

There are instances where asking a scale question makes sense, most notably if you’re wanting to report a trend. However asking sufficiently broad, open questions will likely capture richer data. The challenge for many firms is how to do this at scale.

If we think about how we’d try to understand an issue in a conversation we’d ask open questions and then follow up with relevant and probing follow-up questions. I firmly believe this will be the future of employee feedback, though it will be a bot-based conversational approach which can be done in multiple languages and at scale.

As an industry we’re currently not there but the future is coming quickly. In the next article I’ll describe the current state, our findings of working with clients at the cutting-edge and highlight some approaches taken by other innovators in this market.

Employee Voice, People AnalyticsAndrew MarrittJune 9, 2016featured