In the last post I explored how the opportunities presented by data-mining could enable us to create and use surveys which reduced the number of questions we ask employees whilst potentially expanding the survey’s breadth. This one suggests another use of data-mining – to build predictive models to help understand survey results.
When presenting data we should always identify and use relevant comparitors to put the current data in context. With surveys there are three comparisons we typically make:
1) We compare with other internal groups – eg comparing one function’s results with another
2) We compare with the same group in a different time period (to see change)
3) We might compare with an external group, such as another firm in the same industry.
The validity of any comparison depends on how similar the groups in question are. What we’re really trying to do is hold all other potential variables constant and therefore show that what we’re displaying (the results of a survey question or dimension) is due, at least in part, to the differences between comparitors.
How good are these comparitors at doing that?
If we take the worst first then we can see that comparing your employees to those in another firm has too many differences to produce strong validity. There are too many different potential factors of differences in the variable in question for us to draw any real conclusions. These comparisons are good to massage senior managers’ egos but they’re not terribly useful to drive good decisions.
The other two comparisons are much better and we’d recommend using them. However, we add another comparitor which can further aid comprehension of what is going on – the predicted value.
Predicted Engagement as a comparitor
Let’s consider Engagement, typically a significant question for any employee survey. What we’re really trying to understand when doing a survey is twofold:
- Where do we stand in relation to our chosen comparitors?
- What can we do to improve levels of engagement, and hopefully the business outcomes we really care about?
If we take the former question, as mentioned above what we want to show is the difference in engagement due to the variable we’re using to compare, eg the function. We therefore need to account for the other potential reasons for a difference.
We know engagement is related to both the experience of the employee and certain underlying factors such as tenure. Engagement over tenure can often be shown to form a bathtub curve, where it first falls as the honeymoon period wears off and then increases as disengaged employees select themselves out by leaving the firm. There are other similar relationships either industry-wide or specific to the firm.
Let’s make a simple example. Let’s compare two departments, A and B. Department A has mainly established employees, and therefore a natural level of engagement matching this group. Department B is new and hired from outside. We’d expect Department B to have a higher level of engagement regardless of how the experiences that employees in either department had received.
How do we make realistic comparisons between these groups? We use a predictive model. This model assigns a probability to each employee of them being engaged based on who they are and their employment history. We then roll these engagement probabilities up to the level of the group being compared – in this instance Departments A and B.
With this figure we can visualise the data in two ways:
- We can show the levels of engagement between the two departments with two comparitors – the other department and the predicted level for each department. We use variations of bullet graphs to do this.
- We can show the differences of both departments recorded engagement levels from their expected levels (ie observed – expected). In so doing we can rank departments based on the differences from their expected level.
As with any model it’s not perfect, but we believe that by using a predictive model to present survey data in this way is reducing the issues of making comparisons between groups and thus increase the likelihood we can make effective decisions.