In 2015 two data scientists, one with a Google background and the other from Facebook wrote a brilliant article in the New York Times about the need to incorporate the why into predictive models. They illustrated that if a Facebook user would click a link an then immediately return they might be able to predict the behaviour of others but without knowing why the individual returned so quickly they wouldn’t be able to accurately identify which future articles would cause this behaviour. To understand why Facebook would ask some of these users why they returned so quickly.
Most People Analytics models are probabilistic. What we’re doing when building these models is splitting the overall population into small segments which (hopefully) become far more uniform in their behaviour. The models look for combinations of factors that in combination create a group of employees for whom we are much more likely to be certain of their behaviour. It’s not that we’re going to be ‘right’, it’s just that with these models we can be ‘more right’.
As a manager or an HR person whilst identifying high-risk groups is good, we will only realise economic value if we can make a change that changes the odds of this group behaving in a certain manner. Here lies the difficulty.
A large percentage of the influential variables in a typical People Analytics model are not easy to manipulate.
The type of variables that we see coming through would include age, gender, years of service, home location, family situation. Whilst changing a 25 year old to be a 35 year old might be advantageous to reduce risk it’s not available as a option to the manager, at least not in the short term. Even if our findings were able to inform recruitment approaches, changing the makeup of an organization via revised selection and natural attrition is a long process.
If we are to take action we need to understand not only the who and when but also the why. I’m firmly of the opinion that the most effective way of adding value from People Analytics is if our analyses change the processes and policies within an organization.
There are multiple reasons that bother me about making individual-based decisions, or communicating individual-level risk scores. Firstly the loss functions that we are faced with are often highly asymmetric (and none of your technology will have integrated a loss function!). Secondly we can’t assume that a change to one person can be made without changing the risks of other individuals. Take raising a salary - if you do this to one person you might have to adjust others in their area to keep differentials.
So if we can’t adjust most of the important variables and shouldn’t make changes at an individual level what can we do to make our models more actionable? We need to include perceptions. We need to ask why? in exactly the same way that Facebook asks users why? when about why they shared a particular post.
How to include perceptions in your analysis
How you include perceptions into your models depends on a number of factors, most important of which is what data you already have available.
In an ideal world you’ll have confidential historic employee survey data which can be linked to your other variables at an individual-level. Of course you’ll need to speak to your lawyers about what you can do but in many instances it’s possible to protect the confidentiality whilst doing some relatively sophisticated analysis. For example, if you’re using a tree or forest-type model you can set your parameters so that leaf size is at least your minimum reporting level, thus not creating a group smaller than the one you committed to.
If you’re not able to do individual-level linking then linking team scores is usually valuable (it might be worth doing as well even if you can do individual-level linking).
What you’ll typically find
In many instances when a building a model which includes both ‘traditional’ employee data and perception data the perception variables will be about 40% of the most important variables in the model. The key advantage however is that the perception data will be the easiest ones to action and therefore, arguably, the most valuable.
Take an example of some attrition modelling we conducted several years ago. It turned out that for high performers being in a team where the team scores for a question related to whether processes got in the way of doing good work was one of the most important variables. We identified the bottom teams in this area and were able to target those areas for performance improvement work. (Our loss function enabled us to identify the expected value of this work).
Developing a strategy for data capture
When you explore probable factors that influence your target variable you’ll almost certainly find ones where data collection is needed, frequently by use of a survey. Ensuring that you’re capturing these variables as early as possible is critical if you’re going to monitor and improve the target on a continual basis.
The great thing is that you can usually package various variables into a small number of short surveys. Depending on what you’re studying these might be regular surveys or event-triggered surveys.
If you’re starting and are still trying to understand what could be driving the behaviour that you’re trying to model we’d strongly recommend using open questions / text analysis. If you’ve got this information already (or have your own tool to collect it) then our Workometry Lite solution was designed specifically for this use-case.
When you’ve identified the key question then you can think about transitioning to scale or choice questions in addition to your open question. As the organization changes you can expect the reasons why to change.
Finally, if you’re using a survey provider or survey app it’s vitally important to ensure that you have access to individual-level data. We’ve seen too many instances where providers refuse to provide individual data or charge large amounts to do this. If you want to really make an impact with your analysis you need to be in control of your data.