Predictive Analytics – a primer for HR

What are predictive analytics?

Predictive analytics (PA) these days mean lots of different things to different people and the term is falling into misuse, often by marketing departments of software providers.

In a true use predictive analytics involves using historic information to predict future events.  Most will provide a score to determine how likely an event is to happen.  For example a credit scoring model will provide a prediction of how likely an applicant is to default on a loan; a marketing prediction might inform the marketer who is most likely to respond positively to a particular promotion.

There are lots of different techniques to developing a predictive model.  It would be fair to say that this is an area between statistics and computer science.

Are Predictive Analytics new?

No, not really.  Insurance companies, direct marketing organizations and several other industries have used them for some time.

Their increased adoption has been driven by a few factors:

  • Increased data available that can be used to build models
  • Increasing computing power
  • More educated practitioners
  • Dramatically falling prices including free software to do the modelling
  • Notable successes driving increased interest

Are these PA models causal models?

Not usually.  The software or algorithm will look for patterns in the data that can be used to predict an outcome.  It will not care whether the factors cause the outcome, just that by identifying them you can make a prediction

What can they do for HR?

Any event that can be defined as a binary outcome (yes/no, does/doesn’t etc) can be explored using predictive analytic methods.

Typically these events need to be tightly defined. As an example ‘who will leave the firm’ isn’t much use as in the end everybody will leave but ‘who will leave the firm in the next quarter’ might be more useful.

What can’t they do?

Tell you what will happen.  Predictors will never be 100% accurate.  Some predictions will be classified correctly and some will be classified incorrectly.

Does incorrect classification matter?

It might do.  Let’s take a simple example.  For a workforce we want to predict who will leave in the next year.  The model will rank every employee with a ‘leaves’ or ‘doesn’t leave’.

There are 3 different outcomes:

  • The model is correct.  It rates employees who leave as ‘leaves’ and those who stay as ‘doesn’t leave’
  • The model provides false positives.  It rates as ‘leaves’ those who stay
  • The model provides false negatives.  It rates as ‘doesn’t leave’ those who do.

Whether this matters depends on how you choose to act based on the recommendation

So should I just act on the basis of a predictor?

To make any decision you need to use a loss function.  Whilst it is called a loss function it really is a benefit-and-loss-function.  You need to try and quantify the material benefit / loss for each outcome and optimise this.  In some instances the cost of a wrong decision will be low, in others – including many HR applications – a wrong prediction will be costly.  For these instances you probably will have a low tolerance to acting on wrong predictions.

What else should I know before I act?

Beware of a so-called ‘black box’ model. Make sure you understand how the prediction is made.  With HR decisions there are often important data-items or fields that can help increase predictor accuracy but can’t be used for legal reasons (eg. both gender and age will change the likelihood of someone leaving).  Building the model shouldn’t be a problem, the problem comes from when a decision is based on a predictor which uses one of these protected fields.  If you’re going to act on a predictor you need to understand how it was created.

My workforce analytics tool claims to have PA built in.  Will it be any good?

This is the thousand-dollar-question.  Whether the model will be any good will depend on how it is implemented in the tool. PA is at least as much art as science with a good modeller selecting algorithms to build the model and most importantly preparing the data in advance (80% of the time to build a model is likely to be data prepartation).

A good modeller is likely to be able to achieve better results than something pre-built into an application.  Whether either is worth paying for will probably depend on the loss function and the increase in prediction over the current technique that the model can produce.  If at all possible get someone who has experience in PA model building to help select the software & quiz the vendor.

 

Andrew Marritt