Data mining in HR

Data mining is a process of extracting patterns from data using computer-based algorithms. These patterns can then be used to help predict future events. It has been used by marketing to refine approaches for some time. We believe that it has an important role to play within HR, especially for large firms.

At the heart of the approach is a need for large quantities of data, as no algorithm can spot patterns that aren’t being captured. There are 3 main groups of information that we believe are useful for successful pattern identification and for using a model to predict and manage future events:

  • Category based employee data (gender, age, location, position, etc.)
  • Employee event-based data
  • External economic data.

Like most solid approaches, data-mining starts with a question that needs answering – a purpose. For HR there are a long-list of possible areas including:

  • Retention – understanding which employees are at risk
  • Performance – understanding factors that are associated with above-average performance
  • Recruitment / re-assigning – matching individuals and roles

A data mining process aims to improve the probability of predicting a result in comparison with a purely random ‘guess’. In such successful models can be compared to other approaches such as human intuition. Typically you are looking at a binary condition – eg employee retained / not retained. The algorithm then uses the data to spot conditions which are more likely to be associated with the condition being met. It is important to note that these are associations, not causes.

It’s highly unlikely that an exercise will tell you what will happen. You are most likely to get a set of true positives and false positives and a decision needs to be made to the balance that you require between the two groups. What the algorithm is trying to do is maximise the true positives whilst minimising the false positives.

It’s also worth noting that an algorithm is blind to ethics or legislation. You may be prohibited from to treating various groups differently just because they are more likely to qualify.

What are the benefits of this approach?

Successful data mining enables you to act in a more targeted manner, and with lower cost, than you would have been able to without the computer-based prediction. As such, as the costs of actions increase, the benefit from being able to target the recipients most at need increases.

Using target groups to drive qualitative research to understand why that group is particularly at risk further enables decisions to be made. As such, the data mining often isn’t the end goal, rather a support to better decision making.


We wrote this post in 2011. For a more recent update of data mining in HR please see our article 'AI in HR – how to understand what is happening'