Harnessing the power of external data in HR analytics

What differentiates good people analysts from average people analysts is their ability to let the question rather than the data drive the analysis.

All too often we see analysts define what questions that they try to answer by what data they have available. By definition they’re reducing their ability to find the right answer by limiting themselves to the a reduced set of possibilities.

There is little need in this era of Big Data. One of the reasons why data is revolutionising business is that acquiring data, which used to be expensive and difficult, is now increasingly simple and cheap.

In many instances we need to think of what we want to understand as a nested and interrelated set of subjects. Our employees are subset of their teams and organizations. Their behaviours are influenced by the culture of the organization and their cultural norms.

Employees behave in the ways they do because of what they experience in their daily working life but also what is happening in the wider society, including other potential employers. For example, most employees only quit when they have a new job. This implies that a competitor must have had an opportunity available. This in turn is influenced by customer demand and the level of market churn. We know that consumer confindenceª is a good predictor of employee resignations. What about unemployment levels by country? by district? Could share price be a good predictor of confidence in the company?

If we think of employees in this wider system-based view it’s clear our models must include data that isn’t likely to be captured in our own internal, transaction-based HR systems. Put simply, it’s impossible to understand what is happening without bringing in data about the external world.

As noted, this is easier than ever before. One of the key reasons are the amount of data sources available in an open format on the web. Increasingly these are available via API-type solutions which mean your analysis can request the latest or most accurate data whenever it is needed. It is almost that the boundaries are set by the analysts’ own imagination.

Let’s consider a few typical examples. Your HR database has the employee’s salary and their currency. You might want to value these people in a common currency at a defined date – or show daily cost trends as currency rates change. One line of code in the analysis code (for us R) can grab daily FX rates that make this trivial. Want to rebase these salaries to Purchasing Power Parity to understand the value of salaries to employees? Again, simple. Places like the World Bank or OECD publish such data.

One of our recent pieces of work was looking at the effect of gender on performance reviews. As noted employees, managers included, don’t leave their cultural norms at the door when they come to work. So if we want to understand whether the differences between the percentages of men or women being highly rated is random or influenced by societal factors we might want to look at objective data on gender equality.

Fortunately the UN publishes such data. Again, one line of code and we can look at a host of factors such as education, participation in the workforce or equality indexes by country. So let’s look at a simple plot of percentage of women / men being rated in the top two levels at a country level and plotting this against a gender equality index of a fictional company:


A positive relationship between the percentage of men receiving an above-average rating and the country’s inequality

A negative relationship between the percentage of women receiving an above-average rating and the country’s inequality

What to do with such information? Well it’s probably worth bringing country-level equality measures into our models. We probably should develop company-specific solutions which acknowledge the importance of country-level traits. The plan for improving gender equality in the Netherlands should probably be different from that in India.

A few posts ago I discussed how Data Aggregators like ADP are potential rich sources of data on external trends for HR teams. External Open Data sources are another which are freely available and simple to integrate.

ª International Consumer Confidence data is available from the OECD.