HR Reporting & Analytics

Mixing qualitative and quantitive approaches in People Analytics model building

Miscellaneous Blog Image V01-01.png

The following is our approach to running People Analytics projects, especially focussing on the hardest part - ensuring that you have the right question and right data to make a decision. 

What really is the problem?

The most critical part of any project is correctly defining the problem in a way that can be informed by data.

Whilst this might seem an easy problem, in most instances it’s not. 

Let us take a simple example - can we reduce employee turnover.

In itself this might seem like a good question, however trying to simply minimise turnover implies some assumptions:

  • that all employees are equally valuable - i.e. you have no preference to which employees you ‘save’

  • that there is no cost associated to ‘saving’ employees, or at least the cost associated is uniform for each option

  • that there is no optimal level of attrition apart from zero.

For most firms none of these assumptions would hold. What we probably want to do is to minimise the cost of attrition where the cost function is as complete as possible and will include the cost of whatever change you need to make to reduce the attrition.

What could be causing the problem?

For many analysts, when given a problem there is a tendency to want to jump-in and start building a model. This is problematic.

From our experience it is always advantageous to conduct a qualitative review to ensure that you have identified as many possible ‘theories’ as possible. 

Conducting this research has two key advantages:

  • it ensures that you make a conscious decision to what data and features you need to bring into your modelling process

  • it helps you socialise and gain acceptance for your recommendations. You’ll reduce the risk of a key stakeholder challenging you that ‘have you thought of X?’.

Desk research

It’s highly unlikely that you’re the first person to consider your current problem. Desk research will go a long way to ensure that you build your analysis on the work of others.

One place worth starting is Google Scholar. Many articles will be available even without access to a university library. With time you’ll learn how to sift through the journals and papers to identify causes quickly and efficiently.

Asking stakeholders

It’s very rare that the causes of the issue will be of a true surprise to people within the organization. However it’s quite possible that decision-makers won’t have a complete view of the issues on the ground.

Companies tend to develop myths of what is causing certain issues. Time after time we see a distorted view, especially at senior levels, of what is causing issues on the ground. Not only are leaders often several layers away from the issue in the organization but they are not a representative sample of the people in the organization as a whole. Expecting their ideas to be representative and complete is foolhardy.

Traditionally this work would be done with a series of interviews and workshops. However, especially for an issue where speed or access is an issue (eg for problems with a geographic distribution) it’s worth considering using supporting technology.

Using technology to go broad

One method which we see many of the most advanced analytics teams using on an increasingly frequent basis it to ask a few questions to a large population of people to understand what they believe is causing the issue you’re addressing. They’ll do this using a very short topic-specific survey / questionnaire.

The most important questions in these surveys will be open-text, as you’re trying to identify a broad set of potential issues in an exploratory manner. It’s almost always worth asking these sort of questions in pairs:

  • What could be causing you / others to….?

  • What suggestions do you have that could help us address this?

In addition you might ask one or two scale based questions. Depending on your topic this might be something like:

  • How big of an issue do you perceive X to be?

  • Over the last 6 months do you think this has become better / worse?

It is important to use a survey tool - almost all do this - which enables you to track who provides the answers so that you can link the answer data with various demographic variables. In this way you can analyse the results by various sub-populations. For example if you’re looking at something like attrition it’s likely that you’ll identify different reasons depending on the geography or life-stages of the individuals.

When you have the answers it’s important to accurately code the reasons that people provide in the open questions. We believe that the best way of doing this is to use an inductive approach (you learn the themes from the data, not a pre-defined model). When our clients use our Workometry service for this it’s typical that only about 70% of the themes that they find are those they expected. Using an automated inductive approach replicates what the best qualitative researchers would do but at a fraction of the time / cost.

Where is the data to test these ideas?

For each of the ‘causes’ that you’ve identified in the earlier stage it’s worth thinking how you can get data to test if the perceived relationship seems validated with the data. This part can require some creative thought. 

Some of this data will be available in your business and HR systems but some certainly won’t. 

All measurement and data-capture has measurement error. At this stage you’re trying to think of ways of acquiring data that balances the amount of uncertainty in the measurement with the cost of bringing it into the model. At this stage you’re not trying to build the most accurate data capture method but instead find a way that is good-enough. If the analysis suggests that there might be something worth investigating you can then invest more resources. Doing an early review makes it easier to build a case to create a more expensive / robust method if needed.

Thinking about Proxies

One of the things that you’ll have to do is to make some reasonable assumptions to identify data that could be a proxy for what you care about.

For example, a few years ago we were helping a client build an attrition model for a national workforce across India. One of our hypotheses was that the attrition rate in any branch was influenced by the vibrancy of the local job market.

At the time we didn’t have good regional data on local job markets. Faced with either a lack of data or expensive data acquisition cost we looked for a proxy.

One idea was that if a city has a buoyant job market then more people will move to that city and therefore that the population would increase (or shrink). Fortunately this data was available as open data from the Indian census. By creating a variable for population growth at a city level between the last two censuses we built a proxy for job market vibrancy. It proved an important predictor in our model and helped explain why the issue wasn’t uniformly distributed.

Creating new variables

There is often a big difference between the data that is captured in systems (usually to help run some form of process) and the data you need for modelling.

The process of transformation needed to create the variable of interest from the raw data is one of the most time-consuming parts of the analysis process, but this can be significantly guided by understanding likely issues.

For example you might be capturing events as dates but the important variable might be duration - for example time since last promotion. Alternatively we often found that the rate of change was more important than the absolute value - salary often falls into this category where the salary rise has more predictive power than the absolute salary (apart from at the extremes). 

This type of feature creation can be a never ending task so understanding where to focus efforts is important. Your early qualitative approaches will often pay-back by reduced effort at this stage

New measurement

In many instances you’ll need to capture new data, either because a system has incomplete data or there is no records are available.

Sometimes you will be able to ask people to provide data. In other instances you might want to manually create it from a sample before bothering employees. For example a few years ago we had a client where the hypotheses was that the employee’s previous employer (eg did they come from a competitor) was a driver. In this instance we took a sample of employee names and spent a few hours looking on their LinkedIn profiles to capture the information. It turned out that it wasn’t likely to be an issue. Hence we avoided creating a burden unnecessarily.

Prioritisation of data acquisition

It’s highly unlikely that you’ll be able to include all potential datapoints in your first iteration of your model. Prioritising which to select is another big advantage of using a broad qualitative questionnaire.

For each potential variable we prioritise based on two characteristics:

  • The cost of acquisition (including the resource-time needed)

  • The likely importance.

If you’ve done a wide-reaching questionnaire you will likely have a good idea of the importance by how frequently it was mentioned. This data might also highlight the groups where it could be problematic, eg an issue might be restricted to a particular geographic area. In this case instead of capturing the data globally you might want to include it in potential ‘hotspots’. It’s always possible (and often advisable) to build an overall model which is an ensemble of local models.

Models can’t be better than the data they see

To non-analysts it might seem that the way to build good models is to extract as much data from your systems or databases, apply some form of machine learning model and use the results. This is almost always the wrong way to do analysis.

Good model building is always a conscious set of choices from the analyst about what data to include and in what form. Spending more time up-front identifying potential issues and therefore variables almost always is a worthwhile investment of time and resources.

As noted, as analysts we should understand that these early qualitative efforts not only increase the chance we’ll build good models, but the process of doing it dramatically increases the chances that our recommendations and models will be accepted by our stakeholders. An analyst who thinks their results will speak for themselves is likely to be an inexperienced analyst.

The best analysts know when to use quantitive approaches and when to use qualitative, exploratory approaches. In almost all instances the best approach is to combine them.


Cost ≠ Value. Issues with the total compensation approach

Cost ≠ Value. Issues with the total compensation approach-01.png

In several sectors compensation professionals and the HR teams they serve talk about total compensation. When communicating compensation messages with employees and prospective employees they sum the cost of each component and present this as the value of the total package.

Unfortunately what is important to employees is not the fiscal cost of the package, but the perceived value, and they are rarely the same thing. The issue is that given the money employees would often prefer to allocate it in a way that gives them more perceived value. Economists describe this as a deadweight loss: a waste of resources that could be averted without making anyone worse off. Think about it another way, if instead of providing non-cash benefits a company paid just cash then a benefit package would only be efficient if the employee would choose to spend the money in exactly the same way as the employer had.

In a famous 2001 paper Joel Waldfogel attempted to calculate the deadweight loss that giving presents at Christmas generated over the alternative of giving the recipient the equivalent cash. He estimated that the deadweight loss was between 10% and a third. It is highly likely your total compensation is being similarly devalued by employees.

Why non-cash benefits can make sense

So the economic theory suggests that the most efficient way – that that creates the highest perceived value to employees – of providing compensation is just to provide the cash. However this misses part of the picture. Companies can often purchase goods and services at considerably reduced prices than employees can. If we take Waldfogel’s 10–33% estimate then if the company can purchase the goods on behalf of the employee at this sort of discount then the package could be more valuable to the employee than just giving a bigger salary.

How to measure what is a valued package

Techniques exist for measuring how various components of a package are perceived but they’re rarely used by employers to measure perceived value of benefits packages. They should be.

The class of problem that you need to understand is termed discrete choice: that is you have a maximum total resource to allocate and having more of one thing means less of something else. The only way of measuring preferences is to replicate this trade-off.

The problem of maximising the perceived value of a package of goods is one faced by marketers daily. The established technique for measuring this is a technique called conjoint analysis. It can be very effectively applied to the compensation problem.

The importance of choice

The most effective way of maximising the perceived value of a compensation package is to provide a wide menu of options and let the employee choose which they want. This menu should be extensive and it should be possible to change items on a reasonably regular basis as employees preferences change, and as they do their ideal basket will change.

The issue with this approach is that it’s expensive. The companies it arguably works best for are those with a large number of relatively similar employees who have a similar set of preferences. It’s used quite frequently in large professional services firms for example.

If you don’t want to do this the second best approach is provide a series of menu options which bundle benefits in a way that appeals to various segments within your workforce. The great thing about discrete choice experiments such as conjoint analysis is that the results they produce are differentiated – that is that cluster analysis is possible using the data.

Communicating benefits to employees

The issue of the difference between cost and perceived value of benefits makes communicating benefit value difficult. If you communicate a cost to employees but it is not something that they would have chosen then there is a real danger that the employee will view it as wasted money. One way of reducing this is to report not only your cost but also market cost – so that the employee sees that you have got them a bargain. Whichever way you communicate it is always worth doing material tests to understand the reactions to the wording. If you don’t want to do this directly with employees then it’s relatively straightforward to recruit test participants who are similar to your employees.

Negotiating a salary in a total comp world.

If you are a prospective employee and want to negotiate a good package then it’s always worth starting with the reported total compensation value of your current package. It is highly likely that the value to you will not be as high as your current employer is reporting due to deadweight loss. By starting at this level you can then negotiate a package that more effectively matches your perceived value and capture some of the deadweight loss.