In HR analytics, don’t be constrained by your data structure

Data is captured in a particular form for a purpose. For most data coming out of HR systems that purpose was to run the processes or transactions that the software supports.

For analysis purposes there is a decent chance this isn’t the form you need it in. Unless you transform it into the form you need your analysis will be constrained by the data.

Analysts will often investigate questions that there data, and data structure, enables them to answer. As decisions are based on analysis then the data shape can drive decision making. Ineffective data structures can lead to ineffective decisions.

We know that in most organizations teamwork and collaboration drives high performance. However, most firms measure, through a performance management approach, individual performance. This creates two side effects:

  • the firms unwittingly incentivise some of the wrong things
  • employees become frustrated because recognition doesn’t accurately recognise performance.

We think that a significant reason for this is that we capture and analyse data in a suboptimal way. What we really want to know and how we measure / analyse aren’t aligned.

The vast majority of data, especially HR data, is stored in tables, most often in a relational database. Each row of these tables typically reflects an individual or an event.

This is great for a certain subset of analysis, when that unit is what you want to study. However there is a large set of questions where this form of data structure makes it much harder to study. The result is that the questions don’t get investigated.

The questions that I’m referring to are ones where you want to understand the relationships between entities. In HR analytics, any set of questions where you’re interested in how people work together probably falls into this group.

To do this sort of analysis you need to look at your data in a graph or network structure. As standard you can’t do this easily in Excel (thought there is a free add-on called Node-XL to enable you to get started I believe). We use either the iGraph package in R or Gephi. I have yet to see HR data of a size where using a database is essential but there are a selection of graph databases available.

In a graph each entity is a node. These nodes can have properties. They are joined via edges or relationships, which as well can have properties.

The following video is a good introduction to graphs and graph analysis:


In HR analytics the simplest form of a graph is the organization chart (often a tree, a specific type of graph). Even this can be highly useful.

For example, when we’re doing automated reporting of surveys we use a graph to identify who should be included in any report. Let’s say I want to create a report for people who report into Jim, people who report into people who report to Jim or who have worked on a project managed by Jim for at least 6 months in the last 12. This is simple with a graph.

You might also want to compare ‘similar’ teams. So for example show the results of Jim’s team but compare them with the results of all other teams whose manager reports into the same manager as Jim.

On top of these formal networks we can add social network relationships to see how work really gets done. This enables us to look at communication, knowledge sharing or influence. We can see how to redesign an organization but maintain the high-performing teams as intact.

Nodes, however don’t have to be people, they can be anything. At the moment we’re doing job / performance management analysis building a weighted network of which combinations of competencies people are being assessed on and their ratings.

They can also be used to analyse text. In a survey comment field, people who talk about leadership, what other topics are they most likely to be commenting on?

In the last 12 months I believe all of our projects used graphs at some stage in the analysis.

Like traditional tables graphs aren’t right for every analysis but if you want to do HR analytics having the option of looking at the world as a network significantly increases the number and types of questions you can investigate.