The technology People Analysts need

It’s 2016. You’ve made plans for your HR department and for the first time the words ‘People Analytics’ appear on the org chart, probably, if truth be known for slightly aspirational reasons.

If we use a typical model we’d be thinking Strategy – People – Processes – Technology. As a senior HR manager you’re probably reasonably comfortable with all but one of these. The part you’re worrying about is Technology.

If you read the HR press you’ll find that every technology vendor will sell you a technology to do People Analytics. What do you really need, especially when starting out?

Let me start by stating that technology will be the smallest of your issues. As the leader of a People Analytics firm we’ve learnt from experience over the last 6 years what works, where our spend has been valuable and what tools we turn to when dealt with typical HR data.

At the same time we collect large amounts of data on People Analytics jobs. We can tell a huge amount about the sophistication of a firm’s People Analytics maturity from their job descriptions. What do the advanced firms want people to be using?

Our clients, who tend to be at the higher level of maturity themselves, tend work with a certain set of technology. Many like us will have learnt by trial and error what they really need.

So, given these three groups what are the technologies that you should include on the machines of your analysts?

R

R is a statistical language. It’s open source, probably the most widely used analytic toolset in universities and widely used in industry.

One of the key things you’ll want in your analysis is replicability. It’s hard to be replicable with Excel.

Replicable means that if you start at these same place and follow the same steps you’ll always reach the same conclusion. As a result another analyst can follow each of your steps and understand how you got your result. Replicability makes your analysis easy to debug. It also makes it repeatable. Given a new data set your sets should produce the correct and consistent result.

R makes replicability easy. Analysts write code – scripts of instructions – which mean others can follow what they’ve done and hopefully repeat the analysis.

When we started 6 years ago deciding to use R was probably a brave decision. At the time it was really just used in academia. In hindsight it was inspired.

Such is the acceptance and usage of R since then that it’s probably the default choice for analysts. Want to hire great analysts? They probably know R. Most of the other commercial vendors in the market like SAS or SPSS enable you to use R within their tools.

We could stop this post here. When I was writing my presentation on Doing People Analytics at last year’s HR tech London event I almost did. For the last 6 years I’ve probably averaged 3 – 4 hours R work a day and it’s by far my go to tool. However there are some other tools in the toolbag which your analysts might also want.

If you’re not going to use R you could probably make a good case for replacing R with Python. I think it’s harder for the general analyst to start with Python, but if your analysts come from a technology background they may well have good Python skills

Open Refine

OpenRefine used to be Google Refine. They bought it as part of another acquisition. They’ve since released it back to the community.

The Google bit used to put people off but it’s always been a really powerful tool that ran locally on your machine. The Google connection was probably most useful in letting it communicate with their useful set of APIs. Refine doesn’t make this super-simple, but it does make it easier to get and use this data.

Refine is the go to tool if you want to deal with dirty data. In reality, for me, this means cleaning up nasty Microsoft Excel files. It does one thing, and it does it very well. That one thing is cleaning dirty data sets.

One of the really useful tools in Refine is a set of algorithms that can spot probable mis-typing errors. It can also spot outliers.

Again, Refine is free. It should be on every analyst’s desktop.

Gephi

One of the best Christmas presents I received last year was a new version of Gephi, the go-to tool to analyse network data. The previous version didn’t play with Macs very well.

I’ve mentioned before how useful network or graph data is. Gephi lets you quickly visualise a network and produce summary statistics. If you’ve seen a network diagram out there it was probably done in Gephi.

We use the igraph library in R more for graph analysis but Gephi is better to interactively explore your data. As every analyst knows just plotting your data is incredibly valuable

Like R it can create scripts to make analysis replicable. I tend to use the igraph library in R for heavy duty network analysis but will still open Gephi to get a first look at some new data. For teaching networks I use Polinode, a very user-friendly web-based tool with the advantage of a good network survey capability.

RapidMiner

When I teach People Analytics I use RapidMiner to give an introduction to predictive analytics. It has a drag-and-drop, flow-chart type interface which makes getting started a lot less intimidating than diving into code with something like R. Of course, should you want to you can use R models in your process.

As at least 70% of a typical analyst’s job is preparing the data, RapidMiner can handle this task (as can Refine or R).

RapidMiner has a market place where analysts can acquire additional functionality or models.

Other tools

We use only a few other tools. For creating interactive visualisations, especially at the prototyping stage we use Tableau. We’ve been Tableau users since starting OrganizationView and use both desktop and server (for sharing visualisations with clients). R can be used with Tableau.

When a visualisation design is set we then tend to build it with D3.js and Javascript. Increasingly there are good libraries in R which can get you up and running quite quickly.

From our experience technology isn’t the thing that’s holding HR departments back from doing great analytics – it’s a shortage of analysts. Given the budget we’d invest in talent over technology in most instances.

Andrew Marritt