Net stacked distribution – a better way to visualize Likert data.

January 14, 2011

This is a bit off-topic, though we hope it has relevance to many of our readers.

A few days ago I commented on an article by Ed Halteman called ‘Discovering different ways to report on matrix or table data’. It discussed visualizing a common survey question type – the Likert scale.

A Likert scale has a number of options for the respondent – usually 5 or 7 – of the ‘Strongly disagree, disagree, neutral…’ type. Likert added numbers (1 to 5) to his scale. Even so these values probably should be treated as ordinal hence the use of means, though common, is best avoided.

When presented with distributions our natural inclination is to show that distribution and for continuous variables like time we mostly use box charts. For ordinal data though this doesn’t work. We therefore use a different approach which we call the Net Stacked Distribution. It works for Likert data as the question is symmetrical around a neutral position.

For any analysis it’s important to start by understanding what the viewer wants to know from the results. With a Likert scale we think that the viewer is most interested in the balance between positive and negatives and how strong those feelings are. Neutral responses could often be described as a measure of indifference. Whilst this is important we want to ensure that the understanding provided by visualization isn’t distorted by those who are indifferent.

The best way to describe the Net Stacked Distribution is with a real example, so we’ve created a sample data set with 5 questions – all with a different distribution of our choosing.

Here are the sample distributions in histogram form:

histograms showing distributions

For (A) the responses are distributed equally between the 5 points on the scale. For (B) we’ve created a symmetrical bell-type distribution with quite broad tails. (C) is a similar distribution but with a strong central – Neutral – set of respondents. (D) is skewed to the right. (E) is identical to (D) but with a stronger amount of ‘Strongly disagree’ at the expense of the ‘Disagree’s.

The issue with using histograms is when you want to compare numerous questions together – they simply take a lot of space. In a presentation you probably can get between 3 to 5 on a slide. You also can either align the questions vertically as here so each response is above each other, or align horizontally to enable better comparison of values. An ideal visualization would enable both.

A common way of presenting this data is as follows, the stacked distribution:

stacked distribution

Whilst this has strengths, mainly in comparing the extreme results, the central elements can be difficult to read because they all need to be seen from differing bases. The Net Stacked Distribution tries to counter this issue.

Net stacked distribution of Likert response data

There are a few things which we hope are easy to see: (as well as the most obvious – we’re explicitly not showing the neutral)

  • It is easy to see the skew between total positive and negative responses, due to the central base
  • The total width of the bars shows the percentage or respondents who have non-neutral feelings towards the statement. Inversely the short-width bars in C shows a large percentage of the respondents who are indifferent / have neutral feelings (If this statement was assessing an experience you cared about it might show that your customers / employers aren’t interested)
  • We use depth of colour to highlight the intensity of feeling. Though the balance between total agree’ers and disagree’ers in D and E are identical, the colouring quickly highlights E where you might decide action is more important (let’s say these examples show a service where most people have a good experience but in the case of E, for those who have a bad experience it is perceived as very bad).

(Note that even though we’ve used red-blue in this example, by using colour density you could print this in black and white and little of the information would be lost)

We haven’t formally user-tested this approach but we’d suggest that a viewer could differentiate very quickly based on the 3 important factors; skew, non-neutrality and intensity of feeling.

We look forward to comments.

Technical notes for the production of these graphics.

The graphics were produced in Tableau. A sample dataset comprising of a respondent ID and 5 columns for the responses to questions A, B, C, D and E was prepared for analysing in Tableau using Google Refine (Transpose columns A-E prepending the column title and then spliting this new column into a question column and a response column. A table with 100 rows of data with 5 questions then becomes a table with 500 rows each with one question and one response). Our responses were captured as numbers 1 (strongly disagree) to 5 (strongly agree)

To produce the negative numbers we created a new calculated field (called Net Response) in Tableau with the following calculation:

IF [response]<3 THEN -1 ELSEIF [response]=3 THEN 0 ELSE 1 END

To produce the graphic in Tableau we used the following settings:

Row: Question (A,B,C..). Of course you could compare the responses to the same question by population should you wish
Colour: Set by Response
Column: We used a Table calculation based on the standard ‘percentage of total’. this was changed to:

SUM([Net Response]) /TOTAL(count([Net Response]))

Update

Both Jason P Becker and Statisfactions provides a detailed example of how to create these graphics using R / ggplot2. My thanks to both of them.