Net stacked distribution – a better way to visualize Likert data.

January 14, 2011

This is a bit off-topic, though we hope it has relevance to many of our readers.

A few days ago I commented on an article by Ed Halteman called ‘Discovering different ways to report on matrix or table data’. It discussed visualizing a common survey question type – the Likert scale.

A Likert scale has a number of options for the respondent – usually 5 or 7 – of the ‘Strongly disagree, disagree, neutral…’ type. Likert added numbers (1 to 5) to his scale. Even so these values probably should be treated as ordinal hence the use of means, though common, is best avoided.

When presented with distributions our natural inclination is to show that distribution and for continuous variables like time we mostly use box charts. For ordinal data though this doesn’t work. We therefore use a different approach which we call the Net Stacked Distribution. It works for Likert data as the question is symmetrical around a neutral position.

For any analysis it’s important to start by understanding what the viewer wants to know from the results. With a Likert scale we think that the viewer is most interested in the balance between positive and negatives and how strong those feelings are. Neutral responses could often be described as a measure of indifference. Whilst this is important we want to ensure that the understanding provided by visualization isn’t distorted by those who are indifferent.

The best way to describe the Net Stacked Distribution is with a real example, so we’ve created a sample data set with 5 questions – all with a different distribution of our choosing.

Here are the sample distributions in histogram form:

histograms showing distributions

For (A) the responses are distributed equally between the 5 points on the scale. For (B) we’ve created a symmetrical bell-type distribution with quite broad tails. (C) is a similar distribution but with a strong central – Neutral – set of respondents. (D) is skewed to the right. (E) is identical to (D) but with a stronger amount of ‘Strongly disagree’ at the expense of the ‘Disagree’s.

The issue with using histograms is when you want to compare numerous questions together – they simply take a lot of space. In a presentation you probably can get between 3 to 5 on a slide. You also can either align the questions vertically as here so each response is above each other, or align horizontally to enable better comparison of values. An ideal visualization would enable both.

A common way of presenting this data is as follows, the stacked distribution:

stacked distribution

Whilst this has strengths, mainly in comparing the extreme results, the central elements can be difficult to read because they all need to be seen from differing bases. The Net Stacked Distribution tries to counter this issue.

Net stacked distribution of Likert response data

There are a few things which we hope are easy to see: (as well as the most obvious – we’re explicitly not showing the neutral)

  • It is easy to see the skew between total positive and negative responses, due to the central base
  • The total width of the bars shows the percentage or respondents who have non-neutral feelings towards the statement. Inversely the short-width bars in C shows a large percentage of the respondents who are indifferent / have neutral feelings (If this statement was assessing an experience you cared about it might show that your customers / employers aren’t interested)
  • We use depth of colour to highlight the intensity of feeling. Though the balance between total agree’ers and disagree’ers in D and E are identical, the colouring quickly highlights E where you might decide action is more important (let’s say these examples show a service where most people have a good experience but in the case of E, for those who have a bad experience it is perceived as very bad).

(Note that even though we’ve used red-blue in this example, by using colour density you could print this in black and white and little of the information would be lost)

We haven’t formally user-tested this approach but we’d suggest that a viewer could differentiate very quickly based on the 3 important factors; skew, non-neutrality and intensity of feeling.

We look forward to comments.

Technical notes for the production of these graphics.

The graphics were produced in Tableau. A sample dataset comprising of a respondent ID and 5 columns for the responses to questions A, B, C, D and E was prepared for analysing in Tableau using Google Refine (Transpose columns A-E prepending the column title and then spliting this new column into a question column and a response column. A table with 100 rows of data with 5 questions then becomes a table with 500 rows each with one question and one response). Our responses were captured as numbers 1 (strongly disagree) to 5 (strongly agree)

To produce the negative numbers we created a new calculated field (called Net Response) in Tableau with the following calculation:

IF [response]<3 THEN -1
ELSEIF [response]=3 THEN 0
ELSE 1
END

To produce the graphic in Tableau we used the following settings:

Row: Question (A,B,C..). Of course you could compare the responses to the same question by population should you wish
Colour: Set by Response
Column: We used a Table calculation based on the standard ‘percentage of total’. this was changed to:

SUM([Net Response]) /TOTAL(count([Net Response]))

Update

Both Jason P Becker and Statisfactions provides a detailed example of how to create these graphics using R / ggplot2. My thanks to both of them.

{ 7 comments… read them below or add one }

Jacqui Taylor January 21, 2011 at 12:11 pm

Hi Andrew

This is a great piece, thanks for sharing on the Tableau forum. I wondered wether you would like to share this in informal presentation to the group as part of the next UK Tableau Group. We haven’t yet fixed a date but we are on the lookout for content. Please email me and let me know your thoughts.

Jacqui

Reply

Andrew January 21, 2011 at 3:20 pm

Done so. Hopefully I will see you at the next meeting.

Reply

Andy Cotgreave January 21, 2011 at 4:38 pm

That’s a great innovation.

You omitted the neutrals in order to highlight those with neg/pos attitudes. However, on first looking at the charts, I didn’t notice that and assumed the total bar width represented total respondents. So, I think you need to educate users when presenting the chart like you do. OR, how about you do show the neutral responses, 50/50 split left/right of the vertical axis? that way, all the bars would be the same length and the neutral response answers would also be displayed. The calc field to do this would be a bit more complicated, of course, and might be one of the reasons you chose to exclude it! I’m not suggesting what you have done is wrong, I am just curious to know if adding the neutrals makes it better or worse.

As I said, though – great stuff.

Reply

Andrew January 24, 2011 at 4:10 pm

Andy,

My original thought was to add the neutrals in the centre as you suggest but splitting them 50/50 I think is also misleading – it would imply half are positive and half negative whereas we can’t assume that.

Given the example above assumes everyone selects one of the 5 options (no N/A option) then the total length of the bars is inversely proportional to the ‘neutral’ value. This does require education as you suggest.

My thought is that the answer is probably to use a combination of methods depending on what you’re trying to show. It could be worth highlighting the neutral respondents by use of a figure, or a combination of figure and sorting on this value (to show strength of non-neutrality).

Fundamentally, the representation of the data should be based on what action we want to promote. This analysis makes sense if you assume that a neutral response means no action is necessary, and that the question is worded so that the neutral really is a mid point. This may or may not be the case. It is an analysis which increases sensitivity to non-neutral values and is therefore useful if action is needed in these instances.

Reply

Andy Cotgreave January 27, 2011 at 2:45 pm

Good response, Andrew. I agree, it does seem better to keep the neutrals out of the chart. I realised this technique is pretty much the same as one I used to visualise results at a golf tournament:
http://www.thedatastudio.co.uk/blog/the-data-studio-blog/andy-cotgreave/golf-tournaments

In those charts, I show players ranked by their final position. The bars show birdies/bogies, but hide pars. I figured this was a useful way to see who was really on fire and hitting birdies the most.

Andy

Reply

Verity May 17, 2012 at 3:25 pm

I am desperately trying to recreate this for my data. But having just downloaded tableau and never used it before, it doesn’t seem to work for me. Is there anything you did in creating this which you didn’t think to mention. All I am getting to is is a simple bar chart with percentages on the bottom, but those percentages dont go below 0.

Reply

Andrew May 18, 2012 at 10:49 am

The two important parts are firstly to prepare the data into a format that Tableau likes. From a survey application you typically get each question as a separate field. In most instances it’s best having one column for the question and another for the answers.

The second is to use the formula that creates a -1 for a negative answer and +1 for a positive. If you just plot this you’ll get a similar type bar graph without the highlighting between strongly agree and agree. Use a quick calculation of ‘percentage of total’ to change into a percentage.

If you’re still struggling then I suggest posting something on the Tableau community including your workbook if possible.

Andrew

Reply

Leave a Comment

{ 2 trackbacks }

Previous post:

Next post: