To begin this episode I want to introduce you to a couple of historical ideas on best practices in graphical data presentation—or using the more modern term, data visualization. (The peculiar title I’ve chosen comes from this history. Read on to see what it means.) Then I’ll step through a redesign of a bar chart to show you how effective graphical simplicity can be.
In the 1980’s and 1990’s statistical graphing experts Edward Tufte, William Cleveland, and Howard Wainer were promoting fair and clear designs for statistical charts.1 Nearly seventy years earlier American engineer Willard C. Brinton was doing the same thing in his 1914 book, Graphic Methods for Presenting Facts. Here’s a figure from the book:
Source: Brinton (1914), Graphic Methods for Presenting Facts, p. 21
Note that in his figure Brinton advocated for “accuracy of statement.” He did the same in this next figure:
Source: Brinton (1914), Graphic Methods for Presenting Facts, p. 6
Brinton understood the importance of accurate, undistorted, and comprehensible statistical graphs. He also made an observation that Tufte, Cleveland, and Wainer each made decades later, that two-dimensional objects (squares, rectangles, circles) are poor ways to present comparative data. In Brinton’s words these are bad arrangements to place before school children. The relative sizes of the objects are too difficult to interpret accurately, as the caption in the diagram below explains:
Source: Brinton (1914), Graphic Methods for Presenting Facts, p. 22
For purposes of clarity and accuracy, basic line and bar charts are usually the best formats for statistical graphs. But even with these formats we have to make sure that patterns in the data are easy to see. Consider this example from a 2011 American Association of School Libraries (AASL) report which plots 20 data points:2
Source: AASL, School Libraries Count! 2011 p. 5. Click for larger image.
Although the chart accurately reflects the data, it’s difficult to visually compare figures for a single year. Let me show you how this problem can be solved with a simpler design. First, I re-rendered the AASL chart using Tableau Software with a few minor changes, as seen below. The label ’50th percentile’ I changed to ‘median’ (the terms are synonymous). And I replaced the horizontal legend with a vertical one. Otherwise, the design is pretty much the same as the original chart.
Step 1. Re-render chart using graphing software. (Click for larger image.)
In the next chart I removed the color-coding for years, listing these on the vertical axis:
Step 2. Remove color; list years in vertical axis. (Click for larger image.)
Now color can instead be used for the categories—median, 75th percentile, 95th percentile, and mean—as seen in the legend of this chart:
Step 3. Use color to denote measure types. (Click for larger image.)
Then I simply rotated the chart 90 degrees counter-clockwise (the reason for this will be obvious in a moment):
Step 4. Rotate 90 degrees counter-clockwise. (Click for larger image.)
The repetition of years on the horizontal axis can be removed by converting the graph into a line chart:
Step 5. Convert to line chart; add gridlines. (Click for larger image.)
Notice that all 20 data points have been preserved and there has been no loss of information. I added horizontal gridlines so that the data labels can be omitted, as has been done in this next chart:
Step 6. Remove data labels; decrease gridline intervals. (Click for larger
image. In larger image locate cursor over lines to display data values.)
The chart is cleaner without the data labels, so here I cheated a little by taking advantage of an interactive feature of Tableau Software. It allows you to select individual data points to display values as needed. Also, the gridlines, shown here with smaller scaling intervals, make evaluating the data fairly straightforward.
So let’s take a look at the before-and-after views:
Click for larger image.
Because the original AASL chart is so colorful, it seems more substantial than it actually is. (In glossy reports a multitude of multi-colored charts can be very impressive!) The line chart appears rather sparse by comparison. Nevertheless, from an informational standpoint the line chart is superior. It contains all the necessary information, is less cluttered and easier to read, and makes comparisons a cinch. And it gives readers a better understanding of the overall “lay” of the data.
Plus, the simplicity of the line chart makes it perfect for combining with other charts for comparison purposes as seen here:
Click for larger image. In larger image locate cursor over lines to display data values.
As you can see, the simpler the graphical arrangement the more informative it can be. You could even call this approach child’s play.
1 A more recent and quite definitive book on the principles of best data visualization practice is the second edition of Stephen Few’s book, Show Me the Numbers.
2 The AASL 2012 report uses bar charts of this same style with a new blue color scheme.