Statistical Hearsay

I admit it. I’ve been suffering from a case of statistician’s block. No inspiring ideas for this blog have presented themselves since July. Well, actually, a couple did surface but I resisted them. Very recently, though, the irresistible “infographic” shown here came to my attention. I am therefore pleased to return to my keyboard to discuss this captivating image with you!

Source: ALA, Libraries Connect Communities, 2012.  Click for larger image.

The infographic appears in the executive summary of the American Library Association’s (ALA) report, Libraries Connect Communities: Public Library Funding & Technology Access Study 2011-2012, published in June. The graphic’s basic message is an ongoing struggle between two sides. On the left the blue silhouetted figures represent public demand for technology services at libraries, with four percentages quantifying levels of use. The lone silhouette on the right side personifies library funding (is he a municipal budget official?), with a single percentage quantifying that. Apparently, the quantities on the left are, using the tug-of-war metaphor, overpowering the right side.

Let’s look a bit closer at the quantitative evidence in this infographic beginning with the right side. The 57% figure represents surveyed libraries reporting flat or decreased budgets. If, as implied in its title, the purpose of the infographic is to contrast overall demand for library technology services (or even demand for all types of services) with overall resources (budgets), then the 57% figure seems to be an understatement. Shouldn’t it also include the 43% of libraries with budget increases? Here’s my thinking: If 43% of all libraries had funding increases and 36% had increases in technology class enrollment, it seems like the 43% increase would compensate for the 36% shortfall.1  So, let’s experiment with another way to view the comparison from the ALA infographic:

   Click for larger image. See also footnote 3 below.

From a bird’s-eye view of all U.S. public libraries, an imbalance or “deficit” between the sides of the ALA infographic could also be represented by the gaps between the horizontal dotted line and the heights of the brown bars in this chart. Except this line of thinking (pun intended!) has a couple of weaknesses in it. First, comparing figures representing changes in total budget resources with changes in specific services that consume only a fraction of those resources is an apples-and-oranges comparison. So the contrasts made in the infographic are not very meaningful.2  (I return to this idea further on.)

Second, ignoring the apples-and-oranges claim I just made, the infographic’s message is unconvincing because the data are only indirect indicators of library use and resources (budgets). In fact, the data are quite remote indicators of these things, very akin to hearsay. While the percentages look like actual library measures, they are just relative counts of hearsay reports tallied by the researchers.3

Let me illustrate this idea with a hypothetical story: Here in Cleveland there is a new, somewhat controversial downtown casino (whose revenues have been decreasing steadily since it opened). Suppose that I work for the local newspaper and am assigned the task of investigating gambling losses by senior citizens. So, I (magically) select and interview a random sample of this sub-population and learn that 93% of gambling seniors report having, overall, lost money gambling in the new casino. Not wanting to pry, I didn’t ask folks to divulge either losses or winnings. On a deadline, I electronically submit my story to my editor with the headline, “Seniors Gambling Losses A Staggering 93%!!” Buried in the story is the fact that the 93% is not a dollar amount. These senior citizens didn’t lose 93% of their money set aside for gambling, nor of their monthly pensions, or anything like that. 93% is the relative proportion of losers among all Cleveland seniors who gambled at the new casino.

The ALA infographic gives the same mixed signals that my casino news story does. On one hand it discloses the fact that its data are percentages of libraries responding one way or another in the survey. This disclosure informs the reader that, like the 93% senior gambling figure that reveals nothing about actual gambling losses, the infographic figures reveal nothing about actual amounts of change in funding and library use.

At the same time, the infographic’s bold-faced percentages (no pun intended) embellished with up- and down-arrows tell a different tale.4  And the unqualified label “Increased Public Use” adds to the impression that the left-side percentages indicate actual increases. But, again, none of the data represent actual increases or decreases. They are hearsay about increases and decreases.

It will help, now, to spell out what measures of actual library use and budgets are, and how changes in these are expressed. Simply, library use is documented counts of services delivered to users measured in units like visits, encounters, programs, check-outs, downloads, technology class enrollment, computer sessions, and so on. Library funding is money allocated to libraries and, in the U.S., its units of measurement are dollars.

Changes in either of these types of measures would, initially, be collected as current visit counts, downloads, class enrollment, budget dollars, and so on. Then, these would be revised into percentages calculated using last year’s visit counts, downloads, class enrollment, dollars, and such as baselines. Even after this conversion into percentages, the data still refer to visits, downloads, class enrollment, dollars, and the like. And, as you see, none of the percentages in the ALA infographic do. Their units of measure are merely respondent libraries.

There’s another peculiar result from showcasing data that resemble actual changes in library use and resources, but are not so. Because the ALA data are one (very substantial) step removed from the data that are really needed, a large range of alternative percentages can be inserted into the infographic without changing its meaning. Put another way, what percentages appearing in the infographic would tell a different story—a story of near balance between the silhouetted contestants, or of a decisive imbalance that causes one side to pull the other into the muddy pond between them?

Answering this question is an interesting exercise, but the answer is not particularly useful. Neither are the data in the ALA graphic. Effectively assessing the extent of the “challenge” of library use compared with resources requires direct measurement of library use and resources. For instance, for the 36% of libraries reporting growing enrollment in technology classes, what is the actual enrollment increase? Is it 1%, 4%, 10%, or what? 36% of libraries reporting an increase of, let’s say, 10% enrollment and the remaining 64% reporting, say, stable enrollment (0% increase) would make the average increase among all libraries 3.6%. This sort of data is much more informative than the sort appearing in the infographic.

Beyond actual measurement of increased service use/demand, we also need data revealing the additional cost—expressed in dollars—associated with each of these increases. For example, after determining actual levels of increased WiFi use, and how this increase compares to total WiFi capacity, could the 74% of libraries then estimate the additional (marginal) costs associated with these increases? Depending on library capacity, there might be no additional costs. The same might be true for technology classes and also for public computer session counts if, for instance, libraries shortened computer usage times and therefore end up reporting more sessions.

Granted, collecting these sorts of data nationally is a mammoth task. The ALA infographic uses the best available information to describe the situation. But their data are not good enough. Its relevance is too low. And presented in the format shown, the information is easy to misinterpret. (The main body of Libraries Connect Communities contains much more substantial data than the figures the authors chose for the infographic, which, incidentally, got top billing in the executive summary.)

Fortunately, individual public libraries are not hampered by the constraints that national survey projects contend with. Libraries have the opportunity to collect the necessary data in their local settings. While doing so, I hope no libraries imitate or “clone” the contents of the ALA infographic for a local budget hearing. That would be disastrous! Can you imagine a library director justifying a request for a 10% budget increase based on 70% of her staff reporting the circulation desk to be busier now compared to 60% saying so last year? I think I even heard tell of such a case…

1  This ignores the fact that we don’t know whether funds from the 43% would be allocated to technology classes or any technology services. Or conversely, that funding decreases would automatically subtract from class enrollment or technology services as a whole.
2  My previous post describes a similar problem with a chart in the report by the PEW Charitable Trusts Philadelphia Research Initiative. The chart draws faulty conclusions based on comparisons made among percentages with varying baselines, i.e., apples-and-oranges comparisons. In the case of the ALA graphic where budget data are brought into the comparison, the misfit actually worsens. I’ll save that story for sometime later as it deserves attention in its own right.
3  Due to this problem and the oranges-and-apples problem, my bar chart is a failed attempt to improve an argument that can’t really lead anywhere. Oh, well. The chart was still fun to design!
4  Did you notice that the shade of blue of each arrow and the silhouette above it corresponds with the magnitudes of the percentages? This means the strongest contestant, the dark blue WiFi figurine, is out front doing the heavy pulling! Do you think the technology classes guy (36%) is pale enough to denote his weak contribution, which is slightly more than 1/2 the level of the computers (60%) guy?

2 thoughts on “Statistical Hearsay

  1. I don’t just like this, I love this! This is one reason that we are working hard at to provide data that helps libraries explain their needs. The data, as it is currently reported, does not do enough to support an anecdote. PLFTAS is marginally helpful, provides a baseline but in the eyes of hard-nosed funders, is not taken seriously. Visit us and read our blog – we love your work!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s