Evaluation, assessment, and performance measurement are not what you’d call sciences. But these activities do share certain things in common with science and the scientific method.1 One is the requirement that theories be tested based on the compilation of objective evidence. Another is the idea of replication, which is carefully repeating a measurement or experiment in order to verify that the initial findings were not an accident or mistake of some sort. Then there’s the more philosophical concept known as falsifiablity. A scientific theory needs to be such that there is some way that it can be examined and possibly disproved. A credible scientific theory is one that holds up under repeated attempts to be proven wrong.
In everyday terms, there is a lot of transparency and double-checking in science. I bring these ideas up because, as it happens, there is a claim made in my prior blog entry that needs rechecked. The claim is:
On the basis of per capita statistics, smaller U.S. public libraries out-perform the largest U.S. public libraries.
After I made the claim a couple of things made me wonder more about it. First, Tord Høivik found quite opposite trends based on Norwegian public library statistics he analyzed from the KOSTRA municipal-county reporting system.2
Then, when I mentioned my claim to Keith Curry Lance, he said that these differences can be a side-effect of the data ranges that the libraries are sorted into (for instance, expenditure categories such as [a] under $10,000, [b] $10,000 to $49,999, [c] $50,000 to $99,999, and so on). I had forgotten about a chapter that statistician Howard Wainer wrote on this exact topic. In his book Picturing the Uncertain World he explains how it is possible to manipulate the data ranges to create trends that don’t really exist in the underlying data.3
Revisiting My Claim
Click for larger image.
This table begins with the 11-category scheme that IMLS happens to use,6 followed by other schemes I created to contain fewer categories. This involved changes like expanding the highest category to include populations of 600,000 and above, defining the smallest libraries as serving populations up to 4,000, and so on. For fun, I included the binary scheme, libraries serving communities less than 200,000 and those serving 200,000 or more.7
As the table shows, with fewer categories the counts of libraries in each category are higher. And, regardless of the category breakdown, the larger population categories consist of very few libraries (88 for 300K-599.9K and 71 for 600K +) compared to the smaller categories (for instance, 2777 for 10K-49.9K).
Next, I plotted median values for several per capita measures, like volumes per capita as seen here:
Click for larger image.
One effect of using fewer and expanded categories is the disappearance of the highest values seen in the upper (11-category) chart. Note in that chart that the 15.7 value for the
The volumes per capita measure directly supports my prior claim. The smallest libraries out-perform the largest libraries. In the top (11-category) chart, with each larger size group the median value decreases. In the 8-category chart the claim is mostly true, although the 300-599K category is even with the 600K + category at 1.8. With the bottom right chart above (for < 200K and 200K +) the claim is true, but not very convincing due to the simple binary categories.
But, let's take a look at visits per capita:
Click for larger image.
As with the volumes per capita charts, here we see that the highest values in the 11-category (upper) chart are gone in the 8-category chart. But the interesting thing about visits per capita is how jagged and uneven the patterns are in both of these charts. In the 11-category chart, the 500K-999.9K group out-performs all others, and the smallest (<1K) category places second. Even so, the 1M category earns 4th place, matching or exceeding two other smaller categories. In the 8-category scheme, the three smallest categories beat the 600K+ group, but this last group earns 4th place, with the other smaller categories lagging behind it.
Data-based Answers Can Be Complicated
The answer to this question is more complicated than we might have hoped. We can’t really make a cut-and-dried statement about the relative performances of smaller versus larger libraries, can we?
Click images to view detailed charts.
The basic patterns in the measures shown in the charts appear in the table below. From this summary it appears that, in general, smaller public libraries do out-perform the largest ones. Still, it depends on the definition of largest and smaller libraries.
Summary of Trends for 10 Library Input & Output Measures
And whether the ten measures that I chose qualify as a reasonable collection of meaningful library measures. In any case, the claim is definitely untrue for operating expenditures and reference transactions. Also, it pertains only to the set of libraries represented in the 2009 IMLS data. The claim doesn’t apply to other years or to other countries, nor to subsets of U.S. libraries, for instance, all libraries west of the Mississipi River or Public Library Data Service (PLDS) libraries.
So here you see something else about scientifically inspired measurement. The interpretation of the data has to stay within the bounds of that data. Because I extrapolated well beyond the actual data I had (see footnote 4), my original claim was a bit flimsy. It is fortunate (for me only) that examining more data didn’t end up disproving the claim completely!
1 Some of the foundational ideas in evaluation, assessment, and especially performance measurement have also been borrowed from the field of financial auditing. See Beryl Radin’s 2006 book, Challenging the Performance Movement: Accountability, Complexity, and Democratic Values and Michael Power’s 1997 book, The Audit Society: Rituals of Verification.
2 See Tord’s comment in my prior post.
3 Wainer, H., 2009, Picturing the Uncertain World: How to Understand, Communicate, and Control Uncertainty, Princeton, NJ: Princeton Unviersity Press. See chapter 14, “The Mendel Effect.” Actually, the trick Wainer describes applies to situations where two measures aren’t related at all. Meaning that higher or lower values in one measure are not reflected in the other measure. Even so, spacing the category boundaries just right can make the opposite appear true!
4 These were libraries serving communities from 15,000 to 20,000 and those serving communities of 100,000 and over.
5 If it weren’t so cumbersome, a better way to categorize library size would be considering both service area population and library expenditures together. This was described in a 1998 IMLS publication (NCES-98-310) by Keri Bassman and her colleagues entitled, How Does Your Public Library Compare?
6 See Table 1 in Henderson, E., Miller, K., Craig, T. et al. (2010). Public libraries survey: Fiscal year 2008 (IMLS-2010-PLS-02), Washington, DC: Institute of Museum and Library Services.
7 I borrowed this population threshold from de Rosa, C. and Johnson, J. (2010). From awareness to funding: A study of library support in America, Dublin, OH: Online Computer Library Center. Their study surveyed only libraries in communities with less than 200,000 population.