I beg the reader’s pardon for another long delay since posting here. Unfortunately, my non-library endeavors have caused me to neglect this blog. I will try to make up for this inattention with what I hope will be a worthwhile post. As for the future, I may not be able to post as frequently as before. Though I should find time to offer some statistical musings every few months.
If I may, I’d like to introduce the topic I’ve chosen for this post. The title refers to a statistical formula derived in 1730 by the French mathematician Abraham de Moivre.1 I believe this equation may be the key to solving a longstanding puzzle in library statistics pertaining to large urban public libraries in the U.S. The puzzle is, why do the best performance statistics from the largest urban public libraries barely compare with the best statistics found among the smallest public libraries?
In case you are unaware of this curious fact, the highest per capita statistics (visits, circulation, and so on) reported by large U.S. public libraries are typically multiple points lower than those reported by small public libraries. My colleague Keith Curry Lance and I encountered this while designing the LJ Index of Public Library Services rankings in 2008. We saw that the LJ Index ranking scores for large urban libraries (all scores are based on per capita output measures) also fell far behind scores for small and medium libraries.
Before delving into this conundrum I need to advise you that my aim is to continue to stress the importance of quantitative reasoning and statistical thinking for libraries that are working with data. Granted, the library profession has made definite progress in building its assessment capabilities and becoming increasingly data-driven and evidence based. Now librarians are even enthusiastically enrolling in workshops on research methods. Just this month we see academic libraries so confident about the fruits of their assessment labors that they announced that the jury is in on the value of libraries.
However, I suspect that the profession’s mastery of quantitative reasoning and statistical concepts has not kept pace with its aspirations. Without sound quantitative and statistical skills libraries can unknowingly draw conclusions from their data that turn out to be embarrassingly wrong.
Having to backtrack from published data claims can be a catastrophe. Consider the unfortunate situation at Claremont/McKenna College a few years back. It happened that the college’s president publicized inaccurate estimates of how dishonest her school had been when it falsified data reported for U.S. News & World Reports college rankings. Data scientist Kaiser Fung remarks on the president’s poor grasp of basic statistics in his book NumberSense: How to Use Big Data to Your Advantage. Here is the claim made by the Claremont/McKenna president:
The collective score averages often were hyped by about 10 to 20 points in sections of the SAT tests…That is not a large increase, considering that the maximum score for each section is 800 points.2
Fung proceeds to debunk the president’s claim beginning with this observation:
This [the 10 to 20 points] is equivalent to boosting the individual scores of about 300 freshman by 10 or 20 points each, totaling 3,000 to 6,000 phantom points!3
He then shows that the comparison of faked points to the 800 maximum score is inaccurate and irrelevant. The comparison is inaccurate because it overlooks the second SAT section which Claremont/McKenna also reported fake data for. Taking this section into account the college’s total fake points were closer to 12,000, not merely 10 to 20. Comparing faked points to the 800 score is irrelevant because the valid comparison should be between annual average scores with and without the faked points. Incidentally, the investigation revealed that the college’s scores were actually inflated by an average of 30 to 60 points.
Even without knowing the true number of inflated points we could still discover the error in the president’s claim. All we would need to know is how a simple average works. We all understand that an average summarizes a multitude of individual cases. Pushing an average upward means pushing lots of singular cases upward.
However, something additional is needed to make this discovery. This is our being astute enough to ask, “Is this quantitative claim reasonable based on the data and the calculations used?” Of course, answering this question takes time, energy, and persistence. A couple of years back I wrote two in-depth posts (here and here) suggesting that the LibValue Project at the University of Tennessee published dubious cost/benefit estimates. I suspect that few people were persistent enough to trudge through the annotated formulas to assess my arguments.
But trudging through the quantitative details is the only way to determine whether research findings (or critiques) do or don’t hold water. When we forgo opportunities to trudge through research reports and analyses, we risk keeping ourselves willfully ignorant. And being duped into believing half-truths.
So, now I invite those so inclined to trudge with me through the details of the statistical puzzle I introduced earlier—the discrepancy between the highest per capita statistics reported by the smallest U.S. public libraries and those reported by the largest libraries. Not a pressing issue, I realize. But it has been on the minds of some library statisticians over the years. It definitely mystified Ernest De Prospo and his colleagues who conducted the classic 1973 American Library Association study, Performance Measurement for Public Libraries. These researchers summarized the discrepancy this way:
Statistical comparison of libraries of different sizes…suggests that small libraries give a greater return per dollar spent [evidenced in part by higher per capita statistics], and that the economy of scale normally expected in larger institutions is not evident.4
As I said, I believe De Moivre’s equation can help solve this puzzle. You may well have already been introduced to De Moivre’s equation in basic statistics courses since the equation is commonly known as the standard error of the mean. Despite these esoteric-sounding names the equation is pretty simple:
Yet, as statistician Howard Wainer points out in his book, Picturing the Uncertain World, the equation is a powerful one! Ignorance of De Moivre’s equation led the Bill & Melinda Gates Foundation to misspend $1.7 billion based on the mistaken belief that small schools routinely out-perform larger school systems. The Foundation ended up scrapping the policy of establishing smaller, more intimate schools when it learned these didn’t deliver the performance gains that were expected.5
The ideas behind De Moivre’s equation are central to statistical sampling and to understanding how statistical variation reveals itself in data. Please take a moment now to read Wainer’s explanation of the equation on pages 6 – 7 in this excerpt.
In case you find the last part of the Wainer’s explanation a bit difficult to follow (as I did), let me offer a clarification: On the right side of the equation you can think of the numerator of the fraction (σ) as equal to 100% of the population variation (that is, standard deviation). Division by √n lessens this 100% σ quantity by some amount depending on the value of n. For example, when n = 4 then the (100%) σ quantity ends up being cut by one-half (50%) with one-half (50%) remaining. When n = 25 the 100% is cut by four-fifths (80%) leaving one-fifth (20%) remaining. And n = 100 cuts σ by 9-tenths (90%) down to 1-tenth (10%).6
Remember that these cuts/decreases caused by √n refer to variations in averages from multiple, equal-sized samples hypothetically drawn from the population we’re studying. This is the quantity seen on the left side of the equation and is the standard error of the mean. Thus, the decreases just described do not refer to any sort of changes to the population standard deviation itself. The standard deviation (σ) is a constant number we would calculate after surveying the entire population. Although such a survey is usually impossible to conduct, for our purposes here we suppose that we can, and that we are therefore able to calculate the population standard deviation.
When we know the population’s standard deviation (σ) and the size (n) of multiple random samples we hypothetically draw from the population, De Moivre’s equation tells us how spread out averages from these samples will be. Specifically, the formula tells us that the higher sample size, the lower the variation of sample averages will be. Again, this variation is referred to as the standard error of the mean.
This is the same as saying that when, for example in a survey of student heights, we draw a large sample from our population (like n = 1,000), the average height from this sample will likely be closer to (less variant from) the true average for all students than averages calculated from smaller samples (like n = 10 or n = 100). Makes sense. Larger samples tend to produce more accurate statistical estimates.
However, this straightforward idea has an interesting twist. Different sample sizes having more or less less variation leads to some distinctive patterns in data. Wainer presents one such pattern in this chart from his book already mentioned:
Scatter Plot Illustrating De Moivre’s Equation7
Notice the spread of points in smaller U.S. counties (to the left near 10,000) and how these taper off for population sizes of 1,000,000 and higher. Note also that some counties with populations from about 200 up to 100,000 had zero reported cases of kidney cancer, seen in the horizontal line of data points level with zero on the vertical axis.
So what the heck does this have to do with per capita library statistics? Well, it is not generally recognized that per capita measures are actually averages. They are average numbers of borrowed items, visits made, program sessions attended, dollars expended on staff, and so on among the library’s service area population. Although individual public libraries aren’t randomly drawn samples, the statistical behavior of per capita measures from library communities of, say, 5,000 residents would be analogous to that of measures taken from multiple samples with n = 5,000 drawn from all communities served by public libraries. So would measures from libraries serving 1,000,000 residents be analogous to measures from samples with sample size (n) = 1,000,000 we might draw from among all U.S. residents served by public libraries.
To reiterate, the size of the samples (library service area populations) has a lot to do with how the averages (per capita statistics) behave. Now, let’s see how this plays out with actual library statistics.
The scatter plots below depict two common library statistics, visits per capita and circulation per capita from the Public Libraries in the United States Survey for 2015 (based on 2013 data). The charts shown are non-interactive. You can access a larger interactive Tableau Public graphic by clicking anywhere on the charts. There you can hover your pointer over individual data points to see detailed data. And select an item in the color-coded chart legend to highlight a specific population category.
Scatter Plots of Visits per Capita and Circulation per Capita. Click image for larger interactive charts.
The 2 charts in each row show identical data with the right chart color-coded to indicate population categories (see legend). The 2 charts in the left column show the general lay of data points by community population size (the horizontal axes). Notice that the vertical dispersion of data points at the left ends of both charts is fairly broad, while data points to the right are spread narrowly.
The dispersion just described for the left column (gray) charts is the same in the color-coded (right) charts, since the data in each row are identical. Here, though we can see the low outlying values (outliers) appear in segments ranging from 25K to 259.9K (blue, purple, and lighter green bands). These are likely due to unusual situations, such as low statistics reported by Puerto Rican libraries, or other libraries with zero circulation. Nevertheless, these exceptions are not numerous enough to counter the basic pattern described.
Also, note that the individual charts contain 9,000+ data points. So they are concentrated in the center of the plots. To illustrate how dense the data points are, here I show a version of the visits charts with the vertical axis range shortened to omit the highest and lowest outliers. This gives us a closer, more dispersed view of the data:
Closer View of Visits per Capita Scatter Plot. Click image for larger interactive charts.
For the charts discussed here, however, I use the more compact scatter plots since they show more extreme outlier values, which are relevant to the puzzle we’re investigating.
Now let’s see if the pattern De Moivre’ equation predicts holds true for library expenditures, shown in the next graph:
Scatter Plots of Total Operating Expenditures per Capita and Collection Expenditures per Capita. Click image for larger interactive charts.
In these charts the overall shape of the data is very similar to the visits and circulation charts. Smaller population sizes have more high and low outlying values, while the largest population’s highest and lowest values are quite restricted. In multiple population categories, including larger categories, there are several low outliers. However, these same categories lack high outliers.
To check whether these patterns hold for other library measures take a look at the next graph:
Scatter Plots of Public Internet Computer Uses per Capita and Total Program Attendance per Capita. Click image for larger and interactive charts.
Both measures in these charts show the same basic pattern already described. Finally, lets look at one more pair of measures shown here:
Scatter Plots of Reference Transactions per Capita and Registered Borrowers per Capita. Click image for larger interactive charts.
Notice that reference transactions per capita is much more dispersed than the other measures we’ve looked at. And that extremely low values are spread across several population categories. Still, we see the same basic tapering of variation towards 100,000 and higher population levels. Registered borrowers per capita follows the pattern described for the earlier charts we’ve seen.
Box-and-whisker plots, developed by the father of data visualization John Tukey, are an alternative way to visualize variation in sets of data. The box-and-whisker plots (aka blox plots) below use the same data that appear in the scatter plots above:
Box-and-Whisker Plots of Selected per Capita Measures. Click image for larger and interactive charts.
The vertical (left) axis of each chart names the measure plotted. The population category axes values are visible at the top and bottom only. (Do you know the name for this style of statistical graph containing multiple individual charts?)
Remember that box plots show the spread/variation of a set of data. However, because the data points (gray circles) are plotted on top of each other, we can’t really tell how many data points fall where. Especially, we cannot tell in a given box plot that the strips of light gray coloring in the box (the interquartile range) can represent hundreds of moderate data values, and the bottom blue lines (whiskers) can represent scores of extremely low and zero data points. The visibility of the detailed data in the scatter plots above solves this problem, especially for low outliers.
Although box plots hide these details, they do make high outlier values obvious, which is what we’re looking for. In the box plots above the smallest population categories show values extending far above the boxes. In the largest categories very few high outliers extend upward, especially with the 1M+ category. For 6 of the measures plotted (first 3 rows of the graph) the height of the top whisker (blue horizontal line) of the 1M+ box plot is shorter than for any of the other categories.
So, what can we conclude? Per capita statistics from U.S. pubic libraries serving small populations vary a great deal, producing extreme high and low values. On the other hand, statistics from libraries serving large urban populations have very limited highs and lows. This limitation comes solely from the large population sizes. The inability to match high values found for small public libraries is not a performance failing on the part of large urban libraries. Rather it is due to the measurement method itself, that is, to the practice of measuring and then comparing libraries using per capita data.
For more than 100 years public libraries have attempted to make fairer comparisons of library statistics across libraries of different sizes by transforming their statistical counts into per capita rates. This adjustment, it turns out, is too severe, at least for the largest and smallest sized libraries. Due to De Moivre’s equation per capita adjustments give the smallest libraries a definite advantage.
De Prospo and his colleagues did not include very small libraries in their 1973 study. Their sample consisted of 180 systematically selected U.S. public libraries with expenditure levels from $100K to $249K (small), $250K to $749K (medium), and $750K to $3.5M (large). They found small libraries did out-perform large libraries, but not by leaps and bounds. Rather, smaller libraries surpassed large libraries by moderate amounts in 17 of 24 measures the researchers analyzed. Still, the higher performance on the part of small libraries definitely stumped the researchers. Rather than conclude that this discrepancy reflected actual library performance levels, De Prospo and his colleagues questioned the measurement system itself. They wrote:
[Library] statistics as now collected appear to have limited value (a) in making valid comparisons; (b) as a basis for setting standards of development or performance; and (c) to establish historical trend lines.8
Knowing about De Moive’s equation also leads us to question the measurement system. The equation warns that variation alone gives unfair statistical advantage to small libraries, public schools, charter schools, counties, cities, states, countries, and any other small entities that end up compared with very large counterparts. What appears to be exceedingly high performance, or exceedingly low performance, turns out to be mostly a by-product of statistical variation.
1 See Wainer, H. The Most Dangerous Equation. American Scientist. May-June 2007.
2 NumberSense: How to Make Use of Big Data. 2013, New York: McGraw-Hill.
3 NumberSense: How to Make Use of Big Data. 2013.
4 De Prospo, E., Altman, E. & Beaseley, K. 1973. Performance Measurement for Public Libraries, American Library Association, p. 20.
5 Wainer, H. 2009. Picturing the Uncertain World, Princeton, NJ: Princeton University Press, p. 13.
6 Sample size does reduce variation, thus increasing the accuracy of sample statistics. However, the trick is that the standard error of the mean (variation of the sample averages) does not decrease in direct proportion to sample size n. Rather it changes based on √n. Thus, variation decreases much slower than if the population standard deviation (σ) were divided by n rather than √n.
7 Wainer, H. 2009. Picturing the Uncertain World, p. 10.
8 De Prospo, E., Altman, E. & Beaseley, K. 1973. p. 22.