In recent years in suburban Cleveland, Ohio the Cuyahoga County Public Library embarked on an ambitious building program which ended up alienating some community members. In a public forum last year one citizen asked how the building campaign could be justified when the library’s own “statistically valid” survey indicated that 90% of patrons were satisfied with the library facilities as they were.1 The library director began her response by saying that “some of it is the way the [survey] questions were asked.” She then went on to explain that a range of other relevant information, beyond patron satisfaction percentages, was considered in the library board’s decision, including systematically gathered community input to building designs.
I cannot say whether the library’s decisions were in the best interests of the local community or not. However, I can comment on the data aspect of this story. So, let me restate that part of the director’s argument more forcefully:
A statistically valid survey finding applied to the wrong research question is logically invalid. The community’s level of satisfaction with current library facilities is not a reliable indicator of its feelings about the sufficiency of those facilities over the longer term. Nor whether the community believes it is better to incur large costs to maintain older facilities or to invest in new ones that permit the library to adapt better to changing community needs. In other words, that’s the wrong question.
Contrary to popular belief, on their own data don’t necessarily mean anything. Their meaning comes from how the data are interpreted and what questions they are used to address. Interpreting data with respect to pressing questions is the crux of data analysis. This is why Johns Hopkins biostatistician Jeff Leek begins his new book, Elements of Data Analytic Style, with a chapter about how the type of research question predetermines what analysis is needed.
Data analysis is about connecting the dots. And good data analysis tells which dots can justifiably be connected and which cannot. Granted, this essential task isn’t nearly as glamorous as new fads like big data and open data. Analysis is barely on the radar of the enthusiastic promoters of 21st century data-diving. Probably they’re so engrossed in stockpiling massive amounts of data that they haven’t had time to consider what’s needed to get the dot-connection part right.
Leek, who works with troves of human genome data, observed in his blog that big data promoters are mostly technologists rather than statisticians. He lists 5 major big data conferences where statisticians were absent or sorely underrepresented, including the White House Big Data Partnership. Leek foresees trouble in the big data movement because of its lack of appreciation for statistical thinking and quantitative reasoning.
So does data scientist Kaiser Fung, author of Numbersense: How to Use Big Data to Your Advantage. Fung writes, “Big data means more analyses, and also more bad analyses…Consumers must be extra discerning in this data-rich world.” 2 He predicts that big data won’t make society better informed. Rather, having good and poor analyses mixed together will lead to more confusion and misinformation.
Statistician Howard Wainer has similar fears about open data. In his book Medical Illuminations: Using Evidence, Visualization, and Statistical Thinking to Improve Healthcare he discusses legislation in the state of New York requiring the state public health department to publish maps showing cancer rates at the U.S. census block level. Wainer explains why this is a big mistake. Due to marked population variability among locales, cancer rates can be substantially overstated. Leaving New York citizens free to misconnect dots based essentially on phantom patterns caused by random statistical error.3 Wainer wrote:
Massive data gathering and modern display and analysis technology are important tools that, when used wisely and with knowledge, can be mighty allies in the search for solutions to difficult problems. But such powerful tools should be kept out of the hands of children and adults who don’t understand their stringent limitations.4
I doubt that Wainer is naive enough to expect his warnings will be heeded. To borrow from an old saying, misguided data analyses will always be with us as will answering the wrong questions. And many claims based on these missteps will be repeated again and again until they become accepted as facts.
This is definitely the case on the library front. A good example is how the 2013 Digital Inclusion Survey (DIS) report continues to misconnect dots in the same way these were in multiple annual Public Library Funding and Technology Assessment Study (PLFTAS) reports. (The PLFTAS surveys were funded by the Institute of Museum & Library Services (IMLS) and conducted by the American Library Association (ALA) and the Information Policy & Access Center (IPAC). The DIS survey was funded by IMLS and conducted by ALA and IPAC in collaboration with the International City/County Management Association.)
The misconnected dots I’m referring to, appearing here and here and illustrated in the infographic below, are claims that the number of public access computers and network bandwidth at U.S. public libraries are insufficient to meet user demand.
Click to see larger image.
The claims from both surveys are mistaken due to the limitations of the survey data presented as well as how these were interpreted. The way the data were collected made them inaccurate. Even if the data were accurate, they do not measure actual demand for library services. Nor are they satisfactory indicators of how sufficient technology resources are. The claims are also mistaken because they amount to cherry picking. In this case, pretending statistics from a subset of libraries describe U.S. public libraries in the aggregate, a trick that has become accepted practice in library advocacy. In this article I chronicle another example of this tactic and offer more honorable ways to present library advocacy data.
In this post, though, I’ll focus on the first three problems just mentioned. And I thought I’d get to the heart of the matter by talking about measuring sufficiency first. After this I’ll return to the other two.
To begin we need a way to determine whether the supply of public library technology services is sufficient to meet user demand. That is, we have to ask the right question. Following the approach used in the PLFTAS and DIS studies will lead us astray. That approach is based upon the fallacy that users needing to wait for busy resources means those resources are insufficient. But fully utilized technology capacity can mean the exact opposite, that capacity matches demand fairly closely. Measuring how often users have to wait, as the two surveys did, doesn’t tell us how well supply satisfies demand. The question we need to ask instead is how much user demand goes unmet?
While their prime mission is meeting community needs, libraries are also obliged to be good stewards of public resources. In this 2010 paper I discussed this obligation using the example of municipal emergency services. Municipalities must have sufficient police and fire personnel and equipment available to respond to local emergencies. And it is understood that this capacity may be idle during times of low demand. However, for non-emergency public services it is fiscally irresponsible to allow equipment to sit routinely idle. In other words, it’s important that the equipment be well utilized, with full use being the optimum situation.
In his earlier book Numbers Rule Your World Fung described how service capacity is handled at Disney World where wait lines are an accepted part of the customer experience. Even if Disney World were to greatly expand the capacities of its attractions, waiting lines would still form because of user traffic patterns. Disney statisticians specializing in queueing theory understood that by random chance alone a surplus of customers could flock to a single ride at the same time and overwhelm its expanded capacity anyway.5
Acquisition librarians understand that not all levels of demand can be met all of the time. They aim to order enough copies of popular materials without overstocking the shelves. However, following the reasoning of PLFTAS and DIS recommendations, acquisitions librarians should immediately order additional copies of any titles with holds on them. To technology zealots demand should never outpace supply no matter what the costs to libraries and the public might be.
Obviously, this simplistic view is contrary to established library practice and good government. It is a disservice to libraries to urge them to share advocacy materials like the PLFTAS infographic (above) with funders and stakeholders. Business members of boards and budget committees will see through the flawed argument. The PLFTAS, DIS, the Edge initiative, and Impact Survey should be measuring and reporting actual unmet demand. So far none of these large national projects has asked the right question, leaving the library community in the dark about the sufficiency of technology services.6
All right. That’s enough sermonizing. Now we can consider the methodology side of the PLFTAS and DIS surveys as there are some interesting lessons to be found there. The first I mentioned already: Even if the research questions about technology capacity had been the right questions, the accuracy of the measures collected is quite limited because these measured staff impressions (best guesses) rather than actual technology use. Actual measures would be minutes/hours per typical day that library computers were completely busy with users waiting and minutes/hours per typical day that the library network was congested.
The survey’s multiple choice questions added to the inaccuracy. Below are the PLFTAS questions about computer availability and network speed:
During a typical day does this library branch have people waiting to use its public Internet Workstations?
1. Yes, there are consistently fewer public Internet workstations than patrons who wish to use them throughout a typical day.
2. Yes, there are fewer public Internet workstations than patrons who wish to use them at different times throughout a typical day.
3. No, there are sufficient public Internet workstations available for patrons who wish to use them during a typical day.
Given the observed uses of this library branch’s public Internet access services by patrons, does the library branch’s public Internet service connection speed meet patron needs?
1. The connection speed is insufficient to meet patron needs most of the time.
2. The connection speed is insufficient to meet patron needs some of the time.
3. The connection speed is sufficient to meet patron needs almost all of the time.
4. Don’t know.7
Ignoring the ambiguity of the network speed question (who’s doing the observing?), respondents would not interpret phrases like consistently, at different times, some of the time, and most of the time uniformly. Leaving us now with two sources of inaccuracy—fuzziness from subjective impressions about what really happened in libraries and added noise from varied interpretations of the questionnaire choices. Results of this noise are apparent in the PLFTAS data shown here:
Click to see larger image.
The lines show library outlets reporting insufficient workstation availability (red and violet) compared to sufficient availability (green). Notice from 2008 to 2012 that unavailability trended downward and availability upward, except in 2011 when at times computer unavailability (violet) increased at a 7% rate and sufficient availability (green) decreased at 10%. The fact that the consistent unavailability (red) continued steadily downward suggests that respondents may have had trouble differentiating between at times versus sufficient. If respondents had interpreted these consistently, workstation availability probably would not have dipped in 2011.
How respondents interpreted questionnaire items is definitely a factor in the DIS data. For that survey the computer sufficiency question was revised to this yes-and-no version:
During a typical day, do patrons experience wait times to use this library branch’s public access computers or laptops?
1. Yes 2. No 3. Don’t Know8
Survey anomalies make dash-lined trends fishy. See text to interpret this chart correctly. Click to see larger image.
Here the solid red line represents the combined red and violet lines from the prior line chart, and the solid green line matches that chart exactly. Data from the DIS question are plotted in the gray area with red and green dashed lines.
Notice that these lines show workstation availability (green) and unavailability (red) changing in 2013 by about 30 percentage points, up and down. Essentially, the 2012 PLFTAS unavailability figure (65.5%) flip-flopped with the 2013 availability figure (58.4%, with 6% Don’t know’s). Having 60+% of library outlets reporting insufficient workstations one year and then nearly the same percentage saying the opposite the next year seems fishy to me. I suspect that the line segments up to and after 2012 do not depict a legitimate trend. The dashed lines and gray column in the chart are meant to signify this.
Very likely the decided shift in 2013 is mostly due to side-effects of the survey instruments. Despite the general increase in availability from 2008 to 2012 described already, data from those years probably overstate the incidence of perceived workstation insufficiency.9 However, the DIS figures are puzzling. The revised question allowed libraries to rate public computers as insufficient if they witnessed capacity in full use at any time during the day, a more lenient criterion than at different times in the PLFTAS question. So you’d think more libraries would have noticed at least one occasion of full capacity use in a typical day. But I guess not.
There is also a third source of inaccuracy in the PLFTAS data, biased survey questions. Listing the consistently fewer and at different times options first, using imprecise adjectives, and omitting a Don’t know option all would steer unsure respondents towards a vote for insufficiency. The point about the lenient revised question is that even that bias was ineffective. So, you have to conclude that either the questions are unbiased or the level of perceived workstation sufficiency is even higher than the surveys show.
The shift visible in the chart might also be due, in part, to another survey side-effect, this one related to sample sizes. The 2012 PLFTAS sample had twice the respondents (7,260) than the DIS sample had (3,392).10 Attrition of respondents could have contributed to the pronounced changes in 2013.
So, what can be done with inaccurate data answering the wrong question that, when the survey specifics change, paint a completely different picture about that wrong question? Well, the first thing researchers could have done is advise against over-interpreting the data. That is, against jumping to unequivocal conclusions. The quality of the data don’t really justify that kind of certainty. The researchers should have included caveats in the projects’ executive summaries and DayGlo infographics acknowledging that the data are gross estimates subject to lots of error.
I know this irritates public relations and marketing types in our profession whose daily work it is to cajole audiences into believing the implausible. But even in the marketing profession there’s some recognition that survey research methods should not be completely bastardized. You may not be aware that the marketing research wing of that profession established standards for survey research. Here’s one the Digital Inclusion alliance might want to ponder:
Report research results accurately and honestly…[and] present the results understandably and fairly, including any results that may seem contradictory or unfavorable.11
1 The citizen asking the question used the phrase “statistically valid.”
2 Fung, K. 2013. Numbersense: How to Use Big Data to Your Advantage. New York: McGraw-Hill, p. 8. Italics in original.
3 Wainer, H. 2014. Medical Illuminations: Using Evidence, Visualization & Statistical Thinking to Improve Healthcare. New York: Oxford University Press, pp. 16-20. Note: The statistical explanation Wainer gives is de Moivre’s equation which defines standard error in statistics. The equation explains why data variation in sparsely populated locales is always greater than in densely populated locales.
4 Wainer, H. 2014, p. 19.
5 Fung, K. 2010 Numbers Rule Your World: The Hidden Influence of Probability and Statistics on Everything You Do. New York: McGraw-Hill, pp 7-8.
6 The Edge project has a standard requiring that libraries have “a sufficient number of device hours available on a per capita basis.” And that libraries meet or exceed “the minimum bandwidth capacity necessary to support public user demand.” But the project offers no concrete help in collecting data needed to measure compliance with these standards.
7 Bertot et al. 2012. 2011-2012 Public Library Funding and Technology Access Survey, College Park, MD: Information Policy and Access Center, Appendix A.
8 Bertot et al. 2014. 2013 Digital Inclusion Survey, College Park, MD: Information Policy and Access Center, Appendix C.
9 Regardless of how accurate the data are, the plotted trends would eventually have intersected anyway. But not in 2013. The rate of acceleration of the sufficiency (green) line is exaggerated due to the drop in 2011.
10 Bertot et al. 2012, Fig. M-1, p. 14; Bertot et al. 2014. Fig. 1, p. 15.
11 Marketing Research Association. 2007. The Code of Marketing Research Standards, Washington, DC: Marketing Research Association, Inc., para. 4A.