I’m currently reading statistician David J. Hand’s book, The Improbability Principle: Why Coincidences, Miracles, and Rare Events Happen Every Day. This has caused some of his insights about probability to be percolating through my head. The thing about probability is that specific conditions which, in hindsight, led to the occurrence of a given event are more interesting when you also consider the uncertainty at work before the incident occurred. Just about anything could have happened. And the weirdest things do.
So now I’m having fun interpreting aspects of daily life as random events. Like the frequency of my posts here, including this one. The factor precipitating my writing this blog entry soon after my last post happens to be good fortune! The good fortune that the International Federation of Library Associations (IFLA) 2016 Congress was held last week here in my home state, Ohio. How convenient attending the Congress turned out to be! Which meant that it was in the stars for me to attend the session organized by the IFLA Statistics and Evaluation Section. Finding cheap on-street parking outside the Columbus Convention Center got me to the session on time. That and the fact that no smartphone tweets or texts distracted me resulted in my hearing an interesting statement made by a panelist from the Bill & Melinda Gates Foundation.
The panelist said that the collection and dissemination of aggregate library outcome data are the future of library advocacy.1 I thought this announcement indicated a subtle but significant policy shift that I had not heard before. Until now I had assumed that, for ongoing library advocacy efforts in the U.S. (like the American Library Association’s State of American Libraries reports, the Association of College and Research Libraries Value of Libraries project, the Edge Initiative, iWashington’s Impact project, and the Public Library Association’s Project Outcome), aggregate data were used mainly for the purpose of promoting individual libraries. Granted, my thinking has been biased by my library school training, specifically, the tenet that the most relevant library performance and quality expectations are local. And thus, the most relevant measurement of these will be local.
Due to this tenet I have always thought that, to local library stakeholders, national advocacy data were basically probability propositions. I thought stakeholders would naturally translate, for instance, a report that 92% of U.S. public libraries offer Wifi into “My library most likely has Wifi.” And reports about, say, 23,000 U.S. citizens receiving library job-search assistance during the Great Recession translated into “People probably got employment help at my local library too.”
Even with the data aggregation my colleague Keith Curry Lance and I do when preparing the annual Library Journal Index of Public Library Service (using 9300+ data records from the IMLS Public Library Survey), we are guided by the Library Journal editors’ wishes to keep individual libraries as the main focus. So, other than national or regional data used to explore multi-year trends, I hadn’t thought much about aggregate data as ends in themselves.
By happenstance, another panelist at the same IFLA Statistics and Evaluation Section session showed a statistical chart of public library cost/benefit data broken down by U.S. region (New England, Mideast, Great Lakes, and so on). This aggregation, however, was not intended for any particular regional stakeholders. Rather, the regional breakdown was the most detailed data the researchers were willing to report. They were uncomfortable reporting library-level data because libraries in the study didn’t want their cost/benefit numbers to be known publicly. For sure, withholding this information was a favor to the libraries. But it was not a favor to the libraries’ constituents, who deserve access to information about the performance of the libraries their taxes pay for.
The Gates Foundation representative didn’t mention any obligation on the part of libraries to report performance data for accountability purposes. He seemed most intent on portraying library data in terms of benefits that would accrue from numerous libraries merging their outcome success stories together. The new format for library advocacy data would be a chorus, it sounded like! I’m not sure whether the Gates Foundation feels many voices strengthen the message of libraries working diligently to deliver valuable benefits to their constituents. Or believes the generalized nature of the data will seem more scientific and therefore more convincing. (Might there be some immutable Law of Library Effectiveness that only aggregate data can reveal?)
There are two problems with the Gates Foundation plan—variability and credibility. Let’s begin with variability, which is the easier one to solve. Reporting library success based on aggregate data is like assigning a high school class’s average exam score to every student in the class.2 Under-performing students will be ecstatic, but high performers will not. Outcomes in education, health, behavioral health, rehabilitation, criminal justice, library services, and other public programs vary, between organizations and program sectors and within. Sure, aggregate data are valuable for monitoring general trends. But variation around aggregate measures is more informative than the measures alone. (I’ve written about this in prior posts, including this one, beginning just above the 2-column table entitled Distribution of Reported Changes in 2010 Library Funding.)
Looking only at aggregate outcome data masks various outcomes because it ignores key sub-groups within target populations, sub-groups that might be defined geographically or according to other relevant dimensions. Certain sub-groups may well have recipients with outcomes representing significant changes, while recipients in other sub-groups may have outcomes indicating little or no change. Recipients in some sub-groups may have even experienced change in a negative direction. Also, the timing of short-term, intermediate, and long-term outcomes can differ widely within and among sub-groups.
There will also be variation in outcomes across the different library programs and services delivered, such as those categorized according to PLA’s Project Outcome’s core service areas. Some services areas may have obvious measurable short-term and intermediate outcomes while for other programs initial or intermediate outcomes may be less important. The same sort of variation can occur in the timing of outcome stages (short-term, intermediate, and long-term).
Fortunately, the Gates Foundation can address most of these variation issues fairly easily. Along with aggregate reports of library outcome counts, proportions, medians, averages, and so on, the Foundation can report data describing variation among libraries and relevant sub-groups of recipients of library services.
The credibility problem, on the other hand, is a stickier one. This problem is intrinsic to outcome evaluation, the Holy Grail that has fascinated library opinion leaders in recent years. The Gates Foundation panelist briefly acknowledged the problem, which is that it is practically impossible to prove that observed outcomes have been caused by library programs and services. (Valid measures are too expensive and impractical).3 And he advised libraries to set this deficiency aside in favor of doing some type of measurement, whatever its limitations.
The presenter also said libraries have every right to claim that they contribute to outcomes (observed and unobserved, I suppose). Even if we do take this leap of faith, concluding that libraries are automatically effective and thus contribute to outcomes, there are still other credibility issues remaining, namely, the amount of credit that libraries can claim for successful outcomes, and dealing with outcome failures.
Starting with successful outcomes, let’s suppose that, after utilizing their local library’s job-search services, 20 job-seekers were employed in the past year. What portion of these outcomes (benefits) should the library take credit for? Say that new jobs mean these library users together now pay $40,000 annually in local, state, and federal taxes. Should the library claim $10,000 of this? $5,000? $1,000? $500? The question is the same regardless of how benefits are defined—taxes paid, wages earned, decreased unemployment expenditures, increased family quality of life, contributions to local economic growth, and so forth. (Outcome studies that report just counts of users who found employment assume that the library deserves 100% credit for these successes, rather than some lesser proportion.)
Then, combine these together with similarly derived library benefits for the complete range of library programs and services. How will these summarized benefits be shared among the chorus of libraries who have joined their voices to sing along? What bases are there for these allocations in the aggregate and then for individual libraries? The answers will necessarily be arbitrary. Unfortunately, the lack of an objective way for allocating credit (empirical evidence would be nice) lessens the credibility of library benefit estimates. Stakeholders might suspect that libraries set these allocation levels too high. For example, stakeholders might wonder whether 100% of improvement in reading scores observed among children who participated in the library’s summer reading program was due solely to the program. Or whether a new county-wide summer reading publicity campaign should get part of the credit for the scoring gains.
The other issue, outcome failures, refers to the portion of recipients of library services for whom outcomes were absent or unsatisfactory. Following the same example, these would be job-seekers who used library services in the past year but did not find employment. Let’s say there were 200 of these. How does the library account for these apparent failures. Do they sum up continued unemployment benefits for the 200 and subtract this sum from the $40,000 benefits calculated for the 20 successful job seekers?
For these cases, the library might argue that factors beyond their control kept the 200 unemployed. But the Gates Foundation’s premise was that libraries necessarily contribute to observed outcomes. The Foundation wouldn’t mean to argue that the linkage between library services and outcomes applies only to successful outcomes, would they? I wonder whether library funders and stakeholders would agree with this proposition? If not, rather than aggregating positive benefits only, maybe they’d ask libraries to look at rates of successes. In the case of job-search services, success rates could be evaluated in comparison with other non-library job-search assistance programs available in the community.
The Gates Foundation representative said that library funders and stakeholders are open to assessing library performance based on outcome information. It would be interesting to survey these groups to see if this openness includes the leap of faith already mentioned, believing, without verification, that library services contribute to outcomes. And whether funders and stakeholders would accept arbitrary allocations of benefit levels to libraries, as well as counting only successful outcomes in benefit calculations.
In the meantime, if I were libraries, I wouldn’t throw away my data on utilization of services, user perceptions of service quality, breadth of collections and services, assessments of community needs, usability studies, professional standards on collection and service quality, library best practices, and the like. Having these on hand could prove fortuitous! These old standards might well be music to the stakeholders’ ears!
1 The Gates Foundation panelist used the business terms scale and scaled, meaning, I believe, the ability to expand a technical product or business operation to accommodate more users, customers, sales, production, profits, sites, and so on. In behavioral science research a similar idea is denoted by the terms scope and domain, which refer to the breadth of a study across populations of interest. Incidentally, the idea of aggregating data at regional and national levels can be traced to 18th century Europe, before the term statistics existed. See Alain Desrosiere’s book, The Politics of Large Numbers.
2 It is no secret that libraries love collaboration and sharing, including results of their assessment endeavors. But libraries should be careful about generalizing findings from one or more library assessment studies to other non-studied libraries. Here’s an example of how too much togetherness can lead libraries astray.
3 I was pleased to hear the Gates Foundation presenter mention the American Evaluation Association (AEA), an organization the library profession would do well to learn from. This mention was in the context of randomized controlled trials, the gold standard for proving a program intervention actually caused effects observed in target populations. The presenter’s slides used only the term attribution and didn’t include the terms cause, causality, or causation. Some in the audience may have not understood that attribution refers to the ability to infer causation from the study data and design. Besides randomized controlled trials, there are ways to take non-experimental (observational) data to try to draw conclusions about causation. See chapter 6, Causal Inferences from Observational Studies, in Howard Wainer’s 2016 book, Truth or Truthiness: Distinguishing Fact from Fiction by Learning How to Think Like a Data Scientist.