I decided to move right on to my first 2014 post without delay. The reason is the knot in my stomach that developed while viewing the Webjunction webinar on the University of Washington iSchool Impact Survey. The webinar, held last fall, presented a new survey tool designed for gathering data about how public library patrons make use of library technology and what benefits this use provides them.
Near the end of the webinar a participant asked whether the Impact Survey uses random sampling and whether results can be considered to be statistically representative. The presenter explained that the survey method is not statistically representative since it uses convenience sampling (a topic covered in my recent post). And she confirmed that the data only represent the respondents themselves. And that libraries will have no way of knowing whether the data provide an accurate description of their patrons or community.
Then she announced that this uncertainty and the whole topic of sampling were non-issues, saying, “It really doesn’t matter.” She urged attendees to set aside any worries they had about using data from unrepresentative samples, saying these samples portray “real people doing these real activities and experiencing real outcomes.” And that the samples provide “information you can put into use.”
As well-meaning as the Impact Survey project staff may be, you have to remember their goal is selling their product, which they just happen to have a time-limited introductory offer for. Right now the real issues of data accuracy and responsible use of survey findings are secondary or tertiary to the project team. They could have chosen the ethical high road by proactively discussing the strengths and weaknesses of the Impact Survey. And instructing attendees about appropriate ways to interpret the findings. And encouraging their customers to go the extra mile to augment the incomplete (biased) survey with data from other sources.
But this is not part of their business model. You won’t read about these topics on their website. Nor were they included in the prepared Webjunction presentation last fall. If the issue of sampling bias comes up, their marketing tactic is to “comfort” (the presenter’s word) anyone worried about how trustworthy the survey data are.
The presenter gave two reasons for libraries to trust data from unrepresentative samples: (1) A preeminent expert in the field of program evaluation said they should; and (2) the University of Washington iSchool’s 2010 national study compared its convenience sample of more than 50,000 respondents with a smaller representative sample and found the two samples to be pretty much equivalent.
Let’s see whether these are good reasons. First, the preeminent expert the presenter cited is Harry P. Hatry, a pioneer in the field of program evaluation.1 She used this quote by Hatry: “Better to be roughly right than to be precisely ignorant.”2 To understand Hatry’s statement we must appreciate the context he was writing about. He was referring mainly to federal program managers who opted to not survey their users at all rather than attempt to meet high survey standards promoted by the U.S. Office of Management and Budget. Hatry was talking about the black-and-white choice of high methodological rigor versus doing nothing at all. The only example of lower versus higher precision survey methods he mentioned is mail rather than telephone surveys. Nowhere in the article does he say convenience sampling is justified.
The Impact Survey team would have you believe that Hatry is fine with public agencies opting for convenient and cheap data collection methods without even considering the alternatives. Nevertheless, an Urban Institute manual which Hatry served as advisor for, Surveying Clients About Outcomes, encourages public agencies to first consider surveying their complete roster of clientele. If that is not feasible, public agencies should then use a sampling method that makes sure findings “can be projected reliably to the full client base.”3 The manual does not discuss convenience sampling as an option.
Data accuracy is a big deal to Hatry. He has a chapter in the Handbook of Practical Program Evaluation about using public agency records in evaluation research. There you can read page after page of steps evaluators should follow to assure the accuracy of the data collected. Hatry would never advise public agencies to collect whatever they can, however they can, and use it however they want regardless of how inaccurate or incomplete it is. But that is exactly the advice of the Impact Survey staff when they counsel libraries that sample representativeness doesn’t really matter.
The Impact Survey staff would like libraries to interpret roughly right to mean essentially right. But these are two very different things. When you have information that is roughly right, that information is also roughly wrong. (Statisticians call this situation uncertainty, and the degree of wrongness, error.) The responsibility of a quantitative analyst here is exactly that of an information professional. She must assess how roughly right/wrong the information is. And then communicate this assessment to users of the information so they can account for this in their decision-making. If they do not consider the degree of error in their data, the analyst and decision-makers are replacing Hatry’s precise ignorance with the more insidious ignorance of over-confidence in unvetted information.4
The second reason the presenter gave for libraries not worrying about convenience samples was an analysis from the 2010 U.S. Impact Public Library Study. She said that study researchers compared their sample of 50,000+ self-selected patrons with another sample they had which they considered to be representative. They found that patterns in the data from the large convenience sample were very similar to those in the small representative sample. She explained, “Once you get enough data you start seeing a convergence between what is thought of as a representative sample…and what happens in a convenience sample.”
So, let me rephrase this. You start by attracting thousands and thousands of self-selected respondents from the population you’re interested in. And you continue getting more and more self-selected respondents added to this. When your total number of respondents gets really large, then the patterns in this giant convenience sample begin to change so that they now match patterns found in a small representative sample drawn from that same population. Therefore, very large convenience samples should be just as good as fairly small representative samples.
Assuming this statistical effect is true, how would this help improve the accuracy of small convenience samples at libraries that sign up for the Impact Survey? Does this statistical effect somehow trickle down to the libraries’ small samples, automatically making them the equivalent of representative samples? I don’t think so. I think that, whatever statistical self-correction occurred in the project’s giant national sample, libraries using this survey tool are still stuck with their small unrepresentative samples.5
While it is certainly intriguing, this convergence idea doesn’t quite jibe with the methodology of the 2010 study. You can read in the study appendix or in my prior post about how the analysis worked in the opposite direction. The researchers took great pains to statistically adjust the data in their convenience sample (web survey) in order to counter its intrinsic slantedness. Using something called propensity scoring they statistically reshaped the giant set of data to align it with the smaller (telephone) sample, which they considered to be representative. All of the findings in the final report were based on these adjusted data. It would be very surprising to learn that they later found propensity scoring to be unnecessary because of some statistical effect that caused the giant sample to self-correct.
As you can see, the Impact Survey staff’s justifications for the use of convenience sampling aren’t convincing. We need to rethink the idea of deploying quick-and-easy survey tools for the sake of library advocacy. As currently conceived, these tools require libraries to sacrifice certain of their fundamental values. Gathering and presenting inaccurate and incomplete data is not something libraries should be involved in.
1 The presenter said Hatry “wrote the book on evaluation.” Hatry is legendary in the field of program evaluation. But the book on evaluation has had numerous co-authors and still does. See Marvin Alkin’s 2013 book, Evaluation Roots.
2 The complete quotation is, “I believe that the operational principle for most programs is that it is better to be roughly right than to be precisely ignorant.” Hatry, H.P. (2002). Performance Measurement: Fashions and Fallacies, Public Performance & Management Review, 25:4, 356.
3 Abravanel, M.D. (2003). Surveying Clients About Outcomes, Urban Institute, Appendix C.
4 Yes, convenience samples produce unvetted information. They share the same weakness that focus groups have. Both data collection methods provide real information from real customers. But you take a big risk assuming these customers speak for the entire target group you hope to reach.
5 As I mentioned in my recent post, there is a known statistical effect that can make a library’s convenience sample perfectly match a representative sample drawn from the population of interest. This effect is known as luck or random chance. Just by the luck of the draw your convenience sample could, indeed, end up exactly matching the data from a random sample. The problem is, without an actual random sample to cross-check this with your library will never know whether this has happened. Nor how lucky the library has been!