During the question-and-answer part of a presentation I gave at the 2010 Library Assessment Conference in Baltimore last week, I couldn’t resist editorializing about how bad convenience sampling is. One audience member spoke up, saying she felt convenience samples are legitimate as long as findings are interpreted as describing only the respondents, themselves. Later on I realized she was making a more interesting point, something that Shadish, Cook, and Campbell wrote about validity:
Validity is a property of inferences. It is not a property of [research] designs or methods, for the same design may contribute to more or less valid inferences under different circumstances. So it is wrong to say that a randomized experiment is internally valid or had internal validity—although we may occasionally speak that way for convenience.1
So the commenter in the session is correct. Convenience samples are not invalid per se. To use them we simply need to lower our expectations for the data, that is, rein in our inferences. Since the issue with convenience sampling is low external validity—how well the data can be generalized to the population of interest—we simply decide to equate the two. The sample becomes the population!
A handy solution except for these two issues: First, over time the survey results tend to take on an implied meaning. Intentionally or not, people begin to think of the data as applying to more than just the respondents, even when wording in the reports says otherwise. This leads to invalid inferences from the data. Second, researchers and decision-makers are still in the dark about the general population that initially interested them. Not having information you really need is not a good thing!
Besides, this is a time of raised expectations for the quality of data gathered in library assessment. Libraries are now compelled to produce fair, accurate, and verifiable data about their contributions to student learning, faculty research, institutional missions, community quality of life, life-long learning, and the like.
This was most apparent in the conference session by Megan Oakleaf, Lisa Hinchliffe, and Mary Ellen Davis about the new ACRL report, The Value of Academic Libraries. ACRL Executive Director Davis emphasized the need for “hard evidence” about library value that arises out of the quality and effectiveness of library services. And the ACRL report refers to “hard data” and “hard numbers” that assessment can provide for decision-making and performance improvement.
The hardest evidence for demonstrating impact or effectiveness (but not necessarily value) comes from experimental research designs that use randomized assignment of participants to treatment and control groups. Randomized assignment is the gold standard for studies in evidence-based medicine; for educational research and evaluation (image at left); and for the U.S. Office of Management and Budget, which oversees the federal government’s performance assessment program.
Randomized assignment is impractical for many reasons, as my prior post noted. But I thought it might be useful to look at how hard this hardest-of-evidence can be. We can turn again to Shadish, Cook, and Campbell to learn about this. As their earlier quote indicates, they differentiate between measures (observations) and their interpretation and use. Demonstrating causality requires both hard evidence—accurate and verifiable measures—and defensible inferences—conclusions that address factors that may undermine the strength of the measures or make them irrelevant. (This is a more formal way to describe the two-step process outlined in the prior post I mention above.)
Unfortunately, the research methods available to us, whether experimental or quasi-experimental designs to assess impact, or economic designs to assess value, are incapable of producing absolute measures and irrefutable inferences. I already discussed the measurement situation in a post last year. As for inferences, Shadish, Cook, and Campbell wrote:
We use the term validity to refer to the approximate truth of an inference. When we say something is valid, we make a judgment about the extent to which relevant evidence supports that inference as being true or correct. Usually, that evidence comes from both empirical findings and the consistency of these findings with other sources of knowledge, including past findings and theories. Assessing validity always entails fallible human judgments. We can never be certain that all of the many inferences drawn from a single experiment are true or even that other inferences have been conclusively falsified. That is why validity judgments are not absolute; various degrees of validity can be invoked. As a result when we use terms such as valid or invalid or true or false in this book, they should always be understood as prefaced by ‘approximately’ or ‘tentatively.’2
So, the pathway to hard evidence and durable inferences is an uphill climb. A lot harder than we may have thought! Time to get out our hiking boots…
1 Shadish, W., Cook, T. and D. Campbell, 2002, Experimental and Quasi-Experimental Designs in Generalized Causal Inference, Houghton Mifflin, p. 34. This book is the methodology bible regarding causality in impact and effectiveness studies. It is the direct descendent of two prior books: Campbell, D. and Stanley, J., 1996, Experimental and Quasi-Experimental Designs in Research Chicago: Rand McNally; and Cook, T. and Campbell, D., 1979, Quasi-Experimentation: Design and Analysis for Field Settings, Bosotn: Houghton Mifflin. Incidentally, the Campbell Collaboration in evidence-based practice is named after Donald (T.) Campbell.
2 Shadish, Cook, and Campbell, p. 34.