Last June the final report from an IMLS-funded study of public library summer reading programs, The Dominican Study: Public Library Summer Programs Close the Reading Gap, was published. The reading gap refers to the cumulative loss in proficiency that has been observed in students who struggle with reading. The gap is cumulative because the “summer setbacks” that some students have add up, making them lag further and further behind good readers each year.
Researchers for the study, Susan Roman, Deborah Carran, and Carole Fiore, say that their main research question was, “Do public library summer reading programs impact student achievement?” The answer they delivered was, basically, “Yes.” Except the study does not actually demonstrate that the summer programs affect reading achievement at all, nor whether the programs therefore help close any gaps.
In studies of program effectiveness the idea is to prove, to some reasonable degree, that the program actually caused the desired changes in the clients the program is aimed at. This is a two-step process:
1. Demonstrating that there really were changes in the clients. This entails measurement.
2. Formulating a reasonable argument that the program caused these changes.
Step 2 is the hard part (though measuring the change is no piece of cake). Certain research design elements need to be in place for that step to work. For one thing, the changes must have happened after the program happened. And some kind of reliable baseline is necessary to verify that changes observed were not caused by outside factors. Usually, this baseline takes the form of a control group, an equivalent group of subjects who do not participate in the program.
This study did use a control group, but it was not equivalent to the group of library summer reading program participants. Note in the chart below that the average proficiency score for program participants (Yes-PLSRP) was higher than non-participants (the control group, No-PLSRP) in spring 2008, that is, before the summer programs! While the average score for participants increased at the end of the summer, non-participants also showed score improvement. In fact, they showed much higher gains than the participant group. (This could be because participants were better readers already and had less room for improvement.)
Source: Roman et al. (2008) The Dominican Study: Public Library Summer Reading Programs Close the Reading Gap, Institute of Museum & Library Services.
There’s an interesting twist to these after-summer average scores, though. The study reported that the comparison of these averages (667 versus 615) turned out not to be statistically significant. This means that the averages could be the same in the larger population, despite how the sample data makes them appear.1
The researchers sidestepped this issue, choosing to believe that being close to statistical significance is good enough. It isn’t. Statistical significance does not work like horseshoes or hand-grenades. Or as statistician Bruce Thompson puts this:
Don’t say things like ‘my results approached statistical significance.’ This language makes little sense in the context of the statistical significance testing logic. My favorite response to this is offered by a fellow editor who responds, ‘How did you know your results were not trying to avoid being statistically significant?’2
Let me, then, offer these observations based on the data in the chart above: First, students attending summer reading programs were more proficient and more motivated than non-participants to begin with. That their performance was better than non-participants can be explained by their high proficiency alone. Second, despite the apparent differences between the groups’ averages at summer’s end, the fact that the more proficient participant group could not be shown to out-perform the non-participant group makes the ideas that attending summer programs improves reading skills, and not attending does not, doubtful.
Finally, because the control group showed no reading decline, it is unlikely this group represented struggling, at-risk readers.3 And if this group was not at risk, the participant group certainly was not. If the real goal for public library summer reading programs is to close the reading proficiency gap for at-risk students, then we must wonder whether summer reading programs attract and serve at-risk children. Remember, though, that the study’s sample was not representative of all summer programs nationwide. So, we cannot generalize from this study alone. Research into this issue would need to use a probability sample drawn from all library summer reading programs in the U.S.
Speaking of selection bias, Roman and her colleagues described their research approach as “naturalistic” and “causal comparative,” primarily as a way to justify the research design.4 But, these approaches introduced bias into the findings. As the (often irritating and) Annoyed Librarian already noted, only students receiving parental approval to participate in the study were included. Students with reading deficits whose home situations precluded their involvement got left out. Bias also comes into play in the self-selection of these already unrepresentative (or maybe mis-targeted) subjects into summer programs. The result of this is evident in the chart shown above.
With no controls over the exact content of the libraries’ summer reading programs, more uncertainty is added to the mix. Not standardizing the program content means that, even if the researchers could isolate program effects, it would be difficult to say which specific permutations of the program content caused, or interfered with, which effects. Perhaps some programs pre-selected books for students, while in others parents made the selections, and in others the children did. And so on. (See also my post from last year describing the other edge of this program constancy sword.)
Compared to other researchers, Roman et al. were fairly forthright about the study’s flaws. They acknowledged that limitations of the study design and the realities of the study environment led them down a counterproductive pathway. But this acknowledgement isn’t very clear, and many readers won’t get it. They wrote:
While not definitive in addressing the additive effect of summer library reading programs, this study has been helpful in demonstrating the need for more rigorously controlled research studies.5
The researchers are, nevertheless, culpable for not including this crucial observation, nor the specific study weaknesses, in the executive summary. I imagine political expectations and the hoopla of rolling out study results made this so. Their misleading executive summary led to gross misinterpretations, for example, in a post by the Delaware Division of Libraries: “The researchers found that participation in these programs increased children’s scores on standardized reading tests.” That’s totally wrong. Again, the findings did not establish any link between summer reading programs and reading achievement. This is why the title of this study is so disingenuous.
The researchers’ insights about the importance of sound study designs are the most important results from the study. Roman and colleagues are among the few in the library community who can say they understand the value of randomized assignment, control groups, methods for preventing subject attrition, and the like. They acquired this important knowledge from the dear school of experience. And, yes, as researchers we are all fools of a sort due to our blindness, prejudices, misunderstandings, and so on. But we would be more foolish not to take lessons learned to heart. I hope Roman, Carran, and Fiore have. If they have, they should become more vocal about the insights they gained. And they could bring their now valuable experience to the table if a sequel project were to be initiated, perhaps with more expert guidance this time.
Research by Harvard professor James Kim (2004) cited by Roman et al. was not much more informative than their study, by the way. Kim’s research found an association between summer reading and student achievement. But, he acknowledged that his design could not confirm that increased reading actually caused improved achievement.
A later study by Kim6 used random assignment of study subjects to control groups. Granted, this is not an option in the settings in which most evaluations are conducted. When it is, though, it is extremely valuable. Even when it cannot be used, there are other “quasi-experimental” options researchers can apply.
1 It’s possible that in the larger population of all U.S. 3rd graders (the group the researchers were ultimately interested in describing) the average scores for summer reading participants and non-participants are about the same, even though the values from the study (667 and 615) suggest they are not. Due in part to the small number of subjects in this study who took the after-summer reading test (222), the data are not, you might say, strong enough to stand on their own. Formulas for statistical significance testing say whether or not this is true. In the case of these data, the formulas tell us that chance variation is the most likely explanation for the difference in the averages. This is not to say that the averages in the larger population actually are identical but, rather, that they could be. And that the study data are not convincing enough, statistically, to prove that they are not.
2 Thompson, B., 1994, The concept of statistical significance testing, Practical assessment, research & evaluation, Vol 4, No. 5.
3 There could have been a few students with unimproved or declining after-summer scores, but these individual details would be hidden in the group’s average.
4 Neither of these research approaches is rigorous enough to establish causation. So, they aren’t really able to address the initial research question stated in the study.
5 Roman, S., Carran, D. and C. Fiore, 2010, The dominican study: Public library summer programs close the reading gap, Institute of Museum and Library Services, p. 44.
6 Kim, J. S., 2008, Scaffolding voluntary summer reading for children in grades 3 to 5: An experimental study, Scientific studies of reading, 12:1, pp. 1–23.