Putting the Best Findings Forward

I think I’m getting jaded. I am beginning to wonder whether lobbying for balanced reporting of evaluation and research findings is a waste of time. With voices more influential than mine weighing in on the opposite side, I’m having trouble staying positive. Granted, I do find inspiration in the work of people much wiser than me who have confronted this issue. One such source is my favorite sociologist, Stanislav Andreski, who wrote the following in his book, Social Sciences as Sorcery:

In matters where uncertainty prevails and information is accepted mostly on trust, one is justified in trying to rouse the reading public to a more critical watchfulness by showing that in the study of human affairs evasion and deception are as a rule much more profitable than telling the truth.1

The problem is, wisdom like Andreski’s languishes on dusty library shelves and the dust-free shelves of the Open Library. Much more (dare I call it?) airtime goes to large and prestigious institutions that are comfortable spinning research results to suit their purposes.

Fortunately, I am not so demoralized as to pass up the opportunity to share yet another institution-stretching-the-truth-about-research-data story with you. This involves an evaluation project funded by the Robert Wood Johnson Foundation and conducted by Mathematica Policy Research and the John W. Gardner Center for Youth and Their Communities at Stanford University. In case you are not aware, the Robert Wood Johnson Foundation is well known for its longstanding commitment to rigorous evaluation research. I have not surveyed their publications or project output. And cannot say whether the instance I’ll be describing here is typical or not. I hope it isn’t.

Before getting to the story details, a little background information on research methodology is necessary. Randomized controlled trials are the optimal research method for determining whether a given activity or program (an intervention) really works. This method is designed specifically to confirm that the intervention actually produced (i.e., caused) changes observed in the target population. The approach, also known as experiments and clinical trials, is the highest level of evidence in evidence-based practice. In evaluation, the approach falls under the more general categories impact evaluation and effectiveness evaluation.

Now to the specifics of the Robert Wood Johnson Foundation project. This was a randomized controlled study of the effectiveness of a public school recess program, known as Playworks, conducted in 2011 and 2012. The image below is a link to an issue brief published by the foundation. (The smoking gun.) If you would, click on the image and read the headlines and the bullet points appearing at the middle of the page:

Robt Wood Johnson Brief

Click to see Robert Wood Johns Foundation issue brief.

Notice that the headline reports that the Playworks study confirmed “widespread benefits” of the program. In the body of the text note also the statement about a “growing body of evidence that…organized recess…has the potential to be a key driver of better behavior and learning.” (Phrases like widespread benefits and key driver are sure signs that public relations or marketing professionals had a hand in the writing. When publicists and marketeers are nearby, evasion and deception cannot be far behind.)

The bullet points specify the main positive impacts the Playworks program had and also address that nagging quantitative question—how much? How much less bullying was there? A 43 percent difference in rating scores (amounting to a 0.4 point difference). How much increase in feelings of safety? A twenty percent score difference (0.6 points). What increases in vigorous activities? Playworks participants were vigorously active 14% of recess time compared to 10% for non-participants. How much enhancement to learning readiness? Following recess Playworks students were ready to learn 34 percent more quickly (a 3-minutes per day readiness advantage). You can read these and a myriad of other quantitative comparisons between Playworks schools (the treatment group) and other schools (the control group) in the full report by Mathematica and the Gardner Center (see Appendix 2). I’ll return to these comparisons further on.

But first, I want to say something about the Less Bullying bullet point. My purpose here is to stress the importance of knowing, in any research study, exactly what got measured. In the brief notice in the non-bold text that the idea is actually “bullying and exclusionary behaviors.” Now what would these be?

A footnote in Appendix 2 says that researchers measured these concepts by averaging teachers’ responses to seven questionnaire items. Four items were about students or parents reporting students being bossed or bullied. The other three items were about reported incidents of name-calling, pushing and hitting, and students feeling “isolated from their normal peer group.” (Interestingly, elsewhere the report says that 20% of the Playworks students reported they “felt left out at recess” compared to 23% of non-Playworks students.)

The measurement scale was something like this:

RWJ Safety Scale

Rating Scale for Reported Incidents of Bullying and Exclusionary Behaviors

In this figure the average score for the Playworks schools was 0.6 and for the non-Playworks schools it was 1.0. So the averages for these two groups were towards the lower end of the scale. Also, since the scale measured two things it’s hard to say what portion of the 0.4 Playworks difference pertained to bullying versus exclusionary behaviors. In any case and very roughly, on average the non-Playworks schools reported one more incident of either of these types of behaviors than the Playworks schools did (I believe this would be per surveyed teacher.)

Now consider the next bullet point, Increased Feelings of Safety at School. The study found teachers’ ratings of how safe students felt at school to be 20% (0.6 points) higher at the Playworks schools. But what about students’ own feelings of safety? Referring to the Mathematica and Gardner Center full report, the average level of safety that students felt at Playworks schools was 4% (0.1 point) higher than for the other schools. And the average level of feelings of safety at recess was 8% (0.2 point) higher at Playworks schools than at the other schools. Of course, designers of the Playworks study infographic (below) mixed up these findings, stating that students felt as safe as teachers thought they did. But the study shows otherwise.

Playworks infographic

Playworks study infographic.   Click for larger image.

As you’d expect, the issue brief reported those outcomes where the greatest impacts were detected.2  Designers of the infographic went overboard, cherry-picking items that teachers agreed with wholeheartedly. (I say “overboard” because the infographic designers decided to deceive readers by hiding an important report finding that disputes two of their 90+ percentages. For the un-cherry-picked version, see the last item in the list of excerpted findings in the next paragraph.)

In both cases the idea is putting the best findings forward. So you have to refer to the full report to get a fix on the real story. If you study the study in detail, you’ll see that the brief paints a rather rosy picture (and the infographic is way out in left field). Consider these not-as-rosy findings:

Significant impacts were observed in domains covering school climate, conflict resolution and aggression, learning and academic performance, and recess experience, suggesting that Playworks had positive effects. No significant impacts were detected in the other two domains addressing outcomes related to youth development and student behavior.3 [underlining added]

Playworks had a positive impact on two of the five teacher-reported measures of school climate but had no significant impact on the three student-reported measures of school climate.4 [underlining added]

Teachers in treatment schools reported significantly less bullying and exclusionary behavior [this is the 0.4 point difference graphed above]. However, no significant impacts were found on teacher reports of more general aggressive behavior…student reports of aggressive behavior, students’ beliefs about aggression, or students’ reports on their relationships with other students.5  [underlining added]

There were no significant differences on six additional outcome measures that assessed student engagement with classroom activities and academic performance, homework completion and motivation to succeed academically.6  [underlining added]

Playworks had no significant impact on students’ perceptions of recess, as measured in the student survey. In particular, there was no significant impact on six items that measured the type of recess activities in which students were engaged, such as talking with friends or playing games and sports with adults during recess. There was also no impact on six items that measured student perceptions of recess, such as enjoyment of recess or getting to play the games they wanted to play. In addition, no impact was found on six items that measured student perceptions of how they handle conflict at recess, such as asking an adult to help them solve a conflict or getting into an argument with other students during recess.7  [underlining added]

There were no significant impacts of Playworks on eight measures of youth development. In particular, students in treatment and control schools had similar reports on a six-item scale that measured feelings about adult interactions…In addition, a similar percentage of treatment and control students reported getting along well with other students. There was also no significant difference on a scale that included eight items asking students to indicate their effectiveness at interacting with peers in conflict situations, such as their ability to tell kids to stop teasing a friend. Teachers in treatment and control schools also reported similar perceptions of students’ abilities to regulate their emotions, act responsibly and engage in prosocial and altruistic behavior.8  [underlining added]

Despite the fact that most treatment teachers who responded to the survey felt that Playworks reinforced positive behavior during recess (96 percent) and resulted in fewer students getting into trouble (91 percent) [shown in the the infographic above], there were no significant impacts of Playworks on multiple indicators of student behavior. Treatment and control group students who took the student survey reported similar levels of disruptive behavior in class and behavioral problems at school. Teachers in treatment and control schools reported similar amounts of student misbehavior, absences, tardiness, suspensions and detentions among their students.9  [underlining added]

Reads like those warnings and contraindications that accompany pharmacy prescriptions, doesn’t it? Nevertheless, these facts are important for understanding the truth about this study:  There was a mixture of positive impacts in certain areas and a lack of impacts in several others.

As a quick-and-dirty way to try to make sense of this I tallied results for five “outcome domains” reported in tables in Appendix 2 of the full report:10  school climate (table 3), conflict resolution and aggression (table 5), academic performance (table 7), youth development (table 10), and student behavior (table 12). The chart below shows the point differences11 between the Playworks and non-Playworks schools:

Playworks Bar Chart

Click for larger image.

For 70% (19) of the 27 outcome measures the difference between Playworks schools and non-Playworks schools’ scores was 1/10th of a point or less. Seventy eight percent (21) of the measures had differences of 2/10ths or less. On the remaining 22% (6) of the measures treatment and control schools differed by 3/10ths points or more. So, on nearly 80% of the outcome measures the Playworks schools were almost identical to the control group schools. Meaning that for these outcomes the Playworks program made no appreciable difference.

Several factors could have interfered with measuring effects that Playworks program may have had in reality. For instance, in the full report researchers noted that the program was not implemented consistently at each school. Only half of the treatment group schools followed the Playworks program regimen closely (five schools followed the regimen moderately close). Another potential problem is the multi-item questionnaire scales and their 4-point measurements. Reliability and validity issues could have compromised accurate measurements.

Still, it’s fair to say that the study was methodologically sound (e.g. randomization and sufficient sample sizes) and thorough (e.g. extensive scope of survey questions). It’s a shame that the hard work invested in assuring the quality of the study was diminished by an issue brief containing such casually drawn conclusions. Based on this study alone, the chances that organized recess could perform as a key driver of better [school] behavior and learning appear to be slight. And the idea that the Playworks program had widespread benefits is untrue. As for the infographic, its insubstantial content matches its cartoonish design perfectly!

The point being that the study needs to be summarized in its entirety, which is to say, honestly. Certainly, Playworks, Mathematica, the Gardner Center, and the Robert Wood Johnson Foundation will have done a comprehensive review of the full report in private. But why not share something about this important activity with the public at large? Include a few sentences in published summaries explaining that measurements are approximate and findings can be uncertain. And that researchers often need to sort through mixed signals in the data to draw the soundest conclusions possible.

These are the kinds of messages worth putting forward. Something to try to improve on the truth-to-lie quotient out there.


1    Andreski, S. (1973). Social Sciences As Sorcery, p. 13. 
2   The issue brief bullet point More Vigorous Physical Activity is a finding from a separate study. Additional positive impacts of the Playworks program reported in the executive summary of the full study by Mathematica and the Gardner Center were omitted from the issue brief. Namely, Playworks schools scored higher on teachers’ perceptions of: (a) students feeling included during recess; (b) student behavior during recess; (c) student behavior “after sports, games, and play”; and (d) how much students enjoyed adult-organized recess activities.
3   Bleeker, M. et al. 2012. Findings from a Randomized Experiment of Playworks: Selected Results from Cohort 1. Robert Wood Johnson Foundation, p. 10.
4   Bleeker, M. et al. 2012. p. 11.
5   Bleeker, M. et al. 2012. p. 12.
6   Bleeker, M. et al. 2012. p. 13.
7   Bleeker, M. et al. 2012. p. 14.
8   Bleeker, M. et al. 2012. p. 15.
9   Bleeker, M. et al. 2012. p. 16.
10   I included 27 of the 39 outcome measures listed in these tables. I intentionally omitted twelve measures that are percentages of respondents reporting certain outcomes of interest—like per cent of teachers reporting student detentions in the last 30 days. Because these twelve measures are indirect indicators, they are misleading. One teacher in ten (10%) having three incidents of detention will not be equivalent to three teachers in ten (30%) each having one incident of detention.
11   My quick-and-dirty approach intentionally avoids the topics of statistical inference and statistically significant differences, which are sometimes more confusing than helpful (see the heading “The Bane That Is Statistical Significance” in my prior post.) A large majority of measured differences between Playworks schools and other schools failed to be statistically significant. Besides, the hurdle of statistical significance was even higher in this study due to the need to account for what is called multiple hypotheses testing. My quick-and-dirty look at tiny measured differences is a simpler way to conceive what statistical significance would basically tell us. Due to the formulas for statistical significance testing, tiny differences like 1/10th of a point would only pass the test if sample sizes were much higher than they were in the study.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s