I have implied this in other entries in this blog, but I might as well say it outright: The library and information science profession needs to come to terms with the issue of standards for (i.e., rules of) evidence for performance, statistical, and advocacy research data. There, now I’ve said it.
I recently read the short and enjoyable book Graphic Discovery: A Trout in the Milk and Other Visual Adventures by statistician Howard Wainer (Princeton, NJ: Princeton University Press, 2005). The subtitle of the book comes from something Henry David Thoreau wrote. During a dairy strike in 1850 in New England people began to suspect that dairy owners were watering down the milk supply. This led Thoreau to write in his journal, “Sometimes circumstantial evidence can be quite convincing; like when you find a trout in the milk” (quoted in Wainer, p. 81).
Wainer’s main point, one certainly made also by others like John Tukey and Edward Tufte, is that well designed graphical representations are invaluable for exploring and understanding data. Graphical data presentation can lead to revel- ations about data, and the underlying phenomena they describe, that would otherwise be missed.
But, alas, Wainer and the others warn that the design of graphs can serve to mislead readers. (Statistics can lie. So, you have to figure that statistical graphs might fail a few polygraph tests, too.) Now re-sensitized to this possibility, I am—at least in the short term—looking closer at graphs I encounter. A graph appearing in an article in the Nov. 9, 2009 issue of Business Week was easy
Click on this image to see the Nov. 9 Business Week graph.
prey for my renewed vigilance. Unfortunately, the electronic versions of this article available from EBSCO, LexisNexis, and other databases omit graphics altogether—an aggravating defect of digitization, indeed. To see the graph, click on the image above.
In the Business Week article, “The GDP Mirage,” author Michael Mandel argues that the economic index, Gross Domestic Product (GDP), is incomplete because it does not measure “intangible investments” corporations make.
By overlooking these investments, Mandel claims, the U.S. is “navigating…with fragmentary information” (p. 36). The reader can get the gist of Mandel’s ideas from the article itself. For now, I just want to point out that the aim of the graphic is to illustrate the author’s argument.
Notice that the Business Week graphic consists of three charts. Rather than having an individual title for each chart, a caption at the top forms three surrogate titles: “Reported GDP jumps ahead of jobs [left graph]…but the GDP stats don’t count R&D cuts [center graph]…or lost jobs for knowledge workers [right graph].” The implication is that if the GDP were to include statistics reflecting cuts to research and development and lost jobs, it would be a more valid measure of economic output. (The article doesn’t actually recommend that job loss statistics be included in revised GDP calculations, but we can ignore this inconsistency for our present purposes.)
Its own title notwithstanding, this graphic has a “numbers problem” quite distinct from the GDP measurement challenge that concerns Mandel. The problem with the graphic is this: Two of the three charts report (let’s call these) “actual data” while the third does not. The left and right charts present data obtained from the U.S. Bureau of Labor Statistics, which we can presume were collected using accepted sampling methods. However, the center chart is—depending on how you look at it—either a convenience sample or merely a collection of anecdotes.
The center chart’s heading, “Selected Companies that Have Cut R&D Spending Over the Last Year,” suggests that the selection is some type of non-probability sample. As seen in the chart, cuts for these companies range from roughly 12% to 36%. Nowhere, though, does the chart or the article tell us how the companies were selected or to what extent the percentages pertain to the larger set of U.S. corporations of interest.
What we have is anecdotal information masquerading as data! Even though the chart title is clear,* placing that chart in the middle of two other charts that contain actual data is deceptive. Due to its location and bar-graph style, this chart appears to be on a par with the other charts when it really is not. The center chart is mostly conjecture, the other two have firmer grounding.
Since the units of measure in that chart are percentages, the population parameters (in this case, percentages of decrease in R&D spending among all U.S. corporations of interest) are likely to be within some reasonable range, probably not ridiculously far from the range seen here.
But this is not the point. The author does not have any conclusive evidence about what this range actually is and he, or the creator of the charts, ought to say so. This is a case of pretending to have data that you don’t, in fact, have. Or, in the Mandel’s words, navigating with fragmentary information. Wainer would not be so forgiving; he would call the center chart “nondata” since that is what it is (p. 57). On that same page Wainer also makes this wonderfully apropos pronouncement:
“The plural of anecdote is not data.”
Sure, for particular purposes, quick-and-dirty selections and pseudo-samples can be justified. But, they do not deserve to be graphed. So, if you will permit me, I want to experiment with a possible contribution to the set of standards for evaluating evidence that the library and information science profession might someday establish:
Standard XV.1.c. Since anecdotal information represents only itself, it shall not be portrayed, nor presented graphically, in a way that implies that it describes any phenomena in the aggregate.
Okay, so I can’t think of very good wording. Thankfully, there’s plenty of time for re-working that sentence…
—————————————————
* I don’t mean to say that the chart is clearly titled, but that, once you are able to find it, the title (or is it a subtitle?) has an unambiguous meaning.
Compared to standards of good graphing practice that Howard Wainer, William Cleveland (The Elements of Graphing Data, Murray Hill, NJ: AT&T Laboratories, 1994) and others promote, the Business Week graphs are pretty damned bad! The axis labels are too difficult to find, first, because the charts are overpowered by thick, all-black bars and bold-fonted category labels (company names and occupation categories). And, second, due to small fonts, crowding, and misplacement.
In the left chart, the label “Percent” has the wrong orientation since it apples to the vertical axis. The chart’s horizontal axis has no label. Thanks to the chart designer’s use of Roman numerals we can guess that the units must be quarters on an annual economic calendar. Squeezing the legend into the data portion of this chart violates a cardinal graphing design principle: Don’t let clutter make the data more difficult to see. Though less important, the word “Forecast,” a note for the single GDP data point at quarter III of 2009, appears in a larger font than both axis labels and tickmark values. Not good.
In the center and right charts, the labels, “Percent Change in R&D Spending” and “Percent Change in Employment,” are misplaced. Both should appear on the lower horizontal axes near the the appropriate grid marks. Both labels include asterisked notes that imply the labels are meant to serve a dual purpose as titles (or subtitles). This confusion could be alleviated by creating descriptive chart titles that include the notes information (no need to separate it), and then inserting fully descriptive labels adjacent to the axes.
Grade this graphic earns: D-






I ran across another one of these in connection with a new initiative on “21st century skills” launched last week by the Institute of Museum and Library Services (IMLS). This campaign presents a 
Like some “minimum recommended daily allowance” of a particular 21st century practice. This makes things much simpler for IMLS because the idea of library or museum standards, itself, is notoriously tricky. Several of the approaches endorsed in their model don’t apply to many institutions. (How can a small rural library or a historic police museum be collaborating with community partners on its new educational programs “over 75% of the time?”)
with the very best of intentions, I am sure. But, let me say that I am convinced that these calculators are a bad idea. Their underlying assumptions are weak and their designs are not well thought out. Eventually, library funders and stakeholders are going to realize that the calculations are superficial and…well…sloppy.
But say that, for practical purposes, we accept the idea that value-boils-down-to-price as reasonable. Even so, the retail pricing approach these calculators use has definite problems. The calculators view retail prices as estimates of costs that patrons would incur if the library’s items and services were—hypothetically—unavailable to the community or institution. The library comes up with a retail price for each type of material and service it offers, and then these prices are translated directly into the value patrons receive from utilizing these materials or services.
including expenses for information technology, equipment, building maintenance, utilities, and administrative overhead. These calculators also disregard the incidental costs that patrons may bear, like travel and parking costs, time lost due to item unavailability or poor service, usability difficulties encountered, and so on. In fact, NNLM’s calculator errors in the opposite direction: Assuming that libraries are always convenient, the calculator builds a patron time-savings factor into its formula. (I suppose you could enter in negative numbers to register patron lost time and inconvenience.)
completely sidestep the rightful purpose of library evaluation. This purpose is to assess the extent to which the library provides value to the institution or community as a whole, not how each individual fares. This assessment must also confirm that products and services are equitably distributed, that is, equally available and accessible to all who wish or need to use them (see
by Donald Elliot and Glen and Leslie Holt in their book
the original ideas of their founders. Over time TQM began to promote practices that quality gurus like W. Edwards Deming warned against, for instance, bestowing individual rewards for quality objectives accomplished. And sometimes organizations take liberties with the specifics of an innovation. They might
Last year
The practice has no agenda other than to help improve clinical decisions and patient health. Todd’s 2009 article even mentions this point in a description of a “gold standard” for evidence-based education which frowns on use of advocacy studies due to their inherent biases. Yet, the school libraries Manifesto aims to rally support for this use.
“One of the difficulties in evaluating a specific program is that [there is] little basis for knowing which aspects of the program work in
Only to the extent that a library’s summer reading program matches the content and delivery approach of the programs in the outcome study, and the library’s clientele also matches those in the study–only to these extents can a public library point to the outcome study as evidence of its local program’s effectiveness.
In January my brother and I were laying laminate flooring in his house. Each time we needed to trim a plank, we stood reverently by his table saw and incanted the familiar carpenter’s adage, “Measure twice, cut once. (Amen.)” My brother said, “It’s the damnedest thing. You can repeat and repeat a measurement, and then find out it is still wrong.” As an electrical engineer (he’s working on the 3rd edition of his
Rather than looking at measurement as producing facts, it is perhaps better to say it produces impressions. And impressions will vary on dimensions like accuracy (precision), breadth (scope), and validity (relevance).
attitudes and feelings that form “satisfaction.” Say we are able to create a mind probe! Even with this perfect instrument, other factors can make our measurements inaccurate. In other words, each time we measure something—even with the most proven measurement instruments—extraneous things interfere. A subject we are measuring may be distracted due to a high
I decided to take a listen and chose a
This faith has sustained the men and women who have built and operated American public, as well as university and research, libraries and the men of wealth and political position who have provided for their financial and legal support. It consists of a belief in the virtue of the printed word, especially of the book, the reading of which is held to be good in itself or from its reading flows that which is good.”
So, library marketing and branding devotees are beginning to see the same light that shined more than thirty-five years ago on one of the founders of library evaluation, Richard Orr. His milestone 1973 article “Measuring the Goodness of Library Service” describes evaluation in terms of quality (of collections, services, facilities, etc.) and value (benefits to the community). Within a few years, social program evaluation theorists Michael Scriven and Jane Roth had coined two fundamental terms for their profession—merit and worth—which mean the same as quality and value.
competencies, that is, knowledge or skill levels. Instead, they measure respondents’ opinions about their own knowledge and skills in a dozen or so training topics. So, any baselines WebJunction comes up with will be merely about current opinions which would later be compared to some subsequent set of opinions.
Rather than using some more systematic method (i.e., random sampling), they get respondents where and when it is convenient. This method severely limits the usefulness of study results. Because the respondents are self-selected, their responses will, in all likelihood, differ from the larger population of library
Spending researcher and respondent time on surveys that can only produce questionable results is a poor use of library resources. Also, researchers should never portray findings from studies that use poor designs as if they were fair and balanced depictions of the subjects being studied. That would be mis-information, indeed!