Lib(rary) Performance Blog

Entries categorized as ‘Research’

Navigating with Fragmentary Information

February 16, 2010 · Leave a Comment

I have implied this in other entries in this blog, but I might as well say it outright: The library and information science profession needs to come to terms with the issue of standards for (i.e., rules of) evidence for performance, statistical, and advocacy research data. There, now I’ve said it.

I recently read the short and enjoyable book Graphic Discovery: A Trout in the Milk and Other Visual Adventures by statistician Howard Wainer (Princeton, NJ: Princeton University Press, 2005). The subtitle of the book comes from something Henry David Thoreau wrote. During a dairy strike in 1850 in New England people began to suspect that dairy owners were watering down the milk supply. This led Thoreau to write in his journal, “Sometimes circumstantial evidence can be quite convincing; like when you find a trout in the milk” (quoted in Wainer, p. 81).

Wainer’s main point, one certainly made also by others like John Tukey and Edward Tufte, is that well designed graphical representations are invaluable for exploring and understanding data. Graphical data presentation can lead to revel- ations about data, and the underlying phenomena they describe, that would otherwise be missed.

But, alas, Wainer and the others warn that the design of graphs can serve to mislead readers. (Statistics can lie. So, you have to figure that statistical graphs might fail a few polygraph tests, too.)  Now re-sensitized to this possibility, I am—at least in the short term—looking closer at graphs I encounter.  A graph appearing in an article in the Nov. 9, 2009 issue of Business Week was easy

                                     Click on this image to see the Nov. 9 Business Week graph.

prey for my renewed vigilance. Unfortunately, the electronic versions of this article available from EBSCO, LexisNexis, and other databases omit graphics altogether—an aggravating defect of digitization, indeed. To see the graph, click on the image above.

In the Business Week article, “The GDP Mirage,” author Michael Mandel argues that the economic index, Gross Domestic Product (GDP), is incomplete because it does not measure “intangible investments” corporations make. By overlooking these investments, Mandel claims, the U.S. is “navigating…with fragmentary information” (p. 36). The reader can get the gist of Mandel’s ideas from the article itself. For now, I just want to point out that the aim of the graphic is to illustrate the author’s argument.

Notice that the Business Week graphic consists of three charts. Rather than having an individual title for each chart, a caption at the top forms three surrogate titles: “Reported GDP jumps ahead of jobs [left graph]…but the GDP stats don’t count R&D cuts [center graph]…or lost jobs for knowledge workers [right graph].”  The implication is that if the GDP were to include statistics reflecting cuts to research and development and lost jobs, it would be a more valid measure of economic output. (The article doesn’t actually recommend that job loss statistics be included in revised GDP calculations, but we can ignore this inconsistency for our present purposes.)

Its own title notwithstanding, this graphic has a “numbers problem” quite distinct from the GDP measurement challenge that concerns Mandel. The problem with the graphic is this: Two of the three charts report (let’s call these) “actual data” while the third does not. The left and right charts present data obtained from the U.S. Bureau of Labor Statistics, which we can presume were collected using accepted sampling methods. However, the center chart is—depending on how you look at it—either a convenience sample or merely a collection of anecdotes.

The center chart’s heading, “Selected Companies that Have Cut R&D Spending Over the Last Year,” suggests that the selection is some type of non-probability sample. As seen in the chart, cuts for these companies range from roughly 12% to 36%. Nowhere, though, does the chart or the article tell us how the companies were selected or to what extent the percentages pertain to the larger set of U.S. corporations of interest.

What we have is anecdotal information masquerading as data! Even though the chart title is clear,* placing that chart in the middle of two other charts that contain actual data is deceptive. Due to its location and bar-graph style, this chart appears to be on a par with the other charts when it really is not. The center chart is mostly conjecture, the other two have firmer grounding.

Since the units of measure in that chart are percentages, the population parameters (in this case, percentages of decrease in R&D spending among all U.S. corporations of interest) are likely to be within some reasonable range, probably not ridiculously far from the range seen here.

But this is not the point. The author does not have any conclusive evidence about what this range actually is and he, or the creator of the charts, ought to say so. This is a case of pretending to have data that you don’t, in fact, have. Or, in the Mandel’s words, navigating with fragmentary information. Wainer would not be so forgiving; he would call the center chart “nondata” since that is what it is (p. 57). On that same page Wainer also makes this wonderfully apropos pronouncement:

“The plural of anecdote is not data.”

Sure, for particular purposes, quick-and-dirty selections and pseudo-samples can be justified. But, they do not deserve to be graphed. So, if you will permit me, I want to experiment with a possible contribution to the set of standards for evaluating evidence that the library and information science profession might someday establish:

Standard XV.1.c.    Since anecdotal information represents only itself, it shall not be portrayed, nor presented graphically, in a way that implies that it describes any phenomena in the aggregate.

Okay, so I can’t think of very good wording. Thankfully, there’s plenty of time for re-working that sentence…

—————————————————
* I don’t mean to say that the chart is clearly titled, but that, once you are able to find it, the title (or is it a subtitle?) has an unambiguous meaning.

Compared to standards of good graphing practice that Howard Wainer, William Cleveland (The Elements of Graphing Data, Murray Hill, NJ: AT&T Laboratories, 1994) and others promote, the Business Week graphs are pretty damned bad! The axis labels are too difficult to find, first, because the charts are overpowered by thick, all-black bars and bold-fonted category labels (company names and occupation categories). And, second, due to small fonts, crowding, and misplacement.

In the left chart, the label “Percent” has the wrong orientation since it apples to the vertical axis. The chart’s horizontal axis has no label. Thanks to the chart designer’s use of Roman numerals we can guess that the units must be quarters on an annual economic calendar. Squeezing the legend into the data portion of this chart violates a cardinal graphing design principle: Don’t let clutter make the data more difficult to see. Though less important, the word “Forecast,” a note for the single GDP data point at quarter III of 2009, appears in a larger font than both axis labels and tickmark values. Not good.

In the center and right charts, the labels, “Percent Change in R&D Spending” and “Percent Change in Employment,” are misplaced. Both should appear on the lower horizontal axes near the the appropriate grid marks. Both labels include asterisked notes that imply the labels are meant to serve a dual purpose as titles (or subtitles). This confusion could be alleviated by creating descriptive chart titles that include the notes information (no need to separate it), and then inserting fully descriptive labels adjacent to the axes.

Grade this graphic earns:  D-

Categories: Advocacy · Library assessment · Measurement · Research

The Telephone Game

January 5, 2010 · Leave a Comment

Readers of the, say, older persuasion may recall a time when children actually enjoyed games that required no peripheral devices, infrared sensors, or satellite tracking. There was one party game, simply called (I think) “Telephone,” where one player whispered a message to the next, and that player to the next, until the message was passed all the way around the circle of players. The fun came when everyone heard the amusing distortions that ended up in the final message.

In library advocacy research, though, message distortion is not amusing. I noticed a serious instance of this in a recent IMLS Research Brief which cites an American Library Association (ALA) report finding that patron use of library computers for job-seeking purposes has “greatly increased.” The ALA report is Job-Seeking in U.S. Public Libraries and the statement the IMLS brief cites is this:

“As part of site visits to public libraries in nine states [conducted in three annual studies], the study research team has found greatly increased use of library technology for job-seeking and e-govern-ment.” Job-Seeking in U.S. Public Libraries, American Library Association, Oct. 2009, p. 2.
(The emphasis in red is mine).

The ALA report is one of a series of “issue briefs” published by the Office for Research and Statistics that summarize and supplement key results from the multi-year Public Library Funding and Technology Access study. To date this project has issued three annual reports beginning with the 2006/2007 edition. (The project is a collaborative effort connected with the Public Libraries and the Internet longitudinal studies which began in 1994.)

Anyway, I wondered how big this increase actually was and what the level of job-seeking computer use had been before the big increase happened. So I went searching for the numbers in the ALA Public Library Funding annual studies. Turns out none of the studies measured frequency of job-seeking or e-government computer use by patrons. Nor did the studies compare frequencies of any reported computer uses from year to year. The 2006/2007 and 2007/2008 editions merely state that job-seeking and e-government were common uses reported by some patrons, without mentioning increases of any sort. The 2008/2009 edition reports increased patron job-seeking computer use, but does not describe this increase as substantial or “great.”

The quotation from the issue brief (above) says that the researchers detected this increase by means of interviews with staff and patrons during library site visits. These interviews, conducted in a few selected U.S. states each year, included a simple open-ended question to patrons: “What do you use [the library's computers] for?” Job-seeking and e-government made the lists of most frequent responses (and each state’s list apparently differed from the others). But, no frequency counts for these uses show up in the three studies, perhaps because the counts weren’t collected. (Site visits, as well as focus groups which the ALA studies held, belong to the category of qualitative research methods. As the ALA project illustrates, collecting essentially quantitative information using qualitative methods can lead to problems.)

Even if the researchers did tally these uses during the interviews, neither the interviewees nor the states where interviews were conducted were randomly selected. So, we couldn’t say the tallies represent the larger patron population nationally. The 2007/2008 study reports a convenience sample of about 200 patrons who were using library computers at the time (Libraries Connect Communities, American Library Association, 2008, p. 128). (See the NOTE below.) The same study then reports that “Interviews with users confirmed staff observations that much computer use in libraries is job-related…” (p. 131; I added the emphasis). Exactly how much they don’t say. And how often unrepresented patrons—say teenagers who had yet to show up after school—might use computers for job-seeking we can’t tell. (The 2006/2007 ALA study cites a 2006 Baltimore, MD study where library computer use was found to depend on age group. See footnote on p. 169 of the ALA 2006/2007 study.)

Selection bias of a different sort confounds year to year comparisons from the ALA studies. Because the annual interviews were conducted in different states where the job markets and online government services could differ significantly, there are no reliable baselines for comparing job-seeking and e-government computer usage between years. For instance, the 2006/2007 study included a site visit to a library in Nevada where staff reported long lines of patrons using library computers to apply for jobs at a newly opened gambling casino. Relying on such atypically high usage as a baseline for job-seeking computer use could mask actual increases in later study years in different states.

The question remains, what data are these “great increases” based on? None that I can find in the ALA studies. The issue brief does cite other figures that did increase over time, but these figures don’t describe patterns of computer use. The figures are from the survey portion of ALA’s studies, from questionnaire items that elicit library staff opinions. Staff were asked to identify the top five public Internet services that they believed to be “the most critical to the role of the library branch in its local community.” In the 2006/2007 study 44% of responding staff chose provision of job-seeking services as one of their top five priorities. In the 2007/2008 study this proportion was 62.2%, and in 2008/2009 it was 65.9%.

Basically, about 22% more votes by staff (each staff respondent got up to 5 votes) went to patron job-seeking services in the 2008/2009 survey than in the 2006/2007 one. While these vote tallies may reflect changes in staff perceptions of computer use during this period, the votes are only opinions, and don’t indicate how patrons actually used computers in libraries.

The truth is that we don’t have the usage data needed to support the assertion made in the ALA issue brief. Without valid baseline data, we can’t measure increases in patron job-seeking or e-government computer use at all, and we certainly can’t tell whether or not any increases have been great.

———————————-

NOTE:    Convenience sampling is a type of ‘nonprobability sampling.’ With nonprobability sampling, we have no statistical basis for claiming that our study findings describe the larger population that we had hoped our research would apply to. Using nonprobality sampling invites biased information into study results.

Categories: Advocacy · Measurement · Research
Tagged: , , , ,

Research Literacy

September 10, 2009 · Leave a Comment

In 2008 the Online Computer Library Center (OCLC) published a marketing research report addressing the need for increasing public support for libraries. The study, From Awareness to Funding: A Study of Library Support in America, was funded by a $1.2 million grant awarded by the Bill & Melinda Gates Foundation.

Respondents in the study were divided into four groups according to how supportive of public library funding they were. Somehow—the OCLC researchers don’t say how—certain questionnaire responses qualified respondents for assignment to each group. But the results don’t always make sense. For instance, 20% of “super-supporter” group said they were not definitely committed to voting in favor of libraries. And 6% of this group was either unsure how they’d vote or said they would vote “No.” Wondering how such uncommitted respondents ended up assigned to the group that is “super” supportive of libraries, I contacted the OCLC researchers. Unfortunately, I never did get a response to this question.

There are other perplexing aspects of the study, like details of the study sample: What was the sampling frame? Were subjects randomly selected? Is the sample geographically representative? Apparently, OCLC’s research is proprietary and not subject to the type of outside verification and review that is the mainstay of academic and scientific research.

So, now to a couple of things clearly evident from the 212-page report. The researchers devised two statistical indexes to describe key attributes of their study respondents: one based on respondents’ library visit frequency, the other on responses to specific questionnaire items. Imaginative charts comparing the two indexes for different respondent groups are spread throughout the first half of the report. An article I wrote examining these indexes appears in the current issue of Public Library Quarterly (PLQ), vol. 28 no. 3 (July-September) 2009.

In the PLQ article I point out that the indexes happen to have no units of measure. In the language of mathematicians, they are “dimensionless.” This makes it really difficult to interpret the repeated comparisons that appear in the study. This annotated graphic from the PLQ article attempts to explain a sample chart from the OCLC report:

PLQArticleFig3_380                                 ©  2009  Taylor and Francie Group

I won’t elaborate on this graphic because I want you to read the complete PLQ article. Suffice it to say that my annotations are the rectangular bubbles, arrows, and parenthetical axis labels. I conclude that, while the indexes appear to be accurate,*   they are more complicated than need be, given that they are based on such simple data. As a result, they aren’t particularly useful.

My main message from the article is, though, that as consumers of advocacy research we need to be research literate, meaning that we have the knowledge necessary for examining research findings to determine whether study data and methodology really do justify the study findings. Certainly, we need our leading library organizations to pursue advocacy research projects like this one. At the same time, we have to look beyond deluxe graphics and polished text and focus on the actual substance of library studies. As information specialists and brokers, we have a special obligation to verify that research done on behalf of libraries is understandable, accurate, objective, transparent and relevant.

* OCLC declined to respond to my repeated requests for information necessary for auditing the data in their report. They did give permission for reproducing their charts in PLQ.

Categories: Advocacy · Research
Tagged: , ,

Thoroughly Modern Museums and Libraries

August 31, 2009 · Leave a Comment

I think I get it now.  I had thought the term assessment meant a systematic and appropriately rigorous measurement of a construct or phenomenon of interest, like program outcomes, community needs, service quality, and so on.  Only now have I come to understand that a self-assessment is a different animal altogether. Who would have thought that the purpose of a self-assessment is not really to assess anything?  The purpose, I now realize, is to inform and educate. All this time I have been applying research methodology standards to tools that are intended to advocate and indoctrinate. No wonder my observations have been so off-base!

When I disapproved of WebJunction’s online competencies assessment questionnaire (see my April 22, 2009 entry), the WebJunction staff explained to me that the true objective for their surveys was to increase awareness of these competencies. I immediately wondered, “Well, how then will WebJunction measure awareness?”  But that is quite an irrelevant question when these questionnaires are actually teaching tools, not measurement instruments. Since the instruments don’t really have to measure anything, we don’t have to obsess about how reliable or valid they are. They can be evaluated (I guess) according to how well they apply proven methods for facilitating adult learning.

The irony of using a research instrument like a survey questionnaire this way will probably escape the majority of librarians (i.e. those who disliked library school research methods class.)  But here’s the story: One of the giant problems in designing behavioral science measures is making sure the measures don’t alter the thing you’re trying to measure. Measures are supposed to be unobtrusive. You would never trust a thermometer if you found that, while measuring the temperature of water, the thermometer also happened to heat the water! The same goes for questionnaires and tests in behavioral science and education.

Worries like this are old hat nowadays. Forget the antiseptic, hands-off approach. So easy and cheap to post online, the new questionnaires are designed to induce change by informing, educating, and motivating respondents. Millie95I ran across another one of these in connection with a new initiative on “21st century skills” launched last week by the Institute of Museum and Library Services (IMLS). This campaign presents a thoroughly modern take on the mission of libraries and museums. You can read the details and access the “self-assessment tool” here.

Still stuck in my 20th century research methodology paradigm, I found the IMLS questionnaire technically interesting. It is what I call a “Goldilocks instrument” since it uses a 3-point ordinal scale that amounts to a little, a medium amount, and a lot. The response options are something like this:

Goldilocks110

  1. The institution rarely practices such-and-such 21st century skills enhancement task or technique
  2. The institution practices the task or technique fairly often, or
  3. The institution almost always practices the task or technique.

In several questions in the survey, this tripartite scale appears as less than 25% of the time, 25% to 75% of the time, and over 75% of the time. But you get the idea—small, medium, large.

Specific questionnaire items address a series of general institutional dimensions like accountability, leadership, partnerships, and so on.  (See the self-assessment tool matrix.)  Then, in each area, the institution is rated as being in one of three developmental stages:  Early, Transitional, or 21st Century. An institution’s Goldilocks responses fall conveniently into these stages (surprise!!).  If you perform a 21st century skill enhancement task less than 25% of the time, you are in the Early (Neolithic?) stage on that one.  If you perform it more than 75% of the time, you are thoroughly modern!

At the completion of the questionnaire, the self-assessment tool simply parrots back an institution’s responses in graphical form. There are “Recommendations” buttons users can click on, but the advice offered is pretty much the same, regardless of an institution’s rating: Use the results “to initiate a dialogue with your institution’s leaders, board, colleagues, and other stakeholders” so you can improve your rating. In Goldilocks measurement terms, having the most 21st century skills possible is always “just right!”

Obviously, the survey is a teaching tool, not an assessment. That’s why there is no need for the instrument to gauge how libraries and museums compare to any independently derived standards.  nutrition100Like some “minimum recommended daily allowance” of a particular 21st century practice. This makes things much simpler for IMLS because the idea of library or museum standards, itself, is notoriously tricky.  Several of the approaches endorsed in their model don’t apply to many institutions.  (How can a small rural library or a historic police museum be collaborating with community partners on its new educational programs “over 75% of the time?”)

Fortunately, these types of measurement issues are immaterial.  Remember, this is not assessment.  It is education and proselytizing.  In fact, the IMLS self-assessment tool demonstrates one 21st century skill enhancement technique first-hand. As described in the project report, the tool is clearly interactive audience involvement! Rather than posting the questionnaire merely to measure something, IMLS is modeling the behavior they are seeking from museums and libraries.  I think it’s called “showing by doing.”

Categories: Library assessment · Measurement · Research

Cha-Ching!

August 14, 2009 · Leave a Comment

I noticed that yet another library value calculator has appeared on the scene. This one is offered by the National Network of Libraries of Medicine (NNLM) NNLMLogo with the very best of intentions, I am sure. But, let me say that I am convinced that these calculators are a bad idea. Their underlying assumptions are weak and their designs are not well thought out. Eventually, library funders and stakeholders are going to realize that the calculations are superficial and…well…sloppy.

For one thing, sound cost-benefit analysis requires an examination of the full extent of relevant costs and benefits of a given project, program, or service. These quick-and-easy library calculators, however, use average retail prices as proxies for benefits. This oversimplification ignores important sources of library value like contributions to student and life-long learning, scientific and academic research, and public discourse, as well as roles libraries play in imparting cultural and humanitarian values and traditions, promoting literary appreciation and aesthetic values, facilitating community cohesion, and so forth.

boiling120But say that, for practical purposes, we accept the idea that value-boils-down-to-price as reasonable. Even so, the retail pricing approach these calculators use has definite problems. The calculators view retail prices as estimates of costs that patrons would incur if the library’s items and services were—hypothetically—unavailable to the community or institution. The library comes up with a retail price for each type of material and service it offers, and then these prices are translated directly into the value patrons receive from utilizing these materials or services.

In many cases, however, the alternative to obtaining an item or service from the library is not an outright purchase at retail prices. A student might purchase a textbook for $125 and then later re-sell it on Amazon.com for $50. Or perhaps she buys the item at a used price or borrows it from a friend for free. Clearly, a variety of alternative patron scenarios are possible, meaning that there is a range of alternative costs (approximate values) associated with each item or service use. The average of these ranges will typically be less than an item’s retail price. Besides, an item borrowed from a library does not include the breadth of rights and conveniences that item ownership does. So, it is a stretch to say that a patron always enjoys the same benefit from a borrowed item as from a purchased one.

Other problems with the calculators make their output suspect. For example, each time a patron renews an item or re-uses it in-house or online, the item’s retail price gets credited—again—to the library’s value totals. (Cha-ching!) On the other hand, when our Amazon.com shopper purchases a book at $75, that book’s value does not increase to $150, then to $225 and beyond each time the owner opens the book, or with each 3-week library loan period that passes.

Because the calculators tally only certain types of transactions, they end up painting a rather rosy picture of library performance. Consider the case of a patron who needs an item or service that is (really) not available from the library, and whose information need ultimately goes unmet. And the case of a service delivered that fails to meet a patron’s need, such as an unproductive reference consultation. The first case won’t be tallied at all by these calculators, and the second case will be tallied but will be significantly over-valued. (It will be considered a complete success.) Yet, the actual value of both of these patron transactions is negative and should be entered into these calculators this way. Unfortunately, the calculators’ designs do not accommodate this.

Given these problems and oversights, it is fairly obvious that these calculators produce exaggerated estimates of the benefits which libraries provide. Perhaps this exaggeration is only moderate or perhaps it is substantial—we cannot really tell for sure.

The calculators also underestimate the cost side of the equations, causing their benefit/cost ratios to be even more over-stated. They ignore several key costs incurred in delivering library materials and services, abacus120including expenses for information technology, equipment, building maintenance, utilities, and administrative overhead. These calculators also disregard the incidental costs that patrons may bear, like travel and parking costs, time lost due to item unavailability or poor service, usability difficulties encountered, and so on. In fact, NNLM’s calculator errors in the opposite direction: Assuming that libraries are always convenient, the calculator builds a patron time-savings factor into its formula. (I suppose you could enter in negative numbers to register patron lost time and inconvenience.)

When the calculators do recognize costs, they end up settling for data that are the grossest of estimates. For instance, users can enter estimated percent of total library staff time spent supporting access to materials or services. Creators of the calculators seem unaware that accurate benefit/cost ratios require meticulous collection of operational data, not just convenient guess-timates.

You will be hard-pressed to learn about these shortcomings from the materials that accompany library value calculators. Mostly, libraries receive general guidelines for entering data and encouragement to use the calculators without reservation. The library just keys in its data and—voilà!—receives an exact return-on-investment percentage or benefit/cost ratio right on the spot! Given the casual assumptions the calculations entail and the inexactness of the library’s input data, you’d think the final answer would at least include some type of margin-of-error disclaimer. Maybe something like this:

Your library’s benefit-to-cost ratio = $8.20 per $1.00 cost*

*    Based on our calculations, we are 95% confident that your library’s benefit/cost ratio is between $4.50 and $12.50 (per $1.00 cost). If your data are especially inaccurate, this range will be larger. Note that our single $8.20 estimate may be high due to assumptions our model uses.

Needless to say, this kind of small print doesn’t appear in the instructions that come with library value calculators. As they are, the calculators generate figures that are precise to the penny, with no other explanations to speak of. Libraries confidently report the figures to stakeholders as accurate, authoritative, and nearly approaching Scientific Truth. Of course, the figures are nothing of the sort.

Clones of these library calculators have sprouted up on dozens of library websites, where patrons are invited to enter their custom data to receive their own monthly “value of library services.” Costs are typically not mentioned, so that final value calculations are simple multiplications of counts times arbitrary and often fanciful retail price estimates. Of course, the nifty and optimistic totals will delight library patrons. The totals might even please the population of nonusers who are happy to subsidize library use by others as an overall benefit to the community or institution.

On a few public library websites the calculations are made even more tantalizing by informing patrons about their “individual return-on-investment”—how much value they gain for every tax dollar they contribute. (Don’t you just love democracy!) Unfortunately, this approach casts the wrong light on the public value of libraries. First, the figures are further exaggerations because they use per capita revenue data. Not every public library user pays taxes, a fact that makes the individually quoted return rates artificially high. (Instead of library tax revenue per capita, the calculations should use tax revenue per tax-paying household.)

Second, these seemingly benign “value” calculations actually hide information. The websites fail to provide overall return-on-investment rates for all tax-paying households or for all tuition-paying students. As I have already alluded, an individual patron’s rate of return is being subsidized by nonusers of library services. For every patron elated with his own personally-calculated rate, there will be several households or students whose return rates are less than $0, meaning they lose money on their library tax or tuition “investments.” (This mix of returns rates also applies to using the vanilla versions of the calculators that don’t bother to factor costs in.) Omitting this larger picture from these presentations is slanted and misleading—something that libraries should not be involved in.

From a public or institutional value perspective, these Library 2.0-inspired patron calculators CreatingPublicValue100 completely sidestep the rightful purpose of library evaluation. This purpose is to assess the extent to which the library provides value to the institution or community as a whole, not how each individual fares. This assessment must also confirm that products and services are equitably distributed, that is, equally available and accessible to all who wish or need to use them (see Creating Public Value by Mark H. Moore).

In actuality, economic valuation is not so simple as it appears. It involves complicated (and frustrating) concepts like exchange value, use value, contingent value, and others. Even business corporations have misgivings about standard return-on-investment analysis because of how difficult it is to obtain reliable data to input into the formulas.

If we want to use purely monetary estimates of the value of our services, we need more rigorous methods than these makeshift library calculators. This exact advice was offered to us a couple of years ago MeasLibValue100 by Donald Elliot and Glen and Leslie Holt in their book Measuring Your Library’s Value. Their work provides important guidance that we should be heeding. Like the fact that benefit/cost valuations are unique to the communities and institutions from whence they come. The figures are really not comparable across communities or for different libraries. This is something that most of us would not have thought about. The central message from their book, though, should already be obvious to us: We can’t just make these benefit/cost numbers up, the way these calculators do. There have to be sound theoretical and empirical bases for our findings.

Sure, quick-and-dirty estimations might be helpful in certain situations, as long as they are recognized for what they are. But the numbers gushing from these library calculators are nonsensical and disingenuous, in many cases. The whole idea has become an impediment to the real work of assessing library value. When the batteries in these little pocket library calculators wear out, I recommend that we just not replace them.

Categories: Library assessment · Measurement · Research

A Preponderance of the Evidence

June 29, 2009 · 3 Comments

It is fairly well known that the field of business management can be susceptible to fads. Organizational scientists have studied the adoption of business approaches like management-by-objectives, total quality management (TQM), business process re-engineering, just-in-time manufacturing, scorecard methods, and others. Their work has led to an interesting body of literature about management innovations and organizational change.

One idea from this literature is that management innovations can morph from demingthe original ideas of their founders. Over time TQM began to promote practices that quality gurus like W. Edwards Deming warned against, for instance, bestowing individual rewards for quality objectives accomplished. And sometimes organizations take liberties with the specifics of an innovation. They might
Dr. Deming               decide to use only the components they’re most comfortable with or add their own idiosyncratic twists.

Recently, the business profession came up with yet another data-driven bandwagon known as evidence-based management. And some in the library profession have become enamored with this new technique, in the variety I just named or in the form of “evidence-based practice.” Both varieties have been inspired by “evidence-based medicine,” an idea that surfaced in the early 1990’s in the field of medicine. (See Trinder, S. Reynolds (eds.) 2000. Evidence-Based Practice: A Critical Appraisal.)

Evidence-based medicine began with a specific objective: to systematically collect clinical research studies, assess their validity, reliability, and relevance, and synthesize study conclusions for physicians to use in making individual clinical decisions. Because one of the system’s main tenets is the need for objectivity, its practitioners have instituted strict procedures to make sure review summaries are impartial. Preferably, several studies on each medical topic should be reviewed and compared. (This is because single research studies cannot necessarily be trusted. Their findings can easily be subject to measurement, sampling, or design errors. Plus, studies can contradict each other, leaving it to the profession at large to sort things out.)

Despite its complexities, evidence-based medicine is an application of the common sense notion that verifiable data should be a part of any decision-making process. In the arena of management this has simply been thought of as sound decision-making. And the basic managerial idea has been around at least since the late 19th century. (Some of the earliest data-intensive management practices came from the railroad industry, where accurate tracking of cars, freight, rates, and schedules was essential.) Use of valid data for decision-making has been the foundation of scientific management, financial accounting,  managerial control, and performance measurement as taught in business schools for decades.

ManifestoLast year School Library Journal ran an article with the intriguing title The Evidence-Based Library Manifesto. I am not sure whether the title means that the Manifesto was supported by evidence or that this is a clarion call for evidence as a desirable thing. In any case, the title happens to perfectly express (sorry, I’m getting philosophical here…) a basic paradox that government and not-for-profit organizations face. Evidence is objective, empirical, and systematic while a manifesto is pure subjective belief and opinion. This is the dilemma of rationality versus commitment that Aaron Wildavsky explored in his classic article which I cited in a prior post.

Anyway, the gist of the SLJ article is that school libraries need to mobilize to justify their existences and that the libraries should capitalize on empirical studies that prove their effectiveness. In a 2009 article the same author, Ross Todd, wrote more about this:

At a local school level, evidence based practice of school librarianship seeks to demonstrate the value-added role of a school library to the life and work of a school—outcomes that center on learning, literacy and living…    Todd, R. 2009. School Librarianship and Evidence-Based Practice, Evidence Based Library and Information Practice, 4(2), p. 88.

So let’s see. If we translate this quote into the terms of evidence-based medicine, we get:

At the level of the individual physician/practitioner, evidence based medicine seeks to demonstrate the positive effects of the physician’s overall professional practice on the patient–effects that center on good health, healthy life-styles and living…

To which I respectfully respond (excuse my French), “Au contraire!” The function of evidence-based medicine is not to promote or commend physicians’ decisions, but rather to inform them.gold The practice has no agenda other than to help improve clinical decisions and patient health. Todd’s 2009 article even mentions this point in a description of a “gold standard” for evidence-based education which frowns on use of advocacy studies due to their inherent biases. Yet, the school libraries Manifesto aims to rally support for this use. WebJunction’s new model of “library management competencies” also urges librarians to “use evidence-based management to demonstrate the value of the library” (Gutshe, B. ed., 2009. Competency Index for the Library Field, p. 2). And the title of an editorial in Evidence Based Library and Information Practice promotes this stance.

I am all for our profession conducting advocacy and action research, outcome studies, and return-on-investment and cost-benefit analyses. But I believe we ought to name these what they really are. This is a truth-in-advertising thing for me—I admit it. Gathering information or devising studies to confirm the benefits, value, or impacts of libraries is not evidence-based practice or management. I don’t think we’re helping things by mis-applying these new buzzwords. Their definitions tend to mutate, sometimes to the point that they are just plain wrong. And this fuzziness can lead libraries to believe they are using one tool when in fact they are using quite another.

Categories: Advocacy · Research
Tagged: , , , , ,

Once Size Doesn’t Fit All

May 7, 2009 · Leave a Comment

A basic tenet of public librarianship is the idea that each library and its communities are unique.  While libraries share certain characteristics in common, their products, services, and operations are (in theory) highly customized to fit local conditions. I didn’t realize how strong a tenet this was until I heard this declaration at an Ohio Library Council conference:  “All library excellence is local.”  Wow, pretty unequivocal!  Granted, public libraries do acknowledge that they have certain things in common with other libraries, but it sure sounds like unique characteristics trump everything else.

This contrast between things standard and things tailored (or customized) turns out to be a theme central to evaluation research also.  The idea has been noted, for instance, by Mark Lipsey, co-author of the leading textbook on program evaluation:

Evaluation7Ed100“One of the difficulties in evaluating a specific program is that [there is] little basis for knowing which aspects of the program work in relatively predictable ways and which are very distinctive to that particular program situation. A given intervention…may be known to have positive effects when used with some client populations but [not for others].  Similarly, one variation of a service may be effective, but that may not be true of another variation, especially when applied in a different program situation.”      Lipsey, M.W. (2000). Meta-Analysis and the Learning Curve in Evaluation Practice, American Journal of Evaluation 21(2), p. 209.

In Lipsey’s quote just replace “relatively predictable” with “standard” and replace “distinctive” with “custom” or “tailored.”

Here’s the same idea from the Kellogg Foundation’s evaluation handbook:

“All too often, conventional approaches to evaluation focus on examining only the outcomes or the impact of a project without examining the environment in which it operates or the processes involved in the project’s development. Although we agree that assessing short- and long-term outcomes is important and necessary, such an exclusive focus on impacts leads us to overlook equally important aspects of evaluation–including more sophisticated understandings of how and why programs and services work, for whom they work, and in what circumstances.” W.K. Kellogg Foundation Evaluation Handbook, p. 20.

Suppose that our profession produces a rigorously conducted outcome evaluation of, say, summer reading programs and the study affirms the effectiveness of these programs.  Then, what claims can be made about library summer reading programs nationwide?  Can we boast that this effectiveness applies to any and every public library summer reading program and attendee group?  Experts from the field of program evaluation tell us otherwise.

WPASummerReadingClubOnly to the extent that a library’s summer reading program matches the content and delivery approach of the programs in the outcome study, and the library’s clientele also matches those in the study–only to these extents can a public library point to the outcome study as evidence of its local program’s effectiveness.

Public libraries view their attunement to the nature and needs          WPA Poster           of unique communities as the foundation for their excellence and effectiveness. This puts the onus on libraries to demonstrate how well their custom practices work for their local clientele. Pretty tall order.

Categories: Library assessment · Research
Tagged: , ,

Objects in Mirror Are Closer Than They Appear

May 1, 2009 · 3 Comments

ObjectsInMirror130In January my brother and I were laying laminate flooring in his house.  Each time we needed to trim a plank, we stood reverently by his table saw and incanted the familiar carpenter’s adage, “Measure twice, cut once. (Amen.)”  My brother said, “It’s the damnedest thing. You can repeat and repeat a measurement, and then find out it is still wrong.” As an electrical engineer (he’s working on the 3rd edition of his book on digital signal processing), his observation comes from dozens of real-life technical projects.

In the behavioral sciences as well as in program evaluation and performance assessment we attempt to measure fairly abstract things—like social class, anxiety, customer loyalty, community need, awareness of services, and so on. Measuring these is difficult. But even in the “hard” sciences measurement is a continuous challenge.

So, I want to write about what statisticians call measurement error. And I might as well start right off with a rather advanced idea:  Measurement is about reducing error. We try to be systematic in our measures to increase accuracy, and minimize error in the final measurements. The thing is, we are never 100% successful in this. And, truthfully, we hardly ever know how successful we have been. Our only hope is to keep refining our methods and measures to try to eliminate those sources of error we know about.

Given this, we in librarianship really need to discard the naive idea that we can obtain “hard facts and figures,” an idea bandied about most often in the field of business management.  I suggest that we not look to MBA’s and business consultants for advice on this topic (and we should especially avoid accountants and efficiency experts). Instead, I believe we will find the measurement approaches used in physical, natural, behavioral and statistical sciences to be more fruitful.

Dial90Rather than looking at measurement as producing facts, it is perhaps better to say it produces impressions. And impressions will vary on dimensions like accuracy (precision), breadth (scope), and validity (relevance).

Let’s look at the last of these—validity.  How faithfully a given measure or indicator reflects something we are interested in understanding. Say we want to determine how satisfied customers are. Let’s assume that satisfaction is  both an attitude and a feeling customers have. But we cannot actually tap these things directly. We can only get hints—indications, we say—of these. Usually we do this by interviewing customers or having them complete questionnaires. Yet, there will always be a disconnect between how customers answer questions and their real, internal level of satisfaction with our products and services. This disconnect is a form of error.

Our instruments just are not refined enough to get at the real-life phenomenon we are interested in. So we get only a taste of one aspect of the phenomenon. For instance, the field of business uses reported customer intent to recommend products or services to friends as an indicator of satisfaction. (Actually, the field is more attuned to the evil twin satisfaction indicator known as “negative word-of-mouth behavior!”)

But, suppose that—miraculously—we do develop an advanced instrument that perfectly detects the entire range of customer Spockattitudes and feelings that form “satisfaction.” Say we are able to create a mind probe! Even with this perfect instrument, other factors can make our measurements inaccurate. In other words, each time we measure something—even with the most proven measurement instruments—extraneous things interfere. A subject we are measuring may be distracted due to a high                     Or mind meld? caffeine level in his blood. Or an electric voltage spike overnight may have thrown off the sensitivity of the probe. Silly examples, I know, but the point is we don’t know what this myriad of interfering factors might be.

Statisticians view any given measurement as being a sum of two numbers:  First, a  true, valid number (in units we understand) reflecting what we are interested in. Second, another number that reflects how weird circumstances and other factors have “spun” the final measure to make it slightly, moderately, or even grossly out of whack. This second number is “error.” You can see this idea illustrated here and described further here.

Obviously, there is much more to this topic. But the idea is that we strive to produce accurate measurements so that our final numbers are mostly true. So repeat after me, “There are no such things as hard facts and figures. There are no such things as hard facts and figures. There are no such things…”

Categories: Measurement · Research
Tagged: , , , ,

New (or Old?) Paradigm Spurs ‘Fundamental Shift’ in Library Advocacy

April 25, 2009 · Leave a Comment

“Everything Old is New Again” is the title of a 1999 article in American Libraries by Douglas Raber, author of the excellent and eye-opening book, Librarianship and Legitimacy: The Ideology of the Public Library Inquiry. The article suggests that the Inquiry, a comprehensive assessment of public librarianship initiated by ALA in the late 1940’s, continues to be relevant to libraries today. While in library school I discovered Raber’s book in the stacks of Cleveland Public Library.  The book was so inspiring that I got ahold of 3 of the 7 volumes of the Inquiry (thank you, CPL!) and read them also.

Now the next piece in my story: My colleague Keith Curry Lance had recommended a podcast series to me. It is called “Longshots” and is broadcast by Sarah Long, Executive Director of the North Suburban Library System outside Chicago. longshots110I decided to take a listen and chose a December 2008 interview with Cathy de Rosa and Jenny Johnson, primary authors of the OCLC study, From Funding To Awareness: A Study of Library Support in America. A couple of months ago I studied  the first half of this voluminous and highly graphicized report. In case I never got back to the second half, I thought I’d see how the audio book version went.

In the podcast Jenny Johnson reported that:

“We conducted several focus groups that gave us a sense of what the types of messages are that really are likely to move [potential library levy/referendum supporters]. Specifically transformation. I think we have a tendency in our space…to think in terms of focusing on information. And what we learned was it’s really the power through libraries to transform the people we serve and allow them to reach their potential that really resonates with potential funders.”

Ok. Now compare that to this quotation:

“THE LIBRARY FAITH. Throughout the years librarians have transformed their concept of function into a dynamic faith. StMarkCollinsMonk120This faith has sustained the men and women who have built and operated American public, as well as university and research, libraries and the men of wealth and political position who have provided for their financial and legal support. It consists of a belief in the virtue of the printed word, especially of the book, the reading of which is held to be good in itself or from its reading flows that which is good.”     Leigh, R.D. (1950). The Public Library in the United States: The General Report of the Public Library Inquiry. New York: Columbia University Press, p. 12.

“Good” that flows from reading the printed word. “Transformation” that comes from information. Kinda similar, wouldn’t you say? Consider Cathy de Rosa’s elaboration:

“[From our study] there were a couple really surprising things…Maybe not too surprising findings if you really think about them…One of the things we found in the study is that an appreciation of library [sic] really transcends use, if you will, that people see it as a U.S. right for citizens and they want to fund it even if they themselves don’t use it.”

Again, it isn’t just about information access for all. de Rosa reports that:

“…while [information access] is an important service, the funding reason–the reason behind why libraries offer information—is really the compelling reason why they’re willing to fund us. So, while we’ve pushed so hard the information, information, we really need to—as is often the case in marketing—talk about, and what does that mean for my community? And that’s really a fundamental shift in what we have been doing collectively as a community in advocacy. And it’s a critically important finding out of this study… It is very common in marketing that we talk about what we deliver—we have new books today or…new software, if you happen to be a software provider—instead of what value it brings to the community. ”

Branding180 So, library marketing and branding devotees are beginning to see the same light that shined more than thirty-five years ago on one of the founders of library evaluation, Richard Orr. His milestone 1973 article “Measuring the Goodness of Library Service” describes evaluation in terms of quality (of collections, services, facilities, etc.) and value (benefits to the community). Within a few years, social program evaluation theorists Michael Scriven and Jane Roth had coined two fundamental terms for their profession—merit and worth—which mean the same as quality and value.

Definitely, there have been pioneers clearing these pathways for us. And now 21st century librarianship seems to be reaching a (gasp!) consensus. Merely reporting what information resources libraries make available and how often these are utilized is not enough. We need more convincing ways to portray libraries as valuable and viable public institutions. But, how exactly to do this….?

Categories: Advocacy · Research
Tagged: , , , , , , , , ,

Poor WebJunction Survey Design Makes Findings Pretty Much Useless

April 22, 2009 · 1 Comment

This week I noticed that WebJunction is conducting a survey entitled “Technology Competencies Evaluation.”  I think this must be a sequel to a survey I saw there last month about “management core competencies.”  While the surveys are probably marketing research for WebJunction’s e-learning product line, the researchers say they want to use the data to “establish a baseline for the library field.” Thus, they do profess an interest in identifying larger and, we might conclude, non-commercial trends within the library profession.

Whatever their intentions, the surveys won’t produce much reliable information due to poor designs. First, neither questionnaire actually assesses questionnaire130competencies, that is, knowledge or skill levels. Instead, they measure respondents’ opinions about their own knowledge and skills in a dozen or so training topics. So, any baselines WebJunction comes up with will be merely about current opinions which would later be compared to some subsequent set of opinions.

So, what will they learn?  At the most, they can determine whether library staff believe they are more (or less) knowledgeable over time. That type of information, while mildly interesting, seems beside the point. Wouldn’t it make more sense to measure competencies as compared to some minimum acceptable levels, the way that IT certification or professional licensure exams do?  Later, perhaps, it might be useful to compare these over time, but that would not be as significant as a comparison of skills and knowledge to well-thought-out minimum standards.

Second, information from these surveys is compromised by the sampling method that WebJunction researchers have chosen.  They use what is called convenience sampling. marssoilsample160Rather than using some more systematic method (i.e., random sampling), they get respondents where and when it is convenient. This method severely limits the usefulness of study results. Because the respondents are self-selected, their responses will, in all likelihood, differ from the larger population of library        Soil sampling on Mars.
staff the researchers are interested in. That is, the                (NASA/JPL/Univ. Ariz.)
findings will be biased.

Suppose that mostly tech savvy librarians tend to take the surveys. Then, levels of self-reported competency will be artificially higher than the larger population of librarians and library staff overall. Or perhaps the opposite is true, that respondents tend to be mostly tech un-savvy non-librarian staff. Either way, allowing respondents to self-select introduces a troublesome and typically unknown slant, making results biased and misleading. Statisticians describe this situation by saying that “results cannot be generalized to the larger population of interest.”  This research validity issue–called external validity–is a central concern in behavioral and marketing research methods.

Convenience sampling also hampers the baseline comparisons WebJunction talks about. Without making sure they have representative samples of the larger population of interest, there is no way to know whether differences between baseline measures from this month’s survey and later surveys are bogus.  Perhaps the original (baseline) respondents were very tech-savvy, and the future (comparison) respondents are not at all. In this case the researchers will be comparing two non-equivalent groups’ opinions. This will lead to incorrect conclusions about apparent changes in opinions of the larger population of library staff over time. It may be that overall library staff opinions have remained unchanged even though the two samples—baseline and later comparison—differ quite a bit.

Producing assessment data represents a big investment of time, effort, and expense. Data collection methods need to be designed to produce maximally reliable and valid information in order to justify these costs. balance110Spending researcher and respondent time on surveys that can only produce questionable results is a poor use of library resources. Also, researchers should never portray findings from studies that use poor designs as if they were fair and balanced depictions of the subjects being studied. That would be mis-information, indeed!

Categories: Measurement · Research
Tagged: , , ,