Lib(rary) Performance Blog

Entries categorized as ‘Measurement’

Navigating with Fragmentary Information

February 16, 2010 · Leave a Comment

I have implied this in other entries in this blog, but I might as well say it outright: The library and information science profession needs to come to terms with the issue of standards for (i.e., rules of) evidence for performance, statistical, and advocacy research data. There, now I’ve said it.

I recently read the short and enjoyable book Graphic Discovery: A Trout in the Milk and Other Visual Adventures by statistician Howard Wainer (Princeton, NJ: Princeton University Press, 2005). The subtitle of the book comes from something Henry David Thoreau wrote. During a dairy strike in 1850 in New England people began to suspect that dairy owners were watering down the milk supply. This led Thoreau to write in his journal, “Sometimes circumstantial evidence can be quite convincing; like when you find a trout in the milk” (quoted in Wainer, p. 81).

Wainer’s main point, one certainly made also by others like John Tukey and Edward Tufte, is that well designed graphical representations are invaluable for exploring and understanding data. Graphical data presentation can lead to revel- ations about data, and the underlying phenomena they describe, that would otherwise be missed.

But, alas, Wainer and the others warn that the design of graphs can serve to mislead readers. (Statistics can lie. So, you have to figure that statistical graphs might fail a few polygraph tests, too.)  Now re-sensitized to this possibility, I am—at least in the short term—looking closer at graphs I encounter.  A graph appearing in an article in the Nov. 9, 2009 issue of Business Week was easy

                                     Click on this image to see the Nov. 9 Business Week graph.

prey for my renewed vigilance. Unfortunately, the electronic versions of this article available from EBSCO, LexisNexis, and other databases omit graphics altogether—an aggravating defect of digitization, indeed. To see the graph, click on the image above.

In the Business Week article, “The GDP Mirage,” author Michael Mandel argues that the economic index, Gross Domestic Product (GDP), is incomplete because it does not measure “intangible investments” corporations make. By overlooking these investments, Mandel claims, the U.S. is “navigating…with fragmentary information” (p. 36). The reader can get the gist of Mandel’s ideas from the article itself. For now, I just want to point out that the aim of the graphic is to illustrate the author’s argument.

Notice that the Business Week graphic consists of three charts. Rather than having an individual title for each chart, a caption at the top forms three surrogate titles: “Reported GDP jumps ahead of jobs [left graph]…but the GDP stats don’t count R&D cuts [center graph]…or lost jobs for knowledge workers [right graph].”  The implication is that if the GDP were to include statistics reflecting cuts to research and development and lost jobs, it would be a more valid measure of economic output. (The article doesn’t actually recommend that job loss statistics be included in revised GDP calculations, but we can ignore this inconsistency for our present purposes.)

Its own title notwithstanding, this graphic has a “numbers problem” quite distinct from the GDP measurement challenge that concerns Mandel. The problem with the graphic is this: Two of the three charts report (let’s call these) “actual data” while the third does not. The left and right charts present data obtained from the U.S. Bureau of Labor Statistics, which we can presume were collected using accepted sampling methods. However, the center chart is—depending on how you look at it—either a convenience sample or merely a collection of anecdotes.

The center chart’s heading, “Selected Companies that Have Cut R&D Spending Over the Last Year,” suggests that the selection is some type of non-probability sample. As seen in the chart, cuts for these companies range from roughly 12% to 36%. Nowhere, though, does the chart or the article tell us how the companies were selected or to what extent the percentages pertain to the larger set of U.S. corporations of interest.

What we have is anecdotal information masquerading as data! Even though the chart title is clear,* placing that chart in the middle of two other charts that contain actual data is deceptive. Due to its location and bar-graph style, this chart appears to be on a par with the other charts when it really is not. The center chart is mostly conjecture, the other two have firmer grounding.

Since the units of measure in that chart are percentages, the population parameters (in this case, percentages of decrease in R&D spending among all U.S. corporations of interest) are likely to be within some reasonable range, probably not ridiculously far from the range seen here.

But this is not the point. The author does not have any conclusive evidence about what this range actually is and he, or the creator of the charts, ought to say so. This is a case of pretending to have data that you don’t, in fact, have. Or, in the Mandel’s words, navigating with fragmentary information. Wainer would not be so forgiving; he would call the center chart “nondata” since that is what it is (p. 57). On that same page Wainer also makes this wonderfully apropos pronouncement:

“The plural of anecdote is not data.”

Sure, for particular purposes, quick-and-dirty selections and pseudo-samples can be justified. But, they do not deserve to be graphed. So, if you will permit me, I want to experiment with a possible contribution to the set of standards for evaluating evidence that the library and information science profession might someday establish:

Standard XV.1.c.    Since anecdotal information represents only itself, it shall not be portrayed, nor presented graphically, in a way that implies that it describes any phenomena in the aggregate.

Okay, so I can’t think of very good wording. Thankfully, there’s plenty of time for re-working that sentence…

—————————————————
* I don’t mean to say that the chart is clearly titled, but that, once you are able to find it, the title (or is it a subtitle?) has an unambiguous meaning.

Compared to standards of good graphing practice that Howard Wainer, William Cleveland (The Elements of Graphing Data, Murray Hill, NJ: AT&T Laboratories, 1994) and others promote, the Business Week graphs are pretty damned bad! The axis labels are too difficult to find, first, because the charts are overpowered by thick, all-black bars and bold-fonted category labels (company names and occupation categories). And, second, due to small fonts, crowding, and misplacement.

In the left chart, the label “Percent” has the wrong orientation since it apples to the vertical axis. The chart’s horizontal axis has no label. Thanks to the chart designer’s use of Roman numerals we can guess that the units must be quarters on an annual economic calendar. Squeezing the legend into the data portion of this chart violates a cardinal graphing design principle: Don’t let clutter make the data more difficult to see. Though less important, the word “Forecast,” a note for the single GDP data point at quarter III of 2009, appears in a larger font than both axis labels and tickmark values. Not good.

In the center and right charts, the labels, “Percent Change in R&D Spending” and “Percent Change in Employment,” are misplaced. Both should appear on the lower horizontal axes near the the appropriate grid marks. Both labels include asterisked notes that imply the labels are meant to serve a dual purpose as titles (or subtitles). This confusion could be alleviated by creating descriptive chart titles that include the notes information (no need to separate it), and then inserting fully descriptive labels adjacent to the axes.

Grade this graphic earns:  D-

Categories: Advocacy · Library assessment · Measurement · Research

Sawing with a Dull Saw

January 25, 2010 · Leave a Comment

In spite of their evolution over the last few decades, accelerated most recently due to the Googlization of information, public libraries have been amazingly impervious to change in the arena of performance measurement. I found the following observations about  library measures in the early history of American libraries:

There is no branch of library economy more important, or so little understood by a librarian as helps to himself, as the daily statistics which he can preserve of the growth, loss, and use (both in extent and character) of the collection under his care. The librarian who watches these things closely, and records them, always understands what he is about, and what he accomplishes or fails to accomplish. The patrons to whom he present these statistics will comprehend better the machinery of the library, and be more indulgent toward its defects.     Public Libraries in the United States of America, Warren, S.R. And Clark, S. N., Eds., Washington, DC: U.S. Bureau of Education, 1876, p. 714.

Interesting that use of library statistics for advocacy purposes was recognized in 1876.

Early in the twentieth century our current ideas about performance measurement were already well understood, long before the practices of program evaluation, evidence-based management, and performance scorecarding were formalized. Arthur Bostwick, late director of St. Louis Public Library and Librarian of Brooklyn Public Library, wrote this in his book published in 1917:

No business can be properly carried on without a system of accounts. These may involve only money received and expended, but they may and should extend much further. The collection and tabulation of such [financial and performance] data have come to be regarded as indispensable by shrewd businessmen; and large corporations do not hesitate to spend considerable sums in employing a force of experts and clerks especially to gather data of this kind and to tell what they mean…    The American Public Library, Bostwick, A.E., New York: D. Appleton & Co., 1917, p. 253.

Bostwick also dealt with the ideas of accountability, continuous improvement, careful analysis of library statistics, and a tiered approach to evaluation data-collection:

Information of this [financial and performance] kind is gathered with either or both of two different purposes in view—to satisfy the legitimate curiosity of the person managing the business, or of some one who has a right to know how it is going on, whether it is succeeding or failing and just what it is accomplishing; and, secondly, to furnish a basis for improvements or changes, to indicate weak points and points of strength, so that the business may be reënforced along the former and extended along the latter.

…If the latter [purpose is intended], a more detailed and analytical study is made of the data, which are compared and tested in all possible ways to reveal unsuspected facts. When something is thus brought to light that seems to call for further investigation, additional data are collected. (p. 254)

He also preached about the imprecision of library statistics, a topic conveniently overlooked in our profession nowadays (as is the importance of assuring the validity of data presented in library advocacy reports also):

It should not be forgotten, either by those who collect and report these statistics, or by those who read them or use them, that they are of various degrees of exactness…In any kind of scientific measurement the limits of probable error are always mentioned to give an idea of the degree of accuracy. The less the probable error, the greater the accuracy. It is never stated that there can be no error and that the accuracy is exact; this would be simply ridiculous. The same holds good in library statistics. In the average report nothing at all is said of accuracy; the reader is left to conclude that all the data are exact, or at least that there is no difference in their degree of exactness. (p. 262)

Finally, the level of interest in this topic among Bostwick’s 20th century peers strikes a familiar chord:

But how much intelligent study of library statistics goes on in librarian’s offices, and how much modification or improvement in library methods and material results from such study, is something that we shall never now. It appears to be certain, however, that large numbers of librarians…look upon their statistics in the light of a necessary evil. They must be collected, because some thing of the kind is expected in the annual report, but they should be minimized, and, once in print, they should be dismissed from the mind. This attitude reminds one of the rural workman who used a dull saw because the amount of work before him gave him no time to stop and sharpen it… (p. 255)

Categories: Library assessment · Measurement
Tagged: , ,

The Telephone Game

January 5, 2010 · Leave a Comment

Readers of the, say, older persuasion may recall a time when children actually enjoyed games that required no peripheral devices, infrared sensors, or satellite tracking. There was one party game, simply called (I think) “Telephone,” where one player whispered a message to the next, and that player to the next, until the message was passed all the way around the circle of players. The fun came when everyone heard the amusing distortions that ended up in the final message.

In library advocacy research, though, message distortion is not amusing. I noticed a serious instance of this in a recent IMLS Research Brief which cites an American Library Association (ALA) report finding that patron use of library computers for job-seeking purposes has “greatly increased.” The ALA report is Job-Seeking in U.S. Public Libraries and the statement the IMLS brief cites is this:

“As part of site visits to public libraries in nine states [conducted in three annual studies], the study research team has found greatly increased use of library technology for job-seeking and e-govern-ment.” Job-Seeking in U.S. Public Libraries, American Library Association, Oct. 2009, p. 2.
(The emphasis in red is mine).

The ALA report is one of a series of “issue briefs” published by the Office for Research and Statistics that summarize and supplement key results from the multi-year Public Library Funding and Technology Access study. To date this project has issued three annual reports beginning with the 2006/2007 edition. (The project is a collaborative effort connected with the Public Libraries and the Internet longitudinal studies which began in 1994.)

Anyway, I wondered how big this increase actually was and what the level of job-seeking computer use had been before the big increase happened. So I went searching for the numbers in the ALA Public Library Funding annual studies. Turns out none of the studies measured frequency of job-seeking or e-government computer use by patrons. Nor did the studies compare frequencies of any reported computer uses from year to year. The 2006/2007 and 2007/2008 editions merely state that job-seeking and e-government were common uses reported by some patrons, without mentioning increases of any sort. The 2008/2009 edition reports increased patron job-seeking computer use, but does not describe this increase as substantial or “great.”

The quotation from the issue brief (above) says that the researchers detected this increase by means of interviews with staff and patrons during library site visits. These interviews, conducted in a few selected U.S. states each year, included a simple open-ended question to patrons: “What do you use [the library's computers] for?” Job-seeking and e-government made the lists of most frequent responses (and each state’s list apparently differed from the others). But, no frequency counts for these uses show up in the three studies, perhaps because the counts weren’t collected. (Site visits, as well as focus groups which the ALA studies held, belong to the category of qualitative research methods. As the ALA project illustrates, collecting essentially quantitative information using qualitative methods can lead to problems.)

Even if the researchers did tally these uses during the interviews, neither the interviewees nor the states where interviews were conducted were randomly selected. So, we couldn’t say the tallies represent the larger patron population nationally. The 2007/2008 study reports a convenience sample of about 200 patrons who were using library computers at the time (Libraries Connect Communities, American Library Association, 2008, p. 128). (See the NOTE below.) The same study then reports that “Interviews with users confirmed staff observations that much computer use in libraries is job-related…” (p. 131; I added the emphasis). Exactly how much they don’t say. And how often unrepresented patrons—say teenagers who had yet to show up after school—might use computers for job-seeking we can’t tell. (The 2006/2007 ALA study cites a 2006 Baltimore, MD study where library computer use was found to depend on age group. See footnote on p. 169 of the ALA 2006/2007 study.)

Selection bias of a different sort confounds year to year comparisons from the ALA studies. Because the annual interviews were conducted in different states where the job markets and online government services could differ significantly, there are no reliable baselines for comparing job-seeking and e-government computer usage between years. For instance, the 2006/2007 study included a site visit to a library in Nevada where staff reported long lines of patrons using library computers to apply for jobs at a newly opened gambling casino. Relying on such atypically high usage as a baseline for job-seeking computer use could mask actual increases in later study years in different states.

The question remains, what data are these “great increases” based on? None that I can find in the ALA studies. The issue brief does cite other figures that did increase over time, but these figures don’t describe patterns of computer use. The figures are from the survey portion of ALA’s studies, from questionnaire items that elicit library staff opinions. Staff were asked to identify the top five public Internet services that they believed to be “the most critical to the role of the library branch in its local community.” In the 2006/2007 study 44% of responding staff chose provision of job-seeking services as one of their top five priorities. In the 2007/2008 study this proportion was 62.2%, and in 2008/2009 it was 65.9%.

Basically, about 22% more votes by staff (each staff respondent got up to 5 votes) went to patron job-seeking services in the 2008/2009 survey than in the 2006/2007 one. While these vote tallies may reflect changes in staff perceptions of computer use during this period, the votes are only opinions, and don’t indicate how patrons actually used computers in libraries.

The truth is that we don’t have the usage data needed to support the assertion made in the ALA issue brief. Without valid baseline data, we can’t measure increases in patron job-seeking or e-government computer use at all, and we certainly can’t tell whether or not any increases have been great.

———————————-

NOTE:    Convenience sampling is a type of ‘nonprobability sampling.’ With nonprobability sampling, we have no statistical basis for claiming that our study findings describe the larger population that we had hoped our research would apply to. Using nonprobality sampling invites biased information into study results.

Categories: Advocacy · Measurement · Research
Tagged: , , , ,

Thoroughly Modern Museums and Libraries

August 31, 2009 · Leave a Comment

I think I get it now.  I had thought the term assessment meant a systematic and appropriately rigorous measurement of a construct or phenomenon of interest, like program outcomes, community needs, service quality, and so on.  Only now have I come to understand that a self-assessment is a different animal altogether. Who would have thought that the purpose of a self-assessment is not really to assess anything?  The purpose, I now realize, is to inform and educate. All this time I have been applying research methodology standards to tools that are intended to advocate and indoctrinate. No wonder my observations have been so off-base!

When I disapproved of WebJunction’s online competencies assessment questionnaire (see my April 22, 2009 entry), the WebJunction staff explained to me that the true objective for their surveys was to increase awareness of these competencies. I immediately wondered, “Well, how then will WebJunction measure awareness?”  But that is quite an irrelevant question when these questionnaires are actually teaching tools, not measurement instruments. Since the instruments don’t really have to measure anything, we don’t have to obsess about how reliable or valid they are. They can be evaluated (I guess) according to how well they apply proven methods for facilitating adult learning.

The irony of using a research instrument like a survey questionnaire this way will probably escape the majority of librarians (i.e. those who disliked library school research methods class.)  But here’s the story: One of the giant problems in designing behavioral science measures is making sure the measures don’t alter the thing you’re trying to measure. Measures are supposed to be unobtrusive. You would never trust a thermometer if you found that, while measuring the temperature of water, the thermometer also happened to heat the water! The same goes for questionnaires and tests in behavioral science and education.

Worries like this are old hat nowadays. Forget the antiseptic, hands-off approach. So easy and cheap to post online, the new questionnaires are designed to induce change by informing, educating, and motivating respondents. Millie95I ran across another one of these in connection with a new initiative on “21st century skills” launched last week by the Institute of Museum and Library Services (IMLS). This campaign presents a thoroughly modern take on the mission of libraries and museums. You can read the details and access the “self-assessment tool” here.

Still stuck in my 20th century research methodology paradigm, I found the IMLS questionnaire technically interesting. It is what I call a “Goldilocks instrument” since it uses a 3-point ordinal scale that amounts to a little, a medium amount, and a lot. The response options are something like this:

Goldilocks110

  1. The institution rarely practices such-and-such 21st century skills enhancement task or technique
  2. The institution practices the task or technique fairly often, or
  3. The institution almost always practices the task or technique.

In several questions in the survey, this tripartite scale appears as less than 25% of the time, 25% to 75% of the time, and over 75% of the time. But you get the idea—small, medium, large.

Specific questionnaire items address a series of general institutional dimensions like accountability, leadership, partnerships, and so on.  (See the self-assessment tool matrix.)  Then, in each area, the institution is rated as being in one of three developmental stages:  Early, Transitional, or 21st Century. An institution’s Goldilocks responses fall conveniently into these stages (surprise!!).  If you perform a 21st century skill enhancement task less than 25% of the time, you are in the Early (Neolithic?) stage on that one.  If you perform it more than 75% of the time, you are thoroughly modern!

At the completion of the questionnaire, the self-assessment tool simply parrots back an institution’s responses in graphical form. There are “Recommendations” buttons users can click on, but the advice offered is pretty much the same, regardless of an institution’s rating: Use the results “to initiate a dialogue with your institution’s leaders, board, colleagues, and other stakeholders” so you can improve your rating. In Goldilocks measurement terms, having the most 21st century skills possible is always “just right!”

Obviously, the survey is a teaching tool, not an assessment. That’s why there is no need for the instrument to gauge how libraries and museums compare to any independently derived standards.  nutrition100Like some “minimum recommended daily allowance” of a particular 21st century practice. This makes things much simpler for IMLS because the idea of library or museum standards, itself, is notoriously tricky.  Several of the approaches endorsed in their model don’t apply to many institutions.  (How can a small rural library or a historic police museum be collaborating with community partners on its new educational programs “over 75% of the time?”)

Fortunately, these types of measurement issues are immaterial.  Remember, this is not assessment.  It is education and proselytizing.  In fact, the IMLS self-assessment tool demonstrates one 21st century skill enhancement technique first-hand. As described in the project report, the tool is clearly interactive audience involvement! Rather than posting the questionnaire merely to measure something, IMLS is modeling the behavior they are seeking from museums and libraries.  I think it’s called “showing by doing.”

Categories: Library assessment · Measurement · Research

Cha-Ching!

August 14, 2009 · Leave a Comment

I noticed that yet another library value calculator has appeared on the scene. This one is offered by the National Network of Libraries of Medicine (NNLM) NNLMLogo with the very best of intentions, I am sure. But, let me say that I am convinced that these calculators are a bad idea. Their underlying assumptions are weak and their designs are not well thought out. Eventually, library funders and stakeholders are going to realize that the calculations are superficial and…well…sloppy.

For one thing, sound cost-benefit analysis requires an examination of the full extent of relevant costs and benefits of a given project, program, or service. These quick-and-easy library calculators, however, use average retail prices as proxies for benefits. This oversimplification ignores important sources of library value like contributions to student and life-long learning, scientific and academic research, and public discourse, as well as roles libraries play in imparting cultural and humanitarian values and traditions, promoting literary appreciation and aesthetic values, facilitating community cohesion, and so forth.

boiling120But say that, for practical purposes, we accept the idea that value-boils-down-to-price as reasonable. Even so, the retail pricing approach these calculators use has definite problems. The calculators view retail prices as estimates of costs that patrons would incur if the library’s items and services were—hypothetically—unavailable to the community or institution. The library comes up with a retail price for each type of material and service it offers, and then these prices are translated directly into the value patrons receive from utilizing these materials or services.

In many cases, however, the alternative to obtaining an item or service from the library is not an outright purchase at retail prices. A student might purchase a textbook for $125 and then later re-sell it on Amazon.com for $50. Or perhaps she buys the item at a used price or borrows it from a friend for free. Clearly, a variety of alternative patron scenarios are possible, meaning that there is a range of alternative costs (approximate values) associated with each item or service use. The average of these ranges will typically be less than an item’s retail price. Besides, an item borrowed from a library does not include the breadth of rights and conveniences that item ownership does. So, it is a stretch to say that a patron always enjoys the same benefit from a borrowed item as from a purchased one.

Other problems with the calculators make their output suspect. For example, each time a patron renews an item or re-uses it in-house or online, the item’s retail price gets credited—again—to the library’s value totals. (Cha-ching!) On the other hand, when our Amazon.com shopper purchases a book at $75, that book’s value does not increase to $150, then to $225 and beyond each time the owner opens the book, or with each 3-week library loan period that passes.

Because the calculators tally only certain types of transactions, they end up painting a rather rosy picture of library performance. Consider the case of a patron who needs an item or service that is (really) not available from the library, and whose information need ultimately goes unmet. And the case of a service delivered that fails to meet a patron’s need, such as an unproductive reference consultation. The first case won’t be tallied at all by these calculators, and the second case will be tallied but will be significantly over-valued. (It will be considered a complete success.) Yet, the actual value of both of these patron transactions is negative and should be entered into these calculators this way. Unfortunately, the calculators’ designs do not accommodate this.

Given these problems and oversights, it is fairly obvious that these calculators produce exaggerated estimates of the benefits which libraries provide. Perhaps this exaggeration is only moderate or perhaps it is substantial—we cannot really tell for sure.

The calculators also underestimate the cost side of the equations, causing their benefit/cost ratios to be even more over-stated. They ignore several key costs incurred in delivering library materials and services, abacus120including expenses for information technology, equipment, building maintenance, utilities, and administrative overhead. These calculators also disregard the incidental costs that patrons may bear, like travel and parking costs, time lost due to item unavailability or poor service, usability difficulties encountered, and so on. In fact, NNLM’s calculator errors in the opposite direction: Assuming that libraries are always convenient, the calculator builds a patron time-savings factor into its formula. (I suppose you could enter in negative numbers to register patron lost time and inconvenience.)

When the calculators do recognize costs, they end up settling for data that are the grossest of estimates. For instance, users can enter estimated percent of total library staff time spent supporting access to materials or services. Creators of the calculators seem unaware that accurate benefit/cost ratios require meticulous collection of operational data, not just convenient guess-timates.

You will be hard-pressed to learn about these shortcomings from the materials that accompany library value calculators. Mostly, libraries receive general guidelines for entering data and encouragement to use the calculators without reservation. The library just keys in its data and—voilà!—receives an exact return-on-investment percentage or benefit/cost ratio right on the spot! Given the casual assumptions the calculations entail and the inexactness of the library’s input data, you’d think the final answer would at least include some type of margin-of-error disclaimer. Maybe something like this:

Your library’s benefit-to-cost ratio = $8.20 per $1.00 cost*

*    Based on our calculations, we are 95% confident that your library’s benefit/cost ratio is between $4.50 and $12.50 (per $1.00 cost). If your data are especially inaccurate, this range will be larger. Note that our single $8.20 estimate may be high due to assumptions our model uses.

Needless to say, this kind of small print doesn’t appear in the instructions that come with library value calculators. As they are, the calculators generate figures that are precise to the penny, with no other explanations to speak of. Libraries confidently report the figures to stakeholders as accurate, authoritative, and nearly approaching Scientific Truth. Of course, the figures are nothing of the sort.

Clones of these library calculators have sprouted up on dozens of library websites, where patrons are invited to enter their custom data to receive their own monthly “value of library services.” Costs are typically not mentioned, so that final value calculations are simple multiplications of counts times arbitrary and often fanciful retail price estimates. Of course, the nifty and optimistic totals will delight library patrons. The totals might even please the population of nonusers who are happy to subsidize library use by others as an overall benefit to the community or institution.

On a few public library websites the calculations are made even more tantalizing by informing patrons about their “individual return-on-investment”—how much value they gain for every tax dollar they contribute. (Don’t you just love democracy!) Unfortunately, this approach casts the wrong light on the public value of libraries. First, the figures are further exaggerations because they use per capita revenue data. Not every public library user pays taxes, a fact that makes the individually quoted return rates artificially high. (Instead of library tax revenue per capita, the calculations should use tax revenue per tax-paying household.)

Second, these seemingly benign “value” calculations actually hide information. The websites fail to provide overall return-on-investment rates for all tax-paying households or for all tuition-paying students. As I have already alluded, an individual patron’s rate of return is being subsidized by nonusers of library services. For every patron elated with his own personally-calculated rate, there will be several households or students whose return rates are less than $0, meaning they lose money on their library tax or tuition “investments.” (This mix of returns rates also applies to using the vanilla versions of the calculators that don’t bother to factor costs in.) Omitting this larger picture from these presentations is slanted and misleading—something that libraries should not be involved in.

From a public or institutional value perspective, these Library 2.0-inspired patron calculators CreatingPublicValue100 completely sidestep the rightful purpose of library evaluation. This purpose is to assess the extent to which the library provides value to the institution or community as a whole, not how each individual fares. This assessment must also confirm that products and services are equitably distributed, that is, equally available and accessible to all who wish or need to use them (see Creating Public Value by Mark H. Moore).

In actuality, economic valuation is not so simple as it appears. It involves complicated (and frustrating) concepts like exchange value, use value, contingent value, and others. Even business corporations have misgivings about standard return-on-investment analysis because of how difficult it is to obtain reliable data to input into the formulas.

If we want to use purely monetary estimates of the value of our services, we need more rigorous methods than these makeshift library calculators. This exact advice was offered to us a couple of years ago MeasLibValue100 by Donald Elliot and Glen and Leslie Holt in their book Measuring Your Library’s Value. Their work provides important guidance that we should be heeding. Like the fact that benefit/cost valuations are unique to the communities and institutions from whence they come. The figures are really not comparable across communities or for different libraries. This is something that most of us would not have thought about. The central message from their book, though, should already be obvious to us: We can’t just make these benefit/cost numbers up, the way these calculators do. There have to be sound theoretical and empirical bases for our findings.

Sure, quick-and-dirty estimations might be helpful in certain situations, as long as they are recognized for what they are. But the numbers gushing from these library calculators are nonsensical and disingenuous, in many cases. The whole idea has become an impediment to the real work of assessing library value. When the batteries in these little pocket library calculators wear out, I recommend that we just not replace them.

Categories: Library assessment · Measurement · Research

Shorter

July 20, 2009 · Leave a Comment

You may not want to spend time reading this blog post.  It’s rather long and drawn out and is likely to be dull.  And it gets kind of complicated. Besides, the graphics are sparse and uninteresting. Plus there’s no video.

Grant Wood American GothicInstead, you might appreciate some other informational experience better, one that happens also to be thoroughly cool and engaging. Like Facebook walls or those omnitemporal slice-of-life Twitter tweets.

This post definitely is not slice-of-life. Hardly. It is conceptual, meaning that it is mostly tedious and definitely time-consuming.  It entails plodding through the text to see if any of the ideas make any sense. And even if they do, you have to figure out whether they are at all relevant. Worse, the topic could be one of those god-awfully amorphous ones that have no clear, calculatable bottom lines—like conundrums or Zen Buddhist koans.

Well, since you’re reading this paragraph, you must have free time on your hands.  So, I’ll tell you that the title of this post is from a National Public Radio essay by commentator Mark Allen.  Allen recounts how his boss insists that Allen send him only brief, concise email messages.  The boss apparently realizes that life is too short to get bogged down in details.  Or, God forbid, in the subtleties of precision, meaning, and context. Too much information. Shorter. Allen says that for people like his boss who subscribe to the Utopian vision of Life 2.0, “speed and brevity are obviously more important than facts, words, or information.”

Bridge18th_120Nowadays it is a social faux pas to communicate in long sentences with colleagues, friends, and family.  It’s self-indulgent, counter-productive, and so 20th century!  (Actually, I like to think of it as so 18th century since that’s when expository writing actually sprang up.)

Every so often, though, brevity and simple-minded factoids end up being extremely dangerous. I am thinking of the 2003 Columbia space shuttle accident that killed seven astronauts and crippled the NASA shuttle program.  The (I apologize) details about the role that sound-bite-like thinking played in this tragedy can be seen in the thoughtful work of data-presentation expert Edward Tufte.

Bottom line—the format in which information is presented has a gigantic effect on the information itself.  Marshall McLuhan’s famous quote ‘The medium is the message’ said essentially this.  Bottom lines filter out lots of information and it is never clear what crucial data have gotten omitted.  (Listen to the NPR audio to hear how the print version strips out information that is otherwise embedded in the single spoken word “shorter.”) In the case of textual information, simplified formats lead to simplified information.  Complicated ones enable the presentation of more complex and richer information.

Thankfully, the engineering details of space shuttle systems can be fairly well specified, as Tufte points out.  The task just requires ample formats for text, formulas, performance data, and diagrams that permit the exploration of the information, including its obvious and latent interrelationships.  And, of course, a commitment to studying and analyzing the information systematically.

TuftePPcoverPutting too many time and space restrictions on information distorts the information. But, as Allen notes, managers on a mission want bottom line answers.  They inhabit the world of perpetual motion and decisive action—not contemplation and analysis.  When a manager is seeking a tree, the forest can only be an aggravation.

Tufte has an almost scriptural response to the temptation to oversimplify important phenomena:  “It’s more complicated than that.” Which I will supplement with this verse: “Woe to the manager who under-contemplates a really important decision.”

All too often Tufte’s adage applies to informational practices in businesses and in public institutions, including libraries.  Short, over-simplified answers typically misrepresent the real situation. And they tend to justify the conduct of business as usual.  Responsible and effective public management (that is, stewardship of the public’s resources), however, requires a commitment to analyzing and digesting operational, performance, and environmental data, recognizing where informational gaps exist, identifying possible connections, looking for underlying logic, structure, and trends, and determining what relevant conclusions or generalizations can justifiably be drawn from these details. 

But all of this is a big hassle when there is more pressing work to be done.  Work like hunkering down to absorb library budget cuts, re-allocate staff, pare down materials costs, pay utility bills, deal with unions, and so on.  When we have more time, we’ll study our data to inform our decisions, and maybe even refine what we collect.  But right now we’re in a time crunch!

Categories: Library assessment · Measurement
Tagged: , , , ,

Down to Business

June 9, 2009 · 1 Comment

The links between elaborate economic models and reality can be downright mysterious! In 1959 one economist described economic models this way:

“Econometric theory is like an exquisitely balanced French recipe, spelling out precisely with how many turns to mix the sauce, how many carats of spice to add, and for how many milliseconds to bake the mixture at exactly 474 degrees of temperature. But when the statistical cook turns to raw winebottle1materials, he finds that hearts of cactus fruit are unavailable, so he substitutes chunks of cantaloupe; where the recipe calls for vermicelli he uses shredded wheat; and he substitutes green garment die for curry, ping-pong balls for turtle’s eggs, and, for Chalifougnac vintage 1883, a can of turpentine.”     Stefan Valavanis – quoted in Kennedy, P., 2008. A Guide to Econometrics, 5th ed., p. 2.)

Valavanis’ main concern is the quality of empirical data that economists introduce into their models—the classic garbage-in/garbage-out problem. But the larger point is that reconciling economic theory with reality has not been a particularly strong suit for the field of economics. Quite a lot of economic theory must be accepted on faith. These leaps-of-faith are basically assumptions—and some of these assumptions are dubious, to put it mildly. (For more details see Debunking Economics: The Naked Emperor of the Social Sciences by Steve Keen and Economics as Religion: From Samuelson to Chicago and Beyond by Robert H. Nelson.)

Even so, in recent years there has been a movement among library, arts and cultural organizations to enhance their public images by advertising purely economic benefits that their organizations ostensibly produce. This practice became especially popular when the theories of economist Richard Florida appeared in his book, The Rise of the Creative Class, and in prestigious periodicals like the Harvard Business Review. Some economists later concluded that his thesis was a bit of a stretch. (See Lang, R. & K. Danielson, eds. 2005. Review Roundtable: Cities and the Creative Class. Journal of the American Planning Association, 71(2), 203-220.) In the meantime, arts and cultural organizations took the bait and began promoting their institutions as mini-economic powerhouses.

This advocacy strategy continues to be in vogue, for instance, in a reportOAC issued by the Ohio Arts Council, a state agency that funds and promotes the arts statewide. The report is based on economic modeling software that uses a technique originally developed in the 1960’s known as “input/output analysis.” Economists enter various economic data from federal and other sources into the modeling software. And—presto!—the model formulas produce a detailed breakdown of economic impacts that occur within the region of interest (in this case, Ohio). Extrapolating from whatever data the researchers feed it, the software identifies both “direct” and “indirect” impacts a given industrial sector is likely to have on other sectors. These figures indicate how the invisible hand of the market magically multiplies dollars and spreads them around.

The final product of this whole process is a total amount of economic impact that the model says a given industry produces. So, the Ohio Arts Council report declares confidently that Ohio’s “creative industries contribute more than $25 billion to Ohio’s economy annually” (p.8). Of course, this figure depends on the set of assumptions that the economist(s) and the model make. Other economic cooks using different recipes will come up with different figures.

The Ohio report includes an appendix listing about 500 industrial categories that these economic effects spread to. Here’s a selected list of the categories and dollar impacts that are supposedly produced by Ohio’s arts and culture sector:

OhioArtsImp

Maybe  it is true that arts and cultural organizations bolster oilseed farming  and ornamental metal work manufacturing and keep bowling-aloners occupied. We might well be happy to hear news like this.

But here’s the problem: Other economic sectors may also be capable of producing these same economic effects. Say for the sake of argument that government sponsored day care centers could have an equivalent economic impact, or that Ohio’s unpopular vehicle emissions testing (E-Check) program could. What incentives are there, then, for Ohio to invest in arts and culture—or in public and academic libraries—rather than these other industries? And what if an industry, say gambling (the Ohio Lottery, race tracks, not-for-profit Bingo and raffles, etc.), produces even greater economic impacts? Shouldn’t Ohio then divert arts and culture dollars to the gambling sector?

monopoly$When cultural and library organizations are viewed as mere stimulants to the economy, they are no longer distinguishable from rival public or private economic “engines”—daycare programs, E-Check, gambling, amusement parks, tattoo parlors, and so on.

In the short run, funders and supporters of library, arts, and cultural organizations may be pleased by these glowing economic impact reports. However, making the almighty economic-bang-for-the-buck a primary measure of value is a mistake. It completely misrepresents the fruits that these organizations bear, which are by definition societal and cultural and therefore very difficult to quantify. Recognition of the value of libraries and cultural organizations should not be relegated to econometrics and accountancy. Our advocacy campaigns need to move beyond these fanciful and inadequate portrayals of these institutions as “strictly-business.”

Categories: Advocacy · Measurement
Tagged: , , , ,

Objects in Mirror Are Closer Than They Appear

May 1, 2009 · 3 Comments

ObjectsInMirror130In January my brother and I were laying laminate flooring in his house.  Each time we needed to trim a plank, we stood reverently by his table saw and incanted the familiar carpenter’s adage, “Measure twice, cut once. (Amen.)”  My brother said, “It’s the damnedest thing. You can repeat and repeat a measurement, and then find out it is still wrong.” As an electrical engineer (he’s working on the 3rd edition of his book on digital signal processing), his observation comes from dozens of real-life technical projects.

In the behavioral sciences as well as in program evaluation and performance assessment we attempt to measure fairly abstract things—like social class, anxiety, customer loyalty, community need, awareness of services, and so on. Measuring these is difficult. But even in the “hard” sciences measurement is a continuous challenge.

So, I want to write about what statisticians call measurement error. And I might as well start right off with a rather advanced idea:  Measurement is about reducing error. We try to be systematic in our measures to increase accuracy, and minimize error in the final measurements. The thing is, we are never 100% successful in this. And, truthfully, we hardly ever know how successful we have been. Our only hope is to keep refining our methods and measures to try to eliminate those sources of error we know about.

Given this, we in librarianship really need to discard the naive idea that we can obtain “hard facts and figures,” an idea bandied about most often in the field of business management.  I suggest that we not look to MBA’s and business consultants for advice on this topic (and we should especially avoid accountants and efficiency experts). Instead, I believe we will find the measurement approaches used in physical, natural, behavioral and statistical sciences to be more fruitful.

Dial90Rather than looking at measurement as producing facts, it is perhaps better to say it produces impressions. And impressions will vary on dimensions like accuracy (precision), breadth (scope), and validity (relevance).

Let’s look at the last of these—validity.  How faithfully a given measure or indicator reflects something we are interested in understanding. Say we want to determine how satisfied customers are. Let’s assume that satisfaction is  both an attitude and a feeling customers have. But we cannot actually tap these things directly. We can only get hints—indications, we say—of these. Usually we do this by interviewing customers or having them complete questionnaires. Yet, there will always be a disconnect between how customers answer questions and their real, internal level of satisfaction with our products and services. This disconnect is a form of error.

Our instruments just are not refined enough to get at the real-life phenomenon we are interested in. So we get only a taste of one aspect of the phenomenon. For instance, the field of business uses reported customer intent to recommend products or services to friends as an indicator of satisfaction. (Actually, the field is more attuned to the evil twin satisfaction indicator known as “negative word-of-mouth behavior!”)

But, suppose that—miraculously—we do develop an advanced instrument that perfectly detects the entire range of customer Spockattitudes and feelings that form “satisfaction.” Say we are able to create a mind probe! Even with this perfect instrument, other factors can make our measurements inaccurate. In other words, each time we measure something—even with the most proven measurement instruments—extraneous things interfere. A subject we are measuring may be distracted due to a high                     Or mind meld? caffeine level in his blood. Or an electric voltage spike overnight may have thrown off the sensitivity of the probe. Silly examples, I know, but the point is we don’t know what this myriad of interfering factors might be.

Statisticians view any given measurement as being a sum of two numbers:  First, a  true, valid number (in units we understand) reflecting what we are interested in. Second, another number that reflects how weird circumstances and other factors have “spun” the final measure to make it slightly, moderately, or even grossly out of whack. This second number is “error.” You can see this idea illustrated here and described further here.

Obviously, there is much more to this topic. But the idea is that we strive to produce accurate measurements so that our final numbers are mostly true. So repeat after me, “There are no such things as hard facts and figures. There are no such things as hard facts and figures. There are no such things…”

Categories: Measurement · Research
Tagged: , , , ,

Poor WebJunction Survey Design Makes Findings Pretty Much Useless

April 22, 2009 · 1 Comment

This week I noticed that WebJunction is conducting a survey entitled “Technology Competencies Evaluation.”  I think this must be a sequel to a survey I saw there last month about “management core competencies.”  While the surveys are probably marketing research for WebJunction’s e-learning product line, the researchers say they want to use the data to “establish a baseline for the library field.” Thus, they do profess an interest in identifying larger and, we might conclude, non-commercial trends within the library profession.

Whatever their intentions, the surveys won’t produce much reliable information due to poor designs. First, neither questionnaire actually assesses questionnaire130competencies, that is, knowledge or skill levels. Instead, they measure respondents’ opinions about their own knowledge and skills in a dozen or so training topics. So, any baselines WebJunction comes up with will be merely about current opinions which would later be compared to some subsequent set of opinions.

So, what will they learn?  At the most, they can determine whether library staff believe they are more (or less) knowledgeable over time. That type of information, while mildly interesting, seems beside the point. Wouldn’t it make more sense to measure competencies as compared to some minimum acceptable levels, the way that IT certification or professional licensure exams do?  Later, perhaps, it might be useful to compare these over time, but that would not be as significant as a comparison of skills and knowledge to well-thought-out minimum standards.

Second, information from these surveys is compromised by the sampling method that WebJunction researchers have chosen.  They use what is called convenience sampling. marssoilsample160Rather than using some more systematic method (i.e., random sampling), they get respondents where and when it is convenient. This method severely limits the usefulness of study results. Because the respondents are self-selected, their responses will, in all likelihood, differ from the larger population of library        Soil sampling on Mars.
staff the researchers are interested in. That is, the                (NASA/JPL/Univ. Ariz.)
findings will be biased.

Suppose that mostly tech savvy librarians tend to take the surveys. Then, levels of self-reported competency will be artificially higher than the larger population of librarians and library staff overall. Or perhaps the opposite is true, that respondents tend to be mostly tech un-savvy non-librarian staff. Either way, allowing respondents to self-select introduces a troublesome and typically unknown slant, making results biased and misleading. Statisticians describe this situation by saying that “results cannot be generalized to the larger population of interest.”  This research validity issue–called external validity–is a central concern in behavioral and marketing research methods.

Convenience sampling also hampers the baseline comparisons WebJunction talks about. Without making sure they have representative samples of the larger population of interest, there is no way to know whether differences between baseline measures from this month’s survey and later surveys are bogus.  Perhaps the original (baseline) respondents were very tech-savvy, and the future (comparison) respondents are not at all. In this case the researchers will be comparing two non-equivalent groups’ opinions. This will lead to incorrect conclusions about apparent changes in opinions of the larger population of library staff over time. It may be that overall library staff opinions have remained unchanged even though the two samples—baseline and later comparison—differ quite a bit.

Producing assessment data represents a big investment of time, effort, and expense. Data collection methods need to be designed to produce maximally reliable and valid information in order to justify these costs. balance110Spending researcher and respondent time on surveys that can only produce questionable results is a poor use of library resources. Also, researchers should never portray findings from studies that use poor designs as if they were fair and balanced depictions of the subjects being studied. That would be mis-information, indeed!

Categories: Measurement · Research
Tagged: , , ,

Ain’t Misbehavin’! Uneven LJ Index Score Ranges Are More Informative

April 11, 2009 · Leave a Comment

I want to explain why LJ Index scores are not well-behaved. That is, why they don’t conform to neat and tidy intervals the way HAPLR scores range from about 30 to 930. HAPLR scores fall into a predictable range because they are built on percentiles. Any given library’s score is a sum of 15 percentile rankings, one for each statistical item HAPLR uses (like circulation per visit). As you probably know, percentiles range from from 99th down to zero(th).  (Nobody can be in the exact 100th percentile for reasons I’ll skip here.)  If a library ranks at the 99th percentile for all 15 HAPLR items, that library earns a score of 990. If it ranks at the lowest (0th) percentile for all of the items, then it gets a score of zero. In reality, libraries don’t get all high or all low rankings on the 15 HAPLR items—they get a mixture. So scores tend to stay corralled well within the 30 to to 930 range, as seen in this chart:

Typical Distribution of HAPLR Scores

Typical Distribution of HAPLR Scores

Note how translating library statistics into percentiles makes the distribution fairly even, with most libraries congregated towards the middle, and fewer towards edges of the chart. This is a direct result (an artifact, really) of using percentiles. There are other side effects of using percentiles in ratings, described in the article “Honorable Mention: What Public Library National Ratings Say” by Neal Kaske and me in the Nov/Dec 2008 issue of Public Libraries.

Real library statistical data aren’t as neat and tidy–nor as evenly distributed–as percentiles. Within any given peer comparison group (i.e., expenditure categories that LJ Index uses or population groups HAPLR uses) the data can vary widely. The two charts below show how this works for two statistical indicators, circulation per capita and visits per capita:

Distribution of Circulation per Capita for LJ Index $400K Peer Group

Distribution of Circulation per Capita for LJ Index $400K Peer Group

Distribution of Visits per Capita for LJ Index $400K Peer Group

Distribution of Visits per Capita for LJ Index $400K Peer Group

Obviously, most libraries’ statistics cluster towards the lower values at the left edge of the charts. A few libraries have statistics much higher than the rest of the group, as seen by the flattened bars extending rightward on the charts.

LJ Index was designed to reflect real patterns in library statistics. The scores match how the data really behave. The calculation methods we use preserve the real statistical values, no matter how low, medium, or high they are. Before I show this, let me reiterate that percentiles don’t do this. Instead, percentiles lose information about library statistics. Here’s why:  Knowing that a couple placed 1st, 2nd, 3rd, or 4th on Dancing with the Stars does not tell you how many points judges awarded them. One week the top four couples’ final scores might be very close to each other, another week one couple may out-score the rest by several points. With ranks alone we can’t tell the difference between these two weeks. This is because ranks—and percentiles—contain very little information about the actual scores they represent.

OK, enough bashing percentiles–they have other good uses, but are a big disadvantage when calculating ratings. Since the LJ Index faithfully tracks actual behavior of library statistics, the scores tend to cluster the same way that library statistics do:

Distribution of LJ Index Scores for $400K Peer Group

Distribution of LJ Index Scores for $400K Peer Group

The LJ Index is more informative than percentile-based rankings. This increased information does help us move a few steps forward. Of course, “all news is not good news.”  And this approach does bring other issues to light. Seeing the data more clearly helps us recognize its weaknesses and potential misbehavior. Especially, there is the challenging problem of the validity of very high per capita statistics (called outliers in statistical jargon). These can occur, for instance, due to very small service area populations, when libraries serve a population well beyond its official service boundaries, due to errors in data collection or reporting, or for other reasons.

This is the one of the reasons that the LJ Index team decided to de-emphasize the scores and group the top-rated libraries into “star” categories. As the Library Journal article explains, the scores are not precise and there is a lot of “noise” in the underlying data. Better to take a more bird’s-eye view of how libraries are arranged than to take exact scores too literally…or numerally!

Categories: Measurement · Research
Tagged: , , , , ,