Do No Quantitative Harm

Every measurement and every real-world number is a little bit fuzzy, a little bit uncertain. It is an imperfect reflection of reality. A number is always impure: it is an admixture of truth, error, and uncertainty.
Charles Seife, Proofiness: How You Are Being Fooled by the Numbers

Seife explains that the most well-conceived measures and carefully collected data are still imperfect. In real life situations where measurement designs are far from ideal and data collection is messy, the numbers are even more imperfect. The challenge for library assessment and research professionals is making sure our study designs and measures don’t make things any worse than they already are. To the best of our abilities we should strive to do no harm to the data.

Sharpening our skills in quantitative reasoning/numeracy will help make sure our measures aren’t exacerbating the situation. Here I’m continuing a quantitative exercise begun in my Nov. 9 post about a library return-on-investment (ROI) white paper connected with the LibValue project. In the post I explained that the ROI formula substantially exaggerated library benefits, a quantitative side-effect I suspect the researchers weren’t aware of.

A caveat before proceeding: Quantitative reasoning is not for people expecting quick and simple takeaways. Or those seeking confirmation of their pre-conceived notions. Quantitative reasoning is about thoroughness. It involves systematic thinking and lots of it! (That’s why this post is so long.)

This exercise involves one remaining component in the LibValue ROI formula I didn’t get to in my prior post. I can tell you up front that, measurement-wise, this component makes things worse. By that I mean it detracts from a sound estimate of library ROI rather than enhancing it. To see why let’s revisit the entire formula shown here with sample data:


LibValue White Paper ROI Formula with Sample Data.  Click image for formula only.

The expression circled in blue is the component we’ll be looking at. Let’s begin by first considering how this expression works arithmetically. After that we’ll explore the meanings of the specific measures in the expression.

You can see that the expression is a fraction containing multiple terms embedded in the larger fraction that is the entire ROI formula. In formula terms alone the blue-circled fraction looks like this:


Blue-Circled Fraction in Formula Terms

This fraction and its blue-circled version above are rates. A rate is a fraction where the numerator and denominator have different units of measure such as miles per gallon, influenza cases per 1000 adults, or library return per dollar invested. So, for the remainder of this post I’ll use the term formula rate to refer to the fraction shown here. And I’ll use rates for other fractions we encounter that meet the definition just given.

In the larger ROI formula the formula rate serves as an adjustment to the other two terms to its right ($101,596 X 1128 in the blue-circled fraction). As explained in my prior post, these two terms multiplied together are equivalent to a university’s total grant income. Total grant income, in turn, is used in the LibValue ROI formula to reflect library return/earnings (benefits).

Whether the formula rate ends up increasing or decreasing library benefits depends on each university’s data. For now, we can simply note that, based on the 8 universities studied in the white paper, the value of this rate hovers around 0.45. Its median value is 0.443 or 4/9. So, it typically decreases the value of the rest of the formula numerator by about 55%.

The formula rate is made up of two products, one in the numerator and one in the denominator, which follow this pattern:


It also happens that the rate can be re-expressed as the product of two separate rates like this:

LibValue Ratio as 2 Separate Ratios

Formula Rate Expressed as Two Separate Rates

To identify these let’s call the left rate awards/proposals rate and the right one percentages rate. Using sample data from the blue-circled fraction the equations below demonstrate that the expression consisting of two separate rates (upper equation) is equivalent to the original formula rate (lower equation):


The two-rates (upper) and formula rate (lower) expressions are arithmetically equivalent.
Click for larger image.

Now let’s see how these two separate rates behave for the 8 universities surveyed in the white paper. Take a look at this dot plot:


Dot Plot of Awards/Proposals and Percentages Rates.  Click for larger image.

Note how the percentages rates (red markers) are high and stay within the range from .95 to 1.11, whereas the awards/proposals rates (green markers) are lower with a broader range, from .006 to .70.

Because the percentages rates hover so close to 1.0 their individual percentages essentially cancel each other out. Recall that multiplying any number by 1.0 yields the number itself. This is what happens when the awards/proposals rates are multiplied by percentages rates so close to 1.0. For example, in the upper calculation shown above the awards/proposals rate (.577 or .58 rounded) is multiplied by 1.01 resulting in .583 (or .58 rounded). This multiplication does almost nothing. So, in this case the formula rate can be simplified to:


When the percentages rate numerator and denominator are close, the formula rate
can be simplified to just the awards/proposals term.

The near equivalence of the formula rate and the awards/proposals rate alone is evident in these dot plots:


Dot Plots of Formula & Awards/Proposals Rates. Lower chart has decreased vertical axis.
Click either chart for larger image.

To provide a closer view of the gaps between the dots (markers) the lower plot omits university 3. Decreasing the axis range widens the scale units. In the plots you can see that the two rates stay close together university by university. So close that in the top plot 6 of the 8 marker pairs overlap. We can also gauge the closeness of these two rates by computing the differences between them as I’ve done here:


Comparison of Universities’ Formula and (Simplified) Awards/Proposals Rates.  Click for larger image.

The differences appear in row 3. The abbreviation Abs indicates that these are absolute numbers. Since we’re interested in any difference between the rates, the sign of the difference doesn’t matter. Row 4 gives the difference in row 3 as a percent of the formula value (row 1).

These differences (gaps) are also plotted here as percentages, sorted low to high:


Gaps in Universities’ Percentage Rate Terms.  Click for larger image.

Notice that for 4 of the 8 universities the gaps are 1.5% or less. For these universities, as I said, the percentages rate can be omitted because it’s inconsequential. For the other 4 universities the rate may be important. However, this does raise the question about whether these measures are worth the time and expense to collect.1  Answering this requires a larger and more representative sample than the white paper had.

There’s something else interesting to be seen in the pair of dot plots shown above. For 4 of the universities the formula rate equals or exceeds the awards/proposals rate. For the 4 other universities the opposite is true. (I’ll leave it to the reader to determine whether either of these sets of 4 universities matches the 4 mentioned as having gaps between the rates of 1.5% or lower.) This diagram explains the two scenarios:


How Percentages Rate Affects Value of Formula Rate.  Click for larger image.

Again, in the LibValue ROI formula the formula rate adjusts total grant income downward, typically by 55% for the sample of 8 universities in the white paper. (We can’t say whether or not this is true for the larger population of universities without a representative sample.) As seen in the dot plots above, the awards/proposals rate primarily determines the final value of the formula rate while the percentages rate plays a minor role.

Still, there’s another problem with the percentages rate that needs addressed in spite of its minor role. This has to do with a specific arithmetic behavior of the two rates. The awards/proposals rate can never exceed 1.0 (100%) since this measure was defined by the white paper researchers as a part of a whole—how many proposals succeeded (grants awarded) per grant proposal submitted. On the other hand, the percentages rate can exceed 1.0 and does so as we saw in the diagram above.

The percentages rate’s ability to exceed 1.0 has an unintended effect. All else being roughly equal, universities with a lower % of proposals that contain citations obtained through the library (the percentages rate denominator) earn higher library benefit estimates! That is, for these universities the final adjustment to total income is relatively higher than for others. Assigning higher library benefits to universities that use fewer library resources is inconsistent with the idea that making use of more library resources would produce more benefit.

While this unintended effect hinges on the arithmetic behavior of the formula rate, it involves the meanings of the individual measures as well. Since we have yet to address what the ROI formula measures mean, let’s begin this by re-visiting the original formula rate:


Formula Rate.  Click for larger image.

Looking still at the percentages, it seems the white paper researchers intended that each would modify—lessen, actually—the term to its left. Though they didn’t explain their measurement design decisions in detail, the researchers did say that not all grant proposal successes should automatically be attributed to library resources used. I’m presuming they weren’t comfortable giving libraries credit for grant successes that had nothing to do with use of library resources. So, they sought to dampen the awards and proposals counts some. Thus, % of faculty who say that citations are important to grant awards lessens number of grant awards and % of proposals that contain citations obtained through the library lessens number of grant proposals.

Their decision led to compromises that diminished the validity of the LibValue study findings. This is nothing unusual, of course. Every assessment, evaluation, and research project involves methodological compromises. Researchers just need to recognize them, understand their implications, and inform readers about the implications in the final reports. Unfortunately, this doesn’t always happen. When it doesn’t, it’s up to the astute (and quantitatively skilled!) reader to try to sort things out.

So let’s sort through the compromises the white paper researchers made. First, the measures needed for the formula rate numerator were counts of awards where citations contributed to awards success and total awards counts. Unfortunately, these counts were not available to the researchers. So, they relied on faculty opinions as a substitute. This substitute is the percentages rate measure we’re already familiar with, % of faculty who say that citations are important to grant awards.

But think about it. It would be mere happenstance for the proportion of faculty who believed that citations contribute to grant success to match the proportion of grant awards where citations actually did contribute. The fact that these two measures have different units—faculty members versus grant awards—is the first clue that it’s a stretch to equate them. More important, faculty perceptions are a poor approximation of actual grant awards decisions since faculty perceptions will tend to be stable over time compared to awards decisions. Therefore, the accuracy of this faculty opinions measure is questionable. One of the main tenets of assessment and evaluation is relying upon direct measures rather than upon impressionistic opinions about what these measures might be. Here faculty opinions are indirect and impressionistic measures.

The second compromise concerns the formula rate denominator. Here the measurement needed would be counts of grant proposals containing citations obtained from the library. Again, the researchers did not have access to these counts. So, they devised a completely different measure and labeled it as if it were the counts just described. What they actually collected were faculty opinions about % of citations in grant proposals, grant reports, and/or published articles accessed by faculty via university computer network or via the library.2  Note that the unit of measure for the measure collected is citations whereas for the named measure it is proposals.

An example should help clarify this mismatch: Suppose at one university faculty proposals, reports, and published articles contained a total of 10,000 citations and the faculty estimated that 96% of these were obtained from the library. This means that the 96% refers to 9,600 citations. Now suppose the university also submitted 1,000 grant proposals that year. Applying the 96% to the named measure—% of proposals that contain citations obtained through the library—amounts to 960 proposals containing citations obtained from the library. In any given year or at any given university the proportions of library citations (among all citations) and grant proposals containing library citations (among all proposals) could, by random luck, be the same. But this lucky match would not occur every year or at every university. And, of course, the percentages researchers collected were not based on actual citation counts but on rough faculty estimates.

Again, this is a case of collecting impressionistic data rather than direct measurements. Substituting less-than-ideal estimates for unavailable measures may be justified. But relabeling a measure to make it appear to be something that it is not is deceptive.

There’s a third possible compromise apparent in the ROI formula. This has to do with the meaning researchers ascribed to the formula rate. Take a look at this description of the University of Illinois at Urbana-Champaign (UIUC) ROI model appearing in the LibValue white paper:


Depiction of University of Illinois at Urbana-Champaign ROI Model.3  Click for larger image.

As explained in my prior post, the white paper ROI model is a near clone of the UIUC ROI model with one exception I describe below.4  In the UIUC model shown here note the bold text, Percentage of proposals that are successful and use citations obtained by the library. This text labels the rate (fraction) just below it, which is the formula rate we’ve been discussing all along.

The bold label describes the joint probability that a grant proposal was successful and also contained citations from the library. I’ll leave it to the reader to compare how joint probabilities are calculated with the formula rate calculation. In the meantime, let me just say this: If the label describes what researchers meant to measure, then the formula rate calculation is incorrectly specified. If the researchers purposely substituted an alternative calculation for a joint probability, then their ROI formula is another step removed from a valid measure of library ROI.

Finally, we need to consider the meaning of the left part of the ROI formula rate, the awards/proposals rate. You probably have surmised that this rate is essentially the university’s grant application success rate—the number of grant awards won per grant proposal submitted. As seen already, it is mainly the awards/proposals rate that determines how much total grant income is lessened to obtain the final library benefit estimate.

So, let’s think about how this works. Say two universities each earned $10 million in grant income. According to the ROI formula the library benefit estimate is $10 million with some adjustment (usually) downward based on the formula rate. Suppose the first university submitted 10 grant proposals and 5 of these were funded. And the second submitted 100 proposals and 20 were funded. The success rates for the universities would be 50% and 20% respectively. Thus, the second university’s earnings end up adjusted downward by 80% (which is what a 20% formula rate does) while the first is adjusted downward by 50%.

Now consider a third university that submitted just 1 grant proposal which succeeded in obtaining a single grant award of $10 million. The ROI formula credits this university with library benefits equal to the full $10 million (100%). But why should one university deserve the full $10 million credit while others earn considerably less? Actually, there is no good reason for these gradations. The monetary value of grant income/revenue is whatever it is, and is tracked in university financial accounting systems the same, regardless of how it was or was not earned. Grant success rates have nothing at all to do with accurately tallying grant income. Since the ROI model defines library benefit as total grant income, then the benefit equals the total money received.

Despite all of the time we’ve spent here deciphering the formula rate, this rate should never have been part of the ROI formula in the first place.5  Its inclusion made the formula worse rather than better.

Now it may be that accurate measurement requires adjusting gross grant income figures somewhat. But these adjustments should be based on possible causes of inaccuracy such as over- or under-reporting, record-keeping errors, inflation, and so on. Adjustments based upon extraneous characteristics of the universities won’t produce accurate results. These just adulterate the measures and make them difficult to interpret. Which brings up an interesting question: Just what does it mean to give universities with higher success rates more credit for the money earned? What is the meaning of library benefits that have been filtered this way?

The LibValue researchers came close to recognizing the formula rate didn’t belong in the ROI formula. Take a look at the first bold label in the UIUC formula shown above, Percentage of faculty who use citations in grant proposals who are also PI’s. The researchers rejected this rate, writing “The percentage of faculty who are principal investigators [PI’s] has no bearing on the library’s ROI.”6  (Leaving this rate out of their ROI formula is the only alteration to the UIUC ROI model made by the LibValue researchers.) They didn’t realize that their statement applied equally to the next part of the UIUC formula, the formula rate.

Let me finish now by acknowledging the elephant in the library. The biggest weakness in the UIUC and LibValue ROI models is equating library benefits with significant portions of university grant earnings. The quick example I used above involved portions between 20% and 100%. But those did not include cases where the ROI formula exaggerated grant income significantly (see my prior post). If we include exaggerated income estimates with the formula rate adjustment, the library benefits specified by the ROI model range from 40% to 225% of total grant income, with an average benefit estimate of 110% and median of 78% (based on the white paper data).

It must be quite a surprise to university teams who pored and sweated over proposal drafts—principle investigators, faculty collaborators and colleagues, graduate students, grants offices, support staff, and others—to learn that libraries feel justified in claiming something like 80% or 110% credit for the teams’ work. Leaving grant creators with a small or negative share.

Obviously, this allotment isn’t reasonable. It is much more likely that a library’s contribution to grant earnings would be single-digit percentages or less. Even a crude ratio of library citation pages to total proposals page counts would be, what, around 5%? If pages are weighted according to content and relevance, citations pages would be greatly outweighed. (Redundant, actually.)

Then there’s the question of credit for relevant content culled from cited articles, books, and other works. Do original creators get the credit? Or do grant authors deserve it for their ability to master, synthesize, and add to knowledge? Whichever way this might go, clearly the library’s role is tertiary. The library is basically the messenger. Of course, a messenger doesn’t deserved to be harmed for being a medium. But he shouldn’t take credit for the message either.

So how shall we gauge the value of libraries being the messenger (medium)? A difficult question, indeed. Its answer needs to be more credible than the white paper and UIUC ROI models.

1   For a measure to be meaningful it has to show some variation. Otherwise, it does not help us make useful distinctions between important subgroups in our population such as small academic, public university, and large private university libraries. A questionnaire item that elicits identical responses from all respondents probably isn’t measuring anything useful.
2   Tenopir et al. 2010. University Investment in the Library Phase II, p. 18. See Table 12. The percentages are estimates based on faculty recall rather than on actual citation counts.
3   Tenopir et al. 2010, p. 7.
4   See my prior post for more information about the UIUC model, including the fact that the model is specified differently in different sections of the article.
5   Studying the formula rate was worthwhile as a quantitative exercise, though. It involved some good ideas such as the importance of analyzing patterns in the data to see how the measure behaves.
6   Tenopir et al. 2010, p. 7.

One thought on “Do No Quantitative Harm

  1. Yes, you hit the nail on the head several times here, obviously this one – “It must be quite a surprise to university teams who pored and sweated over proposal drafts…to learn that libraries feel justified in claiming from 80% or 110% credit for the teams’ work.”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s