I want to explain why LJ Index scores are not well-behaved. That is, why they don’t conform to neat and tidy intervals the way HAPLR scores range from about 30 to 930. HAPLR scores fall into a predictable range because they are built on percentiles. Any given library’s score is a sum of 15 percentile rankings, one for each statistical item HAPLR uses (like circulation per visit). As you probably know, percentiles range from from 99th down to zero(th). (Nobody can be in the exact 100th percentile for reasons I’ll skip here.) If a library ranks at the 99th percentile for all 15 HAPLR items, that library earns a score of 990. If it ranks at the lowest (0th) percentile for all of the items, then it gets a score of zero. In reality, libraries don’t get all high or all low rankings on the 15 HAPLR items—they get a mixture. So scores tend to stay corralled well within the 30 to to 930 range, as seen in this chart:
Note how translating library statistics into percentiles makes the distribution fairly even, with most libraries congregated towards the middle, and fewer towards edges of the chart. This is a direct result (an artifact, really) of using percentiles. There are other side effects of using percentiles in ratings, described in the article “Honorable Mention: What Public Library National Ratings Say” by Neal Kaske and me in the Nov/Dec 2008 issue of Public Libraries.
Real library statistical data aren’t as neat and tidy–nor as evenly distributed–as percentiles. Within any given peer comparison group (i.e., expenditure categories that LJ Index uses or population groups HAPLR uses) the data can vary widely. The two charts below show how this works for two statistical indicators, circulation per capita and visits per capita:
Obviously, most libraries’ statistics cluster towards the lower values at the left edge of the charts. A few libraries have statistics much higher than the rest of the group, as seen by the flattened bars extending rightward on the charts.
LJ Index was designed to reflect real patterns in library statistics. The scores match how the data really behave. The calculation methods we use preserve the real statistical values, no matter how low, medium, or high they are. Before I show this, let me reiterate that percentiles don’t do this. Instead, percentiles lose information about library statistics. Here’s why: Knowing that a couple placed 1st, 2nd, 3rd, or 4th on Dancing with the Stars does not tell you how many points judges awarded them. One week the top four couples’ final scores might be very close to each other, another week one couple may out-score the rest by several points. With ranks alone we can’t tell the difference between these two weeks. This is because ranks—and percentiles—contain very little information about the actual scores they represent.
OK, enough bashing percentiles–they have other good uses, but are a big disadvantage when calculating ratings. Since the LJ Index faithfully tracks actual behavior of library statistics, the scores tend to cluster the same way that library statistics do:
The LJ Index is more informative than percentile-based rankings. This increased information does help us move a few steps forward. Of course, “all news is not good news.” And this approach does bring other issues to light. Seeing the data more clearly helps us recognize its weaknesses and potential misbehavior. Especially, there is the challenging problem of the validity of very high per capita statistics (called outliers in statistical jargon). These can occur, for instance, due to very small service area populations, when libraries serve a population well beyond its official service boundaries, due to errors in data collection or reporting, or for other reasons.
This is the one of the reasons that the LJ Index team decided to de-emphasize the scores and group the top-rated libraries into “star” categories. As the Library Journal article explains, the scores are not precise and there is a lot of “noise” in the underlying data. Better to take a more bird’s-eye view of how libraries are arranged than to take exact scores too literally…or numerally!