I decided to move right on to my first 2014 post without delay. The reason is the knot in my stomach that developed while viewing the Webjunction webinar on the University of Washington iSchool Impact Survey. The webinar, held last fall, presented a new survey tool designed for gathering data about how public library patrons make use of library technology and what benefits this use provides them.
Near the end of the webinar a participant asked whether the Impact Survey uses random sampling and whether results can be considered to be statistically representative. The presenter explained that the survey method is not statistically representative since it uses convenience sampling (a topic covered in my recent post). And she confirmed that the data only represent the respondents themselves. And that libraries will have no way of knowing whether the data provide an accurate description of their patrons or community.
Then she announced that this uncertainty and the whole topic of sampling were non-issues, saying, “It really doesn’t matter.” She urged attendees to set aside any worries they had about using data from unrepresentative samples… [Read more...]
We all know that the main function of libraries is to make information accessible in ways that satisfy user needs. Following Ranganathan’s Fourth Law of Library Science, library instructions guiding users to information must be clear and simple in order to save the user’s time. This is why library signage avoids exotic fonts, splashy decorations, and any embellishments that can muddle the intended message. Library service that wastes the user’s time is bad service.
So I am baffled by how lenient our profession is when it comes to muddled and unclear presentations of quantitative information in the form of data visualizations. We have yet to realize that the sorts of visualizations that are popular nowadays actually waste the user’s time—bigtime! As appealing as these visualizations may be, from an informational standpoint they violate Ranganathan’s Fourth Law.
Consider the data visualization shown below from the American Library Association’s (ALA) Digital Inclusion study:
ALA Digital Inclusion project national-level dashboard. Click to access original dashboard.
This visualization was designed to keep state data coordinators (staff at U.S. state libraries) informed. The coordinators were called upon… [Read more...]
I want to tell you about a group of U.S. public libraries that are powerhouses when it comes to providing services to the American public. You might suppose that I’m referring to the nation’s large urban and county systems that serve the densest populations with large collections and budgets. These are the libraries you’d expect to dominate national library statistics. However, there’s a group of libraries with modest means serving moderate size communities that are the unsung heroes in public library service provision. These are libraries with operating expenditures ranging from $1 million to $4.9 million.1 Due to their critical mass combined with their numbers (there are 1,424 of them) these unassuming libraries pack a wallop in the service delivery arena.
Their statistical story is an interesting one. Let me introduce it to you by means of the patchwork graphic below containing 6 charts known as treemaps.
Click to view larger graphic.
From a data visualization standpoint treemaps (and pie charts also) have certain drawbacks that were identified in my prior post.2 Still, treemaps do have their place when used judiciously. And their novelty and color are refreshing. So, let’s go with them!
1 Based on the Public Libraries in the United States Survey, 2011, Institute of Museum and Library Services.
2 It was Willard Brinton who identified the problem in his 1914 book. In my prior post scroll down to the sepia graphic of squares arranged laterally. There you see Brinton’s words, “The eye cannot fit one square into another on an area basis so as to get the correct ratio.” Bingo. With treemaps this is even more problematic since a single quantity in the data can be represented by different-shaped but equivalent rectangles—stubby ones or more elongated ones. You’ll see in the examples that it is impossible to visually determine which of two similarly-sized rectangles is larger. This difficulty also applies to pie wedges.
Even with the promises of big data, open data, and data hacking it is important to remember that having more data does not necessarily mean being more informed. The real value of data, whatever its quantity or scope, comes from the question(s) the data can help answer.
There are various reasons any given set of data might or might not provide reliable answers, the most basic being the data’s accuracy. Clever new technologies that scan, scrape, geocode, or mobilize loads of data aren’t much use if the data are wrong. All we end up with is scanned, scraped, geocoded, and mobilized misinformation. Garbage in-garbage out, as they say.
Getting from data to answers requires understanding the meaning of the data and its relevance to our questions. With statistical data much of this meaning and relevance depends on three particular ideas:
- How the data were selected
- The group/population researchers are trying to learn about
- How these two relate
I am here to tell you that if you master these ideas your statistical knowledge will immediately quadruple! [Read more...]
I think I’m getting jaded. I am beginning to wonder whether lobbying for balanced reporting of evaluation and research findings is a waste of time. With voices more influential than mine weighing in on the opposite side, I’m having trouble staying positive. Granted, I do find inspiration in the work of people much wiser than me who have confronted this issue. One such source is my favorite sociologist, Stanislav Andreski, who wrote the following in his book, Social Sciences as Sorcery:
In matters where uncertainty prevails and information is accepted mostly on trust, one is justified in trying to rouse the reading public to a more critical watchfulness by showing that in the study of human affairs evasion and deception are as a rule much more profitable than telling the truth.1
The problem is, wisdom like Andreski’s languishes on dusty library shelves and the dust-free shelves of the Open Library. Much more (dare I call it?) airtime goes to large and prestigious institutions that are comfortable spinning research results to suit their purposes.
Fortunately, I am not so demoralized as to pass up the opportunity to share yet another institution-stretching-the-truth-about-research-data story with you. This involves an evaluation project funded by the Robert Wood Johnson Foundation and conducted by Mathematica Policy Research and the John W. Gardner Center for Youth and Their Communities at Stanford University. [Read more]
It never hurts to revisit the basics of a method that we’ve chosen to apply to a task we want to accomplish or a problem needing solved. So, the recent announcement of the Library Edge benchmarks is a good occasion to discuss that particular performance assessment method. In the third edition of his book, Municipal Benchmarks, University of North Carolina professor David Ammons describes three types of benchmarking:1
1. Comparison of performance statistics
2. Visioning initiatives
3. “Best practices” benchmarking
The idea behind item #1 is that the sufficiency of an organization’s performance can be judged by comparing its performance data with other organizations or against externally defined standards. Comparisons of different organizations using only performance data, without any reference to standards, is called comparative performance measurement. An Urban Institute handbook of the same name by Elaine Morley, Scott Bryant, and Harry Hatry gives an in depth explanation of this method.2 [Read more...]
1 Ammons, D. N. (2012). Municipal benchmarks: Assessing local performance and establishing community standards, Armonk, NY: M.E. Sharpe, p. 15.
2 Morley, E., Bryant, S. P., & Hatry, H. P. (2001). Comparative performance measurement, Washington, DC: The Urban Institute Press.
To begin this episode I want to introduce you to a couple of historical ideas on best practices in graphical data presentation—or using the more modern term, data visualization. (The peculiar title I’ve chosen comes from this history. Read on to see what it means.) Then I’ll step through a redesign of a bar chart to show you how effective graphical simplicity can be.
In the 1980’s and 1990’s statistical graphing experts Edward Tufte, William Cleveland, and Howard Wainer were promoting fair and clear designs for statistical charts.1 Nearly seventy years earlier American engineer Willard C. Brinton was doing the same thing in his 1914 book, Graphic Methods for Presenting Facts. Here’s a figure from the book:
Source: Brinton (1914), Graphic Methods for Presenting Facts, p. 21
Note that in his figure Brinton advocated for “accuracy of statement.” He did the same in this next… [Read more]
1 A more recent and quite definitive book on the principles of best data visualization practice is the second edition of Stephen Few’s book, Show Me the Numbers.