Friday, January 09, 2009

Why Reference and Instruction Librarians Hate Federated Searching

Federated searching has often been billed by vendors as the Holy Grail in the age of Google. Perhaps this has raised our collective professional expectations of federated search products to an unreasonably high level, because its reception among reference and instruction librarians can be described as lukewarm at best. In this pool of dissatisfied librarians, I include my former reference and instruction librarian self, my esteemed reference and instruction colleagues at the Auraria Library, and, more scientifically, the respondents of a survey conducted by Lynn Lampert and Katherine Dabbour at California State University Northridge.1 The short answer for the lukewarm reception is that expectations of federated searching are built on expectations of the products that it is designed to federate. The longer answer to this question envelops these technical issues and also the dissonance between what federated searching purports to do and the pedagogical roles of reference and instruction staff in the library.

While much has changed in the realm of federated searching in recent years, there are still a handful of technical shortcomings that are hard to swallow. Note that the following discussion does not target any specific products here; my personal experiences with products and customer support with WebFeat (2003) and Serials Solutions 360 Search (2007) were both very positive, in spite of my initial distaste for federated searching generally. The shortcomings discussed here—whether real or perceived—are more or less endemic to the product regardless of the brand at this time. These include lack of features, inability to search all databases, speed, and unmet performance expectations.

Date and peer review filters are now standard on most online databases, and these features have, logically, become embedded in reference and instruction routines. While Serials Solutions’ 360 Search and other products are technically capable in and of themselves of applying these limits to searches, limitations in the metadata that database vendors can currently provide render date and peer-reviewed filters useless. Serials Solutions’ 360 Search currently supports peer-reviewed filtering, but technical support recommends avoiding it because filtering a list of results for peer-reviewed articles usually results in zero results. Further, users may include a single specific year in their search terms; however, searching a range of dates is not yet supported for the same reason that the peer-reviewed filter does not work. (To reiterate, these are not flaws in the 360 Search product itself but conditions of the current technology.)

Secondly, in spite of what product literature may claim, no federated searching product can—or in some cases, should—currently search every online resource. Again, this is not necessarily a reflection on the quality of the search product itself. Three common reasons that a database may be excluded from a federated search include: vendor prohibition, no existing “translator” for the product, or a limited number of concurrent users for a resource.

First, notable holdouts among vendors who do not permit their clients to federate some or all of their products include Hoover’s, InfoUSA, and content giant LexisNexis. Currently, libraries may offer only LexisNexis Academic through a federated interface. Libraries sold on the concept of federated search have expressed their dismay about exclusion to holdout vendors, and they have also considered looking for equivalent content in other online resources that will allow inclusion.

Secondly, the nature of the electronic resources market creates a demand for constant creation by federated search vendors of the “translators” that allow a resource to be processed in federated search. Therefore, while it may be technologically possible for the federated search product to work with a given resource, if the vendor has not developed a translator yet, that resource will be effectively wait-listed. If the resource is local or highly specialized, the client library may have to wait until more client libraries request inclusion of a resource to increase the demand for the translator. In the case of the Auraria Library, this meant exclusion of Prospector, the unified catalog for the Colorado Alliance of Research Libraries consortium, which is used heavily for interlibrary loan.

Third, technical support generally recommends that resources for which a library has a limited number of concurrent users be excluded from federated searching. If a subscription to a resource with limited concurrent users is included, it is unlikely that any user would ever be able to successfully access the product. “Use” of the database in this context begins not when a user clicks a link to an item in one of these resources, but when the resource is included in a federated search. In some cases, changing the subscription to the electronic resource to unlimited use may be an option; however, for other resources, this is prohibitively expensive. Some favorite high-quality resources of reference and instruction staff fall into this third category, as purchasing a resource with a low number of concurrent users may be a factor in licensing an expensive product at all.

Deciding how to handle all of the orphaned resources created by vendor exclusion, lack of an existing translator, or a limited number of concurrent users in a customized implementation of a federated search product can be quite difficult. Links to the excluded resources’ native interfaces can be included in an A-Z list within the federated environment or on appropriate subject guides or other pertinent web pages, but even so, they may be easily overlooked by patrons seeking a quicker—that is to say, federated—method of searching, such as selecting a bundle of resources grouped around a subject area, e.g., “Art & Architecture.” From a reference and instruction perspective, it can be difficult to market and encourage use of a new search feature that omits some of the best and most recommended resources.

A further issue is search speed. With the explosion in the number of online resources made available in recent years, many libraries have gradually outgrown their network infrastructure, and federated searching can, in a worst-case scenario, push network capabilities to the breaking point because of the increase in traffic it can cause. However, even in a healthy network environment, the simple fact that a federated search product is doing more work than a search in a database’s native interface also accounts for this extended search time. Waiting a minute or more for a search to grind away can create an awkward lull at the reference desk or in front of an information literacy session, which does little to instill confidence in the minds of the library staff and patrons.

Finally, there are behaviors with federated search products that are simply unexpected. A particular use of the software that sounds brilliant in theory sometimes does not prove effective in practice. For example, reference and instruction staff at Auraria were asked to draw up a list of ten or so resources that would be included in a general-focus “Quick Search” box on the Library’s home page. Eleven databases plus the library catalog were chosen for inclusion, and staff were excited by the potential of offering results to general queries from these resources from a search box on the home page. However, in practice, the result was disappointing. The results returned from the fastest resource were the results on top of the pile, and of the twelve resources chosen, PsycINFO routinely returned results first. Reference and instruction staff rightly felt that this skewed the results for a general query; therefore, the fate of this feature is under discussion. Perhaps expectations such as this are a bit unfair given the nature of the beast; however, anyone considering investigating federated searching would do well to manage expectations with library staff ahead of time by describing the above issues.

While these current technical shortcomings are a large part of the dissatisfaction in the reference and instruction department, there are philosophical and pedagogical issues as well. One of the primary concerns of reference and instruction staff is that federated searching dumbs down the research process, which is, of course, antithetical to the very existence of reference and instruction. All of the controlled vocabulary and carefully constructed indexes behind online resources are tossed out the window; the results returned from a certain resource via the federated interface may be of a lesser quality than those returned from a search in that resource’s native interface. In the words of a colleague, federated searching “removes many kinds of academic research drills and routines one or more steps from reality.”

Further, federated searching products bring no content into either the physical or virtual library. Reference and instruction librarians quite understandably crave content with which to fulfill their reference and instruction duties. With product price tags in the tens of thousands and budgets shriveling, buying a tool that does not stand up to staff expectations and brings no more content into the library seems foolish. Contrariwise, consider that libraries spend tens of thousands of dollars on online resources that have terribly unfriendly user interfaces, even for information professionals, yet are the sole online provider for crucial resources.

Finally, in terms of personal use—whether for conducting one’s own research or while assisting patrons—using federated searching feels a bit like putting the training wheels back onto the bicycle. Reference librarians know or can surmise which resources will likely yield good results for a given query, and they proceed to what they know are “the usual suspects” in the lineup of electronic resources. Because of this expertise, it is difficult to use federated searching instinctively in reference and instruction. While I am now a systems librarian and no longer work at the reference desk, an undergraduate asked me in passing recently how to find a “professional article” in a biology journal. I found myself directing her to the “biology” drop down menu in our homegrown directory of databases, ultimately offering her a choice of BioOne, BIOSIS, ScienceDirect, and Web of Science. I added as a footnote that she could search multiple resources simultaneously from the Biology subject guide—but not Web of Science because it was not included in our federated search setup because of our limited number of concurrent users. Not to mention the peer-reviewed filter issue so that she’d get a “professional” article. And there’s the rub: explaining the shortcomings of the federated search box on the Biology guide was more difficult than simply pointing her to a couple of “best bet” choices in the first place. Am I going to take that search box off of the Biology guide? No.

Given the above, why are federated search products still on the market, and why are libraries still contracting with vendors? What has changed my own mind in the last five years, transforming me from a reluctant community college reference librarian fighting it tooth and nail to a web librarian petitioning the university’s budget priorities committee for extra funds to pay for it? Two things: usability testing and developments in federated search products themselves.

Web usability testing, which is rightly becoming ubiquitous in libraries, has shown that patrons have vastly different mental models of the world of information than librarians. Personally, this became painfully clear while watching a graduate student at Georgetown University time out after three minutes while trying to decipher from the library’s home page where to find a scholarly article about Descartes. He vacillated between the links on the library’s home page for 180 agonizing seconds and, in the end, never made a choice.

The primary benefit of federated searching products at this time is their use as a discovery tool. Libraries have historically had difficulty marketing and presenting in an intuitive fashion what is the heart of the virtual library: subscribed electronic content, which accounts for the lion’s share of our annual budgets. While not inexpensive, federated search products are now offered with a number of pricing options, and even if a library chooses to federate as many resources as possible, the annual price tag will still likely be less that one percent of the total annual expenditure for electronic resources—a small price to pay for what can be a large return on a very large investment. Additionally, implementing federated searching as a discovery tool can restore patrons’ faith in their ability to find what they need when they come to their library’s web site versus an open Internet portal such as Google or Yahoo. Libraries will only continue to offer more online content in the coming years, and federated search provides a way to present sensible options for patrons as the number of online resources continues to grow.

The products themselves, to include the implementation process, have also evolved quite a bit in recent years. Early vendor offerings were clunky behemoths that required a local server installation and took months, sometimes years, to prepare for patron use. Now, vendors typically offer more lightweight hosted options that are ideal for libraries whose local technology resources are limited or lacking. Serials Solutions’ promised—and, in the case of the Auraria Library, delivered—six-to-eight-week turnaround from contract to launch is a dramatic improvement over the years-long implementation time. Generally, vendor support during implementation is better, with the vendor doing more of the setup work.

In terms of technical improvements, vendors have found effective ways of deduplicating results, which was an early Achilles’ heel. The practice of HTML screen scraping—analyzing the output of a database by “reading” the HTML of the results page—is being replaced by more highly structured XML Gateway technology, which improves the quality of the results returned. Additional features like clustered results, integration with other tools, supported web integration services, and impressive administration and statistics modules are now available among the various products. Notable examples of these features are Serials Solutions’ clustered results feature in 360 Search and Info-Graphic’s administrative module for their AGent product.

Further, customization options allow a federated search box to be dropped into the code of any page on a library’s web site. Rather than being placed in parallel in a library web directory with other electronic resources, the technology can be used to overlay an entire web site to provide more immediate access to online resources. In addition to providing a single-search box on the library’s home page, many libraries are using federated search products to enhance the more traditional and usually home-grown A-Z/subject directories of databases as well as library guides and pathfinders. Almost any combination of resources and audience is possible. For example, an “English 101” bundle in an academic library could include EBSCO Academic Search Premier, Gale’s General OneFile, and LexisNexis Academic. Code for this particular search could be embedded on a class guide for English composition courses so that less time could be spent training students on the individual interfaces. More time could then be spent discussing topic selection and refinement, Boolean logic—which is supported by federated search products—and other important research concepts.

Federated searching is not the Holy Grail—at least not yet. It does not automatically make library patrons better searchers or researchers, but thankfully, reference and instruction librarians do. When implemented as a discovery tool, federated searching can successfully connect patrons with subscribed online content. It is another tool that we can put in our patrons’ research toolboxes, even if we still prefer not to use it ourselves until future developments, hopefully, will make improvements in what reference and instruction librarians find lacking. Even with all of its current technical shortcomings, federated searching provides a means of presenting our hyperstructured universe, with all of is semi-secret classification schemes and codes, to our customers in a way that they not only understand, but have come to expect.

Notes:
1. Lampert, Lynn D. and Katherine S. Dabbour, “Librarian Perspectives on Teaching Metasearch and Federated Search Technologies,” in Federated Search: Solution or Setback for Online Library Services, ed. Christopher Cox (Binghamton, NY: Haworth, 2007) 253-78.