Better Living Through Taxonomies
Published on February 5, 2008
Large websites and intranets can benefit from improved methods of search and navigation. These include site maps, A-Z indexes, sophisticated search engines, and generally improved navigational design—and playing a potential role in all of these methods is well-planned taxonomy.
What is a taxonomy?
There are both different kinds of taxonomies and different definitions of taxonomies, so it is understandable that there exists uncertainty regarding what a taxonomy actually is. When we think of a taxonomy what often comes to mind is a tree-type hierarchy of terms for classifying things such as plants or animals. The word taxonomy originally meant the science of classifying things, but it has since become a popular term for any hierarchical classification or categorization system—thus, we no longer speak of taxonomy as a science, but rather a taxonomy (plural: taxonomies) as a kind of controlled vocabulary that has a hierarchy. The taxonomy terms are arranged or linked so that narrower/more specific/child terms fall under broader/more generic/parent terms. An example of a term hierarchy might be:
- Computers and internet
- Internet software
- World Wide Web software
- HTML editors
- World Wide Web software
- Internet software
A controlled vocabulary, at a minimum, is “a restricted list of words or terms used for indexing or categorizing”. Most controlled vocabularies have the additional feature of cross-references pointing from a non-preferred term (synonym, variant, alternate spelling, etc.) to the preferred term. The objectives of a controlled vocabulary are to ensure consistency in indexing, tagging, or categorizing and to guide the user to the desired information.
Different kinds of taxonomies and controlled vocabularies are known by different names. If a controlled vocabulary is merely a list of concepts with their synonyms, and none of the synonyms is designated as the preferred term, it is called a synonym ring. A more complex controlled vocabulary, with hierarchical (broader-term/narrower-term), related-term, and used-from relationships (and often also term notes) is called a thesaurus. A more complex thesaurus with customized semantic relationships such as “located in”, “utilized for the purpose of”, “is a member of”, “is owned by”, etc., is known as an ontology. A taxonomy that consists of multiple smaller hierarchies, one for each type of term or characteristics (or facet), which can be searched in combination, is called a faceted taxonomy. (Typical facets might be People, Organizations, Topics, Products, Locations, and Activities.)
Recently the term taxonomy has also become popular as the term for any kind of controlled vocabulary or classification system, whether a simple standard glossary, a highly structured ontology, or anything in between. A taxonomy is thus either—or both—a hierarchical classification scheme and/or a controlled vocabulary. Both these characteristics of a taxonomy make it useful for improving the navigation and findability of a website.
Hierarchies and information architecture
In print, pages of content are usually arranged in a linear sequential form; in digital format, pages of content are often arranged hierarchically. Obviously a linear format will not be followed in hyperspace, where users can click in any direction they want. But why a hierarchy? People “surf” the Web instead of reading everything in one place; the main ideas, overviews, and executive summary pages are the top level pages in a website, whereas more specific, detailed topics are at lower levels, requiring more clicks. This also occurs (to a limited degree) in print in daily newspapers, where the main headlines are on the front page with stories continued on later pages. The structural design (i.e. information architecture) of websites, therefore, has adopted a hierarchical format with main pages branching from the home page, and lower level pages branching from each of the main pages in an inverted tree-like structure. The structure is reflected in the navigation menu, often with second-level pages in pull-down sub-menus, and in the site map.
It goes without saying, then, that developing a good hierarchical structure is important for creating a well-designed and easy to navigate website. By understanding the fundamentals and best practices of taxonomy development, web designers and information architects can design better websites. This involves knowing whether concepts or topics are indeed of a broader-narrower (parent-child) relationship and not merely an associated relationship. A concept can be narrower to another concept only if it is a kind of, instance of, or part of the broader concept.
Controlled vocabularies and findability
Another application of taxonomies in the digital medium is to aid findability and search. In addition to the hierarchical nature of the taxonomy, the naming of the terms or concepts must be carefully considered. Terms should be unambiguous, yet consistent in style. The category may also need to be concise to fit within menu labels. Terms used on menu labels should be used consistently in page titles and other areas of a site—this consistency aids in the navigation of a site.
When we speak of “search” we probably think of a search engine. For many of us, how a search engine works is some kind of mathematical mystery; there are algorithms for the frequency, location, and proximity of words and phrases, but often what it comes down to is having the words or phrase entered by the user match words or phrases in the text. If, however, a synonym or different wording is used, such as “car” instead of “automobiles”, relevant pages will be missed. A taxonomy, at least of the synonym ring type, can significantly improve search engine results. Instead of merely going straight from the user-entered search-string to the text, the search engine first searches for the word or phrase on a synonym look-up table, whose terms are in turn matched to the text of the documents. A synonym-ring controlled vocabulary is typically not seen by the user.
A user-browsable taxonomy used with a search engine is another possibility. If browsed by the user, the hierarchical nature of a taxonomy is then also important.
Higher-end search engines may come with a taxonomy creation and maintenance feature, but—except for a very few customized enterprise search systems—a search engine rarely comes with a pre-built taxonomy. But even the simplest search engines, including freeware, provide the option to configure the search to target certain fields, including the keyword meta-tags. If you configure your search engine to search only the keywords tags and not the full text, and you ensure all pages are properly and consistently tagged with keywords, you can reap the benefits of a taxonomy. Note that the keywords need to be selected by the (human) taggers from a controlled vocabulary list to ensure consistency in application and retrieval.
A database is not necessary to make use of a taxonomy, but certain kinds of content are better categorized and searched through a database structure. A taxonomy that is based on facets and supports a simultaneous query on multiple facets in combination (such as activity, product, and location) requires that its content be stored in a database.
What kind of sites are best candidates for taxonomies?
The design of any site can benefit from the basics of classification and categorization, although complex taxonomies are not needed for a small site. Large sites, though, such as intranets and web sites of large diversified corporations, government agencies, and universities can definitely benefit from a carefully planned taxonomy to aid both navigation and search. Sites that are collections of articles, images, or other data records can also be more easily searched with a taxonomy.
Ecommerce sites, in which each product is categorized, are excellent candidates for faceted taxonomies. There can be facets for product type, purpose, customer-type, price-range, and other characteristics such as color or size. The user chooses a term/value from each facet and searches for results that meet the requirements of all chosen facets combined.
Consumer-oriented sites that do not include an ecommerce feature might utilize a single hierarchical taxonomy instead, such as in the case of the Verizon Superpages site’s category browse:
How are taxonomies created?
While many content management systems and search engines support the integration of a taxonomy, most do not include full capabilities to create and manage a taxonomy within the system. Rather, taxonomy construction must be created offline in a different tool and imported into the CMS or search engine.
For a small, simple taxonomy, you can create lists of terms in Excel spreadsheets for importing, but a more sophisticated taxonomy tool is needed if you want to support multiple non-preferred terms (synonyms) for each term, related-term relationships, term notes (definitions), more than one broader-term relationship for the same term (polyhierarchies), and attributes or categories for terms. A dedicated taxonomy tool will also check for duplicate terms, unlinked terms (orphans), and circular relationships, and adjust relationships if a term is re-named or deleted.
Affordable desktop taxonomy/thesaurus creation tools that can export complex taxonomies as XML or HTML include MultiTes, Webchoir TCS-10, and Term Tree 2000. Higher-end multi-user systems combine distributed taxonomy creation with indexing features; these include Synaptica, WordMap, SchemaLogic, and Data Harmony’s Thesaurus Master. Finally, there are several enterprise search engines that feature auto-categorization in combination with taxonomies.
Use of the software alone does not necessarily result in a usable taxonomy, just as a good HTML editing tool does not guarantee a good website if the user is not skilled in the techniques of website design. A good taxonomy requires additional skills and knowledge in categorization practices. The best qualified people to build taxonomies are those with a background in library and information science, or experience in creating thesauri for indexed article retrieval. Often indexers of books or periodical articles transition well into taxonomy work. Other individuals could make good taxonomists if they have the right analytical skills, take a course or workshop on taxonomy creation, and read up on the field.
If you want to start a taxonomy project for your organization, first check with your corporate librarian, if you have one. If you have a large taxonomy project, especially one that involves more content than just what’s on a website, there are several taxonomy consultancies that can help you. If you have a smaller taxonomy project that you can manage but not execute yourself, you can contract a freelance taxonomist.
Whether your goal is to improve the organizational structure of your website or boost the effectiveness of the search engine on your site, a taxonomy of one kind or another can be of great help.
- Taxonomy Community of Practice Wikispace
- Taxonomies & Controlled Vocabularies SIG
- National Information Standards Organization: ANSI/NISO Standard 39.19 – Guidelines for the Construction, Format and Management of Multilingual Controlled Vocabularies
- Introductory Tutorial on Thesaurus Construction
- Construction of Controlled Vocabularies: A Primer
- Taxonomy Community of Practice Yahoo group
Heather Hedden is an information taxonomist with Viziant Corporation in Boston, MA. She also does consulting through Hedden Information Management and teaches continuing education workshops through Simmons College Graduate School of Library and Information Science.