The search isn't broken, we're broken - Part II : Intranet Search & Taxonomy

by Toby Ward — In Part I of our two-part look at how search effectiveness can be enhanced by people processes and rules, we looked at today’s search challenges in light of how a taxonomy and meta tagging can help improve the search process without having to scrap the search engine.


Information Overload

Information overload is eroding employee productivity. Recent studies reveal that the average corporate employee spends 25-35 per cent of their productive time searching for information to do their day-to-day job. “Our ability to create information has substantially outpaced our ability to retrieve relevant information,” claims a recent Delphi Group report (Taxonomy & Content Classification, 2002). Some estimates claim that there now exists some 250 megabytes of information for every human being – and the glut is growing.
The previous information vacuum perpetrated by ‘iceberg management’ practices has been turned inside out. Instead of wading through information, users are now drowning in it. In addition, search technology has created an interesting and ironic paradox: regardless of the product or a user’s ability to use it, effective searches require the user to know the terms they need to use before they type them into the search engine. This all means that intranet and Internet users are increasingly frustrated by their inability to find what they’re looking for in a timely manner.

Autocategorization & Taxonomy Solutions

The current challenge is sorting, retrieving and filtering this over-abundance of information. In response to this challenge, a new breed of software solution has emerged in recent years that will search and index all of your content from any number of sources on your corporate intranet (or Internet site) and categorize it in relevant buckets. In other words, these solutions look at your site and automatically build a directory or navigation tree – even creating and inserting meta data within the documents as it searches.
Indexed information is then built into a browser-based directory of information, similar to the directory approach of Yahoo! To this end, the user need not necessarily know the exact terms they’re seeking but can instead use a directory of information that is organized by connected themes and context. Furthermore, a visual and navigable hierarchy of content can activate intuitive relationships between information sources that are not evident when using a search engine.

Intuitive Relationships

Powered by complex mathematical algorithms, these software tools can build relationships between similar concepts and distinguish the multiple meanings of a single word. For example, the word “link” can mean one of many interlocking units in a steel chain, the action verb meaning to connect, or a piece of actionable HTML on a Web page. By creating and inserting meta data, these software solutions also extract greater precision out of your corporate search engine by allowing it to return results based upon taxonomy-powered meta data.
These new taxonomy software products replace the traditional, manual methods that have a number of drawbacks. Manual classifying, directory building and meta tagging require significant expenditures in terms of people and time, the former which can be inconsistent and not very scaleable.
“Many organizations have spent many years just trying to get it right,” says Paul Whitelam, Product Manager for Endeca Technologies, a Cambridge-based software company that built the Tower Records Website. “Our approach negates this and it’s done automatically as people navigate through the set. Endeca auto-indexes rather than builds from scratch… leveraging content meta data to dynamically generate the taxonomy (directory).”

Rules First

“Before you meta tag, you need a taxonomy,” says Ramana Venkata, Chief Technology Officer for Stratify (formerly known as Purple Yogi), a taxonomy software company. “So you bring in someone like KPMG and pay them a million dollars, but you still need to figure out how to create and insert the meta data.” In other words, the search process still relies on the meta data behind the content and the taxonomy rules (not to be confused with a taxonomy directory).
Stratify’s product, The Discovery Product, includes a taxonomy builder with auto categorization that automatically creates and inserts meta data in all of the content and then builds a portal presentation layer that is web-based or similar to the presentation of MS-Windows Explorer – in a matter of seconds.
Taxonomy products such as Stratify – and there are many that are now available on the market – works best when it works in tandem with the search engine. Stratify has even built a partnership with search technology company Inktomi and is looking to expand their search partnerships.
“Search is not very effective today unless you know specifically what you’re looking for,” adds Venkata. “If you’re a little vague, search will give you nothing (or too much).”

No Panacea

Despite their obvious advantages, however, taxonomies and meta data are no panacea.  As Cory Doctorow reminded us in part I, our ability to use and retrieve information is still largely dependent on people – those doing the searching and those creating the content.
Many users will continue to go straight to the search engine, rather than surf through a site’s hierarchy to find what they need. Furthermore, if the user does not know how to use common search principles such as Boolean Operators (AND, OR) and Implied Operators (+, -, *, “) then the world’s greatest search technology will be limited in value.
But there is hope. Many companies know the value of a taxonomy and are prepared to act. Indeed, a recent Delphi study revealed that a whopping 90% of interviewed companies plan to invest in a taxonomy strategy in the next 24 months. This means taxonomy rules and people processes, all of which will help to reduce the confusion and make informational sites more manageable and navigable.

Toby Ward, a former journalist and a regular e-business columnist and speaker, is the President and Founder of Prescient Digital Media. For more information on Prescient’s CMS Blueprint service, or for a free copy of the white paper “Finding ROI”, please contact us.