110
Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved. Taxonomies, Metadata and Search Seth Earley 781-4820-8080 [email protected]

Taxonomies and Search for Chicago SharePoint User Group

Embed Size (px)

Citation preview

Page 1: Taxonomies and Search for Chicago SharePoint User Group

Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomies, Metadata and Search

Seth [email protected]

Page 2: Taxonomies and Search for Chicago SharePoint User Group

2Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Starting March 3rd, 2011 (Recordings will be available)

Register at:

www.earley.com/webinars/jumpstarts/sharepoint-2010-architecting-business-value

SharePoint Call Series Architecting for Business Value

Page 3: Taxonomies and Search for Chicago SharePoint User Group

3Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

• Session 1 - SharePoint 2010 – Best Practices for Creating Business Value March 3rd, 12:00- 1:00 pm

• Session 2: Methods and Tools for Better SharePoint Search March 10th, 12:00- 1:00 pm

• Session 3: Practical Approaches to Developing Rich Information Architectures March 17th, 12:00- 1:00 pm

• Session 4: The Role of Governance in Ensuring Success March 24th, 12:00- 1:00 pm

Jumpstart Series – Architecting SharePoint for Business Value

Page 4: Taxonomies and Search for Chicago SharePoint User Group

4Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Earley & Associates Highlights

Founded 1994

Focus Areas Holistic approach to specific business contexts and goals for:

• Retail

• Manufacturing

• Pharmaceuticals & Life Sciences

• Public Sector

• Media & Entertainment

Personnel Core team of 30 consultants

Locations Stow, MA headquarters, consultants in US, UK & Canada, global projects

Services • Taxonomy & Information Architecture

• Search Strategy for Enterprise & Web

• ECM, DAM & Information Lifecycle

• Program Management & Governance

Page 5: Taxonomies and Search for Chicago SharePoint User Group

5Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

• Co-author of Practical Knowledge Management from IBM Press

• 17 years experience building content and knowledge management systems, 20+ years experience in technology

• Former Co-Chair, Academy of Motion Picture Arts and Sciences,

Science and Technology Council Metadata Project Committee

• Founder of the Boston Knowledge Management Forum

• Former adjunct professor at Northeastern University

• Guest speaker for US Strategic Command briefing on knowledge networks

• Currently working with enterprises to develop knowledge and digital asset management systems, taxonomy and metadata governance strategies

• Founder of Taxonomy Community of Practice – host monthly conference calls of case studies on taxonomy derivation and application. http://finance.groups.yahoo.com/group/TaxoCoP 100+ calls since 2005

• Co-founder Search Community of Practice:

http://tech.groups.yahoo.com/group/SearchCoP

Seth Earley, Founder & President, Earley & Associates

Page 6: Taxonomies and Search for Chicago SharePoint User Group

6Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Session Objective

From Session Abstract

• High level review of basic concepts related to taxonomy, metadata and search

• How are taxonomies integrated with metadata management and standards and

• The relationship between taxonomy and information architecture

• How taxonomy, metadata and IA relate to SharePoint

• Options for creating good information architectures within 2010.

• How to leverage taxonomy and metadata to improve navigation and search in your SharePoint portal.

• Techniques for implementation using native SharePoint functionality.

Page 7: Taxonomies and Search for Chicago SharePoint User Group

7Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Agenda

• Change is constant

• Taxonomy definition

• Information and semantic architecture

• The challenge of search

• Five basic truths about search

• The role of metadata

• Taxonomy and navigation

• Case Study

• Conclusion

Page 8: Taxonomies and Search for Chicago SharePoint User Group

8Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Change is constant

• Snap shot versus movie• Business changes faster than IT can• Systems grow up to solve specific problems without a view toward

integration• Integrated environments

• Solution to application proliferation…?

Page 9: Taxonomies and Search for Chicago SharePoint User Group

9Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Library

Web site

Healthday

Same Term, Different Expressions…

Cardiology

Cardiac Care

Heart Health

Problems:

• Difficulty finding relevant information

• Federated search configuration is cumbersome

• Inability to view consolidated results

• Limited ability to control shared vocabularies

• Weak governance or demonstrated control

• Costly/cumbersome administrative overhead

Page 10: Taxonomies and Search for Chicago SharePoint User Group

10Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomy is an enabler…

• Every organization is struggling with findability

• Content management applications, search tools, workflow applications, customer relationship management systems, etc all strive to create views of information that are in the context of work processes

What is the key component to any of these initiatives?

Having a common language in which to:• Describe• Communicate• Translate

information between applications and between user audiences

Page 11: Taxonomies and Search for Chicago SharePoint User Group

11Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Information architecture versus Semantic architecture

• Information architecture describes the ways in which systems capture, manage, organize and present information Metadata fields describe information about a document or piece of

content. Identifiers of various kinds: Name, account number, part id, price, etc Conditions or status of the content: Workflow approval state, Date

created, review date, etc

• Semantic architecture is about meaning and nuance Terms can have multiple contexts and meanings. People use different terms to describe the same thing

Page 12: Taxonomies and Search for Chicago SharePoint User Group

12Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

A single concept can have different Expressions

Person we do business with• Cust_Name• Cust_ID• Customer ID• Customer• Client

Person who writes a document• Contributor• Author• Creator

What we buy or sell a product for:• Price • Cost

Pitch • the property of sound• the throwing of a baseball• a vendor's position (especially on the

sidewalk) • sales talk• degree of deviation from a horizontal

plane• dark heavy viscid substance• a high approach shot in golf • a card game• abrupt up-and-down motion • the action of throwing something• …

A single expression can represent different Concepts

Info Architecture Semantic Architecture

Source: Fred Leise

Page 13: Taxonomies and Search for Chicago SharePoint User Group

13Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomy definition

• Taxonomy is a system for organizing concepts and categorizing content Expresses hierarchical

relationships (parent/child) Arranged in a tree-like

structure, with top level categories that branch out to reveal sub-categories and terms in varying levels of depth

Dictionary of preferred terminology

Products

Games

Card games

Action figures

Board games

Brands

Milton Bradley

Scrabble

Disney

Battleship

Page 14: Taxonomies and Search for Chicago SharePoint User Group

14Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomy definition

• Taxonomy: system for organizing concepts and categorizing content• Expresses hierarchical relationships (parent/child)

• Expresses other relationships

Sample taxonomy record

Car SYN: Automobile Vehicle

fr-CA: Voiture en-UK: Auto es-CO: Carro

Synonyms

Translationsand regionalvariants

Preferred term

Page 15: Taxonomies and Search for Chicago SharePoint User Group

15Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomy is a foundation…

• It is a system for classification

• It allows for a means to organize documents and web content

• Helps us fine tune search tools and mechanisms

• Creates a common language for sharing concepts

• Allows for a coherent approach to integrate information sources

• It is a common language for business processes

Page 16: Taxonomies and Search for Chicago SharePoint User Group

16Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomy as a common business language

Case Example:Motorola’s Global

Taxonomy FrameworkServed Multiple Processes

Case Example:Motorola’s Global

Taxonomy FrameworkServed Multiple Processes

Browsing & filtering

Compare product

Related documents

Financial reporting

Business intelligence Program Management

Product Lifecycle Management

Page 17: Taxonomies and Search for Chicago SharePoint User Group

17Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Enterprise taxonomy drivers

Application Primary driver

“Clock speed”

Constituencies Technology challenges

Web Content Consistency in branding, internal efficiencies

Medium to fast

Web developers, content managers, content creators

Exposing taxonomy to CMS, integration with search

Enterprise data standards

Cross platform integration, business intelligence, metadata modeling, data warehousing

Very slow to slow

Data architects, standards boards, data modelers, business intelligence

“Source of truth”, difficulty integrating metadata standards

E Commerce Web site sales. Need to support customer experience

Very fast Merchandisers, e commerce development team, marketing

Commerce platforms do not necessarily leverage capabilities. Updates to classification are not a priority

Product development

Product development efficiencies, speed to market

Fast Engineering, Product development, product marketers

Product life cycle management systems usually self contained

Intranet development

Internal efficiencies

Slow to Medium

Intranet managers, functional managers

Difficulty unifying access to multiple repositories, sheer volume of sources

Page 18: Taxonomies and Search for Chicago SharePoint User Group

Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

The Challenge of Search

Five basics truths about search

Page 19: Taxonomies and Search for Chicago SharePoint User Group

19Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Search as Utility

• “search as a utility has become deeply ingrained into people's everyday lives.“ – Study by Nielsen/Net Ratings

• “search software, hardware, and support bundle or search appliance has become very popular since being introduced in early 2002" – Goebel Group

These are misleading concepts. Search is used as a utility, but contexts vary so widely that “plugging search in” does not always produce satisfactory results.

Page 20: Taxonomies and Search for Chicago SharePoint User Group

20Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Truth #1.

We have to change our definition of search.

• Search is no longer just a white box.

• Search is an experience.

• Search is about information access & capabilities.

Page 21: Taxonomies and Search for Chicago SharePoint User Group

21Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Truth #2.

Search algorithms are getting better, but they cannot infer human

context & intent.

• A search engine doesn’t know if I’m an engineer, an attorney, or a high school student.

• Perspective has an impact on whether a set of search results are useful & appropriate.

Page 22: Taxonomies and Search for Chicago SharePoint User Group

22Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Truth #3.

Taxonomy, metadata and information architecture are key aspects of

search.

• Search is fundamentally about metaata

• Some content is structured, some isn’t and needs help

• Advanced search functionalities require taxonomy

Page 23: Taxonomies and Search for Chicago SharePoint User Group

23Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Truth #4.

Search is increasingly looking like navigation.

• What happens when you click on a link?

• Guided navigation & faceted search are really the same thing

Page 24: Taxonomies and Search for Chicago SharePoint User Group

24Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Truth #5.

Search is messy.

• Knowledge is messy, information is messy.

• People find answers through haphazard and chaotic processes.

Page 25: Taxonomies and Search for Chicago SharePoint User Group

25Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

“…search terms are short, ambiguous and an approximation of the searcher’s real information need…”

Source: http://research.microsoft.com/~ryenw/papers/WhiteCONTEXT2002.pdf Ryen W. White, Joemon M. Jose and Ian Ruthven

Page 26: Taxonomies and Search for Chicago SharePoint User Group

26Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Rising Expectations Plus Increased Complexity

• Search seems to be a ‘given’ – we expect it to be there

• Most enterprise search is less than optimal – too many results, irrelevant results, missing results

• It was not so long ago that organizations were starved for information

• A puzzling fact: as information environments have grown more complex, users expectations have grown that search should be simpler

Page 27: Taxonomies and Search for Chicago SharePoint User Group

27Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Search is complex

Enterprise search is diverse – need to access multiple applications and contexts – both structured and unstructured

Business Intelligence/Analytics

Customer Relationship Mgt

Document repositories

Custom databases and applications

Intranets/web pages

Page 28: Taxonomies and Search for Chicago SharePoint User Group

28Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Search is Heterogeneous

Search/Tagging/Taxonomy Integration Framework

Data Sources

Search Mechanisms

Appliances Federated Search

Auto categorization/Clustering

Entity Extraction

Faceted Search

Semantic Search

Business Intelligence

Customer Relationship Mgt

Document repositories

Custom databases and applications

Intranets/web pages

Page 29: Taxonomies and Search for Chicago SharePoint User Group

29Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

What is the right mechanism for accessing information?

• Content can be created in structured or unstructured contexts

• It’s value can vary depending on audience, context or process

• Some content is extremely nuanced and requires more precise access (according to audience or task, solution, etc…)

• Search can be based on inherent structure and content of a document (implicit metadata) or on information applied to that content (explicit metadata)

Page 30: Taxonomies and Search for Chicago SharePoint User Group

30Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

More Structured

Email

Instant Messages

Wikki’s

Blogs

Discussions

Collaborative Workspaces

Online Learning

Instructor Led Courses

Content Mgt

Workflow systems

Doc Mgt Systems

Records Mgt Systems

Knowledge Creation Knowledge Access/Reuse

Chaotic Processes Controlled Processes

Different tools are appropriate depending upon degree of collaboration and creation versus structured access

Less Structured

Emergent Value

Page 31: Taxonomies and Search for Chicago SharePoint User Group

31Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Lower Cost Higher Cost

Message text

External News Example deliverables

Discussion postings

Interim deliverables

Content Repositories

Success Stories

Benchmarks

Approved Methods

Best Practices

Unfiltered Reviewed/Vetted/Approved

Lower Value Higher Value

Relative value

Formal Tagging/Organizing Processes

(More difficult to access) (Easier to access)

Social tagging (“folksonomy”)

Structured tagging (taxonomy)

Page 32: Taxonomies and Search for Chicago SharePoint User Group

Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

The Role of Metadata

Metadata drives content processes

Taxonomies provide the organizing principles behind metadata

Page 33: Taxonomies and Search for Chicago SharePoint User Group

33Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

What is metadata?

• It is the “is –ness” of a piece of content

• And the “about- ness” of a piece of content

• This is a Product Description

• It is about the Motorola Android

Taxonomies are the organizing principle behind metadata and the values that populate

metadata fields`

Taxonomies are the organizing principle behind metadata and the values that populate

metadata fields`

Page 34: Taxonomies and Search for Chicago SharePoint User Group

34Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

What is a content model?

• Content is structured with body information and a wrapper that formats and tags that information

• Also called a “content object model”*

Title

DescriptionSimple content object modelSimple content object model

*Content model refers to overall frameworkContent object model refers to a specific model for a set of document types

I.e., an overall “Content Model” includes multiple Content Object Models”

Page 35: Taxonomies and Search for Chicago SharePoint User Group

35Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Metadata for a product page in a content

management system

Title

DateAuthor

Features

Product_Name

Category

Doc_IDDoc_Type

“is – ness”“is – ness”

“about – ness”“about – ness”

FAQ

Product

Press release

Specification

Promotion

Page 36: Taxonomies and Search for Chicago SharePoint User Group

36Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Meta data allows for various views of content

• Web pages are made up of assembled items of content

• These are comprised of metadata elements that are assembled together into “content types”

Title

Comp_Features

DateAuthor

Features

Product_Name

Category

Promotion_ID

Promo_Type

Related_Products

Doc_ID

Content_ID

Date

Content_ID

Date

Content_ID

Date

Product content typeProduct content type

Promotion content typePromotion content type

Standard HeaderStandard Header

Related Products content type

Related Products content type

Page 37: Taxonomies and Search for Chicago SharePoint User Group

37Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

The User Experience (UX) is at the intersection of taxonomies, metadata and content objects

• Taxonomy: system for organizing and classifying content• Metadata: information about our content, housekeeping, as well as semantic

and structural information• Content Objects: groups of metadata that are assembled into components

that are then assembled into pages or documents

How will taxonomy surface on the front-facing application?

What do the wireframes suggest?

How do people interact with it?

How does the content architecture deliver the front-end design?

Page 38: Taxonomies and Search for Chicago SharePoint User Group

38Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomy and the User Experience

• Define what the user interface will eventually look like

• Identify how content is laid out on the page

• Faceted Search:

Taxonomy Facets

Taxonomy Facets

Document Preview

Document Preview

Best BetsBest Bets

SynonymsSynonymsMisspellings

Results

Page 39: Taxonomies and Search for Chicago SharePoint User Group

39Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomy in a content management application

1. Filtering products / search results

2. Dynamic relationships

3. Tagging & categorization of content

4. Dynamic navigation

5. Feature consistency / compare product

3

1

5

24

4

Page 40: Taxonomies and Search for Chicago SharePoint User Group

40Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

When is it metadata and when is it taxonomy?

• Taxonomy can be applied as metadata• Typically this is expressed as a drop down “controlled vocabulary”

list (also called “reference data”)• Some controlled vocabularies are very simple, with a few

unambiguous choices• Some are specific to a particular system or tool and will not

change frequently• There is a tendency to lump all metadata into a technology

bucket and assume this is owned and managed by IT• Not a good approach (since we need business ownership and

participation)

Page 41: Taxonomies and Search for Chicago SharePoint User Group

41Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Who owns the taxonomy? A question of governance

• Metadata Management (IT or application owner) Unambiguous Limited number of values Not frequently changing Housekeeping or administration role Specific to an application

• Taxonomy Management (business or functional owner) Ambiguous meaning Subject to frequent changes or updates Common across multiple applications or contexts Requires specific knowledge of field (subject matter expertise)

Page 42: Taxonomies and Search for Chicago SharePoint User Group

Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Metadata and Search

All search leverages metadata

Explicit versus implicit metadata

Page 43: Taxonomies and Search for Chicago SharePoint User Group

43Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

All search leverages metadata…

…but not all metadata is explicit

• Full text search derives metadata about documents

• Creates an index of terms that occur in a document collection

• Associates documents with those index entries

Page 44: Taxonomies and Search for Chicago SharePoint User Group

44Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Explicit metadata versus implicit metadata

DEF Company

Support

ABC Company

ABC shall provide first level technical support to all Licensed Product end users and/or Sublicensed Product customers/users. DEF will provide second level support. DEF shall provide to ABC a primary and a secondary support person to act as the primary interface with ABC’s technical and customer support team. DEF shall provide direct technical support to ABC for all uses of the DEF Software. Support level definitions and responsibilities are set forth in Exhibit C. An “SLA Failure” as defined in Exhibit C shall qualify as a Release Condition sufficient to authorize the Escrow Agent to release to Source Code to ABC pursuant to Section 7 and the Escrow Agreement.

LicenseContent Type =

Organization =

ABCcustomerscustomer supportcustomer support teamDEFDEF softwareend usersescrow agreement.escrow agentexhibit clicensed product

release conditionsection 7secondary supportSLASLA failuresoftwaresource codesupport levelsublicensed producttechnical support

Topic =Forward Index – Words per documentInverted Index – Documents per word

Explicit metadata

Used to derive implicit metadata

Page 45: Taxonomies and Search for Chicago SharePoint User Group

45Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Search index points to document

12

3

4

Forward Index – Words per documentInverted Index – Documents per word

A search index becomes derived metadata about a collection of documents

Term Document

Acme 1, 2, 3, 4

customers 2, 3

escrow 3, 4

exhibit c 2

license 1, 4

…etc …etc

In which documents do these words occur?

Page 46: Taxonomies and Search for Chicago SharePoint User Group

46Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

…but not all metadata is explicit

• Full text search derives metadata about documents

• Creates an index of terms that occur in a document collection

• Associates documents with those index entries

• Occurrence of certain words in a document and the relative value of those occurrences, including: Weighting Relative positioning Semantic relationships…

…becomes information about the document that is cached in the index and served by the search engine

• Search algorithms vary in how metadata is derived and exposed to users.

All Search Leverages Metadata

Relevance ranking, for example, is additional metadata for a result that is ‘implied’ or derived based on incoming connections to a piece of content.

Relevance ranking, for example, is additional metadata for a result that is ‘implied’ or derived based on incoming connections to a piece of content.

Page 47: Taxonomies and Search for Chicago SharePoint User Group

47Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Examples of implicit metadata:

• ‘Structure’ and format of content – a piece of content may be ‘unstructured’ and not contain metadata, but it is well organized. Example : Newspaper story contains a headline, sub head, and first paragraph with who,

what, where, when, etc. Clear editorial standards

• Context of content – Where did the content come from? If from a particular web site, file share, data source or intranet location the domain of knowledge provides context. How can we disambiguate the term “diamond”?

Sports site – baseball diamond Commerce site – diamond ring

Sales context for ‘feature’ versus engineering context for ‘feature’ “Adapter” – power cord “Adapter” – blue tooth headset

Page 48: Taxonomies and Search for Chicago SharePoint User Group

48Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Context as metadata

• If we maintain context of a piece of information in our search results, this is equivalent to having additional metadata on that content

Search results organized by repository

This is a form of “federated” search – a single search term fed to multiple repositories

Example courtesy of Morrison and Foerster

Page 49: Taxonomies and Search for Chicago SharePoint User Group

49Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

“We should get Google”…

Page 50: Taxonomies and Search for Chicago SharePoint User Group

50Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Why you will not “just get Google”

• Google leverages linkages on the web that are not typically duplicated internally in the organization

• Search engines cannot infer intent or know what is important to you in the context of your work task

• Information relevance is dependant on who you are and your level of expertise as well as what you are trying to accomplish

• Not all content is equal - Google is fine for broad search results or less precise information, may not work as well if large numbers of documents with finer granularity of differences

Page 51: Taxonomies and Search for Chicago SharePoint User Group

51Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Why doesn’t Google, just use Google?

Page 52: Taxonomies and Search for Chicago SharePoint User Group

52Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Why you will not “just get Google”

Page 53: Taxonomies and Search for Chicago SharePoint User Group

Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

More Definitions: Taxonomy, Ontology, Thesaurus…

Page 54: Taxonomies and Search for Chicago SharePoint User Group

54Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

“Sound bite” definitions

• A Taxonomy is a list of terms that enable classification of information Method used to organize Subject/Topic metadata Typically expresses hierarchical relationships (parent/child) Emphasizes context

• A Thesaurus is a specialized taxonomy Equivalence relationships (synonyms) Associative relationships (related terms – “see also”) Preferred terms, variant terms

• An Ontology is a collection of taxonomies and thesauri A body of knowledge is represented by multiple lists of categories Categories of various types are conceptually related

Page 55: Taxonomies and Search for Chicago SharePoint User Group

55Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Definitions

• Classification Scheme - A preordained structure of words or symbols used to organize information content

• Index - A list organized in a standardized sequential fashion

Types of indexes may include: back-of-the-book, telephone directory, computerized look-up tables (e.g. b-tree, file system), card catalog, meeting roster of attendees, customer list, to name a few.

An index is a classification scheme

A taxonomy is a classification scheme

But… a classification scheme is not necessarily a taxonomy…

Page 56: Taxonomies and Search for Chicago SharePoint User Group

56Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Classification versus Taxonomy

TAX

Assets

Individuals

Corporations

Liabilities

Individuals

Corporations

TAX ITEMS

Assets

Real Estate

Vehicles

Liabilities

Loans

Debts

TAX PAYERS

Individuals

Single

Married

Organizations

Corporations

Associations

Page 57: Taxonomies and Search for Chicago SharePoint User Group

57Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Types of Term Relationships

Used in thesauri.

Also called “entry types” of terms.

Synonyms.

Things that are related conceptually.

Associative relation types are context and audience specific.

This is how we might relate multiple taxonomies.

Purist definition of a taxonomy – terms have parent/child relationship.

Equivalence Hierarchical Associative

Increasing complexity

Page 58: Taxonomies and Search for Chicago SharePoint User Group

58Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Relationship Types

Relationship Examples

E E

? A

? ?H

H

A

E Equivalence

H Hierarchical

A Associative

Computer Manufacturers

International Business Machines

IBM

Software Group

Big Blue

Hardware Software

Page 59: Taxonomies and Search for Chicago SharePoint User Group

59Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Equivalence Terms Associative Terms

• Common misspellings

• Other terms used

• Abbreviations

• Internal names

• See also

• Related products

• Language spoken

• Products for market

• Available in region

• Risks in region

Page 60: Taxonomies and Search for Chicago SharePoint User Group

Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

The Role of Taxonomy

Page 61: Taxonomies and Search for Chicago SharePoint User Group

61Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Goals of a taxonomy

• Allow for knowledge discovery

• Improve usability of applications as well as learnability of applications

• Reduce the cost of delivering services, developing products and conducting operations

• Improve operational efficiencies by allowing for reuse of information rather than recreation

• Improve search results and applicability (both precision and recall)

Page 62: Taxonomies and Search for Chicago SharePoint User Group

62Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomy Challenges

• Taxonomy means many things in SharePoint Site organization Content types Controlled vocabularies for tagging documents

• Challenges Typically integration of legacy content requires significant tagging effort Users wanted to leverage hierarchy in search in the form of faceted navigation

Page 63: Taxonomies and Search for Chicago SharePoint User Group

63Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomy Solutions

• Taxonomy Technology Leveraging Hierarchy and Taxonomy in both tagging and faceted search True taxonomy management is beyond the scope of SharePoint 2010

• Taxonomy in Context Auto-populate metadata fields with taxonomy values based on the overall

architecture of the site and users roles Reduce the burden on users allow Locations, Departments, Roles to be filled in

automatically

Page 64: Taxonomies and Search for Chicago SharePoint User Group

64Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Recall versus Precision

• The goal of effective search is to pull back lots of relevant results

• This is measured by “recall” and “precision”

• Recall: I am getting the documents that contain my term

• Precision: These results are relevant to me

When trying to improve recall, precision can suffer and vice versa

Precision can also be subjective – based on who we are and what we are doing, in other words, context and task

When trying to improve recall, precision can suffer and vice versa

Precision can also be subjective – based on who we are and what we are doing, in other words, context and task

Page 65: Taxonomies and Search for Chicago SharePoint User Group

65Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Precision

Recall

Relevant items in a database

Items retrieved

Irrelevant items

Relevant items retrieved

Relevant items not retrieved

A

B

C

Ratio of number of relevant items retrieved to total number of relevant items in database

AA B+

Ratio of number of relevant items retrieved to total number of irrelevant and relevant items retrieved

AA C+

X 100 %

X 100 %

Goal is to improve recall without sacrificing precision

Page 66: Taxonomies and Search for Chicago SharePoint User Group

66Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomy & search strategies

Six strategies you should know about

Tuned search Relevance ranking Faceted search Related terms Clustering Disambiguation

Page 67: Taxonomies and Search for Chicago SharePoint User Group

67Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomy and Search Strategies

• Pre Search processing Search engine applies taxonomy or thesaurus to narrow or expand search before

retrieving results

Tuned search “Best Bets” Relevance ranking Faceted search

• Post Search Processing Search results are narrowed or organized after they are retrieved

Related terms Clustering Disambiguation

Page 68: Taxonomies and Search for Chicago SharePoint User Group

68Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Applying a taxonomy to search

We need a mechanism to improve search

• A Taxonomy can be used to Define search terms and map those terms to specific locations of

information (need to integrate with a search engine)

Apply terms to a document so that relevant and consistent search results are returned (need to integrate with a content management system)

• A Thesaurus can be used to define term synonyms and related terms in order to improve the recall of information. We may define “proposal” and “statement of work” and “SOW” as

meaning the same thing. If I enter SOW, I can pull back documents that are labeled with (or contain) the other terms. This is referred to as “term expansion”

Page 69: Taxonomies and Search for Chicago SharePoint User Group

Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Tuned Search, or “Best Bets”

Page 70: Taxonomies and Search for Chicago SharePoint User Group

70Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Tuned Search

What is Tuned Search?

• Search terms are defined in a taxonomy and mapped back to specific locations of information (ie. Specific web pages).

• Eg. A user searching on a broad term like cell phones would be first pointed to a landing page (a “best bet”), or presented a box of hand-picked links above regular search results.

Page 71: Taxonomies and Search for Chicago SharePoint User Group

71Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Best Bets Example – Best Buy

Page 72: Taxonomies and Search for Chicago SharePoint User Group

72Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Tuned Search “Best Bets”

• The same search using just keyword matching could a have retrieved a list of pages with the words “phone” or “cell” e.g.

Home phones Cordless phones 12 cell batteries Etc.

• Reading through pages of possible matches is time consuming and frustrating

Page 73: Taxonomies and Search for Chicago SharePoint User Group

73Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Tuned Search “Best Bets”

How Does a Taxonomy Help?

• Using the taxonomy categories as landing pages assures that users are strategically directed to the content that is most important.

• Best bets are done in conjunction with a taxonomy/thesaurus, not just a list of search terms… Eg. Circuit City

Page 74: Taxonomies and Search for Chicago SharePoint User Group

74Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Circuit City Example

• Search on “Cell phone”:

Page 75: Taxonomies and Search for Chicago SharePoint User Group

75Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Circuit City Example

• Search on “Mobile phone”:

Page 76: Taxonomies and Search for Chicago SharePoint User Group

76Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Circuit City Example

• What do these things have to do with mobile phones?

Page 77: Taxonomies and Search for Chicago SharePoint User Group

Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Relevance Ranking

Page 78: Taxonomies and Search for Chicago SharePoint User Group

78Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Relevance ranking boost

• Can assign more weight to specific metadata fields in the engine’s ranking algorithms

• If search term matches metadata field, higher relative weight than full text hit and boosted rank

• E.g. Best Buy boosts taxonomy category

• E.g. Motorola could boost the product category

content index

Metadata field Relative Weighting: 45

Page 79: Taxonomies and Search for Chicago SharePoint User Group

Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Leveraging taxonomy terms as metadata

Faceted search

Page 80: Taxonomies and Search for Chicago SharePoint User Group

80Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Leverage the taxonomy terms as metadata - faceted search

What is Faceted Search?

• Attribute based search (guided navigation) approach to create precise, targeted search results. Each parameter narrows the search result to the most appropriate content. Also commonly referred to as “advanced searching” or “parametric

searching”

• Users think they are browsing, but they are actually searching

• Allows for multiple navigation schemes based on taxonomy

Page 81: Taxonomies and Search for Chicago SharePoint User Group

81Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Navigational taxonomy

Taxonomy can be a hierarchical grouping of navigational nodes on a web site

Motorola.com

Mobile phonesModems & gateways

2-way radios

Unlocked GSM

With service Accessories

Batteries Headsets

Bluetooth headsets

Challenge is there is no “one way” to navigate that is correct.

Is this the “correct” way?

Page 82: Taxonomies and Search for Chicago SharePoint User Group

82Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Navigational taxonomy

Or is this one “correct”? Or is this one?

Motorola.com

Mobile phonesModems & gateways

2-way radios

Camera phones

Bluetooth phones

Bluetooth accessories

Sunglasses Headsets

Motorola.com

Mobile phonesModems & gateways

2-way radios

Unlocked GSM

With serviceBluetooth

accessories

Sunglasses Headsets

Page 83: Taxonomies and Search for Chicago SharePoint User Group

83Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Motorola.com => United States => Government => Portable Radios

Motorola.com => Portable Radios => United States => Government

Motorola.com => Government => Portable Radios => United States

Motorola.com

CanadaUnited

KingdomUnited States

Enterprise Government

Portable radios

Mobile computers

Consumers

Motorola.com

Mobile computers

Mobile radiosPortable radios

United StatesCanada

Government

United Kingdom

Enterprise Consumer

Motorola.com

Government Enterprise Consumers

Mobile computers

Portable radios

United Kingdom

Canada United States

Page 84: Taxonomies and Search for Chicago SharePoint User Group

84Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Navigating with “facets”

• Two way radios Portable Fixed Mobile Motorcycle

• Vertical market Government Manufacturing Wholesale retail

• Country Canada United Kingdom United States

Vertical market

Target document: P = Portable radioG = United StatesV = Government

Product type

Geographic region

“Facet” is a top level category in the taxonomy

Just three nodes with 5 terms each could have 3 to the 5th power (243) possible combinations

Page 85: Taxonomies and Search for Chicago SharePoint User Group

85Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Is it search? Or navigation?

Good example of faceted search using hierarchy

Good example of faceted search using hierarchy

Page 86: Taxonomies and Search for Chicago SharePoint User Group

86Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Faceted search – PC Connection

Each parameter narrows the search result to the most appropriate content.

Page 87: Taxonomies and Search for Chicago SharePoint User Group

87Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomy and Search

• Post Search Processing- Search results are narrowed after they are retrieved

Related terms

Clustering

Disambiguation

Page 88: Taxonomies and Search for Chicago SharePoint User Group

88Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Related Terms

• Leverages associative relationships in a taxonomy

Page 89: Taxonomies and Search for Chicago SharePoint User Group

89Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Clustering

• Adds context to large result sets

• Clusters are similar to facets but based on derived attributes

• Derived attributes based on concepts contained in result set mapped to taxonomy

Page 90: Taxonomies and Search for Chicago SharePoint User Group

90Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Clustering Example

90

Page 91: Taxonomies and Search for Chicago SharePoint User Group

91Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Clustering

How do I implement Clustering?

• Build out your taxonomy, then extract entities from content and categorize based on derived metadata (facets)

Page 92: Taxonomies and Search for Chicago SharePoint User Group

92Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Categorizing content

Statistical/linguistic

Rules-based

These documents look similar due to an analysis of word patterns – lets put them into the same group

These documents look similar based on some rule that have created (they contain marketing plans and are about the newest widget) lets put them into the same group

Page 93: Taxonomies and Search for Chicago SharePoint User Group

93Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Clustering based on Taxonomy Slice from a set of Search Results:

TaxonomyPathTreeBuilder

Taxonomy“Slice”

TaggedDocumentsTagged

DocumentsTaggedDocumentsTagged

DocumentsTaggedDocuments

Page 94: Taxonomies and Search for Chicago SharePoint User Group

Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Disambiguation

Page 95: Taxonomies and Search for Chicago SharePoint User Group

95Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Disambiguation of search results

What is Disambiguation?

• If a user enters a broad term (like “mobile”) the taxonomy can return terms that help the user select a more precise terms

• Includes multiple approaches: Term expansion

Complex lookups

Page 96: Taxonomies and Search for Chicago SharePoint User Group

96Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Disambiguation methods

• Show related search terms with check boxes in the search results page.

• Show additional search terms as links, perhaps with a prompt - "You might also be interested in:"

• Expand the query and show the expanded words in the search box

• Expand the query invisibly

Page 97: Taxonomies and Search for Chicago SharePoint User Group

97Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Disambiguation of search results

mobile Mobile data terminalsHandheld computers

Network InfrastructureMobile switches

PhonesFixed mobile car phonesMobile phones

Software applicationsMobile applications

Two way radiosMobile radios

Intelligent video solutionsMobile video enforcerMobile video sharing

MESH SolutionsMulti-radio mobile broadband

Mobile ComputingMobile application

Presenting term in multiple contexts

Page 98: Taxonomies and Search for Chicago SharePoint User Group

Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

From Associative Relationships

Page 99: Taxonomies and Search for Chicago SharePoint User Group

99Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Disambiguation of search results

How Do I Implement Disambiguation Methods?

• Need to integrate thesaurus with search engine• Can be accomplished through custom frameworks, web

services, API calls• Thesaurus values can live inside of search engine, in taxonomy

management tool, in spreadsheets or databases or in public sources

Page 100: Taxonomies and Search for Chicago SharePoint User Group

100Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Disambiguation

• Query: Did Enron executives illegally sell Enron stock?

Source: CognitionSearch.com

Page 101: Taxonomies and Search for Chicago SharePoint User Group

Earley & Associates, Inc. | Classification: CONFIDENTIAL USE, NO REPRINTS Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Taxonomy and Navigation*

*Taxonomy is not the same as navigation

Page 102: Taxonomies and Search for Chicago SharePoint User Group

102Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Page 103: Taxonomies and Search for Chicago SharePoint User Group

103Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Applying a taxonomy to navigation

We need to improve navigation for our site

• A Taxonomy can be used to Inform navigation (though it is not the same as navigation) Define metadata and the information architecture of the site.

Page 104: Taxonomies and Search for Chicago SharePoint User Group

104Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Navigation – Sales Node

Sales ToolsAnalyst Reports

………

Case Studies…

Competition…

Customer ReferencesFAQ’sPricing & LicensingWhite PapersPresentations

Page 105: Taxonomies and Search for Chicago SharePoint User Group

105Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Navigation – Sales Node

Sales ToolsAnalyst Reports

………

Case Studies…

Competition…

Customer ReferencesFAQ’sPricing & LicensingWhite PapersPresentations

Doc Types• Analyst Reports• Assessment• Benchmarks• Best Practice• Brochures• Campaign• Case studies• Competition• Configuration Guide• Contracts• Customer References• Data sheet• Event• FAQ• Guides• License Agreements• Migration• Presentations• Press Releases• Price Lists• Quick Reference Guide• White papers

Page 106: Taxonomies and Search for Chicago SharePoint User Group

106Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Why you will not just “use a folksonomy”

• All content is not equal

• Higher value content requires more rigor

• Social tagging is still immature

• May be appropriate for some kinds of content

• On systems open to large user groups, esoteric tags which are understood by a only minority of users tend to proliferate burdens users decreases system efficiency

• Core to folksonomies are the flaws that formal classification systems are designed to eliminate, such as redundancy, misspelling, etc.

• Taxonomists/ontologists argue that an agreed-to set of tags enables more efficient indexing and searching of content

Page 107: Taxonomies and Search for Chicago SharePoint User Group

earley

earley & associates

earley & associates inc

earley & associates needham, massachusets

earley & associates taxonomy

earley & associates, inc

earley & associates, inc.

earley & earley associates

earley and associates

earley and associates inc

earley and associates seth

earley and associates taxonomy

earley assoc

earley associates

earley associates address

earley associates boston

earley associates wordmap

earley financial

earley jumpstart

earley taxonomy

earley taxonomy & metadata jumpstart call: managing structured metadata and taxonomies

earley.com

early & associates

early and associates

taxanomic classification of the freycinetia

taxonimic classification of humans

taxonomic and dichotomus

taxonomic classification

taxonomic classification human

taxonomic genus of king cobra

taxonomic implementation

taxonomies of knowledge

taxonomies project roadmap

taxonomist job description

taxonomy metadata

taxonomy & metadata jumpstart - 2007

taxonomy and false drops

taxonomy and classifiation examples of animals

taxonomy and metadata

taxonomy and metadata jumpstart

taxonomy c

taxonomy classification

taxonomy classification charts

taxonomy community of practice

taxonomy consulting

taxonomy creation

taxonomy creation management

taxonomy defined

taxonomy deployment

taxonomy development process

taxonomy implementation

taxonomy iqpc

taxonomy job description

taxonomy maintenance

taxonomy management

taxonomy management job title

taxonomy management tools

taxonomy metadata

taxonomy models for project management

taxonomy of global executives

taxonomy of man

taxonomy search

taxonomy seth early

taxonomy structure business organisation

taxonomy training

taxonomy validation

taxonomy(2007)

taxonomy, mlis

taxonomy/classification.online

Page 108: Taxonomies and Search for Chicago SharePoint User Group

108Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Conclusions

• Search engines, no matter how sophisticated, do not obviate the need for taxonomies

• Content value in the context of a work process will determine the level of required structure

• There is no “one size fits all”

• Taxonomy, content strategy and search all work together to improve the findability of content.

• Google doesn’t always get it right…

Page 109: Taxonomies and Search for Chicago SharePoint User Group

109Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Earley & Associates: #1 on Google for Silver Mining Tools

Page 110: Taxonomies and Search for Chicago SharePoint User Group

110Copyright © 2011 Earley & Associates, Inc. All Rights Reserved.

Questions?

Jeff CarrSenior Information Architect & Search [email protected]

Seth [email protected] Follow me on twitter: sethearleyConnect with me on LinkedIn: www.linkedin.com/in/sethearley