70
Strategies LLC Taxonomy May 16, 2005 Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Taxonomy Governance Ron Daniel, Jr. & Joseph A. Busch Taxonomy Strategies LLC

Taxonomy Governance

Embed Size (px)

DESCRIPTION

Taxonomy Governance. Ron Daniel, Jr. & Joseph A. Busch Taxonomy Strategies LLC. Agenda. 1:30Welcome & Introductions 1:45Exercise: Taxonomy Revisions 2:15Fundamental Processes 2:30Governance Team Roles and Structures 3:00Tools 3:05Break 3:15Exercise: Organizational Self-Assessment - PowerPoint PPT Presentation

Citation preview

Page 1: Taxonomy Governance

Strategies LLCTaxonomy

May 16, 2005 Copyright 2005 Taxonomy Strategies LLC. All rights reserved.

Taxonomy Governance

Ron Daniel, Jr. & Joseph A. Busch

Taxonomy Strategies LLC

Page 2: Taxonomy Governance

2Taxonomy Strategies LLC The business of organized information

Agenda

1:30 Welcome & Introductions

1:45 Exercise: Taxonomy Revisions

2:15 Fundamental Processes

2:30 Governance Team Roles and Structures

3:00 Tools

3:05 Break

3:15 Exercise: Organizational Self-Assessment

3:30 Maturity Model

3:40 Designing and Building Maintainable Taxonomies & Metadata

4:00 Additional Processes

4:20 Q &A

4:30 Adjourn

Page 3: Taxonomy Governance

3Taxonomy Strategies LLC The business of organized information

Who we are: Joseph Busch

Over 25 years in the business of organized information Founder, Taxonomy Strategies Director, Solutions Architecture, Interwoven VP, Infoware, Metacode Technologies Program Manager, Getty Foundation Manager, Pricewaterhouse

Metadata and taxonomies community leadership President, American Society for Information Science & Technology Director, Dublin Core Metadata Initiative Adviser, National Research Council Computer Science and

Telecommunications Board Reviewer, National Science Foundation Division of Information and Intelligent

Systems Founder, Networked Knowledge Organization Systems/Services

Page 4: Taxonomy Governance

4Taxonomy Strategies LLC The business of organized information

Who we are: Ron Daniel, Jr.

Over 15 years in the business of metadata & automatic classification Principal, Taxonomy Strategies Standards Architect, Interwoven Senior Information Scientist, Metacode Technologies Technical Staff Member, Los Alamos National Laboratory

Metadata and taxonomies community leadership Chair, PRISM (Publishers Requirements for Industry Standard Metadata)

working group Acting chair: XML Linking working group Member: RDF working groups Co-editor: PRISM, XPointer, 3 IETF RFCs, and Dublin Core 1 & 2 reports.

Page 5: Taxonomy Governance

5Taxonomy Strategies LLC The business of organized information

Recent & current projects

Government Commodity Futures Trading Commission Defense Intelligence Agency ERIC Federal Aviation Administration Federal Reserve Bank of Atlanta Forest Service GSA Office of Citizen Services (

www.firstgov.gov) Head Start Infocomm Development Authority of

Singapore NASA (nasataxonomy.jpl.nasa.gov) Small Business Administration Social Security Administration USDA Economic Research Service USDA e-Government Program (

www.usda.gov)

Commercial Allstate Insurance Blue Shield of California Debevoise & Plimpton Halliburton Hewlett Packard Motorola PeopleSoft Pricewaterhouse Coopers Siderean Software Sprint Time Inc.

Commercial subcontracts Agency.com – Top financial services Critical Mass – Fortune 50 retailer Deloitte Consulting – Big credit card Gistics/OTB – Direct selling giant

NGO’s CEN IDEAlliance IMF OCLC

Page 6: Taxonomy Governance

6Taxonomy Strategies LLC The business of organized information

Participant Introductions

Who are you?

What do you do?

What brings you here today?

Page 7: Taxonomy Governance

7Taxonomy Strategies LLC The business of organized information

Agenda

1:30 Welcome & Introductions

1:45 Exercise: Taxonomy Revisions

2:15 Fundamental Processes

2:30 Governance Team Roles and Structures

3:00 Tools

3:05 Break

3:15 Exercise: Organizational Self-Assessment

3:30 Maturity Model

3:40 Designing and Building Maintainable Taxonomies & Metadata

4:00 Additional Processes

4:20 Q &A

4:30 Adjourn

Page 8: Taxonomy Governance

8Taxonomy Strategies LLC The business of organized information

Taxonomy Governance Overview

Is “Taxonomy Governance” synonymous with “Taxonomy Maintenance”?

What kinds of changes can be made, and what are their costs?

What kinds of information are needed to determine the changes?

What kind of group should maintain the taxonomy?

What kinds of rules should the group follow to decide on changes?

What should the group do beyond maintaining the taxonomy?

Page 9: Taxonomy Governance

9Taxonomy Strategies LLC The business of organized information

Exercise: Taxonomy Modifications

Divide into small groups

Review assigned sample taxonomy

Discuss changes you would make

In 10 minutes, a spokesperson will speak for the group and briefly: Tell us something good about the

taxonomy Characterize the short-term changes your

group would make Characterize the questions your group

would like answered before making other changes

Page 10: Taxonomy Governance

10Taxonomy Strategies LLC The business of organized information

Exercise Notes

Team Members:

Something good about the taxonomy:

Short term changes:

Questions for other changes:

Page 11: Taxonomy Governance

11Taxonomy Strategies LLC The business of organized information

Group 1 Sample Taxonomy

Page 12: Taxonomy Governance

12Taxonomy Strategies LLC The business of organized information

Group 2 Sample Taxonomy

Business / Accounting / Firms / Directories

Business / Biotechnology & Pharmaceuticals / Education & Training

Business / Employment / By Industry

Business / Healthcare / Employment / Regional

Business / Small Business / Finance / AccountingReference / Education / Colleges & Universities / North America / United States / Maryland / Columbia Union College / Athletics

Reference / Education / K-12 / Home Schooling / Unschooling / Chats and Forums

Regional / Europe / Ireland / Business & Economy / Employment / Health & Medical

Science / Math / Academic Departments / South America / Colombia

Science / Social Sciences / Linguistics / Translation / Associations

Society / People / Women / Science & Technology / Mathematics

Top Level

Random Samples of Detailed Categories

Page 13: Taxonomy Governance

13Taxonomy Strategies LLC The business of organized information

Group 3 Sample Taxonomy

Source: http://householdproducts.nlm.nih.gov/products.htm

Top Level

Detail in Auto Products Category

Page 14: Taxonomy Governance

14Taxonomy Strategies LLC The business of organized information

Predictions

Short-term changes will center on rules of style – ‘&’ vs. ampersand, capitalization, plurals

Faceted subdivision will only be suggested by experienced practitioners, by groups given low-level details of a taxonomy, or both. People will critique the UI Presentation

Questions for Long-term changes will focus, in decreasing order, on: Who are the users and what are they doing? What is the content and how much is in the various

categories? … What kind of money depends on the taxonomy, and what

kind of maintenance expenses are justified?

Anything else people want to cover?

Editorial Rules

Metadata Specification,

Design for maintainability

How to put it into action?

User Characterization

Content and Metadata

Maintenance

ROI

Page 15: Taxonomy Governance

15Taxonomy Strategies LLC The business of organized information

Agenda

1:30 Welcome & Introductions

1:45 Exercise: Taxonomy Revisions

2:15 Fundamental Processes

2:30 Governance Team Roles and Structures

3:00 Tools

3:05 Break

3:15 Exercise: Organizational Self-Assessment

3:30 Maturity Model

3:40 Designing and Building Maintainable Taxonomies & Metadata

4:00 Additional Processes

4:20 Q &A

4:30 Adjourn

Page 16: Taxonomy Governance

16Taxonomy Strategies LLC The business of organized information

Fundamental Processes

What are the two fundamental processes every organization should implement to maintain its metadata and taxonomies? Query log / Click trail examination Tagging Error Correction

What are the key outlooks a taxonomist should try to instill in their organization?

Page 17: Taxonomy Governance

17Taxonomy Strategies LLC The business of organized information

Fundamental Process #1 – Query Log Examination

How can we characterize users and what they are looking for?

Query Log & Click Trail Examination Sophisticated software available,

but don’t wait. 80/20 Rule – 80% of value from

20% of possible reports.

Greatest value comes from: Identifying a person as responsible

for search quality Starting a “Measure & Improve”

mindset

Greatest challenge: Getting a person assigned (≥ 10%) Getting logs turned back on What to do after the obvious fixes

have been made

UltraSeek Reporting

• Top queries • Queries with no

results • Queries with no

click-through • Most requested

documents • Query trend

analysis • Complete server

usage summary Click Trail Packages

iWebTrackNetTrackerOptimalIQ

SiteCatalystVisitorvilleWebTrends

Page 18: Taxonomy Governance

18Taxonomy Strategies LLC The business of organized information

Fundamental Process #2 – Tagging Error Correction

For the Taxonomy to be used, its values must be associated with content. We will refer to this as “Tagging”.

Errors will happen, and some will be found. What are you going to do about them?

Define an error correction process. Process will accommodate questions like:

Is it an error? What is the cost to correct or not correct? Does the correction need to be scheduled? etc.

Once an error is corrected, NEVER lose that fact. Manually reviewed pages are vital for training automatic classifiers. Has implications for metadata specification and review procedures.

Over time, multiple error detection methods will be defined. e.g. Statistical sampling of newly added pages Gradually, additional error correction processes may be defined to deal

with particular types of errors.

Page 19: Taxonomy Governance

19Taxonomy Strategies LLC The business of organized information

Fundamental Outlooks

How are we going to build and maintain metadata structures and controlled vocabularies? The taxonomy problem

How are we going to populate metadata elements with complete and consistent values? The tagging problem

How are we then going to use metadata in applications and demonstrate benefits? The ROI problem

Taxonomy Governance is a standards process.

Take tips from other standards efforts Team, with comment-handling

responsibilities and an appeals process

Issue Logs Announcements Release Schedule

Foster a “Measure & Improve” MindsetMust know this to

address other problems!

Page 20: Taxonomy Governance

20Taxonomy Strategies LLC The business of organized information

Agenda

1:30 Welcome & Introductions

1:45 Exercise: Taxonomy Revisions

2:15 Fundamental Processes

2:30 Governance Team Roles and Structures

3:00 Tools

3:05 Break

3:15 Exercise: Organizational Self-Assessment

3:30 Maturity Model

3:40 Designing and Building Maintainable Taxonomies & Metadata

4:00 Additional Processes

4:20 Q &A

4:30 Adjourn

Page 21: Taxonomy Governance

21Taxonomy Strategies LLC The business of organized information

Taxonomy Business Processes

Taxonomies must change, gradually, over time if they are to remain relevant

Maintenance processes need to be specified so that the changes are based on rational cost/benefit decisions

A team will need to maintain the taxonomy on a part-time basis

Taxonomy team reports to some other steering committee

Page 22: Taxonomy Governance

22Taxonomy Strategies LLC The business of organized information

Published CVs and STs

Consuming Applications

Syndicated Terminologies

IntranetSearch

’’

Web CMS

Archives

ERMS

Custodians

Notifications

Change Requests & Responses

ISO3166-1

Other External

ERP

Other Internal

Vocabulary Management

System

Other Controlled

Items

’’

Intranet Nav.

DAM

Definitions about the Controlled Vocabulary Governance Environment

Controlled Vocabulary Governance Environment

2: CV Team decides when to update CVs

3: Team adds value via mappings, translations, synonyms, training materials, etc.

1: Syndicated Terminologies change on their own schedule

4: Updated versions of CVs published to consuming applications

CVs

Page 23: Taxonomy Governance

23Taxonomy Strategies LLC The business of organized information

Other Controlled Items

Taxonomy Team will have additional items to manage: Charter, Goals, Performance Measures Editorial rules Team processes Tagger training materials (manual and automatic) Outreach & ROI

Communication plan Website Presentations Announcements

Roadmap

Page 24: Taxonomy Governance

24Taxonomy Strategies LLC The business of organized information

Taxonomy governance | Generic team charter

Taxonomy Team is responsible for maintaining: The Taxonomy, a multi-faceted classification scheme Associated taxonomy materials, such as:

Editorial Style Guide Taxonomy Training Materials Metadata Standard Team rules and procedures (subject to CIO review)

Team evaluates costs and benefits of suggested change Taxonomy Team will:

Manage relationship between providers of source vocabularies and consumers of the Taxonomy

Identify new opportunities for use of the Taxonomy across the Enterprise to improve information management practices

Promote awareness and use of the Taxonomy

Page 25: Taxonomy Governance

25Taxonomy Strategies LLC The business of organized information

Editorial Rules

To ensure consistent style, rules are needed

Issues commonly addressed in the rules: Sources of Terms Abbreviations Ampersands Capitalization Continuations (More… or Other…) Duplicate Terms Hierarchy and Polyhierarchy Languages and Character Sets Length Limits “Other” – Allowed or Forbidden? Plural vs. Singular Forms Relation Types and Limits Scope Notes Serial Comma Spaces Synonyms and Acronyms Term Order (Alphabetic or …) Term Label Order (Direct vs. Inverted)

Must also address issue of what to do when rules conflict – which are more important?

Rule Name Editorial Rule

Use Existing Vocabularies

Other things being equal, reusing an existing vocabulary is preferred to creating a new one.

Ampersands The character '&' is preferred to the word ‘and’ in Term Labels.Example: Use Type: “Manuals & Forms”, not “Manuals and Forms”.

Special Characters

Retain accented characters in Term Labels.Example: España

Serial comma If a category name includes more than two items, separate the items by commas. The last item is separated by the character ‘&’ which IS NOT preceded by a comma.Example: “Education, Learning & Employment”, not “Education, Learning, & Employment”.

Capitalization Use title case (where all words except articles are capitalized).Example: “Education, Learning & Employment”NOT “Education, learning & employment”NOT “EDUCATION, LEARNING & EMPLOYMENT”NOT “education, learning & employment”

… …

Page 26: Taxonomy Governance

26Taxonomy Strategies LLC The business of organized information

Roles in Two Taxonomy Governance Teams

Executive Sponsor Advocate for the taxonomy team

Business Lead Keeps team on track with larger business

objectives Balances cost/benefit issues to decide

appropriate levels of effort Specialists help in estimating costs

Obtains needed resources if those in team can’t accomplish a particular task

Technical Specialist Estimates costs of proposed changes in

terms of amount of data to be retagged, additional storage and processing burden, software changes, etc.

Helps obtain data from various systems

Content Specialist Team’s liaison to content creators Estimates costs of proposed changes in

terms of editorial process changes, additional or reduced workload, etc.

Small-scale Metadata QA Responsibility

Taxonomy Specialist Suggests potential taxonomy changes based on

analysis of query logs, indexer feedback Makes edits to taxonomy, installs into system

with aid of IT specialist

Content Owner Reality check on process change suggestions

Business Lead Custodians

Responsible for content in a specific CV. Training Representative

Develops communications plan, training materials

Work Practices Representative Develops processes, monitors adherence

IT Representative Backups, admin of CV Tool

Info. Mgmt. Representative Provides CV expertise, tie-in with larger IM effort

in the organization.

Team structure at a different org.

Page 27: Taxonomy Governance

27Taxonomy Strategies LLC The business of organized information

Taxonomy governance | Where changes come from

experience

End User

Firewall

Taxonomy

Content TaggingLogic

ApplicationUI

TaggingUI

Tagging Staff

Taxonomy Editor

Staff notes

‘missing’concepts

Query log analysis

Requests from other parts of NASA

experience

End User

Taxonomy Team

FirewallFirewall

Taxonomy

Content TaggingLogic

TaggingLogic

ApplicationUI

ApplicationUI

TaggingUI

TaggingUI

Tagging Staff

Taxonomy Editor

Staff notes

‘missing’concepts

Query log analysis

Requests from other parts of the organization

Team considerations

1. Business goals

2. Changes in user experience

3. Retagging cost

Recommendations by Editor

1. Small taxonomy changes (labels, synonyms)

2. Large taxonomy changes (retagging, application changes)

3. New “best bets” content

Application Logic

Page 28: Taxonomy Governance

28Taxonomy Strategies LLC The business of organized information

Processes

Different organizations will need to consider their own change processes. Organization 1: A custodian is

responsible for the content, but checks facts with department heads before making changes.

Organization 2: Analysts suggest changes, editors approve, copyeditors verify consistency.

Change process MUST also consider cost of implementing the change Retagging data Reconfiguring auto-classifier Retraining staff Changes in user expectations

Taxonomy Change CasesCase 1. Renaming a term

Case 2. Adding a new leaf term

Case 3. Inserting a new term

Case 4. Splitting a term

Case 5. Deleting a leaf term or subtree

Case 6. Deleting a term

Case 7. Moving a subtree

Case 8. Merging terms

Case 9. Adding a CV

Case 10. Deleting a CV

Page 29: Taxonomy Governance

29Taxonomy Strategies LLC The business of organized information

Taxonomy governance | Taxonomy maintenance workflow

Analyst Editor

Problem?

Copywriter

Problem?

Yes

Yes No

No

Suggest new name/category

Review new name

Taxon-omy

Taxonomy Tool

Copy edit new name

Add to enterprise Taxonomy

Sys Admin

Page 30: Taxonomy Governance

30Taxonomy Strategies LLC The business of organized information

Agenda

1:30 Welcome & Introductions

1:45 Exercise: Taxonomy Revisions

2:15 Fundamental Processes

2:30 Governance Team Roles and Structures

3:00 Tools

3:05 Break

3:15 Exercise: Organizational Self-Assessment

3:30 Maturity Model

3:40 Designing and Building Maintainable Taxonomies & Metadata

4:00 Additional Processes

4:20 Q &A

4:30 Adjourn

Page 31: Taxonomy Governance

31Taxonomy Strategies LLC The business of organized information

Taxonomy editing tools vendors

Abi

lity

to E

xecu

telo

whi

gh

Completeness of VisionVisionariesNiche Players

Widely used, cheap, single-user

High functionality, high cost ($100k!)

Most popular taxonomy editor? MS

Excel

Immature industry – no vendors in upper-right quadrant!

Page 32: Taxonomy Governance

32Taxonomy Strategies LLC The business of organized information

Sample Taxonomy Editor Functionality

Standard and Custom Fields

Standard and Custom Relations Data Typing, Restrictions, and

Inference

Flexible Reporting

Flexible Importing

Multiple Vocabulary Support

Inter-Vocabulary Relations

Unique IDs ISO Codes not sufficient

Workflow Voting Change Request Management

Programmability Hierarchy

Browser

Term Editing

Page 33: Taxonomy Governance

33Taxonomy Strategies LLC The business of organized information

Where do I put the metadata?

Where can I store metadata? In the content – HTML Headers, File properties, etc. In a centralized repository – Search index, MDDB, etc. In multiple systems – Common case

Where should I store metadata? Consultant’s answer – “It depends.” If you are moving files through a process, putting it in the file keeps it

from getting dropped at system borders. If you are doing search across multiple documents, it has to be at

least copied out of the files. If you make copies of files and modify them, consistent in-file

metadata will be impossible.

Real question is not where to STORE the metadata, it is how to MAINTAIN the metadata. Web CMS as an example. Central Metadata Database is a very advanced practice.

Page 34: Taxonomy Governance

34Taxonomy Strategies LLC The business of organized information

Agenda

1:30 Welcome & Introductions

1:45 Exercise: Taxonomy Revisions

2:15 Fundamental Processes

2:30 Governance Team Roles and Structures

3:00 Tools

3:05 Break

3:15 Exercise: Organizational Self-Assessment

3:30 Maturity Model

3:40 Designing and Building Maintainable Taxonomies & Metadata

4:00 Additional Processes

4:20 Q &A

4:30 Adjourn

Page 35: Taxonomy Governance

35Taxonomy Strategies LLC The business of organized information

Agenda

1:30 Welcome & Introductions

1:45 Exercise: Taxonomy Revisions

2:15 Fundamental Processes

2:30 Governance Team Roles and Structures

3:00 Tools

3:05 Break

3:15 Exercise: Organizational Self-Assessment

3:30 Maturity Model

3:40 Designing and Building Maintainable Taxonomies & Metadata

4:00 Additional Processes

4:20 Q &A

4:30 Adjourn

Page 36: Taxonomy Governance

36Taxonomy Strategies LLC The business of organized information

What Processes Should I Try to Institute?

Processes will vary from one organization to another.

Assessing the Organization’s state is the first step.

Determining the ROI and potential resources follows.

Plan on instituting processes over time, beginning with basic ones.

Page 37: Taxonomy Governance

37Taxonomy Strategies LLC The business of organized information

Search and Metadata Self-Assessment Form

Background1) Rate your organization’s search &

metadata maturity from 1 to 10.

2) What was the most recent change to your organization’s search & metadata processes?

3) What is the next step for your organization’s search & metadata processes?

Basic4) Is there a process in place to examine

query logs?

5) Is there an organization-wide metadata standard, such as an extension of the Dublin Core, for use by search tools, multiple repositories, etc.?

Intermediate6) Is there an ongoing data cleansing

procedure to look for ROT (Redundant, Obsolete, Trivial content)? If so, describe briefly.

7) Does the search engine index more than 4 repositories around the organization?

8) Are system features and metadata fields added based on cost/benefit analysis, rather than things that are easy to do with the current tools?

9) Are tools only acquired after requirements have been analyzed, or are major purchases sometimes made to use up year-end money?

10) Are there hiring and training practices especially for metadata and taxonomy positions? If so, describe briefly.

Advanced11) Are there established qualitative and

quantitative measures of metadata quality? If so, describe briefly.

12) Can the CEO explain the ROI for search and metadata?

Optional13) Your name:

14) Organization:

15) E-mail:

Contact information will not be used for marketing purposes. It will only be used to follow-up and clarify issues around the survey.

Page 38: Taxonomy Governance

38Taxonomy Strategies LLC The business of organized information

Agenda

1:30 Welcome & Introductions

1:45 Exercise: Taxonomy Revisions

2:15 Fundamental Processes

2:30 Governance Team Roles and Structures

3:00 Tools

3:05 Break

3:15 Exercise: Organizational Self-Assessment

3:30 Maturity Model

3:40 Designing and Building Maintainable Taxonomies & Metadata

4:00 Additional Processes

4:20 Q &A

4:30 Adjourn

Page 39: Taxonomy Governance

39Taxonomy Strategies LLC The business of organized information

Metadata Maturity Model

Taxonomy governance processes must fit the organization

As consultants, we notice different levels of maturity in the business processes around Content Management, Taxonomy, and Metadata

Honestly assess your organization’s metadata maturity in order to design appropriate governance processes

We are starting to define a maturity model, similar to the CMMI model in the software world.

Page 40: Taxonomy Governance

40Taxonomy Strategies LLC The business of organized information

Metadata Maturity Model

Process Areas Maturity Levels Limiting Processes

Basic Intermediate Advanced Bleeding Edge

Search Capabilities Uniform Search BoxQuery Log Exam.

Index MultipleBest BetsSimple Grouping

Intranet Facet NavigationImproved Ranking

Metadata and taxonomy standards

System MD Stds. Organization MD Std.Reuse ERP

Multiple Repos. ComplyTaxonomy Roadmap

Highly Abstract Subject Taxonomies

Tools and tool selection Requirements, then Tools

Bakeoff Datasets Budget for Bakeoffs

Unneeded Capabil.Tools, then Reqs.

Staff training and hiring Search Analyst Role

Librarian Expertise Pre-hire Testing SME Catalogers

Data creation and QA CM Introduced ROT-Elimination Hybrid Creation Model

Adaptive QualificationQuality Measures

Project management Project Plan Std. Proj. Methodol.X-Functional TeamsCommunication PlanMulti-Year Plan

Early Termination

Executive support and ROI

External Search ROI

Intranet ROI Model CEO knows Search ROI

Use it or Lose It Budgets

Shameless Plug: Tomorrow Morning at 9:45

Call for Data: Leave Self-Assessments with us

Page 41: Taxonomy Governance

41Taxonomy Strategies LLC The business of organized information

Purpose of Maturity Model

Estimating the maturity of an organization’s information management processes tells us: How involved the taxonomy development and maintenance process

should be Overly sophisticated processes will fail

What to recommend as next steps

Maturity is not a goal, it is a characterization of an organization’s methods for achieving particular goals.

Mature processes have expenses which must be justified by consequent cost savings or revenue gains.

Metadata Maturity may not be core to your business.

Page 42: Taxonomy Governance

42Taxonomy Strategies LLC The business of organized information

Agenda

1:30 Welcome & Introductions

1:45 Exercise: Taxonomy Revisions

2:15 Fundamental Processes

2:30 Governance Team Roles and Structures

3:00 Tools

3:05 Break

3:15 Exercise: Organizational Self-Assessment

3:30 Maturity Model

3:40 Designing and Building Maintainable Taxonomies & Metadata

4:00 Additional Processes

4:20 Q &A

4:30 Adjourn

Page 43: Taxonomy Governance

43Taxonomy Strategies LLC The business of organized information

Overview of Best Practices in Metadata and Taxonomy

Avoid monolithic ‘subject’ taxonomies May have a browsing taxonomy constructed from combined facets.

Use (or map to) Dublin Core for basic information. Extend with custom elements for specific facts. Use pre-existing, standard, vocabularies as much as possible.

Validate author names with LDAP directory ISO country codes for locations Product & service info from ERP system

Designate a team to manage the taxonomies and related materials Taxonomy Editorial Rules, Processes, Training materials, Outreach & ROI

Design a Metadata QC Process Start with an error-correction process, then get more formal on error

detection. In the future, large-scale ontologies like CYC may be valuable in

automated error detection.

Page 44: Taxonomy Governance

44Taxonomy Strategies LLC The business of organized information

Factor “Subject” into smaller facets

Size DMOZ tries to organize all

web content, has more than 600k categories!

Difficulty in navigating, maintaining

Hidden facet structure “Classification Schemes” vs.

“Taxonomies”

Page 45: Taxonomy Governance

45Taxonomy Strategies LLC The business of organized information

Sources for 7 common vocabularies

Vocabulary Definition Potential Sources

Organization Organizational structure. FIPS 95-2, U.S. Government Manual, Your organizational structure, competitors, partners, regulators, etc.

Content Type Structured list of the various types of content being managed or used.

DC Types, AGLS Document Type, AAT Information Forms , Records management policy, etc.

Industry Broad market categories such as lines of business, life events, or industry codes.

FIPS 66, SIC, NAICS, etc.

Location Place of operations or constituencies.

FIPS 5-2, FIPS 55-3, ISO 3166, UN Statistics Div, US Postal Service, etc.

Topic Business topics relevant to your mission and goals.

Federal Register Thesaurus, NAL Agricultural Thesaurus, LCSH, etc.

Audience Subset of constituents to whom a piece of content is directed or intended to be used.

GEM, ERIC Thesaurus, IEEE LOM, etc.

Products and Services

Names of products/programs & services.

ERP system, Your products and services, etc.

Function Functions and processes performed to accomplish mission and goals.

FEA Business Reference Model, Enterprise Ontology, AAT Functions, etc.

Page 46: Taxonomy Governance

46Taxonomy Strategies LLC The business of organized information

Facet Principles

Basic facets with identified items – people, places, projects, instruments, missions, organizations, … Note that these are not subjective “subjects”, they are objective “objects”.

Subjective views can be laid on top of the objective facts, but should be in a different namespace so they are clearly distinguishable. For example, labels like “Anarchist” or “Prime Minister” can be

applied to the same person at different times (e.g. Nelson Mandela).

Page 47: Taxonomy Governance

47Taxonomy Strategies LLC The business of organized information

Iterative Development Vision (More participants and tagged content at each iteration)

1 Identify Objectives

Interview core team and stakeholders

2 Inventory Content

ID sources, spider assets & extract

metadata

Define fields & purpose

3 Specify Metadata

4 Model Content

Define content chunks & XML

DTDs

5 Specify Vocabularies

Compile controlled

vocabularies

6 Specify Procedures

Start with UI sketches,

off-the-shelf rules.

7 Train StaffManually tag small sample

Review tagged samples, default

procedures

Gather additional sources, if

any

Revise if needed, bake

into alpha CMS

Revise if needed, bake into alpha CMS

Revise, use in alpha CMS

alpha workflows in CMS

Use alpha CMS to tag larger

sample

Interview alpha users

Modify CMS for beta

Modify CMS for beta

Revise, use in beta CMS

Modify & extend workflows

Finalize training materials & train

staff

Gather additional sources, if

any

Tailor the default

materials

Use beta CMS to tag larger

sample

Interview beta users

Modify for 1.0

Modify for 1.0

Revise using team

procedure

Finalize procedure materials

Plan & Prototype Alpha Dev & Test Beta D&T Final D&TProject Team Stakeholders and SMEs Friendly Users Audiences

StageParticipants

Page 48: Taxonomy Governance

48Taxonomy Strategies LLC The business of organized information

Planning for Taxonomy Changes

Error Correction – What to do when end-users and tagging staff notice problems? Provide for it in the Error Correction Process Add Query Log Analysis to help detect user problems How to answer questions re. things to add, delete, or rearrange in

the taxonomy? Keep a visible issue log Discuss with SMEs, tag samples, use other testing methods

Per-facet changes: Corporate reorganizations, Product lineup changes, Country splits

& merges, … will happen. Prepare for them when deploying those facets

Long-term – what facets to create, when, and why See Taxonomy Roadmap section

Page 49: Taxonomy Governance

49Taxonomy Strategies LLC The business of organized information

Agenda

1:30 Welcome & Introductions

1:45 Exercise: Taxonomy Revisions

2:15 Fundamental Processes

2:30 Governance Team Roles and Structures

3:00 Tools

3:05 Break

3:15 Exercise: Organizational Self-Assessment

3:30 Maturity Model

3:40 Designing and Building Maintainable Taxonomies & Metadata

4:00 Additional Processes Brief remarks on Measurements, ROI, Training, Roadmap

4:20 Q &A

4:30 Adjourn

Page 50: Taxonomy Governance

50Taxonomy Strategies LLC The business of organized information

Measuring Metadata and Taxonomy Quality

Taxonomy development is an iterative process

Develop an organizational idea, then test it by tagging sample content

Elicit feedback via walk-throughs and card sorting exercises

Use both qualitative and quantitative methods Time, budget, and availability of tagged data will

determine what methods are possible.

Page 51: Taxonomy Governance

51Taxonomy Strategies LLC The business of organized information

Taxonomy testing | Qualitative methods

Method Process Validation

Walk-throughs Show and explain Approach

Consistency to rules

Accuracy (SME Checking)

Appropriateness to task

Usability Testing Card sorting,

Contextual analysis

Repeatability of user classification

Tasks are completed successfully

Time to complete task is reduced

User Satisfaction Survey Reaction to new interface

Reaction to search results

Tagging samples Tag sample content with taxonomy

Content ‘fit’

Fills out content inventory

Training materials for people & algorithms

Basis for quantitative methods

Include sample pages in walkthroughs, not just the

hierarchy.

Page 52: Taxonomy Governance

52Taxonomy Strategies LLC The business of organized information

Tagged Samples

The Taxonomy must fit the content.

How to verify this? Tag samples! Spreadsheets are a convenient tool

for this. URLs, drop-down choosers, text notes all allowed.

Team can review tagged samples when reviewing taxonomy More sophisticated teams may test

inter-cataloger agreement

Samples should appear in training materials for tagging staff Show typical and unusual cases.

Samples are used to define training sets for automatic classifiers.

Metadata Element

Metadata Value

URL sixbits.atl.frb.org/invoke.cfm?objectid=A01B30D1-10C2-11D6-981100508B104751&method=display

Headline Innovation Awards

Organization Federal Reserve Bank of Atlanta

Content Type Honors & Awards

Subject Salary & Compensation?

DOCUMENT URL FACET A FACET B FACET C FACET D MISSING IDEAS

Page 53: Taxonomy Governance

53Taxonomy Strategies LLC The business of organized information

Quantitative Method | How evenly does it divide the content?

Background: Documents do not distribute uniformly

across categories Zipf (1/x) distribution is expected

behavior 80/20 rule in action (actually 70/20 rule)

Methodology: Part of alpha test of ‘content type’ for

corporate intranet 115 URLs selected at random from

search index were manually categorized. Inaccessible files and ‘junk’ were removed

Results: Results were slightly more uniform than

the Zipf distribution, which is better than expected

Measured and Expected Distribution of Content Types in an Intranet

0

5

10

15

20

25

Peo

ple,

Gro

ups

& P

lace

s

New

s &

Eve

nts

Man

uals

&Le

arni

ngM

ater

ials

Ope

ratio

ns &

Inte

rnal

Com

mun

icat

ions

Mar

ketin

g &

Sal

es

Reg

ulat

ions

,P

olic

ies,

Pro

cedu

res

&

Pap

ers

&P

rese

ntat

ions

Oth

er &

Unc

lass

ified

Pro

gram

s,P

ropo

sals

, P

lans

& S

ched

ules

Content Type

# D

ocu

men

ts

Measured

Expected

Measured and Expected Distribution of Top 10 Content Types in Library of Congress Database

0

50,000

100,000

150,000

200,000

250,000

300,000

350,000

Congre

sses

Biogra

phy

Period

icals

Map

s

Fiction

Exhib

itions

Juve

nile l

itera

ture

Bibliog

raph

y

Statis

tics

Top 10 Content Types

Nu

mb

er o

f R

eco

rds

Series2

Series1

Page 54: Taxonomy Governance

54Taxonomy Strategies LLC The business of organized information

Quantitative Method | How intuitive (repeatable) are the categorizations?

Methodology: Closed Card Sort For alpha test of a grocery site 15 Testers put each of 100 best-

selling products into one of 10 pre-defined categories

Categories where fewer than 14 of 15 testers put product into same category were flagged

Results:

% of Testers

Cumulative % of Products

15/15 54%

14/15 70%

13/15 77%

12/15 83%

11/15 85%

<11/15 100%

In the trade, “Corn Tortillas” are a Dairy item!

“Cocoa Drinks – Powder” is best categorized in both

“Beverages” and “Grocery”.

Page 55: Taxonomy Governance

55Taxonomy Strategies LLC The business of organized information

Quantitative Method | How does taxonomy “shape” match that of content?

Term Group % Terms

% Docs

Administrators 7.8 15.8

Community Groups 2.8 1.8

Counselors 3.4 1.4

Federal Funds Recipients and Applicants

9.5 34.4

Librarians 2.8 1.1

News Media 0.6 3.1

Other 7.3 2.0

Parents and Families 2.8 6.0

Policymakers 4.5 11.5

Researchers 2.2 3.6

School Support Staff 2.2 0.2

Student Financial Aid Providers

1.7 0.7

Students 27.4 7.0

Teachers 25.1 11.4

Source: Courtesy Keith Stubbs, US. Dept. of Education

Background: Hierarchical taxonomies allow

comparison of “fit” between content and taxonomy areas

Methodology: 25,380 resources tagged with

taxonomy of 179 terms. (Avg. of 2 terms per resource)

Counts of terms and documents summed within taxonomy hierarchy

Results: Roughly Zipf distributed (top 20

terms: 79%; top 30 terms: 87%) Mismatches between term% and

document% flagged

Page 56: Taxonomy Governance

56Taxonomy Strategies LLC The business of organized information

Taxonomy ROI

What level of effort in taxonomy creation and maintenance is justified?

Page 57: Taxonomy Governance

57Taxonomy Strategies LLC The business of organized information

Fundamentals of Taxonomy ROI

Building and maintaining a taxonomy, and tagging data with it, are costs not benefits.

There is no benefit without exposing the tagged data to users in some way that cuts costs or improves revenues.

Putting a new taxonomy into operation requires UI changes and/or backend system changes.

You need to determine those changes, and their costs, as part of the taxonomy ROI.

Page 58: Taxonomy Governance

58Taxonomy Strategies LLC The business of organized information

Common Taxonomy ROI Scenarios

Catalog site - ROI based on increased sales through improved product findability product cross-sells and up-sells customer loyalty

Call center - ROI based on cutting costs through fewer customer calls due to improved website self-service faster, more accurate CSR responses through better information access

Knowledge worker productivity - ROI based on cutting costs through less time searching for things less time recreating existing materials, with knock-on benefits of less

confusion and reduced storage and backup costs

Executive mandate No ROI at the start, just someone with a vision and the budget to make it

happen.

Page 59: Taxonomy Governance

59Taxonomy Strategies LLC The business of organized information

Tagging and Training

How are we going to populate metadata elements with complete and consistent values? The tagging problem

How are we going to get people (and/or software) to assign consistent, and accurate, metadata to the content? The tagger training problem

Page 60: Taxonomy Governance

60Taxonomy Strategies LLC The business of organized information

Taxonomy governance: Workflow-driven metadata tagging

Compose in Template

Submit to CMS

Analyst Editor

Review content

Problem?

Copywriter

Copy Edit content

Problem?Hard Copy

Web site

Yes

Yes No

No

Approve/Edit metadata

Automatically fill-in metadata

Tagging Tool Sys Admin

Tagging Process Doesn’t Stop Here!

Page 61: Taxonomy Governance

61Taxonomy Strategies LLC The business of organized information

Training Taxonomy Editors and Tagging Staff

Staff will require training on The structure of the taxonomy The UI they use to tag the

content The rules to follow when deciding

what codes to apply The end-effect of the codes they

apply – have a running prototype or QA environment.

Tagging examples come from samples tagged during taxonomy development.

Hardcopies of the taxonomy, and yellow highlighters, are helpful during training.

Indexing rulesRule Description

Specificity rule

Apply the most specific terms when tagging assets. Specific terms can always be generalized, but generic terms cannot be specialized.

Repeatable rule

All attributes should be repeatable. Use as many terms as necessary to describe What the asset is about and Why it is important. Storage is cheap. Re-creating content is expensive.

Appropriateness rule

Not all attributes apply to all assets. Only supply values for attributes that make sense.

Usability rule

Anticipate how the asset will be searched for in the future, and how to make it easy to find it. Remember that search engines can only operate on explicit information.

Indexing UI

Page 62: Taxonomy Governance

62Taxonomy Strategies LLC The business of organized information

Tagging tool example—Interwoven MetaTagger

Manual form fill-in w/ check boxes, pull-down lists, etc.

Auto keyword & summarization

Auto-categorization

Parse & lookup (recognize names)

Rules & pattern matching

Page 63: Taxonomy Governance

63Taxonomy Strategies LLC The business of organized information

Taxonomy Roadmap

How to plan for long-term taxonomy development projects?

Page 64: Taxonomy Governance

64Taxonomy Strategies LLC The business of organized information

Taxonomy Roadmap

Most organizations require a phased implementation of an Enterprise Taxonomy

A Taxonomy Roadmap defines the facets to be developed, their timing, and the reasons why

Factors to consider in prioritizing the facets include: Immediacy of application – how will the taxonomy be put into use? A

Search Engine? Portal Navigation? Other? How long will that take? Impact – How many users will a facet help? How big of a help will it be? Ease of development – does the vocabulary exist, can it be bought, or

must it be developed? How big and complex will it be? How often will it change? Are there tools to help manage taxonomy changes or must those be acquired too?

What data must be tagged for that? What are the requirements on the metadata’s density and accuracy? Can those be met with automatic methods, or will more extensive human involvement be needed?

Staff expertise and Team experience.

Page 65: Taxonomy Governance

65Taxonomy Strategies LLC The business of organized information

Roadmap: Dependencies

Roadmap requires an organization plan their projects well in advance, so that upcoming projects can be influenced by the taxonomy Consequently, this is an

advanced practice

Roadmap prioritizes vocabularies according to benefit, cost, and fit with projects.

Governance Team is responsible for maintaining the Roadmap and the necessary outreach.

Page 66: Taxonomy Governance

66Taxonomy Strategies LLC The business of organized information

Roadmap: Facet Prioritization Matrix

Facet Description Impact Effort to create/

maintain CV

Effort to tag

Language* Languages supported by portal Medium (High impact for subset)

Done/Low Low

Format File format (PDF, doc, html, etc…)

Low Low/Low Low

Location* Geo, region, country, site Med-High Done/Low Medium

Content Type Also referred to as genre (news, policy, checklist, form, etc…)

Medium Medium/Low Medium

Organization Publishing organization that owns content

Medium Medium/High Medium

Subject Also referred to as topic (benefits, travel, etc…)

High High/High Medium

Products & Services Corporate product and service offerings

Medium High/High High

Role (level of responsibility)*

Manager, employee, non-employee

High (In use on portal, but search has limited access to secure content)

Done/Low High

Access Control Organization as audience Low Medium/High High

* Facets already in existence in client’s Intranet

Page 67: Taxonomy Governance

67Taxonomy Strategies LLC The business of organized information

Roadmap: Timeline

Language Search

FY04Q2 FY04Q3 FY04Q4 FY05Q1 FY05Q2 FY05Q3 FY05Q4

Organization Search & Org Chart UI

Location (Country)

Search?Index

Taxonomy Tool Projects

Format Search

Content Type Search

Location (Region) Search

Subject Search & Portal Nav

Products/ Services

Search &Index

Access Control

CM?

CM?

Index

Role Search?

Auto-Classification Tool

Timeline lists the facets to be developed, and when those

development efforts start and end.

Timeline shows what projects will make use of the facet, and

how long that should take.

Intermediate and related projects are also shown.Intermediate and related projects are also shown.

Page 68: Taxonomy Governance

68Taxonomy Strategies LLC The business of organized information

Agenda

1:30 Welcome & Introductions

1:45 Exercise: Taxonomy Revisions

2:15 Fundamental Processes

2:30 Governance Team Roles and Structures

3:00 Tools

3:05 Break

3:15 Exercise: Organizational Self-Assessment

3:30 Maturity Model

3:40 Designing and Building Maintainable Taxonomies & Metadata

4:00 Additional Processes

4:20 Q &A

4:30 Adjourn

Page 69: Taxonomy Governance

69Taxonomy Strategies LLC The business of organized information

Agenda

1:30 Welcome & Introductions

1:45 Exercise: Taxonomy Revisions

2:15 Fundamental Processes

2:30 Governance Team Roles and Structures

3:00 Tools

3:05 Break

3:15 Exercise: Organizational Self-Assessment

3:30 Maturity Model

3:40 Designing and Building Maintainable Taxonomies & Metadata

4:00 Additional Processes

4:20 Q &A

4:30 Adjourn

Page 70: Taxonomy Governance

Strategies LLCTaxonomy

May 16, 2005 Copyright 2005 Taxonomy Strategies LLC. All rights reserved.

Contact Info

Ron Daniel, Jr.

925-368-8371

[email protected]

Joseph Busch

415-377-7912

[email protected]