Busch Slides

Embed Size (px)

Citation preview

  • 8/6/2019 Busch Slides

    1/126

    Strategies LLCTaxonomy

    6-15 June 2007 Copyright 2007 Taxonomy Strategies LLC. All rights reserved.

    Taxonomy & metadata

    strategies for effectivecontent management

    Melbourne, Sydney, Canberra

    Masterclass

  • 8/6/2019 Busch Slides

    2/126

    2Taxonomy Strategies LLC The business of organized

    Todays agenda

    9:00-9:10 10 minIntroduction

    9:10-9:15 5 minWarm-up exercise9:15-9:45 30 minTaxonomy fundamentals: Building taxonomies

    9:45-10:00 15 minTaxonomy exercise

    10:00-10:30 30 minTaxonomy fundamentals: Taxonomy business case

    10:30-11:00 30 minTea Break

    11:00-12:00 60 minTaxonomy governance

    12:00-12:30 30 minCapabilities self-assessment

    12:30-13:30 60 minLunch

    13:30-14:30 60 minTaxonomy benchmarking

    14:30-14:45 15 minBenchmarking exercise14:45-15:15 30 minTea Break

    15:15-16:15 60 minContent tagging

    16:15-16:30 15 minTagging exercise

    16:30-17:00 30 minQ&A

  • 8/6/2019 Busch Slides

    3/126

  • 8/6/2019 Busch Slides

    4/126

    4Taxonomy Strategies LLC The business of organized

    What we do

    Organize Stuff

  • 8/6/2019 Busch Slides

    5/126

    5Taxonomy Strategies LLC The business of organized

    For us, taxonomy work includes:

    y

    Metadata specification definesthe properties needed todescribe content so that it canbe found & used.

    y Vocabularies are collections of

    terms that are used to specifysome of the metadataproperties.

    Some vocabularies are bigand hierarchical, some aresmall and flat.

    y An application profile specifieswhat metadata & vocabulariesare required, and thenrepresents them formally.

  • 8/6/2019 Busch Slides

    6/126

    6Taxonomy Strategies LLC The business of organized

    Recent & current projects:

    http://www.taxonomystrategies.com/html/clients.htm

    Government Commercial

    Not-for-Profit

    http://www.taxonomystrategies.com/html/clients.htmhttp://www.oracle.com/index.htmlhttp://www.taxonomystrategies.com/html/clients.htm
  • 8/6/2019 Busch Slides

    7/126

    7Taxonomy Strategies LLC The business of organized

    Who are you? What sectors do you work in?

    Your Role

    y Administrator

    y Records Manager

    y Content Manager

    y Communications

    y Editor

    y Information Architect

    y Usability Expert

    y Librarian

    y Knowledge Engineer

    y Ontologist

    y Chief Information Officer

    Industrial Sector

    y Agriculture & Processing

    Food, Lumber, Pulp & Papery Financial Services

    Banking & Insurance

    y Government Public administration

    Public safety

    y High Tech Computers, Software &

    Telecommunications

    y Heavy Manufacturing Steel, Automobiles & Aircraft

    yManufacturing Consumer Products

    y Medical & Health Care

    y Mining & Refining Petrochemicals, Oil & Gas

    y Pharmaceuticals

  • 8/6/2019 Busch Slides

    8/126

    8Taxonomy Strategies LLC The business of organized

    Why are you here?

    y

    What are the key questions that you want answered in todaysworkshop?

    y Please rank the questions from the most important (5) to the leastimportant (1)

    y Please provide your job title, organization and department; your

    name is optional.Priority (1-5) Questions

    Your title or role:

    Your org or industry:

    Your dept:

    Your name: (optional)

  • 8/6/2019 Busch Slides

    9/126

  • 8/6/2019 Busch Slides

    10/12610Taxonomy Strategies LLC The business of organized

    The Taxonomy problem: How to pick from > 5,000

    faucets?

    By:y Category

    y Price

    y Brand

    y Color/Finish

    y # Handles

    y Series Name

    y

    Water Filter?y Faucet Spray

    y Handle Shape

    y Soap Dispenser?

  • 8/6/2019 Busch Slides

    11/12611Taxonomy Strategies LLC The business of organized

    The main issue: What goes here?

    y When do the

    things in the list

    change?

    y How do wemaintain the list?

    y What rules do we

    follow?

  • 8/6/2019 Busch Slides

    12/12612Taxonomy Strategies LLC The business of organized

    Seven phases of taxonomy development

    Week: 1 2 3 4 5 6 7 8 9 10 11 12

    1 IdentifyObjectives Conduct interviews

    2 InventoryResources

    Identify, gather & reviewresources

    Define fields &

    purpose

    3 Specify

    Metadata

    4 ModelContent

    Define contentchunks & XML

    DTDs

    5 SpecifyVocabularies

    Compile controlledvocabularies

    6 SpecifyProcedures

    Develop workflow,rules & procedures

    7 Test & TrainManually tagsmall sample

  • 8/6/2019 Busch Slides

    13/12613Taxonomy Strategies LLC The business of organized

    Taxonomy design phases need to be iterated

    1 Identify

    Objectives

    2 InventoryResources

    3 SpecifyMetadata

    4 ModelContent

    5 SpecifyVocabularies

    6 SpecifyProcedures

    7 Test & Train

    Interview core teamand stakeholders

    Identify, gather &review resources

    Define fields &purpose

    Definecontent

    chunks &XML DTDs

    Compilecontrolled

    vocabularies

    Developworkflowrules &

    procedures

    Plan & Prototype

    Manuallytag smallsample

    Gatheradditionalresources,

    if any

    Revise ifneeded, bake

    into alphaCMS

    Revise if needed,bake into alpha

    CMS

    Revise, use inalpha CMS

    alpha workflowsin CMS

    Alpha Dev & TestReviewtagged

    samples,

    defaultprocedures

    Use alphaCMS to tag

    larger sample

    ModifyCMS for

    beta

    Modify CMSfor beta

    Revise,use inbetaCMS

    Modify &extend

    workflows

    Gatheradditionalsources, if

    any

    Beta D&T

    Interviewalpha

    users

    Use beta CMSto tag larger

    sample

    Finalize trainingmaterials & train

    staff

    Modifyfor 1.0

    Modifyfor 1.0

    Reviseusingteam

    procedure

    Finalizeprocedurematerials

    Final D&T

    Interviewbeta users

  • 8/6/2019 Busch Slides

    14/12614Taxonomy Strategies LLC The business of organized

    Licensing an existing taxonomy

    See Factivas taxonomy www.taxonomywarehouse.comy There are usually license fees, but these will be less than the

    effort to develop an equivalent taxonomy.

    y But pre-existing taxonomies rarely fit an organizations needs

    and may require extensive customization.

    Recommendation

    y Adopt a faceted approach.

    y Reuse existing (especially internal) vocabularies for as manyof the facets as possible.

    y Plan on doing full-custom Content Type and Topictaxonomies.

    http://www.taxonomywarehouse.com/index.asphttp://www.taxonomywarehouse.com/http://www.taxonomywarehouse.com/index.asphttp://www.taxonomywarehouse.com/
  • 8/6/2019 Busch Slides

    15/12615Taxonomy Strategies LLC The business of organized

    Free sources for 8 common taxonomies

    Taxonomy Definition Potential Sources

    Organization Organizational structure. SP 800-87, U.S. Government Manual, Your

    organizational structure, etc.

    Content Type Structured list of the various types ofcontent being managed or used.

    Dublin Core Type Vocabulary, AGLS DocumentType, Your records management policy, etc.

    Industry Broad market categories such aslines of business, life events, orindustry codes.

    SIC, NAICS, Your market segments, etc.

    Location Place of operations orconstituencies.

    FIPS 5-2, FIPS 55-3, ISO 3166, UN StatisticsDiv, US Postal Service, Your sales regions, etc.

    BusinessActivity

    Business activities or functionsperformed to accomplish missionand goals.

    Federal Enterprise Architecture BusinessReference Model, Enterprise ontology,Yourbusiness functions, etc.

    Topic Business topics relevant to yourmission & goals. Federal Register Thesaurus, NAL AgriculturalThesaurus, Your research areas, etc.

    Audience Subset of constituents to whom apiece of content is directed or isintended to be used by.

    GEM, ERIC Thesaurus, IEEE LOM, Yourpsycho-graphics or personas, etc.

    Products &Services

    Names of products/programs andservices.

    ERP system, Your products and services, etc.

  • 8/6/2019 Busch Slides

    16/12616Taxonomy Strategies LLC The business of organized

    Typical product catalog:

    A-Z, then idiosyncratic categories

  • 8/6/2019 Busch Slides

    17/12617Taxonomy Strategies LLC The business of organized

    How to analyze existing product catalog categories:

    Principles and priorities

    Preparing a product catalog for facet browsing (aka GuidedNavigation) requires a category hierarchy and additional attributes.

    Principles

    1. Categories and subcategories that could be swapped are candidates forconversion to attributes.

    2. Repeated lists of subcategories signal a possible need for an attribute.3. The number of attributes should not exceed six or seven, so not all attribute

    candidates should be used. Avoid selecting strongly correlated attributes, such as Weight and Shipping

    Weight.

    Priorities

    1. Choose Categories that apply to many products, over those with fewproducts.

    2. Choose Attributes that apply to many Categories over those that apply onlyto very few categories.

  • 8/6/2019 Busch Slides

    18/12618Taxonomy Strategies LLC The business of organized

    Product categories example: Wireless carrier

    Products

    AccessoriesContentPhonesServices

    BatteriesCasesChargersDataHands-FreeHeadsetsMiscellaneous

    ConferencingInternet / DataLandline PhoneNetwork &Roaming

    Relay ServicesSolutionsWireless Data

    Versatile PhonesSmart DevicesBasic PhonesPrepaid PhonesInternational OnlyPhones

    Mobile Broad-band Cards

    PurchasedSubscription

  • 8/6/2019 Busch Slides

    19/12619Taxonomy Strategies LLC The business of organized

    Product attributes example: Digital cameras in an

    electronics catalog

    y Types of attributes Generic attributes

    Brand/Product Family/Model Price Range Usually Ships

    Merchandising attributes Usage (E-mail, Internet Browsing, Programming, )

    Segment (Home, Business, Education, Government ) Region & Country Most Popular New Related Products

    Specialized attributes

    Capacity (Battery; Memory; MB; GB; BPS, ) Resolution (DPI; Megapixels; XGA, XGA, UXGA, ) Size (Display; Screen; ...) Standard (a, b, g, n, ; scsi, ata, sata, eide, ; dimm, simm,

    ) Type(Camera; Battery; Display; Printer; Server; Storage;

    Switch; )

    Resolution3 Megapixels (4)

    4 Megapixels (5)5 Megapixels (27)

    6-8 Megapixels (21)

    BrandCanon (15)

    Fuji (10)

    Kodak (17)

    Nikon (8)

    Olympus (9)

    TypePoint & Shoot (25)

    Digital SLR (10)

    Packages (5)

    Price Range$100-250 (5)

    $250-500 (16)

    $500-1000 (19)

    More than $1000 (3)

  • 8/6/2019 Busch Slides

    20/12620Taxonomy Strategies LLC The business of organized

    Faceted taxonomy theory & practice

    y How many terms are needed to provide sufficientgranularity? Not as many as you think!

    y Post-coordinate indexing allows several simple controlledvocabularies to be combined, rather than using a single

    large pre-coordinated vocabulary.

  • 8/6/2019 Busch Slides

    21/12621Taxonomy Strategies LLC The business of organized

    The power of faceted taxonomy

    4 independent categories of 10nodes each have the samediscriminatory power as onehierarchy of10,00010,000 nodes (104) Easier to maintain

    Easier to tag by content authors Can be easier to navigate

    y Its more effective to increasethe number of facets, than to

    increase the number of termsper facet.

    AdvocacyContractors &Grantees

    EnvironmentalProfessionals

    FederalFacilities

    General PublicIndustryKids

    Researchers &Scientists

    Small BusinessStudents

    Audience

    AdvisoryExposureFood SafetyHealthAssessment

    Health EffectHealth RiskOccupationalHealth

    Pesticide

    EffectsSun ProtectionToxicity

    Health Industry

    AllergenBiologicalContaminant

    CarcinogenChemicalExplosiveLiquid WasteMicroorganismOzonePesticide

    RadioactiveWaste

    Substance

    Agriculture &Cattle

    AutomobileRepair

    ChemicalDry CleaningElectronics &Computer

    EnergyExtractive

    IndustriesFoodProcessing

    LeatherTanning &Finishing

    Metal Finishing

  • 8/6/2019 Busch Slides

    22/12622Taxonomy Strategies LLC The business of organized

    Automatically created taxonomies

    y

    Documents can be clusteredbased on similarities anddifferences.

    y Problems:

    Typically only a single

    hierarchy No overall plan

    Results hard for people tonavigate

    What does North mean on this map?

  • 8/6/2019 Busch Slides

    23/12623Taxonomy Strategies LLC The business of organized

    Automatic taxonomy construction software

    y Software can scan large quantities of

    content and extract statistically significantwords and phrases.y Example:

    Archive of 10 publications analyzed fortopics related to copyright.

    y Software does a poorjob of De-duplication. Turning significant words and phrases

    into a larger structure. Discriminating between gold and

    garbage.

    y Software is good for Getting an understanding of the key noun

    phrases in a large collection. Providing test cases for evaluating a

    taxonomy.

    Source: Sample data courtesy of nStein.

  • 8/6/2019 Busch Slides

    24/12624Taxonomy Strategies LLC The business of organized

    Most popular flickr tags on 20 Feb 2007

    http://www.flickr.com/photos/tags/

    Sort flickr categories into 5 or fewergroups. Then label each group.

    http://www.flickr.com/photos/tags/http://www.flickr.com/photos/tags/
  • 8/6/2019 Busch Slides

    25/12625Taxonomy Strategies LLC The business of organized

    Taxonomy exercise

    Facet grouping

    yUniversal taxonomy facets By location (spatially)

    By time (chronologically)

    By type (genre)

    By physical properties (size, color, shape, etc.)

    By subject (topic)

    Richard Saul Wurman. Information Architects (1996)

  • 8/6/2019 Busch Slides

    26/12626Taxonomy Strategies LLC The business of organized

    Taxonomy exercise Facet grouping

    Sort flickr categoriesinto 5 or fewer groups.Then label each group.

  • 8/6/2019 Busch Slides

    27/126

    27Taxonomy Strategies LLC The business of organized

    Todays agenda

    9:00-9:10 10 minIntroduction

    9:10-9:15 5 minWarm-up exercise

    9:15-9:45 30 minTaxonomy fundamentals: Building taxonomies

    9:45-10:00 15 minTaxonomy exercise

    10:00-10:30 30 minTaxonomy fundamentals: Taxonomy business case

    10:30-11:00 30 minTea Break

    11:00-12:00 60 minTaxonomy governance

    12:00-12:30 30 minCapabilities self-assessment

    12:30-13:30 60 minLunch

    13:30-14:30 60 minTaxonomy benchmarking

    14:30-14:45 15 minBenchmarking exercise14:45-15:15 30 minTea Break

    15:15-16:15 60 minContent tagging

    16:15-16:30 15 minTagging exercise

    16:30-17:00 30 minQ&A

  • 8/6/2019 Busch Slides

    28/126

    28Taxonomy Strategies LLC The business of organized

    Business case and motivations for taxonomies

    y

    How are we going to use content, metadata, andtaxonomies in applications to obtain business benefits?

  • 8/6/2019 Busch Slides

    29/126

    29Taxonomy Strategies LLC The business of organized

    What technology analysts have said:

    Add metadata to search on!

    y Adding metadata to unstructured content allows it to be managed likestructured content. Applications that use structured content workbetter.

    y Enriching content with structured metadata is critical forsupporting search and personalized content delivery.

    y Content that has been adequately tagged with metadata can beleveraged in usage tracking, personalization and improvedsearching.

    y Better structure equals better access: Taxonomy serves as aframework for organizing the ever-growing and changing informationwithin a company. The many dimensions of taxonomy can greatlyfacilitate Web site design, content management, and searchengineering. If well done, taxonomy will allow for structured Webcontent, leading to improved information access.

  • 8/6/2019 Busch Slides

    30/126

    30Taxonomy Strategies LLC The business of organized

    Fundamentals of taxonomy ROI

    y

    Tagging content using a taxonomy is a cost, not a benefit.y There is no benefit without exposing the tagged content

    to users in some way that cuts costs or improvesrevenues.

    yPutting taxonomy into operation requires UI changesand/or backend system changes, as well as datachanges.

    y You need to determine those changes, and their costs, as

    part of the ROI.

  • 8/6/2019 Busch Slides

    31/126

    31Taxonomy Strategies LLC The business of organized

    Product utilization: Taxonomy compared to search

    y

    Conversion rate increases. HomeDepot.com Double digit increase.

    1-800-Flowers.com More than a 10% increase.

    Otto Group (Kaleidoscope, Freemans, Grattan, and lookagaincatalogs) 130% increase.

    y Lift in average order size.

  • 8/6/2019 Busch Slides

    32/126

    32Taxonomy Strategies LLC The business of organized

    Product catalog: Taxonomy compared to search

    Benefit: Increased conversion rate& revenue lift

    Web sales net income $ 80,000,000

    Increased conversion rate 30%

    $ 24,000,000

    Order size lift 10%

    $ 8,000,000

    Potential revenue increase per year $ 32,000,000

  • 8/6/2019 Busch Slides

    33/126

    33Taxonomy Strategies LLC The business of organized

    Usability research: Taxonomy compared to search

    y

    We found that users preferred a browsing orientedinterface for a browsing task, and a direct searchinterface when they knew precisely what they wanted.

    Marti Hearst (and others)

    y The category interface is superior to the list interface inboth subjective and objective measures.

    Hao Chen & Susan Dumais

  • 8/6/2019 Busch Slides

    34/126

    34Taxonomy Strategies LLC The business of organized

    Usability research: Taxonomy compared to search

    0

    20

    40

    60

    80100

    120

    140

    C ategory List

    M

    edian

    Search

    Tim

    ein

    Seconds

    In top 20 results

    Not in top 20 results

    Category is36% faster

    Category is48% faster

    Source: Chen & Dumais

  • 8/6/2019 Busch Slides

    35/126

    35Taxonomy Strategies LLC The business of organized

    Time saved: Taxonomy compared to search

    1 hour per day searching x 36% faster = 22 minuteseach day

    22 minutes x 250 working days per year = 5500 minutes

    or 92 hours per year

  • 8/6/2019 Busch Slides

    36/126

    36Taxonomy Strategies LLC The business of organized

    Time saved: Taxonomy compared to search

    Benefit: Increase service efficiency

    Number of call center calls per month 50,000

    Average cost per call $ 20Call response costs per month $ 1,000,000

    Total call response costs per year $12,000,000

    Percentage of self-serviced calls due to

    improved information browsing

    30%

    Service costs savings per year $ 3,600,000

  • 8/6/2019 Busch Slides

    37/126

    37Taxonomy Strategies LLC The business of organized

    Trusted advisers: Taxonomy avoids costs

    y

    The amount of time wasted in futile searching for vitalinformation is enormous, leading to staggering costs

    Sue Feldman,

    y Suns usability experts calculated that 21,000 employeeswere wasting an average of six minutes per day due toinconsistent intranet navigation structures. When losttime was multiplied by staff salaries, the estimated

    productivity loss exceeded $10M per yearabout $500per employee per year.Jakob Nielsen, useit.com

    K l d k d t 2 5 h

  • 8/6/2019 Busch Slides

    38/126

    38Taxonomy Strategies LLC The business of organized

    Searching

    Creating

    Commun-

    icating

    Knowledge workers spend up to 2.5 hours

    each day looking for information

    But find what they are looking for only 40% of

    the time.

    Source: Kit Sims Taylor

    K l d k d ti ti i ti

  • 8/6/2019 Busch Slides

    39/126

    39Taxonomy Strategies LLC The business of organized

    Creating

    new

    content

    Recreating

    existing

    content

    SearchingCommun-

    icating

    25% 8%

    Knowledge workers spend more time re-creating existing

    content than creating new content

    Source: Kit Sims Taylor (cited by Sue Feldman in her original article)

    C t d b t ti t t

  • 8/6/2019 Busch Slides

    40/126

    40Taxonomy Strategies LLC The business of organized

    Cost saved by not recreating content

    Benefit: Increase in productivity

    Number of employees 100

    Average employee salary $ 80,000

    Employee costs per year $8,000,000

    Increase in productivity from not re-creatingcontent

    25%

    Employee cost savings per year $2,000,000

    B i

  • 8/6/2019 Busch Slides

    41/126

    41Taxonomy Strategies LLC The business of organized

    Business case summary

    1. Classifications and classification-like schemes arebeing used to facilitate information seeking in theworkplace, and on the web.

    2. Users take advantage (and prefer) this type of

    scheme (faceted navigation) when it is madeavailable in the user interface.

    3. Hierarchical or facet navigation can be guided by theUser Interface.

    4. Facet navigation is best combined with keywordsearching. E.g., keyword search followed by facetednavigation of results.

    T d d

  • 8/6/2019 Busch Slides

    42/126

    42Taxonomy Strategies LLC The business of organized

    Todays agenda

    9:00-9:10 10 minIntroduction

    9:10-9:15 5 minWarm-up exercise

    9:15-9:45 30 minTaxonomy fundamentals: Building taxonomies

    9:45-10:00 15 minTaxonomy exercise

    10:00-10:30 30 minTaxonomy fundamentals: Taxonomy business case

    10:30-11:00 30 minTea Break

    11:00-12:00 60 minTaxonomy governance

    12:00-12:30 30 minCapabilities self-assessment

    12:30-13:30 60 minLunch

    13:30-14:30 60 minTaxonomy benchmarking

    14:30-14:45 15 minBenchmarking exercise14:45-15:15 30 minTea Break

    15:15-16:15 60 minContent tagging

    16:15-16:30 15 minTagging exercise

    16:30-17:00 30 minQ&A

    T i b i

  • 8/6/2019 Busch Slides

    43/126

    43Taxonomy Strategies LLC The business of organized

    Taxonomy requires a business processes

    y

    Taxonomies must change, gradually, over time if they areto remain relevant.

    y Maintenance processes need to be specified so that thechanges are based on rational cost/benefit decisions.

    T b i d

  • 8/6/2019 Busch Slides

    44/126

    44Taxonomy Strategies LLC The business of organized

    Taxonomy governance can be viewed as a

    standards process

    y

    Taxonomy must evolve, but in a predictable way.y Team structure, with an appeals process

    Taxonomy stewardship is part-time role at most organizations.

    Team needs to make decisions based on costs and benefits.

    y

    Documentation and educational materials.y Comment-handling responsibilities (part of error-

    correction process)

    y Issue Logs.

    y Release Schedule.

    Taxonomy governance: Change process overview

  • 8/6/2019 Busch Slides

    45/126

    45Taxonomy Strategies LLC The business of organized

    Taxonomy governance: Change process overview

    Working Copiesof CVs, maintain in

    Taxonomy Tool

    Site Search Tool

    Portal

    Project Archives

    DMS

    Metatagging Tool

    Search UI

    2: NASA Taxonomy Team

    decides when toupdate snapshots of

    external CVs

    4: Updated versions of

    CVs to Consumers

    NASA Taxonomy

    GovernanceEnvironment

    3: Team adds value to

    snapshots through

    definitions, synonyms,

    classification rules,

    training materials, etc.

    Internally CreatedCVs

    Codes

    NASACompetencies

    CVs from otherNASA Sources

    External StandardVocabularies

    2: Taxonomy Team decideswhen to update CVsnapshots

    Taxonomy

    Facets

    3: Team adds value viadefinitions,synonyms,classification rules,training materials, etc.

    1: External controlledvocabularies (CVs) changeon their own schedule

    TaxonomyGovernance

    Environment

    4: Updatedversions of CVspublished toconsumers

    CV

    Consumers

    CV Sources

    SubjectCodes

    Expertise

    OtherInternal

    ExternalStandard

    Site SearchTool

    Portal

    WorkingPapers

    Web CMS

    DAM

    TaggingTool

    Search UI

    InternallyCreated

    TaxonomyTool

    CV = Controlled Vocabulary

    Who should build the taxonomy?

  • 8/6/2019 Busch Slides

    46/126

    46Taxonomy Strategies LLC The business of organized

    Who should build the taxonomy?

    y

    The taxonomy (and metadata specification) should beproduced by a cross-functional team which includesbusiness, technical, information management, andcontent creation stakeholders.

    y The team should plan on maintaining the taxonomy aswell as building it. Maintenance will not (usually) be anyones full-time job.

    Exact mix of people on team will change.

    y It should be built in an iterative fashion, with more content

    and broader review for each iteration.

    Taxonomy governance: Generic team charter

  • 8/6/2019 Busch Slides

    47/126

    47Taxonomy Strategies LLC The business of organized

    Taxonomy governance: Generic team charter

    y

    Taxonomy Team is responsible for maintaining: The Taxonomy, a multi-faceted classification scheme. Associated taxonomy materials, such as:

    Editorial Style Guides.

    Taxonomy Training Materials.

    Metadata Standard.

    Team rules and procedures for change management.

    y Taxonomy Team will consider costs and benefits ofsuggested changes.

    y Taxonomy Team will:

    Manage relationship between providers of source vocabulariesand consumers of the Taxonomy.

    Identify new opportunities for use of the Taxonomy across theenterprise to improve information management practices.

    Promote awareness and use of the Taxonomy.

    Taxonomy governance team:

  • 8/6/2019 Busch Slides

    48/126

    48Taxonomy Strategies LLC The business of organized

    Taxonomy governance team:

    Generic roles

    BusinessLead

    Technical

    Specialist

    Taxonomy

    Specialist

    Content

    Specialist

    Content

    Owners

    Keeps committee on track with larger business objectives.

    Balances cost/benefit issues to decide appropriate levels ofeffort.

    Obtains needed resources if those on committee cantaccomplish a particular task.

    Estimates costs of proposed changes in terms of amount ofdata to be retagged, additional storage and processing burden,software changes, etc.

    Helps obtain data from various systems.

    Committees liaison to content creators.

    Estimates costs of proposed changes in terms of editorialprocess changes, additional or reduced workload, etc.

    Suggests potential taxonomy changes based on analysis of

    query logs, indexer feedback. Makes edits to taxonomy, installs into system with aid of IT

    specialist.

    Reality check on process change suggestions.

    Where taxonomy changes come from

  • 8/6/2019 Busch Slides

    49/126

    49Taxonomy Strategies LLC The business of organized

    Where taxonomy changes come from

    experience

    End User

    Firewall

    Taxonomy

    Content TaggingLogic

    Application

    UITagging

    UI

    Tagging Staff

    Taxonomy Editor

    Staffnotes

    missingconcepts

    Query loganalysis

    Requests from otherparts of NASA

    experience

    End User

    Taxonomy Team

    FirewallFirewall

    Taxonomy

    Content TaggingLogic

    TaggingLogic

    Application

    UI

    Application

    UITagging

    UITagging

    UI

    Tagging Staff

    Taxonomy Editor

    Staffnotes

    missingconcepts

    Query loganalysis

    Requests from other

    parts of the organization

    Team Considerations

    1.Business goals.2.Changes in user

    experience.

    3.Retagging cost.

    Recommendations by Editor

    1. Small taxonomy changes(labels, synonyms)

    2. Large taxonomy changes(retagging, applicationchanges)

    3.New best bets content.

    Application

    Logic

    Taxonomy maintenance processes

  • 8/6/2019 Busch Slides

    50/126

    50Taxonomy Strategies LLC The business of organized

    Taxonomy maintenance processes

    y

    Different organizations will need to consider their ownchange processes. Organization 1: A custodian is responsible for the content, but

    checks facts with department heads before making changes. Organization 2: Analysts suggest changes, editors approve,

    copyeditors verify consistency. Organization 3: Marketing reps ask for a change, taxonomy editor

    makes demo, web representative approves it.

    y Change process MUST also consider cost ofimplementing the change

    Retagging data. Reconfiguring auto-classifier. Retraining staff. Changes in user expectations.

    Taxonomy maintenance workflow

  • 8/6/2019 Busch Slides

    51/126

    51Taxonomy Strategies LLC The business of organized

    Taxonomy maintenance workflow

    Problem?

    Problem?

    Yes

    Yes No

    No

    Suggest new

    name/categoryReview new

    name

    Taxon-omy

    Copy edit newname

    Add to

    enterpriseTaxonomy

    Analyst Editor Copywriter Sys Admin

    Taxonomy Tool

    Sample taxonomy editor: Data Harmony

  • 8/6/2019 Busch Slides

    52/126

    52Taxonomy Strategies LLC The business of organized

    Sample taxonomy editor: Data Harmony

    Hierarchy

    Browser

    Standard

    Term

    Info

    Taxonomy editing tools vendors An immature area

  • 8/6/2019 Busch Slides

    53/126

    53Taxonomy Strategies LLC The business of organized

    Taxonomy editing tools vendors

    AbilitytoExecute

    lo

    w

    high

    Completeness of VisionVisionariesNiche Players

    Most populartaxonomy editor is

    MS Excel

    An immature areaNo vendors are in

    upper-rightquadrant!

    MultiTes is widely

    used, cheap with

    Highfunctionality

    /high cost

    products($100K+)

    Taxonomy maturity model

    http://www.wordmap.com/index.html
  • 8/6/2019 Busch Slides

    54/126

    54Taxonomy Strategies LLC The business of organized

    Taxonomy maturity model

    y Taxonomy governance processes must fit the organization.y As consultants, we notice different levels of maturity in the business

    processes around content management, taxonomy, and metadata.y Honestly assess your organizations metadata maturity in order to

    design appropriate governance processes.y We are starting to define a maturity model, similar to the Software

    Capability Maturity Model (CMM) Initial: Ad hoc, each project begins from scratch. Repeatable: Procedures defined and used, but not standardized across

    organization or are misapplied to projects. Defined: Standard processes are tailored for project needs. Strategic

    training for long-range goals is in place. Managed: Projects managed using quantitative quality measures.

    Process itself is measured and controlled. Optimizing: Continual process improvement. Extremely accurate project

    estimation.

    Purpose of maturity model

  • 8/6/2019 Busch Slides

    55/126

    55Taxonomy Strategies LLC The business of organized

    Purpose of maturity model

    y

    Estimating the maturity of an organizations informationmanagement processes tells us: How involved the taxonomy development and maintenance

    process should be Overly sophisticated processes will fail.

    What to recommend as first steps.

    y Maturity is not a goal, it is a characterization of anorganizations methods for achieving particular goals.

    y Mature processes have expenses which must be justifiedby consequent cost savings or revenue gains.

    y IT Maturity may not be core to your business.

    Taxonomy maturity scorecard

  • 8/6/2019 Busch Slides

    56/126

    56Taxonomy Strategies LLC The business of organized

    Taxonomy maturity scorecardInitial Repeatable Defined Managed Optimizing

    Organizational Structure

    Executive Sponsorship *

    Budgeting *

    Hiring & Training *

    Quality Assurance

    Manual Processes * 1

    Automated Processes *

    Project Management

    Estimating & Scheduling *

    Cost Control *

    Project Methodology * 2

    Design and Execution

    Planning *

    Design Excellence *

    Development Maturity *

    1 X is starting to examine search query logs, which is an important first step in improving search.But this is only an isolated example.2 IT has a project methodology they are trying to use across all projects. But not all business unitshave project methodologies.

  • 8/6/2019 Busch Slides

    57/126

    2005 Maturity survey: Search practices

  • 8/6/2019 Busch Slides

    58/126

    58Taxonomy Strategies LLC The business of organized

    2005 Maturity survey: Search practices

    n=87 Not currentpractice

    Beingdeveloped

    In practice Former practice

    NA orUnknown

    Search Box in standard place on all web pages. 20% (12) 11% (7) 62% (38) 2% (1) 5% (3)

    Search engine indexes multiple repositories in addition toweb sites.

    25% (15) 21% (13) 44% (27) 2% (1) 8% (5)

    Spell Checking. 31% (19) 18% (11) 38% (23) 0% (0) 13% (8)

    Synonym Searching. 41% (25) 23% (14) 30% (18) 0% (0) 7% (4)

    Search results grouped by date, location, or other factorsin addition to simple relevance score. 37% (22) 20% (12) 37% (22) 0% (0) 7% (4)

    Queries are logged and the logs are regularly examined 31% (19) 25% (15) 31% (19) 5% (3) 8% (5)

    Common queries identified, 'best' pages for those queriesare found, and search engine configured to return them atthe top. (Best Bets)

    46% (28) 25% (15) 21% (13) 0% (0) 8% (5)

    dvanced computation of relevance based on data in

    addition to the text of the document.43% (26) 16% (10) 25% (15) 0% (0) 16% (10)

    faceted search tool, such as Endeca, has beenimplemented for the organization's external site or productcatalog search.

    68% (41) 7% (4) 10% (6) 0% (0) 15% (9)

    faceted search tool, such as Endeca, has beenimplemented for the organization's internal website(s) orportal.

    57% (34) 15% (9) 17% (10) 0% (0) 12% (7)

    2005 Maturity survey: Metadata practices

  • 8/6/2019 Busch Slides

    59/126

    59Taxonomy Strategies LLC The business of organized

    2005 Maturity survey: Metadata practices

    n=87 Not currentpractice

    Beingdeveloped

    In practice Former practice

    NA orUnknown

    Metadata standards are developed for the needs of eachsystem with no overall attempt to unify them. 22% (13) 12% (7) 37% (22) 20% (12) 10% (6)

    n Organization-wide metadata standard exists and newsystems consider it during development.

    37% (22) 37% (22) 20% (12) 0% (0) 7% (4)

    The Organization-wide metadata standard is based onthe Dublin Core.

    52% (30) 16% (9) 21% (12) 0% (0) 12% (7)

    Multiple repositories comply with metadata standard. 52% (31) 20% (12) 17% (10) 0% (0) 12% (7)

    Cataloging Policy document exists to teach people howto tag data in compliance with organizational metadatastandard.

    48% (29) 20% (12) 20% (12) 0% (0) 12% (7)

    The Cataloging Policy document is revised periodically. 48% (29) 15% (9) 17% (10) 0% (0) 20% (12)

    centralized metadata repository exists to aggregate andunify metadata from disparate sources.

    57% (34) 17% (10) 17% (10) 0% (0) 10% (6)

    Metadata is manually entered into web forms. 15% (9) 12% (7) 61% (36) 3% (2) 8% (5)

    Metadata is generated automatically by software. 38% (23) 18% (11) 27% (16) 2% (1) 15% (9)

    Metadata is generated automatically, then reviewedmanually for correction.

    48% (29) 18% (11) 17% (10) 2% (1) 15% (9)

    2005 Maturity survey: Taxonomy practices

  • 8/6/2019 Busch Slides

    60/126

    60Taxonomy Strategies LLC The business of organized

    2005 Maturity survey: Taxonomy practices

    n=87 Not currentpractice

    Beingdeveloped

    In practice Former practice

    NA orUnknown

    Org Chart Taxonomy - One based primarily on thestructure of the organization.

    36% (21) 10% (6) 34% (20) 5% (3) 15% (9)

    Products Taxonomy - One based primarily on theproducts and/or services offered by the organization.

    37% (22) 10% (6) 32% (19) 5% (3) 15% (9)

    Content Types Taxonomy - One based primarily on thedifferent types of documents.

    28% (16) 21% (12) 40% (23) 5% (3) 7% (4)

    Topical Taxonomy - One based primarily on topics ofinterest to the site users. 20% (12) 36% (21) 34% (20) 3% (2) 7% (4)

    Faceted Taxonomy - One which uses several of theapproaches above.

    32% (19) 29% (17) 34% (20) 0% (0) 5% (3)

    The Taxonomy, or a portion of it, was licensed from anoutside taxonomy vendor.

    75% (44) 3% (2) 14% (8) 0% (0) 8% (5)

    The Taxonomy follows a written 'style guide' to ensure its

    consistency over time.

    47% (28) 22% (13) 20% (12) 0% (0) 10% (6)

    The Taxonomy is maintained using a taxonomy editingtool other than MS Excel.

    35% (21) 17% (10) 40% (24) 2% (1) 7% (4)

    The Taxonomy was validated on a representative sampleof content during its development.

    28% (17) 22% (13) 33% (20) 3% (2) 13% (8)

    Roadmap for the future evolution of the Taxonomy hasbeen developed.

    38% (23) 40% (24) 13% (8) 0% (0) 8% (5)

    Todays agenda

  • 8/6/2019 Busch Slides

    61/126

    61Taxonomy Strategies LLC The business of organized

    Today s agenda

    9:00-9:10 10 minIntroduction

    9:10-9:15 5 minWarm-up exercise

    9:15-9:45 30 minTaxonomy fundamentals: Building taxonomies

    9:45-10:00 15 minTaxonomy exercise

    10:00-10:30 30 minTaxonomy fundamentals: Taxonomy business case

    10:30-11:00 30 minTea Break

    11:00-12:00 60 minTaxonomy governance12:00-12:30 30 minCapabilities self-assessment

    12:30-13:30 60 minLunch

    13:30-14:30 60 minTaxonomy benchmarking

    14:30-14:45 15 minBenchmarking exercise14:45-15:15 30 minTea Break

    15:15-16:15 60 minContent tagging

    16:15-16:30 15 minTagging exercise

    16:30-17:00 30 minQ&A

    Taxonomy testing methods

  • 8/6/2019 Busch Slides

    62/126

    62Taxonomy Strategies LLC The business of organized

    Taxonomy testing methods

    Method Process Who Requires Validation

    Walk-thru Show & explain Taxonomist

    SME Team

    Roughtaxonomy

    Approach

    Appropriateness to task

    Walk-thru Checkconformance toeditorial rules

    Taxonomist Drafttaxonomy

    Editorial Rules

    Consistent look and feel

    Usability

    Testing

    Contextual

    analysis (cardsorting, scenariotesting, etc.)

    Users Rough

    taxonomy Tasks &

    Answers

    Tasks are completed

    successfully Time to complete task is

    reduced

    UserSatisfaction

    Survey Users RoughTaxonomy

    UI Mockup

    Searchprototype

    Reaction to taxonomy

    Reaction to new interface

    Reaction to search results

    TaggingSamples

    Tag samplecontent withtaxonomy

    Taxonomist

    Team

    Indexers

    Samplecontent

    Roughtaxonomy (orbetter)

    Content fit

    Fills out content inventory

    Training materials for people &algorithms

    Walk-through method

  • 8/6/2019 Busch Slides

    63/126

    63Taxonomy Strategies LLC The business of organized

    Walk through method

    Show & explain

    ABC Computers.com

    All

    BusinessEmployeeEducationGamingEnthusiast

    HomeInvestorJob SeekerMediaPartnerShopper

    First TimeExperiencedAdvanced

    Supplier

    Audience

    All

    Home & HomeOfficeGamingGovernment,Education &Healthcare

    Medium &LargeBusiness

    Small Business

    Line of

    Business

    All

    Asia-PacificCanadaEMEAJapanLatin America &Caribbean

    United States

    Region-

    Country

    Desktops

    MP3 PlayersMonitorsNetworkingNotebooksPrintersProjectorsServersServicesStorageTelevisionsOther Brands

    Product

    Family

    Award

    Case StudyContract &Warranty

    DemoMagazineNews & EventProductInformation

    ServicesSolutionSpecification

    Technical NoteToolTrainingWhite PaperOther ContentTypes

    Content

    Type

    Business &

    FinanceInterpersonalDevelopment

    IT ProfessionalsTechnicalTraining

    IT ProfessionalsTraining &Certification

    PC ProductivityPersonal

    ComputingProficiency

    Competency Industry

    Banking &

    FinanceCommunica-tions

    E-BusinessEducationGovernmentHealthcareHospitalityManufacturingPetro-chemicalsRetail /

    WholesaleTechnologyTransportationOther Industries

    Service

    Assessment,

    Design &Implementa-tion

    DeploymentEnterpriseSupport

    Client SupportManagedLifecycle

    AssetRecovery &

    RecyclingTraining

    Walk-through method

  • 8/6/2019 Busch Slides

    64/126

    64Taxonomy Strategies LLC The business of organized

    Walk through method

    Editorial rules consistency check

    y Abbreviations

    y Ampersandsy Capitalizationy General, More, Othery Languages & character setsy Length limitsy Multiple parentsy Plural vs. singular formy Scope notesy Serial commay Sources of termsy Spacesy Synonyms & acronymsy Term order (Alphabetic or )y Term label order (Direct vs.

    inverted)

    Rule Name Editorial Rule

    Abbreviations Abbreviations, other than colloquial termsand acronyms, shall not be used in termlabels.Example: Public InformationNOT: Public Info.

    Ampersands The ampersand [&] character shall beused instead of the word and. Example:

    Licensing & Compliance

    NOT: Licensing and Compliance

    Capitalization Title case capitalization shall be used.Example: Customer ServiceNOT: CUSTOMER SERVICENOT: Customer serviceNOT: customer service

    General,More, Other

    The term labels General, More, andOther shall be used for categories

    which contain content items that are notfurther classifiable. Example:Other Property Other Services General InformationGeneral Audience

    Task-based testing* * Based on Donna Maurers usability

  • 8/6/2019 Busch Slides

    65/126

    65Taxonomy Strategies LLC The business of organized

    as based test g

    y 15 representative questions were selected

    Perspective of various organizational units Most frequent website searches Most frequently accessed website content Correct answers to the questions were agreed in advance by team.

    y 15 users were tested Did not work for the organization Represented target audiences

    y Testers were asked where would you look for under which facet Topic, Commodity, or Geography? Then, under which category? Then, under which sub-category?

    Tester choices were recordedy Testers were asked to think aloud

    Notes were taken on what they saidy Pre- and post questions were asked

    Tester answers were recorded

    work with the Australian government

    Task-based testing

  • 8/6/2019 Busch Slides

    66/126

    66Taxonomy Strategies LLC The business of organized

    g

    Representative questions

    1. How much cotton is imported from China?

    2. What are the impacts of mad cow" disease on U.S. meat production, sales?3. What is the average farm income level in your state?4. How much of our diet comes from fast food?5. How many people receive WIC benefits (Special Supplemental Nutrition

    Program for Women, Infants, and Children)?6. How much acreage is planted to genetically engineered corn?

    7. What is the cost of foodborne illness in the United States?8. What part of food costs go to farmers, retailers?9. Which States produce the most tobacco?10. What percentage of farms in the United States are small farms?11. What are the costs and benefits associated with providing more traceability in

    the U.S. food supply?

    12. How many people in America dont get enough to eat?13. What is behind the trade balance (surplus or deficit) in agricultural goods?14. What is the extent of conservation compliance? How does that impact farmer's

    decisions?15. What are the impacts of foreign trade restrictions on U.S. farmers, U.S. food

    prices?

    Task-based testing

  • 8/6/2019 Busch Slides

    67/126

    67Taxonomy Strategies LLC The business of organized

    g

    Closed card sorting

    3. What is the averagefarm income level in

    your state?

    1. Topics2. Commodities

    3. Geographic Coverage

    1. Topics

    1.1 Agricultural Economy1.2 Agriculture-RelatedPolicy

    1.3 Diet, Health & Safety1.4 Farm Financial

    Conditions1.5 Farm Practices &

    Management1.6 Food & Agricultural

    Industries1.7 Food & Nutrition

    Assistance1.8 Natural Resources &

    Environment1.9 Rural Economy1.10 Trade & International

    Markets

    1.4 Farm Financial

    Conditions1.4.1 Costs of Production1.4.2 Commodity Outlook1.4.3 Farm Financial

    Management &Performance

    1.4.4 Farm Income1.4.5 Farm Household

    Financial Well-being

    1.4.6 Lenders & FinancialMarkets

    1.4.7 Taxes

  • 8/6/2019 Busch Slides

    68/126

    Task based testing

  • 8/6/2019 Busch Slides

    69/126

    69Taxonomy Strategies LLC The business of organized

    g

    Card sort results

    y In 80% of the trials users looked for information under thecategories that we expected them to look for it.

    y Breaking-up topics into facets makes it easier to findinformation, especially information related tocommodities.

    Task based testing

  • 8/6/2019 Busch Slides

    70/126

    Taxonomy Strategies LLC The business of organized

    g

    Card sort results

    Test Questions % Correct % Agree

    1. Cotton 91% 82%

    2. Mad cow 73% 64%

    3. Farm income 100% 55%

    4. Fast food 91% 73%

    5. WIC 100% 100%

    6. GE corn 100% 100%

    7. Foodborne illness 82% 82%

    8. Food costs 55% 27%

    9. Tobacco 100% 100%

    10. Small farms 91% 91%

    11. Traceability 36% 18%

    12. Hunger 100% 73%

    13. Trade balance 36% 64%

    14. Conservation 91% 91%

    15. Trade restrictions 55% 36%

    Possible change required.

    Change required.

    Possible error in categorization of this

    question because 64% thought the answer

    should be Commodity Trade.

    On these trials, only 50% looked in the right

    category, & only 27-36% agreed on the

    category.

    Policy of Traceability needs to be clarified.

    Use quasi-synonyms.

    Task-based testing

  • 8/6/2019 Busch Slides

    71/126

    Taxonomy Strategies LLC The business of organized

    g

    User satisfaction survey

    y Was it easy, medium or difficult to choose the appropriateTopic?

    Easy Medium Difficult

    y Was it easy, medium or difficult to choose the appropriateCommodity?

    Easy Medium Difficult

    y

    Was it easy, medium or difficult to choose the appropriateGeographic Coverage? Easy

    Medium

    Difficult

    User satisfaction survey

  • 8/6/2019 Busch Slides

    72/126

    72Taxonomy Strategies LLC The business of organized

    y

    Results

    -

    0.50

    1.00

    1.50

    2.00

    Topic Commodity Geography

    Facet

    Easy

    -->

    Difficul

    EasierMore Difficult

    User interface survey

    http://flamenco.berkeley.edu/index.html
  • 8/6/2019 Busch Slides

    73/126

    73Taxonomy Strategies LLC The business of organized

    Which search UI is better?

    y Criteria

    User satisfaction

    Success completing tasks

    Confidence in results

    Fewer dead ends

    y

    Methodology Design tasks from specific togeneral

    Time performance Calculate success rates Survey subjective criteria Pay attention to survey

    hygiene: Participant selection Counterbalancing T-scores

    Source: Yee, Swearingen, Li, & Hearst

    User interface survey

  • 8/6/2019 Busch Slides

    74/126

    74Taxonomy Strategies LLC The business of organized

    Results (1)

    Which Interface would you rather use for these tasks? Google-likeBaseline

    FacetedCategory

    Find images of roses 15 16

    Find all works from a certain period 2 30

    Find pictures by 2 artists in the same media 1 29

    Overall assessment: Google-likeBaseline

    FacetedCategory

    More useful for your usual tasks 4 28

    Easiest to use 8 23

    Most flexible 6 24

    More likely to result in dead-ends 28 3

    Helped you learn more 1 31

    Overall preference 2 29

    Source: Yee, Swearingen, Li, & Hearst

    User interface survey

  • 8/6/2019 Busch Slides

    75/126

    75Taxonomy Strategies LLC The business of organized

    Results (2)

    6.06.7

    4.7 4.6

    5.8 5.56.0

    4.0

    7.2

    6.3

    3.5

    7.7 7.47.8

    4.8

    7.6

    0

    1

    2

    34

    5

    6

    7

    89

    EasytoUseSimpleFlexibleTedious

    Interesting

    EasytoBrowse

    Enjoyable

    Overwhelming

    Faceted Category

    Google-like Baseline

    Source: Yee, Swearingen, Li, & Hearst

    Tagging samples

  • 8/6/2019 Busch Slides

    76/126

    76Taxonomy Strategies LLC The business of organized

    How many items?

    Goal Number of

    Items

    Criteria

    Illustrate metadata schema 1-3 Random (excluding junk)

    Develop training documentation 10-20 Show typical & unusual cases

    Qualitative test of smallvocabulary (

  • 8/6/2019 Busch Slides

    77/126

    Tagging samples

  • 8/6/2019 Busch Slides

    78/126

    78Taxonomy Strategies LLC The business of organized

    Spreadsheet for tagging 10s-100s of items

    1) Clickable URLs for sample content

    2) Review small sample and describe

    3) Drop-down for tagging (including

    Other entry for the unexpected

    4) Flag questions

    Rough bulk tagging

  • 8/6/2019 Busch Slides

    79/126

    79Taxonomy Strategies LLC The business of organized

    Facet demo (1)

    y Collections: 4 content sources NTRS, SIRTF, Webb, Lessons Learned

    y Taxonomy Converted MultiTes format into RDF for Seamark

    y Metadata Converted from existing metadata on web pages, or

    Created using simple automatic classifier (string matching withterms & synonyms)

    250k items, ~12 metadata fields, 1.5 weeks effort

    y OOTB Seamark user interface, plus logo

    Rough bulk tagging

  • 8/6/2019 Busch Slides

    80/126

    80Taxonomy Strategies LLC The business of organized

    Facet demo(2)

    Document distribution

    http://demo.siderean.com/NASADemoV4/NASA-demoquery1.jsphttp://demo.siderean.com/NASADemoV4/NASA-demoquery1.jsp
  • 8/6/2019 Busch Slides

    81/126

    81Taxonomy Strategies LLC The business of organized

    How evenly does it divide the content?

    y Documents do not distribute uniformly across categories

    y Zipf (1/x) distribution is expected behavior

    y 80/20 rule in action (actually 70/20 rule)

    Measured v Expected Distribution of Top 10 Content Types in

    Library of Congress Database

    0

    50,000

    100,000

    150,000

    200,000

    250,000

    300,000

    350,000

    Cong

    resses

    Biog

    raph

    y

    Perio

    dicals

    Map

    s

    Fiction

    Exhib

    itions

    Juve

    nilelite

    ratur

    e

    Bibli

    ograph

    y

    Statistics

    Top 10 Content Types

    NumberofRecords

    Leading candidate forsplitting

    Leading candidatesfor merging

    Document distribution

  • 8/6/2019 Busch Slides

    82/126

    82Taxonomy Strategies LLC The business of organized

    How evenly does it divide the content?

    y Methodology: 115 randomly selected URLs from corporate intranet

    search index were manually categorized. Inaccessible files and junkwere removed.

    y Results: Slightly more uniform than Zipf distribution. Above the curveis better than expected.

    Measured v Expected Intranet Content Type Distribution

    0

    5

    10

    15

    20

    25

    People,Groups

    &

    Places

    News&Events

    Manuals&

    Learning

    Materials

    Operations&

    Internal

    Communications

    Marketing&

    Sales

    Regulations,

    Policies,

    Procedures&

    Templates

    Papers&

    Presentations

    Other&

    Unclassified

    Programs,

    Proposals,Plans

    &

    Schedules

    Content Type

    #Documents

    Document distribution How does taxonomy

    h h h f ?

  • 8/6/2019 Busch Slides

    83/126

    83Taxonomy Strategies LLC The business of organized

    shape match that of content?

    Background:y Hierarchical taxonomies allow

    comparison of fit between contentand taxonomy areas

    Methodology:y

    25,380 resources tagged withtaxonomy of 179 terms. (Avg. of 2terms per resource)

    y Counts of terms and documentssummed within taxonomy hierarchy

    Results:y Roughly Zipf distributed (top 20

    terms: 79%; top 30 terms: 87%)

    y Mismatches between term% anddocument% flagged

    Term Group % Terms % Docs

    Administrators 7.8 15.8

    Community Groups 2.8 1.8

    Counselors 3.4 1.4

    Federal Funds Recipients andApplicants

    9.5 34.4

    Librarians 2.8 1.1

    News Media 0.6 3.1

    Other 7.3 2.0

    Parents and Families 2.8 6.0

    Policymakers 4.5 11.5

    Researchers 2.2 3.6

    School Support Staff 2.2 0.2

    Student Financial AidProviders

    1.7 0.7

    Students 27.4 7.0

    Teachers 25.1 11.4

    Source: Courtesy Keith Stubbs, US. Dept. of Ed.

    Usability testing

    H i t iti ( t bl ) th t i ti (1)?

  • 8/6/2019 Busch Slides

    84/126

    84Taxonomy Strategies LLC The business of organized

    How intuitive (repeatable) are the categorizations (1)?

    y Methodology: Closed Card Sort For alpha test of a grocery site

    15 Testers put each of 71 best-selling product types into one of10 pre-defined categories

    Categories where fewer than 14 of 15 testers put product into

    same category were flagged

    Usability testing

    H i t iti ( t bl ) th t i ti (2)?

  • 8/6/2019 Busch Slides

    85/126

    85Taxonomy Strategies LLC The business of organized

    How intuitive (repeatable) are the categorizations (2)?

    Usability testing

    H i t iti ( t bl ) th t i ti ?

  • 8/6/2019 Busch Slides

    86/126

    86Taxonomy Strategies LLC The business of organized

    % of Testers Cumulative % ofProducts

    15/15 54%

    14/15 70%

    13/15 77%

    12/15 83%

    11/15 85%

  • 8/6/2019 Busch Slides

    87/126

    87Taxonomy Strategies LLC The business of organized

    The #1 underused source of quantitativeinformation on how to improve your

    taxonomy?

    Query Logs & Click Trails

    Query log & click trail examination

    Wh th & h t th l ki f ?

  • 8/6/2019 Busch Slides

    88/126

    88Taxonomy Strategies LLC The business of organized

    Who are the users & what are they looking for?

    y Only 30-40% of organizations regularly examine theirlogs*.

    y Sophisticated software available, but dont wait.

    y 80% of value comes from basic reports

    Query log & click trail examination

    Q l

  • 8/6/2019 Busch Slides

    89/126

    89Taxonomy Strategies LLC The business of organized

    Query log

    UltraSeek Reportingy Top queries

    y Queries with no results

    y Queries with no click-through

    y Most requested documents

    y Query trend analysisy Complete server usage

    summary

    Query log & click trail examination

    Cli k t il k

  • 8/6/2019 Busch Slides

    90/126

    90Taxonomy Strategies LLC The business of organized

    Click trail packages

    y iWebTrack

    y NetTracker

    y OptimalIQ

    y SiteCatalyst

    y Visitorville y WebTrends

  • 8/6/2019 Busch Slides

    91/126

    Benchmarking exercise

  • 8/6/2019 Busch Slides

    92/126

    92Taxonomy Strategies LLC The business of organized

    y What are 5 representative questions that your users ask or tasks

    that your users do when using your application?y Is it currently easy, medium or difficult to answer these questions or

    accomplish these tasks?

    Rating (Easy/Medium/Difficult)

    Questions or Tasks

    Conclusion

    What is a good taxonomy?

  • 8/6/2019 Busch Slides

    93/126

    93Taxonomy Strategies LLC The business of organized

    What is a good taxonomy?

    y Incremental, extensible process that identifies andenables owners, and engages stakeholders.

    y Quick implementation that provides measurable resultsas quickly as possible.

    y A means to an end, and not the end in itself.

    y Not perfect, but it does the job it is supposed to dosuchas improving search and navigation.

    y Improved over time, and maintained.

    Todays agenda

  • 8/6/2019 Busch Slides

    94/126

    94Taxonomy Strategies LLC The business of organized

    9:00-9:10 10 minIntroduction

    9:10-9:15 5 minWarm-up exercise

    9:15-9:45 30 minTaxonomy fundamentals: Building taxonomies

    9:45-10:00 15 minTaxonomy exercise

    10:00-10:30 30 minTaxonomy fundamentals: Taxonomy business case

    10:30-11:00 30 minTea Break

    11:00-12:00 60 minTaxonomy governance12:00-12:30 30 minCapabilities self-assessment

    12:30-13:30 60 minLunch

    13:30-14:30 60 minTaxonomy benchmarking

    14:30-14:45 15 minBenchmarking exercise

    14:45-15:15 30 minTea Break

    15:15-16:15 60 minContent tagging

    16:15-16:30 15 minTagging exercise

    16:30-17:00 30 minQ&A

    Tagging Overview

  • 8/6/2019 Busch Slides

    95/126

    95Taxonomy Strategies LLC The business of organized

    y Tagging is better than the words that happen to occur in apiece of content.

    y All tagging is useful End user tagging

    Tagging by librarians

    Automated tagging by OS and algorithms

    y Content should be tagged throughout its lifecycle, eachtime the content is handled and used so that it accruesvalue or its significance is diminished.

    MS Office: File Properties

  • 8/6/2019 Busch Slides

    96/126

    96Taxonomy Strategies LLC The business of organized

    Howmanypeoplefillthisi

    n?

    Organize

  • 8/6/2019 Busch Slides

    97/126

    97Taxonomy Strategies LLC The business of organized

    Howmanype

    opleclickonthis?

    What is social tagging?

  • 8/6/2019 Busch Slides

    98/126

    98Taxonomy Strategies LLC The business of organized

    y End user tagging

    y Easy, intuitive tagging interfaces

    y Almost instantaneous feedback Enables people to tag & re-tag content

    in response to seeing their tags in context with other tags.

    y Emergent categories Resembles open card sort process in which patterns emerge

    rather than validating categories using closed card sorts.

    Social tagging innovators

  • 8/6/2019 Busch Slides

    99/126

    99Taxonomy Strategies LLC The business of organized

    y flickr founders Caterina Fake

    Stewart Butterfield

    y del.icio.us founder Joshua Schachter

    y

    del.icio.us & flickr are now both part of Yahoo!y As of April 2006 flickr had 130 million photos posted by 3

    million registered users.

    Four tagging rules for end users

  • 8/6/2019 Busch Slides

    100/126

    100Taxonomy Strategies LLC The business of organized

    Rule Description

    Use specific terms Apply the most specific terms when tagging content.But do not tag every possible topic, just the onesthat are most important or best characterize thecontent as a whole.

    Use multiple terms Use as many terms as necessary to describeoverall What the content is about& Why it isimportant. Do not over-tag.

    Use appropriateterms

    Only fill-in the facets & values that make sense. Notall facets apply to all content.

    Consider howcontent will beused

    Anticipate how the content will be searched forinthe future, & how to make it easy to find it.Remember that search engines can only operate onexplicit information.

    Agenda

  • 8/6/2019 Busch Slides

    101/126

    101Taxonomy Strategies LLC The business of organized

    y Content Tagging

    y Tagging Interface

    Requirements for a tagging interface

  • 8/6/2019 Busch Slides

    102/126

    102Taxonomy Strategies LLC The business of organized

    y Automated form fill-in (automatically fills in known data)y Tagging precedents (see tags already assigned by

    others)y Controlled vocabularies, e.g., with pull-down listy Multi-valued tags

    y Geo-taggingy Group taggingy Clean-up tag tools, e.g., alpha listy Batch editing

    y Share/Dont share (Public/Private)y Identified owner (who can be emailed)y Almost immediate feedback, e.g., tag cloud

    Form fill-in: Automatically filled-in known data

  • 8/6/2019 Busch Slides

    103/126

    103Taxonomy Strategies LLC The business of organized

    Form fill-in: Automatically filled-in known data

  • 8/6/2019 Busch Slides

    104/126

    104Taxonomy Strategies LLC The business of organized

    Manual form fill-in w/ checkboxes, pull-down lists, etc.

    Auto keyword &

    summarization

    Form fill-in: Automatically filled-in known data

  • 8/6/2019 Busch Slides

    105/126

    105Taxonomy Strategies LLC The business of organized

    Auto-categorization

    Parse & lookup

    (recognize names)

    Rules & pattern

    matching

    Tagging precedents:

    See tags assigned by others

  • 8/6/2019 Busch Slides

    106/126

    106Taxonomy Strategies LLC The business of organized

    See tags assigned by others

    Multi-valued group tagging

  • 8/6/2019 Busch Slides

    107/126

    107Taxonomy Strategies LLC The business of organized

    Group geo-tagging

  • 8/6/2019 Busch Slides

    108/126

    108Taxonomy Strategies LLC The business of organized

    Group geo-tagging

  • 8/6/2019 Busch Slides

    109/126

    109Taxonomy Strategies LLC The business of organized

    Clean up tag tools: Alpha list

  • 8/6/2019 Busch Slides

    110/126

    110Taxonomy Strategies LLC The business of organized

    Batch edit

  • 8/6/2019 Busch Slides

    111/126

    111Taxonomy Strategies LLC The business of organized

    Share or dont share tagging

  • 8/6/2019 Busch Slides

    112/126

    112Taxonomy Strategies LLC The business of organized

    Bulk tagging

  • 8/6/2019 Busch Slides

    113/126

    113Taxonomy Strategies LLC The business of organized

    y ID collection of related content items by pattern or context

    y Then, apply same attributes to all content items

    Tag a folder

  • 8/6/2019 Busch Slides

    114/126

    114Taxonomy Strategies LLC The business of organized

    y Drag & drop content items into folder

    y Then, content items inherit properties of folder

    Workflow

  • 8/6/2019 Busch Slides

    115/126

    115Taxonomy Strategies LLC The business of organized

    y Approve & improve mindset

    Review &

    Improve

    Review &

    Improve

    Add

    Metadata

    Create

    Content Publish

    Interactive rewards

  • 8/6/2019 Busch Slides

    116/126

    116Taxonomy Strategies LLC The business of organized

    y Almost instantaneous exposure of tags in simple user

    interfaces on the web provides positive reinforcement foruser tagging that simply did not exist before.

    y For example, Most popular

    Tag clouds Alerts

    Most popular

  • 8/6/2019 Busch Slides

    117/126

    117Taxonomy Strategies LLC The business of organized

    Another example is most emailed from, e.g., the NYTimes.

    Tag cloud

  • 8/6/2019 Busch Slides

    118/126

    118Taxonomy Strategies LLC The business of organized

    Alerts

  • 8/6/2019 Busch Slides

    119/126

    119Taxonomy Strategies LLC The business of organized

    y New (content selected by date)

    y Subscriptions (content selected by tags)

    y Interest (content selected by other people)

    y Individual (content selected for you by other people)

    Strategies LLCTaxonomy

  • 8/6/2019 Busch Slides

    120/126

    6-15 June 2007 Copyright 2007 Taxonomy Strategies LLC. All rights reserved.

    Is faceted indexing the future of

    social tagging?

    Tagging exercise: Blog tagging (a)

  • 8/6/2019 Busch Slides

    121/126

    121Taxonomy Strategies LLC The business of organized

    ALA Tech Source. http://www.techsource.ala.org/blog/2007/04/google-buys-oclc-announces-new-products.html

    Tagging exercise: Blog tagging (b)

    http://www.techsource.ala.org/blog/2007/04/google-buys-oclc-announces-new-products.htmlhttp://www.techsource.ala.org/blog/2007/04/google-buys-oclc-announces-new-products.html
  • 8/6/2019 Busch Slides

    122/126

    122Taxonomy Strategies LLC The business of organized

    HBSP. http://discussionleader.hbsp.com/davenport/2007/04/cause_and_effect_reporting_raw.html#comments

    Tagging exercise: Taxonomy facetsdefinitions

    http://discussionleader.hbsp.com/davenport/2007/04/cause_and_effect_reporting_raw.htmlhttp://discussionleader.hbsp.com/davenport/2007/04/cause_and_effect_reporting_raw.html
  • 8/6/2019 Busch Slides

    123/126

    123Taxonomy Strategies LLC The business of organized

    Taxonomy Facets Descriptions

    Business activity Use for common business function or activity such asfinance, marketing and sales.

    Industry / Product Use for content that is about or related to an industrialsector or product such as construction equipment.

    Geography Use for content that is about a region, country or city.

    Organization Use for named organizations, brands and businessentities.

    Person / Role Use for named people and the roles people have inorganizations.

    Content Type Use for content genres such as letters, memos andreports.

    Audience Use to indicate the intended audience.

    Topic Use for other business and associated topics that thecontent is about or related to.

    Tagging exercise: Taxonomy facetsvalues

  • 8/6/2019 Busch Slides

    124/126

    124Taxonomy Strategies LLC The business of organized

    Geography Industry / Product People / RoleOrganization /

    EntityContent TypeBusiness activity

    Business LeadersThought LeadersPolitical LeadersRoles

    Business entitiesCompanies &brands

    Governmentagencies

    InternationalNGOsOrganizationtypes

    Agriculture MiningUtilitiesConstructionManufacturingWholesale tradeRetail tradeTransportation &

    warehousingInformationFinance &

    insuranceReal estateProfessionalManagementAdministrative

    supportEducationHealth careArts, entertainment

    & recreationAccommodation &

    food

    Other servicesPublic

    administration

    AfricaAmericasAntarcticaAsiaEuropeOceaniaGlobalHistorical

    geographyOceans & seas

    Regions

    Audience

    AccountingAuditingFinanceHR managementITMarketingOperations

    managementSales

    ConsumerEmployeeManagerExecutive

    Basic facts &information

    BlogBrochureDatabaseE-mailLetterMemoMultimediaReport

    NewsletterPodcastPress ReleaseResearch &Analysis

    RSS Feed

    Taxonomy Facets Tags

    Business activity

    Industry / ProductGeography

    Organization

    Person / Role

    Content Type

    Audience

    Topic

    Summary

  • 8/6/2019 Busch Slides

    125/126

    125Taxonomy Strategies LLC The business of organized

    y There are lessons to be learned from web tagging about

    how to get good metadata in document and contentmanagement applications.

    y Document and content management system tagging mustbe simple, and it must be almost instantaneously easier

    to find relevant work products.

    Strategies LLCTaxonomy

  • 8/6/2019 Busch Slides

    126/126

    Questions?

    Joseph A. Busch

    + 415-377-7912

    [email protected]

    http://www.taxonomystrategies.com

    mailto:[email protected]://www.taxonomystrategies.com/http://www.taxonomystrategies.com/mailto:[email protected]