32
14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

Embed Size (px)

Citation preview

Page 1: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 1

Exploring Verity K2 through Pilot Applications

and Taxonomy Development

Gordon Campbell

Director, IS Strategic Planning & Innovation

Page 2: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 2

sanofi pasteurThe vaccines business of sanofi-aventis Group

sanofi-aventis GroupFormed in 2004 by the merger of Sanofi-Synthélabo + Aventis

2004 Revenues = 25.4 Billion Euros

100,000 Employees

3rd largest Pharma company in the world

1st in Europe

sanofi pasteurWorld leader in Vaccines

2004 Revenues = 1.6 Billion Euros

8,000 Employees

Heritage includes Louis Pasteur (1890’s) and other vaccine pioneers (Merieux, Slee)

Page 3: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 3

Global CIO with Global Functional HeadsCIOs for N. America and FranceCIO – R&DCIO – Industrial OperationsCIO – Commercial Operations (Sales & Marketing)CIO – Business Support (Functions – Finance, HR, etc.)Director, Global Infrastructure & OperationsDirector, IS QualityDirector, IS Strategic Planning & Innovation

Director, IS Strategic Planning & Innovation – responsibilitiesTransversal role – bridging functions & technologiesManage the Long Range Planning processManage the Global IS PortfolioVerity Champion – formulate the strategy and foster appropriate pilots and applications

sanofi pasteur IS Organization

Page 4: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 4

Verity Experience at sanofi pasteur

Pre Verity K2 (through 2003)Limited applications – primarily intranet

Verity K2 AcquisitionEnd of 2003

Two primary applications targeted:Improve Intranet search results

Global Medical Affairs - share common disease / vaccine information

2004 Verity K2 – Pilots + Applications4 Pilots to explore taxonomies and multi-repository search

Plus 5 Applications

Developed with two consultancies:Verity Consulting Services – for French pilots & applications

Raritan Technologies, Inc. – for N. American pilots & applications

Page 5: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 5

Search 101 – Basic Concepts

Google – a familiar search engine to manyEasy to use and results are ranked, often showing the best results near the top of the list (and paid sponsored links on the right).

Results are ranked based on Google’s

proprietary & typically secret algorithms

Users often mention Google when describing the type of search they

would like to have

Page 6: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 6

101 - But there is much more Content to Search than what exists on the open Web

Enterprise generated content

is huge …Office documents

eMails

Database driven web pages

Not to mention other types of media (voice, video, etc.)

These estimates are from

2003 Study (UC Berkeley)With some volumes expected to double in 3 yrs

The increasing dilemma …How can I find what I need in a timely fashion?

Will I be forced to recreate what I can’t find?

Annual Information Volumes1

Media Type Terabytes 2 Comments

Scholarly publications

6 37,600 titles per year

Searchable Web

167 Openly accessible sites

Office Documents

1,397 10.75 billion pages per year

Deep Web 91,850 DB driven web sites

eMails (originals)

440,606 31 billion emails sent per day

Hard Disk Drives

1,986,000 44 million items per year

1 Source: How much information 2003?

2 Terabyte = 1 million million bytes, or approx 50,000 trees made into paper and printed.

Page 7: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 7

NavigationBrowsing for information is the most common way to locate content of interestTypically, information is organized in a hierarchy of folders

Taxonomies – can provide the structure and logicBut, content must still be stored in appropriate locations, with meaningful descriptions (file names, abstracts, etc.)

As complexity increases, finding content by navigation becomes more and more difficult

Complexity factors include – volume & scope of content, multiple storage repositories, multiple copies of documents, etc.

SearchRepresents the primary alternative to navigationSimple text searches are very common in specific content repositories, but they may not produce effective resultsSophisticated search tools can yield prioritized and comprehensive lists of results …

but they require content access, rules and other techniques.

101 – Navigation & Search are Complimentary

Page 8: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 8

101 - Requirements of a Good Search Engine:Access - content must be accessible to the search tools

First, access must be public or the user must have permission to search the site. (NB: Google cannot search secured or protected web sites.)Through a pre-established index of the content produced by a crawler or spider (this is the approach used by Google, producing very fast search results).Through bots that scan content at the time of the query. Some workers (bots) make use of local search engines often provided with a set of content.

Search Results – what will produce the best set of results?Simple text string, possibly with Boolean operators may locate only exact matches. Boolean skill may be necessary to enhance results.Rules augmented searches can locate many more items that are missed in a simple search, because they recognize synonyms, associated terms, etc.

Ranking Results – vastly improves the value of the searchAlgorithms are used to score items found by the search, and rank order the results, attempting to place the best matches near the top.Ranking scores can take into account many factors, such as where the search term is found - keyword list, the title or only the body of text? How often it appears. Proximity to other related key terms. Etc.

Bot is common parlance on the Internet for a software program acting as an agent on behalf of a user. Bots interact with other network services intended for people, as if it was a real person. One typical use of bots is to gather information. The term is derived from the word robot, reflecting the autonomous character in the "virtual robot"-ness of the concept.

Page 9: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 9

101 - The Business Value of Good Search Tools

Parametric Search

Create & Maintain

EnterpriseTaxonomies

Federated or Consolidated

Search

sanofi pasteurRules

Rules

TagContent

reSearch

ClassifyContent

based onEnterprise

Rules

Classified Content

BusinessDecision

News,Journals,

Etc.

Identify Key ParametersImpactingdecision

Selected &Ranked

References

MakeDecision

OtherInputs

DefineBusinessProblem

ConsiderInputs & Evaluate

Alternatives

Business Value

=Better

InformedDecisions

Taxonomies provide the foundation for vastly improved search resultsTaxonomies provide the foundation for vastly improved search results

SimpleText

Search

Key Issues:•How long to find

needed info?•Quality of

results?•Missing or

inaccessible info?

Page 10: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 10

Verity Pilots & Applications so far …

2004 Pilots

MeSH* Taxonomy ExtensionAdded depth and granularity on vaccine topics

Departmental Shared Folder2nd Taxonomy study –

Process Development3rd build to the taxonomy

Consolidated IS Content Search

Combine Verity collections from 3 different sources

Applications

VaccinePlace.comPublic service web site

Intranet K2 UpgradeStatic HTML pages + attachments

Global Medical ContentInternal shared access to common disease & vaccine information

RPI NewslineRegulatory publications

*MeSH = Medical Subject Heading from the US National Library of Medicine / NIH

Page 11: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 11

US Vaccine Educational Web Sites

Corpus of DocumentsHTML pages of various vaccine information sites, including:

Daptacel.comInfluenza.comMeningitisvaccine.comRabies.comTetanus.orgTravelersvaccines.comVaccineProtection.com

Business DriversIncrease consumer access to information on vaccine-preventable diseases.Consolidate Internet access to several sites focused on vaccine-preventable diseases

Search ApproachSimple keyword text search

Taxonomy ExtensionsNone

Page 12: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 12

VaccinePlace.com

Page 13: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 13

Simple Text Search Results help VaccinePlace visitors find information quickly …

Page 14: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 14

… or visitors can Browse to learn

Page 15: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 15

But simple text searches are just the beginning – our Model for Improved Search Results includes …

TaxonomiesProvide the foundation for vastly improved search resultsBut public and commercial taxonomies often lack the richness and knowledge available in the enterprise

Our method for developing an enterprise taxonomy included:Use an existing taxonomy as a starting point - MeSH+ A Professional Librarian – Hugh McNaught+ Subject matter experts+ Verity experts – Raritan Technologies, Inc.= Robust Taxonomy + Enterprise Rules

Parametric Search Portal Is essential to test the taxonomy / rules effectivenessProvide enhanced access to the documents in the collection

*MeSH = Medical Subject Heading from the US National Library of Medicine / NIH

Page 16: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 16

Taxonomy Concepts

Taxonomies – what are they?A hierarchical classification of things, or the principles underlying the classification. Almost anything, animate objects, inanimate objects, places, and events, may be classified according to some taxonomic scheme.

Why develop and use taxonomies?By developing and applying taxonomies that are specific to the collection(s) of interest, items in the collection(s) can be retrieved faster and easier. The items retrieved will be more relevant and more precise to the query asked.

Sources of taxonomiesMeSH = Medical Subject Heading from the US National Library of Medicine / NIHLibrary of Congress and other public domain sourcesCommercial taxonomies (Factiva, Verity, etc.)Internally developed – can enrich public domain / commercial taxonomies with enterprise knowledge

Page 17: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 17

MeSH* + Reference Manager – 1st Pilot to Launch Development of a sanofi pasteur Taxonomy

Corpus of DocumentsVaccine related scientific publicationsAbstracts stored in the Reference Manager DB

Business DriversGlobal Information & Library Sciences desire to significantly improve quality of search results across a broad range of collectionsRecognition that public taxonomies such as MeSH, are not as rich in vaccine terms as needed

Search ApproachParametric + key word on title, abstract + key words

Taxonomy ExtensionsVaccine nodes of MeSH* taxonomyProductsCompaniesGeography

*MeSH = Medical Subject Heading

Page 18: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 18

MeSH nodes Structure

Top Level MeSH D24 Nodes, including Vaccines

Page 19: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 19

Expanded Vaccine NodePoliovirus Vaccine Structure

Page 20: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 20

Verity Intelligent Classifier (VIC) - Provides tools to Enhance the Taxonomy and Create Rules

Taxonomy PaneTo create & modify the

users’ navigation structure

Topics PaneTo create & modify the rules –

synonyms, concepts, relationships, etc.

Page 21: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 21

Poliovirus Vaccine Rules

• This is the set of Rules for the Poliovirus Vaccine

• The Inactivated node is expanded in this example.

• The high level node corresponds with a node in the structure.

• The rules ‘roll up’ to each higher level.

• All nodes contain Terms pertaining to the node, and Products used to treat that Virus

Verity Query LanguageIs the syntax used by VIC

to create & modify the rules

Page 22: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 22

MeSH Extension – Product Taxonomy / Flu node

Flu vaccine brand names

Page 23: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 23

Parametric Search Portal of the Reference Manager DB based on an Expanded MeSH Taxonomy

Company information added to

MeSH

Product information

added to MeSH

Geography nodes from

MeSH

6016 articles on Viruses

Page 24: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 24

Clicking on a Parameter such as Influenza Vaccine automatically limits results …

5 BCG articles also

mentioning Flu Vaccine

Results can then be combined with a text

search for more precise selections

Further breakdowns of

the specific context for the

hits

Titles of the articles meeting the selected parameters are listed

here. 455 articles reference the

Americas

Page 25: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 25

Search Parameters

Company- sanofi-aventis- GSK- Wyeth- etc.

Country- N. Am.- Europe- China- etc.

Verity Search Capabilities Leverage the Rules Incorporated in the Taxonomies

Text / Keyword / Parametric

Search

Federated / Consolidated Multi-source

Search

sanofi pasteurRules

TaggedVerity

Collections

VIC

Franchise- Flu- Pediatric- Traveler- Menactra

Rules reflect:- Synonyms- Concepts- Relationships

Disease- Flu- Tetanus- Polio- etc.

FocusedReference

Set

Includes only articles matching the selected parameters

Source 1

Source 2

Source n

TargetedRankedResults

Text / key words search across multiple sources & consolidateresults in one view

sanofi pasteurTaxonomies

Page 26: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 26

Department Shared Folder Application

Corpus of DocumentsInternal documents stored on a shared network driveContents included a variety of Microsoft Office documents and Adobe Acrobat files

Business DriversNeed to locate relevant documents without a detailed knowledge of the folder structure & filing system

Search ApproachParametric + key word on title, abstract, key words and full text

Taxonomy ExtensionsFranchises – new nodes & rules (ex: Travel Vaccines)Companies – building on version 1 MeSH

Page 27: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 27

Intranet K2 Enhancement

Corpus of DocumentsAll HTML static pages on sanofi pasteur intranetAttachmentsBut not yet contents of applications accessed through the Intranet

Business DriversNeed to greatly improve search results on intranet

Search ApproachKey word on title + full textResults ranked according to standard Verity algorithmsNot yet available –

Benefits from applying taxonomy rules, synonyms, etc.Benefits from federated searches of applications accessed via Intranet

Taxonomy ExtensionsNone yetApproach – for subject areas such as IS, HR, Purchasing, etc.

We could acquire commercial or public domain taxonomiesOr, we could develop something internally, similar to what was done for vaccines

Page 28: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 28

Intranet – K2 Search Results

Page 29: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 29

IS Content – Consolidated Search Portal

Corpus of DocumentsIS Intranet sites

IS shared folders (network drives)

IS Exchange Public folders

eRooms – not yet included in this pilot

Business DriversPilot techniques to access content stored in various online repositories

Search ApproachKey word on content

Taxonomy ExtensionsNone yet

Exploring public domain & commercial options

Page 30: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 30

Search Access Components – a Summary

Components Description Tools CapabilitiesExperience

To-date

Verity Collection - Full text index of a corpus of documents

Verity K2 - Std Verity ranked list of results

- Required for taxonomy appl.

- Intranet K2 upgrade

Taxonomy - Hierarchy structure- Rules for searching &

ranking results

VIC - Structured browsing- Synonyms, rules

3 pilots – Ref Man DB, Shared Drive, Process Development

Gateways - Access to proprietary repository formats

Verity stds - Access & index contents, while respecting security

- Documentum (Global Content application)

Workers & Extractors

- Agent using repository’s native search engine

Custom Developmt

- Return results- Create Verity collection

- eRoom planned 05.

Basic Search Portal

- Simple text search of keywords / contents

Verity or Custom Developmt

- Unranked results- Possibly limited by a

qualifier (date, author).

- VaccinePlace.com- Other internal apps

Parametric Search Portal

- Predefined parameters (ex: product, co. name)

Verity or Custom

- Reduced set based on parameters

- Ref Man DB (MeSH)- Shared Dept Drive- Global Content

Federated Search Portal

- Multiple sources, using native search engines

- workers - Combine results from multiple sources

- None yet

Consolidated Search Portal

- Multiple sources, using extract

Custom Developmt

- Combined results + taxonomy ranking

- IS Pilot (in progress)

Page 31: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 31

Extend the sanofi pasteur Enterprise Taxonomy – Build other non-vaccine nodes (ex: IS, HR, Legal, IO, etc.)

Apply to other applications – such as the Intranet sites

Add GatewayseRoom – worker and extractor to create Verity Collections

Data Discovery – a new application of Verity technologyGoal – review nature of internal content existing today on network drives, Public Folders, eRooms, etc.

Identify candidates for archiving / destruction

Isolate content worth including in Verity collections

R&D Consolidated Search PortalExplore needs and develop a business case

Across a broad array of internal and external sources

2005 Verity Projects – Applying What we have Learned and Extending our Learning

Page 32: 14 Mar 05 1 Exploring Verity K2 through Pilot Applications and Taxonomy Development Gordon Campbell Director, IS Strategic Planning & Innovation

14 Mar 05 32

Questions?

[email protected]

(570) 839-4277

Swiftwater, PA 18370