36
Introducing ODIE NCBO Seminar Series February 18, 2009

Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

Introducing ODIENCBO Seminar Series

February 18, 2009

Page 2: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

Example

Page 3: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

IE using ontologies

Diagnosis Malignant MelanomaBreslow Depth 0.72 mmLateral Margin PositiveRegression ProbableUlceration NegativeTIL Focally Brisk

Page 4: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

OE using documentspunch biopsyjunctional componentpagetoid spreaddermal melanocytesBreslow depthlymphocytic infiltratesregressionmicroscopic satellitesvascular invasiontumor infiltrating lymphocytesSpitz nevusepithelioid nevus

Page 5: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

Two Tasks ~ One problem

Ontology

Text

Ontology Enrichment:Uses concepts as source of concepts and relationships to enrich and validate ontology

Information Extraction:Uses concepts as source of concepts and relationships to enrich and validate ontology

Specific Aims 2,3,4

Specific Aims 1,3,5

Page 6: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

Specific Aims Specific Aim 1: Develop and evaluate methods for information extraction (IE) tasks using existing 

OBO ontologies, including:

Named Entity Recognition (NER)

Co‐reference Resolution (CR)

Discourse Reasoning (DR)

Attribute Value Extraction (AVE)

Specific Aim 2: Develop and evaluate general methods for clinical‐text mining to assist in ontology development, including:

Concept Discovery (CD)

Concept Clustering (CC)

Taxonomic Positioning (TP)

Specific Aim 3: Develop reusable software for performing information extraction and ontology development leveraging existing NCBO tools and compatible with NCBO architecture.

Specific Aim 4: Enhance National Cancer Institute Thesaurus Ontology using the ODIE toolkit.

Specific Aim 5: Test the ability of the resulting software and ontologies to address important translational research questions in hematologic cancers.

Page 7: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

Ontology Enrichment

• Machine assisted

‐ Extraction‐ Filtering and Organization‐ Visualization‐ Suggestions 

• Human decision‐maker (developer, curator)

• Feedback and improvement of OE

Page 8: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

Project OrganizationConcept Discovery Coreference Resolution ODIE 0.5

Kaihong LiuRebecca Crowley Wendy ChapmanKevin Mitchell

Wendy ChapmanGuergana SavovaMelissa Castine

Rebecca Crowley Kevin MitchellGirish ChavanEugene Tseytlin

Study and compare methods for ontology enrichment; design methods for evaluation

Develop annotation scheme; create Reference Standard, consider and test existing algorithms; design, implement & test new algorithms

Develop and implement architecture and UI; Create framework for using results of research; Implement work of research groups

Page 9: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

Domain

Will attempt to develop general tools whenever possible

• Priorities for evaluation of components in :

Radiology and pathology reports

NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA)

Cancer domains (including hematologic oncology)

Page 10: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

Progress

• ODIE 0.5 pre‐release on NCBO SourceForge

• Annotation software and document sets

• Res Proj #1: LSP annotation project

• Res Proj #2: Coreference resolution annotation

• Starting Res Proj #3: Discourse Reasoning

Page 11: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

• Toolkit for developers of NLP applications and ontologies

• Pre‐released on NCBO SourceForge as ODIE 0.5

• Current release focuses on NER and CD

• Support interaction and experimentation

• Package systems at the conclusion of working with ODIE

• Foster cycle of enrichment and extraction needed to advance development of NLP systems

• Ontology enrichment as opposed to denovo development

• Human‐machine collaboration as opposed to fully automated learning

ODIE Software

Page 12: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

ODIE Download/InfoODIE Installer: 

http://caties.cabig.upmc.edu/ODIE/odieinstaller.exe

GForge Site: https://bmir‐gforge.stanford.edu/gf/project/odie/

User Forums: https://bmir‐gforge.stanford.edu/gf/project/odie/forum/

ODIE on NCBO Tools Page: http://bioontology.org/tools/ODIE.html

Page 13: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

Users/WorkflowODIE is intended for: 

• users who want to use NCBO ontologies to perform various NLP tasks (+/‐may need to add concepts locally to achieve sufficient performance)

• users who want to enrich ontologies using concepts derived from documents (very early in process of ontology development)

Page 14: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

Plans for ODIE 1.0Ability to import additional ontologies from Bioportal or from owl files

Ability to export proposal/enriched ontologies.

Ability to add and configure new processing resources (UIMA or GATE based)

Ability to build processing pipelines using processing resources

Will come out of the box with a processing pipeline and processing resources for NER, CD and COREF.

Page 15: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

Research Project 1:Ontology Enrichment

Nearly completed survey of lexical, statistical and hybrid methods for ontology enrichment

Methodology to study “utility” of various approaches (Liu, PhD Thesis in progress)

First project underway involves the simplest of the methods to be studied – Lexicosyntactic Patterns (LSP) – regular expressions over POS

Concept Discovery

Kaihong LiuRebecca Crowley Wendy ChapmanKevin Mitchell

Study and compare methods for ontology enrichment; design methods for evaluation

Page 16: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

LSP Patterns

The presence of certain “lexico‐syntactic patterns” can indicate a particular semantic relationship between two nouns

Example:

DIFFERENTIAL DIAGNOSIS INCLUDES, BUT IS NOT LIMITED TO, SPINDLE CELL NEOPLASM OF PERINEURIAL ORIGIN (SUCH AS SCHWANNOMA) AND SPINDLE CELL MALIGNANT MELANOMA

“such as” indicates  hyponym relationship between two noun phrase

Presenter
Presentation Notes
Here is some of background information about LSP method.
Page 17: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

Technique 1 - LSP

PRURIGO NODULE  (aka LICHEN SIMPLEX CHRONICUS)

COMPATIBLE WITH BENIGN ECCRINE NEOPLASIA, SUCH AS NODULAR HIDROADENOMA

Page 18: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

LPS distribution result

PatternsPathology Corpus

852764 reports, 16157608 sentencesRadiology Corpus

209997 Reports, 4057228 sentences# Sentences  Unique # of sentences # Sentences  Unique # of sentences

NP especially NP 14 11 19 10NP also called NP 48 37 29 22NP such as NP 98 95 906 251NP's NP 202 45 5 2NP in NP 4851 1689 106 47NP aka NP 5396 460 2 2NP including NP 6291 4952 1403 747NP other NP 6940 2251 10622 1407NP like NP 7649 2267 410 235NP, NP 8211 5351 7385 3889NP of NP 14275 4032 2906 607NP in the NP 47124 23178 64044 29285NP is NP 92374 25024 7349 2896NP of the NP 246798 70735 173016 54895

Number of sentences contain lexico-syntactic pastterns

Page 19: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

Step 1 ‐Domain Expert annotation

• Annotation tasks: 1. Meaningful medical phrases (MMP) that can stand alone

before LSP and after LSP.2. The phrases before and after LSP have to be related

•Before LSP •After LSP•LSP

Term1 Term2

PRURIGO NODULE LICHEN SIMPLEX CHRONICUS

BENIGN ECCRINE NEOPLASIA NODULAR HIDROADENOMA….. …….

• Calculate :  total # of MMP , # of MMP per LSP 

PRURIGO NODULE  (aka LICHEN SIMPLEX CHRONICUS)

COMPATIBLE WITH BENIGN ECCRINE NEOPLASIA, SUCH AS NODULAR HIDROADENOMA

Page 20: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

Step 2 ‐ Curator Judgment

1. Is the concept in the ontology?

2. If not, should it be added into the ontology?

3. If not, what is the reason?

For each term

1. What is the relationship between them?

2. Is this relationship exist in the ontology?

3. If not, should it be added into the ontology?

4. If not, what is the reason?

For each pair of terms

Term1 Term2

PRURIGO NODULE LICHEN SIMPLEX CHRONICUS

BENIGN ECCRINE NEOPLASIA NODULAR HIDROADENOMA….. …….

New Concept and Relationship Suggestion Rates

New Concept and Relationship Acceptance Rates

Page 21: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

First experiment result–concept enrichment

Radiology ReportsProceed the LSP Following  the LSP

Total # of meaningful medical Phrase

# of meaningful medical Phrase/ # of

LSP  

Total # of meaningful medical

Phrase 

# of meaningful medical 

Phrase/ # of LSP 

such as 17 100% 31 124%including 27 159% 66 264%

Pathology ReportsProceed the LSP Following the LSP

Total # of meaningful 

medical Phrase

# of meaningful medical Phrase/ # of

LSP (25)  Total # of meaningful 

medical Phrase

# of meaningful medical Phrase/ 

# of LSP (25)such as 27 108% 55 220%

including 24 96% 35 233%aka 25 100% 28 112%

Page 22: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

First experiment result– concept enrichment (NCIT)

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

such as including aka

Suggestive rate Acceptance rate

Page 23: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

First experiment – extracted relationships

36%

11%

15%

11%

67%

64%

75%

19%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

such as 

including 

aka 

Hyponym 

Meronym 

Synonym 

Other 

Page 24: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

First experiment – extracted relationships

LSPssuch as including aka

Perc

enta

ge

0

20

40

60

80

100

Hyponym relationship is not in the NCIT Hyponym relationship should be added into the NCIT

Page 25: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

First experiment – Concept Enrichment for RadLex

Column1 # of TermsNot in

RadLexIn

RadLex Blank

Should be added to RadLex

Suggestion rate

Acceptance rate

Proceeding LSP 29 11 16 2 10 38% 91%

Following LSP 68 24 41 3 10 35% 42%

Total 97 35 57 5 20 36% 57%

Page 26: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

Research Project 2:Coreference Resolution

Anaphoric relations are relations between linguistic expressions where the interpretation of one linguistic expression (the anaphor) relies on the interpretation of another linguistic expression (the antecedent)

Examples of Types of anaphoric relations:

Identity (or coreference)Set/subsetPart/whole

Anaphora resolution is a computational technique for the discovery of anaphoric relations

Coreference Resolution

Wendy ChapmanGuergana SavovaMelissa Castine

Develop annotation scheme; create Reference Standard, consider and test existing algorithms; design, implement & test new algorithms

Page 27: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

DefinitionsAnaphoric relations are relations between linguistic expressions where the interpretation of one linguistic expression (the anaphor) relies on the interpretation of another linguistic expression (the antecedent)

Type of anaphoric relations

Identity (or coreference)Set/subsetPart/wholeOther

Anaphora resolution is a computational technique for the discovery of anaphoric relations

Page 28: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

ProgressCompleted and Ongoing:Annotation schema DevelopmentGuidelinesTraining of annotators

4 training sessionsIAA: after session 1 – in the 40’sIAA: after session 3 – in the 60’sPlanned:Complete Reference Standard (RS)Algorithm testing and further development

Page 29: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

Data Sets for RS50 clinical notes (named entities annotated)

50 Pathology (disorders, tumors)

20 Pathology (conditions)

20 Radiology (conditions)

20 Discharge summaries (conditions)

20 ED (conditions)

20 ED (respiratory conditions)•Mayo

•Pitt

Page 30: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

QUESTIONS ?

Page 31: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

Visualization of document set

Page 32: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

NER – viewing concepts

Page 33: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

Multiple Ontologies

Page 34: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

OE – Concept Suggestion

Page 35: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

Ranked Suggestions

Page 36: Training Program Update - bioontology.org...NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) 9 Cancer domains (including hematologic oncology) Progress •ODIE

Adding Proposals