View
25
Download
0
Category
Tags:
Preview:
DESCRIPTION
Introducing ODIE. NCBO Seminar Series February 18, 2009. Example. IE using ontologies. OE using documents. punch biopsy junctional component pagetoid spread dermal melanocytes Breslow depth lymphocytic infiltrates regression microscopic satellites vascular invasion - PowerPoint PPT Presentation
Citation preview
Introducing ODIEIntroducing ODIE
NCBO Seminar Series
February 18, 2009
ExampleExample
IE using ontologiesIE using ontologies
Diagnosis Malignant MelanomaBreslow Depth 0.72 mmLateral Margin PositiveRegression ProbableUlceration NegativeTIL Focally Brisk
OE using documentsOE using documents
punch biopsyjunctional componentpagetoid spreaddermal melanocytesBreslow depthlymphocytic infiltratesregressionmicroscopic satellitesvascular invasiontumor infiltrating lymphocytesSpitz nevusepithelioid nevus
Two Tasks ~ One problemTwo Tasks ~ One problem
Ontology
TextOntology Enrichment:Uses concepts as source of concepts and relationships to enrich and validate ontology
Information Extraction:Uses concepts as source of concepts and relationships to enrich and validate ontology
Specific Aims 2,3,4
Specific Aims 1,3,5
Specific Aims Specific Aims Specific Aim 1: Develop and evaluate methods for information extraction (IE) tasks
using existing OBO ontologies, including:
Named Entity Recognition (NER)
Co-reference Resolution (CR)
Discourse Reasoning (DR)
Attribute Value Extraction (AVE)
Specific Aim 2: Develop and evaluate general methods for clinical-text mining to assist in ontology development, including:
Concept Discovery (CD)
Concept Clustering (CC)
Taxonomic Positioning (TP)
Specific Aim 3: Develop reusable software for performing information extraction and ontology development leveraging existing NCBO tools and compatible with NCBO architecture.
Specific Aim 4: Enhance National Cancer Institute Thesaurus Ontology using the ODIE toolkit.
Specific Aim 5: Test the ability of the resulting software and ontologies to address important translational research questions in hematologic cancers.
Ontology EnrichmentOntology Enrichment
• Machine assisted
- Extraction- Filtering and Organization- Visualization- Suggestions
• Human decision-maker (developer, curator)
• Feedback and improvement of OE
Project OrganizationProject Organization
Concept Discovery Coreference Resolution ODIE 0.5
Kaihong LiuRebecca Crowley Wendy ChapmanKevin Mitchell
Wendy ChapmanGuergana SavovaMelissa Castine
Rebecca Crowley Kevin MitchellGirish ChavanEugene Tseytlin
Study and compare methods for ontology enrichment; design methods for evaluation
Develop annotation scheme; create Reference Standard, consider and test existing algorithms; design, implement & test new algorithms
Develop and implement architecture and UI; Create framework for using results of research; Implement work of research groups
DomainDomain
Will attempt to develop general tools whenever possible
• Priorities for evaluation of components in :
Radiology and pathology reports
NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA)
Cancer domains (including hematologic oncology)
ProgressProgress
• ODIE 0.5 pre-release on NCBO SourceForge
• Annotation software and document sets
• Res Proj #1: LSP annotation project
• Res Proj #2: Coreference resolution annotation
• Starting Res Proj #3: Discourse Reasoning
• Toolkit for developers of NLP applications and ontologies
• Pre-released on NCBO SourceForge as ODIE 0.5
• Current release focuses on NER and CD
• Support interaction and experimentation
• Package systems at the conclusion of working with ODIE
• Foster cycle of enrichment and extraction needed to advance development of NLP systems
• Ontology enrichment as opposed to denovo development
• Human-machine collaboration as opposed to fully automated learning
ODIE SoftwareODIE Software
ODIE Download/InfoODIE Download/Info
ODIE Installer: http://caties.cabig.upmc.edu/ODIE/odieinstaller.exe
GForge Site: https://bmir-gforge.stanford.edu/gf/project/odie/
User Forums: https://bmir-gforge.stanford.edu/gf/project/odie/forum/
ODIE on NCBO Tools Page: http://bioontology.org/tools/ODIE.html
Users/WorkflowUsers/Workflow
ODIE is intended for:
• users who want to use NCBO ontologies to perform various NLP tasks (+/- may need to add concepts locally to achieve sufficient performance)
• users who want to enrich ontologies using concepts derived from documents (very early in process of ontology development)
Plans for ODIE 1.0Plans for ODIE 1.0
Ability to import additional ontologies from Bioportal or from owl files
Ability to export proposal/enriched ontologies.
Ability to add and configure new processing resources (UIMA or GATE based)
Ability to build processing pipelines using processing resources
Will come out of the box with a processing pipeline and processing resources for NER, CD and COREF.
Research Project 1:Ontology EnrichmentResearch Project 1:
Ontology EnrichmentNearly completed survey of
lexical, statistical and hybrid methods for ontology enrichment
Methodology to study “utility” of various approaches (Liu, PhD Thesis in progress)
First project underway involves the simplest of the methods to be studied – Lexicosyntactic Patterns (LSP) – regular expressions over POS
Concept Discovery
Kaihong LiuRebecca Crowley Wendy ChapmanKevin Mitchell
Study and compare methods for ontology enrichment; design methods for evaluation
LSP PatternsLSP Patterns
The presence of certain “lexico-syntactic patterns” can indicate a particular semantic relationship between two nouns
Example:
DIFFERENTIAL DIAGNOSIS INCLUDES, BUT IS NOT LIMITED TO, SPINDLE CELL NEOPLASM OF PERINEURIAL ORIGIN (SUCH AS SCHWANNOMA) AND SPINDLE CELL MALIGNANT MELANOMA
“such as” indicates hyponym relationship between two noun phrase
Technique 1 - LSPTechnique 1 - LSP
PRURIGO NODULE (aka LICHEN SIMPLEX CHRONICUS)
COMPATIBLE WITH BENIGN ECCRINE NEOPLASIA, SUCH AS NODULAR HIDROADENOMA
LPS distribution resultLPS distribution result
Patterns
Pathology Corpus852764 reports, 16157608
sentences
Radiology Corpus209997 Reports, 4057228
sentences
# Sentences Unique # of sentences # Sentences
Unique # of sentences
NP especially NP 14 11 19 10NP also called NP 48 37 29 22NP such as NP 98 95 906 251NP's NP 202 45 5 2NP in NP 4851 1689 106 47NP aka NP 5396 460 2 2NP including NP 6291 4952 1403 747NP other NP 6940 2251 10622 1407NP like NP 7649 2267 410 235NP, NP 8211 5351 7385 3889NP of NP 14275 4032 2906 607NP in the NP 47124 23178 64044 29285NP is NP 92374 25024 7349 2896NP of the NP 246798 70735 173016 54895
Number of sentences contain lexico-syntactic pastterns
Step 1 -Domain Expert annotation• Annotation tasks: 1. Meaningful medical phrases (MMP) that can stand
alone before LSP and after LSP.2. The phrases before and after LSP have to be related
•Before LSP •After LSP •LSP
Term1 Term2
PRURIGO NODULE LICHEN SIMPLEX CHRONICUS
BENIGN ECCRINE NEOPLASIA NODULAR HIDROADENOMA….. …….
• Calculate : total # of MMP , # of MMP per LSP • Calculate : total # of MMP , # of MMP per LSP
PRURIGO NODULE (aka LICHEN SIMPLEX CHRONICUS)
COMPATIBLE WITH BENIGN ECCRINE NEOPLASIA, SUCH AS NODULAR HIDROADENOMA
Step 2 - Curator Judgment
1. Is the concept in the ontology?
2. If not, should it be added into the ontology?
3. If not, what is the reason?
For each term
1. What is the relationship between them?
2. Is this relationship exist in the ontology?
3. If not, should it be added into the ontology?
4. If not, what is the reason?
For each pair of terms
Term1 Term2
PRURIGO NODULE LICHEN SIMPLEX CHRONICUS
BENIGN ECCRINE NEOPLASIA NODULAR HIDROADENOMA
….. …….
New Concept and Relationship Suggestion Rates
New Concept and Relationship Acceptance Rates
First experiment result–concept enrichment
First experiment result–concept enrichment
Radiology Reports
Proceed the LSP Following the
LSP
Total # of meaningful
medical Phrase
# of meaningful medical Phrase/
# of LSP
Total # of meaningful
medical Phrase
# of meaningful
medical Phrase/ # of
LSP such as 17 100% 31 124%
including 27 159% 66 264%
Pathology Reports
Proceed the
LSP Following the LSP
Total # of meaningful
medical Phrase
# of meaningful medical Phrase/ #
of LSP (25)
Total # of meaningful
medical Phrase
# of meaningful
medical Phrase/ # of
LSP (25)such as 27 108% 55 220%
including 24 96% 35 233%aka 25 100% 28 112%
First experiment result– concept enrichment (NCIT)
First experiment result– concept enrichment (NCIT)
First experiment – extracted relationships
First experiment – extracted relationships
First experiment – extracted relationships
First experiment – extracted relationships
LSPs
such as including aka
Pe
rce
nta
ge
0
20
40
60
80
100
Hyponym relationship is not in the NCIT Hyponym relationship should be added into the NCIT
First experiment – Concept Enrichment for RadLex
First experiment – Concept Enrichment for RadLex
Column1 # of TermsNot in
RadLexIn
RadLex Blank
Should be added to RadLex
Suggestion rate
Acceptance rate
Proceeding LSP 29 11 16 2 10 38% 91%
Following LSP 68 24 41 3 10 35% 42%
Total 97 35 57 5 20 36% 57%
Research Project 2:Coreference Resolution
Research Project 2:Coreference Resolution
Anaphoric relations are relations between linguistic expressions where the interpretation of one linguistic expression (the anaphor) relies on the interpretation of another linguistic expression (the antecedent)
Examples of Types of anaphoric relations:
Identity (or coreference)Set/subsetPart/whole
Anaphora resolution is a computational technique for the discovery of anaphoric relations
Coreference Resolution
Wendy ChapmanGuergana SavovaMelissa Castine
Develop annotation scheme; create Reference Standard, consider and test existing algorithms; design, implement & test new algorithms
DefinitionsDefinitionsAnaphoric relations are relations between linguistic expressions where the interpretation of one linguistic expression (the anaphor) relies on the interpretation of another linguistic expression (the antecedent)
Type of anaphoric relations
Identity (or coreference)Set/subsetPart/wholeOther
Anaphora resolution is a computational technique for the discovery of anaphoric relations
ProgressProgressCompleted and Ongoing:Annotation schema DevelopmentGuidelinesTraining of annotators
4 training sessions
IAA: after session 1 – in the 40’s
IAA: after session 3 – in the 60’s
Planned:
Complete Reference Standard (RS)
Algorithm testing and further development
Data Sets for RSData Sets for RS
50 clinical notes (named entities annotated)
50 Pathology (disorders, tumors)
20 Pathology (conditions)
20 Radiology (conditions)
20 Discharge summaries (conditions)
20 ED (conditions)
20 ED (respiratory conditions) •Mayo
•Pitt
QUESTIONS ?QUESTIONS ?
Visualization of document setVisualization of document set
NER – viewing conceptsNER – viewing concepts
Multiple OntologiesMultiple Ontologies
OE – Concept SuggestionOE – Concept Suggestion
Ranked SuggestionsRanked Suggestions
Adding ProposalsAdding Proposals
Recommended