Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Porting a Natural Language Processing Algorithmto Extract Findings from Colonoscopy Pathology
Reports
Jennifer A. Pacheco1; Andrew J. Gawron, MD, PhD2; KennethBorthwick3; Peggy L. Peissig, PhD4; David S Carrell, PhD5; Luke
V. Rasmussen1; Huan Mo, MD6; William K. Thompson, PhD1
1Northwestern University, Chicago, IL; 2University of Utah, Salt Lake City, UT; 3GeisingerHealth System, Danville, PA; 4Marshfield Clinic, Marshfield, WI; 4Group Health Research
Institute, Seattle, WA; 6Vanderbilt University, Nashville, TN
March 22, 2016
Disclosure
I Will Thompson is co-founder of Textractor Technologies LLC
Funding
This work has been supported in part by funding from PhEMA (R01GM105688) and eMERGE (U01 HG006379, U01 HG006378 and U01HG006388).
Electronic Medical Records and Genomics Network
I NHGRI funded project (currentlyin 3rd funding cycle)
I Development of methods andbest practices for using the EHRas a tool for genomic research
I Network members link EHR datato DNA biobanks, pool resultsacross sites
I Several dozen electronicphenotypes completed and inprogress (www.PheKB.org)
Example Phenotypes: Type 2 Diabetes,Peripheral Arterial Disease, QRS Duration,Colon Polyps, Resistant Hypertension
http://www.phekb.org/
Phenotype: Colon Polyps
I Colorectal cancer (CRC) is the 2nd leading cause ofcancer-related mortality in the United States
I Colonoscopy is a widely used CRC screening modality, results inpolyp biopsies with histologies described in pathology notes
I Serrated vs. adenoma cancer pathway
I Goal: EHR-based phenotype and GWAS across eMERGE sites
Developing and Porting the NLP-based Algorithm
[A] NLP pipeline; [B] Document annotator tool; [C] KNIME workflow forexecuting NLP; [D] KNIME workflow for evaluation of results againstgold standard
NLP Modules
Groovy Script UIMA Annotator
Groovy script UIMA annotators are very easily configured, and scriptscan be updated and re-loaded without recompiling the entire NLPlibrary. Built-in functionality for Groovy scripts includes:
I Support for cTAKES common type system
I Selecting and filtering annotations
I Creating annotations
I Matching patterns of annotations and text
Groovy Script Annotator Implementation
@Overridevoid initialize(UimaContext context) {super.initialize(context)config = new CompilerConfiguration()config.setScriptBaseClass("org.northshore.cbri.UIMAUtil")
shell = new GroovyShell(config)// load in script file contentsthis.script = shell.parse(scriptContents)
}
@Overridevoid process(JCas jcas) {
UIMAUtil.setJCas(jcas)this.script.run()
}
Annotation Selection
// select all Sentences containing// an EntityMentionsents = select
type:Sentence,filter:contains(EntityMention)
I Built-in support for cTAKES common type system
I Pre-defined and on-the-fly filters
I Compositional filters w. boolean functions
I First class support for regex strings & operators
Annotation Selection (More Complex Example)
// select all Sentence annotations contained in// "Findings" Segment that also contains an// EntityMention annotation and ends with// text "tubular adenoma"select(type:Segment).grep { seg ->seg.id == "FINDINGS"}.each {select(type:Sentence, filter:(and (coveredBy(seg),{it.coveredText==∼/.+tubular\s+adenoma},contains(EntityMention)))
}}
Annotation Creation
create(type:EntityMention,begin:0, end:10,polarity:1, uncertainty:0,ontologyConcepts:[create(type:UmlsConcept, cui:"C01"),create(type:UmlsConcept, cui:"C02")]
)
I Built-in support for cTAKES common types
I Can nest calls to create on the fly
Annotation Matching
sents = select(type:Sentence)patterns = [∼/(?i)(tubular|villous)\s+adenoma/]
match(sents, patterns,{ create(type:EntityMention,
begin:it.start(1), end:it.end(1),polarity:1, uncertainty:0,ontologyConcepts:[create(type:UmlsConcept, cui:"C01")]
)})
I Patterns can be specified over text and/or annotations
I Specified functions are applied to every match
I Action taken can be anything (create annotation one possibility)
Annotations Matching (More Complex Example)
pat = (∼/(?s)(?@Head)(?=(?@Head)|\Z)/)AnnotationMatcher matcher =pat.matcher(includeText:false)
matcher.each{ Map binding ->create(type:Segment,begin:binding.get("h1").begin,end:(binding.get("h2") ?
binding.get("h2").begin: jcas.documentText.length()))}
Pathology Note Concept Annotator
Abstractor Tool
https://github.com/lrasmus/DocumentAbstraction
https://github.com/lrasmus/DocumentAbstraction
KNIME Execution Workflow
https://www.knime.org
https://www.knime.org
KNIME Integration with NLP Pipeline
KNIME Validation Workflow
Sharing KNIME Workflows
Exporting a KNIME workflow Importing a KNIME workflow
Results
I Using the annotator tool at a single site, a corpus of 200randomly selected pathology notes was created. At the polypfinding level (histology + location match), the algorithm achievedsensitivity of 0.95 and PPV 0.95.
I The algorithm and tools were subsequently shared with threeadditional eMERGE sites
I Porting to new sites required modifications to script filesregulating sectionizing and relation extraction.
I After running the algorithm across all four sites, we were able toextract a genotyped cohort of 5839 patients.
I Our qualitative evaluation indicated that using the toolssignificantly sped up the porting process, and is a promisingapproach to similar tasks involving NLP-based phenotypealgorithms.
Conclusion
I Porting NLP-based phenotype algorithms across sites presentstime-consuming challenges, limiting cross-site collaborationopportunities.
I We developed a set of tools to help make it easier to share,execute, modify, and validate NLP algorithms.
I In future work we will peform a quantitive evaluation of thebenefits of using these tools for porting NLP-based phenotypealgorithms.
I Pathology notes are relatively uniform and lend themselves topattern-based rules; more complex notes or tasks may not lendthemselves as well to this approach.
I The integration of NLP with KNIME can be improved bydeveloping extension nodes that allow for direct editing of scriptfiles without requiring re-generation of NLP libraries.
Thanks!