Mayo Clinic - Porting a Natural Language Processing ...informatics.mayo.edu/phema/images/8/8f/TBI_2016_Will_NLP...Results I Using the annotator tool at a single site, a corpus of 200

Porting a Natural Language Processing Algorithmto Extract Findings from Colonoscopy Pathology

Reports

Jennifer A. Pacheco1; Andrew J. Gawron, MD, PhD2; KennethBorthwick3; Peggy L. Peissig, PhD4; David S Carrell, PhD5; Luke

V. Rasmussen1; Huan Mo, MD6; William K. Thompson, PhD1

1Northwestern University, Chicago, IL; 2University of Utah, Salt Lake City, UT; 3GeisingerHealth System, Danville, PA; 4Marshfield Clinic, Marshfield, WI; 4Group Health Research

Institute, Seattle, WA; 6Vanderbilt University, Nashville, TN

March 22, 2016

Disclosure

I Will Thompson is co-founder of Textractor Technologies LLC

Funding

This work has been supported in part by funding from PhEMA (R01GM105688) and eMERGE (U01 HG006379, U01 HG006378 and U01HG006388).

Electronic Medical Records and Genomics Network

I NHGRI funded project (currentlyin 3rd funding cycle)

I Development of methods andbest practices for using the EHRas a tool for genomic research

I Network members link EHR datato DNA biobanks, pool resultsacross sites

I Several dozen electronicphenotypes completed and inprogress (www.PheKB.org)

Example Phenotypes: Type 2 Diabetes,Peripheral Arterial Disease, QRS Duration,Colon Polyps, Resistant Hypertension

http://www.phekb.org/

Phenotype: Colon Polyps

I Colorectal cancer (CRC) is the 2nd leading cause ofcancer-related mortality in the United States

I Colonoscopy is a widely used CRC screening modality, results inpolyp biopsies with histologies described in pathology notes

I Serrated vs. adenoma cancer pathway

I Goal: EHR-based phenotype and GWAS across eMERGE sites

Developing and Porting the NLP-based Algorithm

[A] NLP pipeline; [B] Document annotator tool; [C] KNIME workflow forexecuting NLP; [D] KNIME workflow for evaluation of results againstgold standard

NLP Modules

Groovy Script UIMA Annotator

Groovy script UIMA annotators are very easily configured, and scriptscan be updated and re-loaded without recompiling the entire NLPlibrary. Built-in functionality for Groovy scripts includes:

I Support for cTAKES common type system

I Selecting and filtering annotations

I Creating annotations

I Matching patterns of annotations and text

Groovy Script Annotator Implementation

@Overridevoid initialize(UimaContext context) {super.initialize(context)config = new CompilerConfiguration()config.setScriptBaseClass("org.northshore.cbri.UIMAUtil")

shell = new GroovyShell(config)// load in script file contentsthis.script = shell.parse(scriptContents)

}

@Overridevoid process(JCas jcas) {

UIMAUtil.setJCas(jcas)this.script.run()

}

Annotation Selection

// select all Sentences containing// an EntityMentionsents = select

type:Sentence,filter:contains(EntityMention)

I Built-in support for cTAKES common type system

I Pre-defined and on-the-fly filters

I Compositional filters w. boolean functions

I First class support for regex strings & operators

Annotation Selection (More Complex Example)

// select all Sentence annotations contained in// "Findings" Segment that also contains an// EntityMention annotation and ends with// text "tubular adenoma"select(type:Segment).grep { seg ->seg.id == "FINDINGS"}.each {select(type:Sentence, filter:(and (coveredBy(seg),{it.coveredText==∼/.+tubular\s+adenoma},contains(EntityMention)))

}}

Annotation Creation

create(type:EntityMention,begin:0, end:10,polarity:1, uncertainty:0,ontologyConcepts:[create(type:UmlsConcept, cui:"C01"),create(type:UmlsConcept, cui:"C02")]

)

I Built-in support for cTAKES common types

I Can nest calls to create on the fly

Annotation Matching

sents = select(type:Sentence)patterns = [∼/(?i)(tubular|villous)\s+adenoma/]

match(sents, patterns,{ create(type:EntityMention,

begin:it.start(1), end:it.end(1),polarity:1, uncertainty:0,ontologyConcepts:[create(type:UmlsConcept, cui:"C01")]

)})

I Patterns can be specified over text and/or annotations

I Specified functions are applied to every match

I Action taken can be anything (create annotation one possibility)

Annotations Matching (More Complex Example)

pat = (∼/(?s)(?@Head)(?=(?@Head)|\Z)/)AnnotationMatcher matcher =pat.matcher(includeText:false)

matcher.each{ Map binding ->create(type:Segment,begin:binding.get("h1").begin,end:(binding.get("h2") ?

binding.get("h2").begin: jcas.documentText.length()))}

Pathology Note Concept Annotator

Abstractor Tool

https://github.com/lrasmus/DocumentAbstraction

https://github.com/lrasmus/DocumentAbstraction

KNIME Execution Workflow

https://www.knime.org

https://www.knime.org

KNIME Integration with NLP Pipeline

KNIME Validation Workflow

Sharing KNIME Workflows

Exporting a KNIME workflow Importing a KNIME workflow

Results

I Using the annotator tool at a single site, a corpus of 200randomly selected pathology notes was created. At the polypfinding level (histology + location match), the algorithm achievedsensitivity of 0.95 and PPV 0.95.

I The algorithm and tools were subsequently shared with threeadditional eMERGE sites

I Porting to new sites required modifications to script filesregulating sectionizing and relation extraction.

I After running the algorithm across all four sites, we were able toextract a genotyped cohort of 5839 patients.

I Our qualitative evaluation indicated that using the toolssignificantly sped up the porting process, and is a promisingapproach to similar tasks involving NLP-based phenotypealgorithms.

Conclusion

I Porting NLP-based phenotype algorithms across sites presentstime-consuming challenges, limiting cross-site collaborationopportunities.

I We developed a set of tools to help make it easier to share,execute, modify, and validate NLP algorithms.

I In future work we will peform a quantitive evaluation of thebenefits of using these tools for porting NLP-based phenotypealgorithms.

I Pathology notes are relatively uniform and lend themselves topattern-based rules; more complex notes or tasks may not lendthemselves as well to this approach.

I The integration of NLP with KNIME can be improved bydeveloping extension nodes that allow for direct editing of scriptfiles without requiring re-generation of NLP libraries.

Thanks!

Documents

Mayo Clinic - Porting a Natural Language Processing ...informatics.mayo.edu/phema/images/8/8f/TBI_2016_Will_NLP...Results I Using the annotator tool at a single site, a corpus of 200