19
Computer as a Doctor? Representation Learning in Medical Documents Irene Li 1 and Mark Hughes 2 1 Dublin Institute Technology,Ireland 2 IBM Watson Health, Ireland

Representation Learning in Medical Documents

Embed Size (px)

Citation preview

Page 1: Representation Learning in Medical Documents

ComputerasaDoctor?RepresentationLearninginMedicalDocuments

IreneLi1 andMarkHughes21DublinInstituteTechnology,Ireland

2IBMWatsonHealth,Ireland

Page 2: Representation Learning in Medical Documents

▪ MedicareDomainDataset:limited,costy▪DomainExperts:dependency▪ ApplicationRequirements(UseCasenextpage):

•Predictions•Classification•Summarization

Motivation

Page 3: Representation Learning in Medical Documents

Usecase:Sentence-LevelNoteClassification

( A 75-y-o woman) with sudden onset back pain last night while lifting turkey from oven. The pain is worse with movement or deep breath, better with rest. No symptoms in legs, no fever or chills. No chest pain, cough, wheezing, abdominal pain, headache… Married. Two children. No smoking.

Sentence Level CategorizationWatson Smart Notes

Free-written texts/chats:

Various TopicsMessy

Irrelevant

Page 4: Representation Learning in Medical Documents

▪ Undertheheadof“DeepLearning”or“FeatureLearning”•DLalgorithmsattempttolearnmorecomplexfeatures:multiplelevelsofrepresentation

▪ Why?•Getridof“hand-designed”featuresandrepresentations.•Unsupervisedfeaturelearning.•Everythingintothesamespace.

Example:Lengthsofsentences.

RepresentationLearning

Representation Learning Tutorial, Yoshua Bengio, 2012 http://www.iro.umontreal.ca/~bengioy/talks/icml2012-YB-tutorial.pdf

Page 5: Representation Learning in Medical Documents

▪ Undertheheadof“DeepLearning”or“FeatureLearning”•DLalgorithmsattempttolearnmorecomplexfeatures:multiplelevelsofrepresentation

▪ Why?•Getridof“hand-designed”featuresandrepresentations.•Unsupervisedfeaturelearning.•Everythingintothesamespace.

Example:Lengthsofsentences.

RepresentationLearning

Representation Learning Tutorial, Yoshua Bengio, 2012 http://www.iro.umontreal.ca/~bengioy/talks/icml2012-YB-tutorial.pdf

Page 6: Representation Learning in Medical Documents

DistributedRepresentationsforwords:•Word2vec[1]:neuralwordembeddings(Eachwordisavector)

•Doc2vec[2,3]:neuraldocument/paragraph/sentenceembeddings(Eachsentenceisavector)

RelatedWork:RLinNLP

[1] Distributed Representations of Words and Phrases and their Compositionality, Mikolov et.al. 2013[2] Distributed Representations of Sentences and Documents, Quoc V.Le et.al. 2014[3] Gensim: https://radimrehurek.com/gensim/models/doc2vec.html

Page 7: Representation Learning in Medical Documents

WordClusters:CapturesSemanticMeanings

Visualization using t-SNE.

Page 8: Representation Learning in Medical Documents

Visualization using t-SNE.

Page 9: Representation Learning in Medical Documents

DocumentClusters

Visualization using t-SNE.Picture from Dai, Andrew M., Christopher Olah, and Quoc V. Le. "Document embedding with paragraph vectors." (2015).

● 4,490,000 Wikipedia English articles

● 915,715 unique words

Page 10: Representation Learning in Medical Documents

Approach(1):SentencetoImage

Sentence

Conducted to

examine different features

associated with

NPEV...

WordEmbeddings

2-DImage

Page 11: Representation Learning in Medical Documents

Approach(2):Model

Conv Layers: 64 filters; 5x5 Pooling Layers: 2x2;Hidden Layer: 128 unitsOutput: 13 units

Page 12: Representation Learning in Medical Documents

Corpus:•3879 publicationsfromPubMed[1]

•27.4millions rawwords•181550wordsinvocabulary•13 classesbytopic/journal

Results(1):Dataset

[1]: US National Library of Medicine National Institutes of Health Search database http://www.ncbi.nlm.nih.gov/pubmed

Page 13: Representation Learning in Medical Documents

27.4million wordoccurrencedistribution

Results(1):Dataset

Page 14: Representation Learning in Medical Documents

Results(1):Dataset

Plot by https://tagul.com/cloud/2

13classesbytopic/journal

Page 15: Representation Learning in Medical Documents

Results(2):R-SquareScoresinClassification

100-d

Page 16: Representation Learning in Medical Documents

▪CNNs:abilitytolearndistributedrepresentations.▪ Pre-processing(stop-words,stemming,etc):

Accuracydrops:loseinformation.Example:“studying”,“studies”->“studi”

▪Trainingset:•Arbitrarilychosenbyjournals:overlaps•Noisycontents:irrelevantsentencesExample:“Weexaminedapatientwhohadsalad...”

•No“thebestcase”/baselinesforthesystem

Discussions

Page 17: Representation Learning in Medical Documents

▪ Dataset•In-domainknowledge:papers,books,etc•Forspecifictasks:well-labeled▪Representation

•CNNmodel:morecomplex(layers)•Othermodels:Long-shortTermMemory(LSTM),etc▪PotentialApplications

•Notesclassification•Patient2vec(UseCasenextpage):representationlearningonindividualpatient

FutureWorks

Page 18: Representation Learning in Medical Documents

Patient2Vec:Everypatientisavector

Featureextraction fromeverything:gender,age, bodyconditions,historytreatments,…

Page 19: Representation Learning in Medical Documents

SpecialthankstoSpyrosKotoulas1 andToyotaroSuzumura2 forsupportandhelp.1IBMWatsonHealth,Dublin,Ireland

2IBMT.J.WatsonResearchCenter,NewYork,USA

Thanks!Q&Aireneli.eu