ComputerasaDoctor?RepresentationLearninginMedicalDocuments
IreneLi1 andMarkHughes21DublinInstituteTechnology,Ireland
2IBMWatsonHealth,Ireland
▪ MedicareDomainDataset:limited,costy▪DomainExperts:dependency▪ ApplicationRequirements(UseCasenextpage):
•Predictions•Classification•Summarization
Motivation
Usecase:Sentence-LevelNoteClassification
( A 75-y-o woman) with sudden onset back pain last night while lifting turkey from oven. The pain is worse with movement or deep breath, better with rest. No symptoms in legs, no fever or chills. No chest pain, cough, wheezing, abdominal pain, headache… Married. Two children. No smoking.
Sentence Level CategorizationWatson Smart Notes
Free-written texts/chats:
Various TopicsMessy
Irrelevant
▪ Undertheheadof“DeepLearning”or“FeatureLearning”•DLalgorithmsattempttolearnmorecomplexfeatures:multiplelevelsofrepresentation
▪ Why?•Getridof“hand-designed”featuresandrepresentations.•Unsupervisedfeaturelearning.•Everythingintothesamespace.
Example:Lengthsofsentences.
RepresentationLearning
Representation Learning Tutorial, Yoshua Bengio, 2012 http://www.iro.umontreal.ca/~bengioy/talks/icml2012-YB-tutorial.pdf
▪ Undertheheadof“DeepLearning”or“FeatureLearning”•DLalgorithmsattempttolearnmorecomplexfeatures:multiplelevelsofrepresentation
▪ Why?•Getridof“hand-designed”featuresandrepresentations.•Unsupervisedfeaturelearning.•Everythingintothesamespace.
Example:Lengthsofsentences.
RepresentationLearning
Representation Learning Tutorial, Yoshua Bengio, 2012 http://www.iro.umontreal.ca/~bengioy/talks/icml2012-YB-tutorial.pdf
DistributedRepresentationsforwords:•Word2vec[1]:neuralwordembeddings(Eachwordisavector)
•Doc2vec[2,3]:neuraldocument/paragraph/sentenceembeddings(Eachsentenceisavector)
RelatedWork:RLinNLP
[1] Distributed Representations of Words and Phrases and their Compositionality, Mikolov et.al. 2013[2] Distributed Representations of Sentences and Documents, Quoc V.Le et.al. 2014[3] Gensim: https://radimrehurek.com/gensim/models/doc2vec.html
WordClusters:CapturesSemanticMeanings
Visualization using t-SNE.
Visualization using t-SNE.
DocumentClusters
Visualization using t-SNE.Picture from Dai, Andrew M., Christopher Olah, and Quoc V. Le. "Document embedding with paragraph vectors." (2015).
● 4,490,000 Wikipedia English articles
● 915,715 unique words
Approach(1):SentencetoImage
Sentence
Conducted to
examine different features
associated with
NPEV...
WordEmbeddings
2-DImage
Approach(2):Model
Conv Layers: 64 filters; 5x5 Pooling Layers: 2x2;Hidden Layer: 128 unitsOutput: 13 units
Corpus:•3879 publicationsfromPubMed[1]
•27.4millions rawwords•181550wordsinvocabulary•13 classesbytopic/journal
Results(1):Dataset
[1]: US National Library of Medicine National Institutes of Health Search database http://www.ncbi.nlm.nih.gov/pubmed
27.4million wordoccurrencedistribution
Results(1):Dataset
Results(1):Dataset
Plot by https://tagul.com/cloud/2
13classesbytopic/journal
Results(2):R-SquareScoresinClassification
100-d
▪CNNs:abilitytolearndistributedrepresentations.▪ Pre-processing(stop-words,stemming,etc):
Accuracydrops:loseinformation.Example:“studying”,“studies”->“studi”
▪Trainingset:•Arbitrarilychosenbyjournals:overlaps•Noisycontents:irrelevantsentencesExample:“Weexaminedapatientwhohadsalad...”
•No“thebestcase”/baselinesforthesystem
Discussions
▪ Dataset•In-domainknowledge:papers,books,etc•Forspecifictasks:well-labeled▪Representation
•CNNmodel:morecomplex(layers)•Othermodels:Long-shortTermMemory(LSTM),etc▪PotentialApplications
•Notesclassification•Patient2vec(UseCasenextpage):representationlearningonindividualpatient
FutureWorks
Patient2Vec:Everypatientisavector
Featureextraction fromeverything:gender,age, bodyconditions,historytreatments,…
SpecialthankstoSpyrosKotoulas1 andToyotaroSuzumura2 forsupportandhelp.1IBMWatsonHealth,Dublin,Ireland
2IBMT.J.WatsonResearchCenter,NewYork,USA
Thanks!Q&Aireneli.eu