Upload
lviv-it-arena
View
427
Download
0
Embed Size (px)
Citation preview
When Healthcare Meets Data Science
Anastasiia Kornilova
http://www.slideshare.net/WebCongress/mars-one-bas-lansdorp
http://www.slideshare.net/WebCongress/mars-one-bas-lansdorp
The Medicine of the Future
http://www.healthbizdecoded.com/2013/05/hies-meeting-the-sustainability-challenge/
http://graphics.wsj.com/infectious-diseases-and-vaccines/
«One or two patient died per week in a certain smallish town because of the lack of information flow between the hospital’s emergency room and the nearby mental health clinic»
[«Doing Data Science», O’Neil ]
60% of US doctors still use paper medical records
Let’s create our own EHR standard
Patient gender Code
Male 0
Female 1
Patient gender Code
Male 1
Female 0
Patient gender Code
Male M
Female F
Unknown U
Let’s code gender
Standart A
Standart B
Standart Cx
x
There 5 key data standards
ICD - diagnostic, billing , world-wide
CPT - procedures, billing , US-specific, classification
LOINC - lab tests and observations, world-wide
NDC - medication, US-specific, classification
SNOMED - medicine
… and a lot of custom standards
Even within one data standard:ICD-9
174 malignant neoplasm of female breast
174.1 malignant neoplasm of central portion of female breast
ICD-10
C50 malignant neoplasm of breast
C50.1 malignant neoplasm of central portion of breast
C50.111 malignant neoplasm of central portion of right female breast
C50.111 malignant neoplasm of central portion of left female breast
You have to be a doctor to handle them
Problem summary
Standart 1
Standart 2
Standart N
medicine expertisea lot of (expensive) hours
Knowledge
Standarts are changing
Artificial Intelligence Way
Feed a lot of medical texts to «medical doctor»
Use NLP power
Make it unsupervised
Key idea:
«Semantically similar words occurs in similar contents» Harris, 1954 «You shall know a word by the company it keeps», Firth, 1957
«It was the year when Udacity, Coursera and edX, the three leading MOOC companies, took the education world by storm and promised a lot» [Huffington Post]
«Many places offer MOOCs, and many more will. But Coursera, Udacity and edX are the leading providers.» [NYTimes]
Distributed Vectors Representation
Two layer neural network
Input: text corpus
Output: set of vectors
Group the vectors of similar words together in vector space (detects similarities matematically)
Predict a word using content
All
youneed
love
is
Resulting vectors
All you
need is
love
[0.2, 0.11, 087, 0.9, … , 0.2] [0.1, 0,98, 01, 0.26, …, 0.82] [0.7, 0.22, 0.3, 0.1, …, 0.45]
[0.5, 0.21, 0,67, 0.82,…, 0.49] [0.6, 034, 0.21, 0.45,…, 0.2]
Vectors Relationships
Vectors Relationships
http://nlp.stanford.edu/projects/glove/images/company_ceo.jpg
http://nlp.stanford.edu/projects/glove/images/comparative_superlative.jpg
ICD-9
174 malignant neoplasm of female breast
174.1 malignant neoplasm of central portion of female breast
ICD-10
C50 malignant neoplasm of breast
C50.1 malignant neoplasm of central portion of breast
C50.111 malignant neoplasm of central portion of right female breast
C50.111 malignant neoplasm of central portion of left female breast
Summary
LinksEfficient Estimation of Word Representation in Vector Space (Mikolov)
Distributed representation of words and phrases and their compositionality (Mikolov)
word2vec Parameter Learning Explaining (Rong)
Questions?