53
KNOWLEDGE-BASED METHOD FOR DETERMINING THE MEANING OF AMBIGUOUS BIOMEDICAL TERMS USING INFORMATION CONTENT MEASURES OF SIMILARITY Bridget McInnes Ted Pedersen Ying Liu Genevieve B. Melton Serguei Pakhomov 1

K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

1

KNOWLEDGE-BASED METHOD FOR DETERMINING THE MEANING OF AMBIGUOUS BIOMEDICAL TERMS USING INFORMATION CONTENT MEASURES OF SIMILARITY

Bridget McInnes

Ted Pedersen

Ying Liu

Genevieve B. Melton

Serguei Pakhomov

Page 2: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

2

OBJECTIVE OF THIS WORK

Develop and evaluate a method than can disambiguate terms in biomedical text by exploiting similarity information extrapolated from the Unified Medical

Language System

Evaluate the efficacy of Information Content-based similarity measures over path-based similarity measures

for Word Sense Disambiguation, WSD

Page 3: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

3

WORD SENSE DISAMBIGUATION

Word sense disambiguation is the task of determining the appropriate sense of a term

given context in which it is used.

TERM: tolerance

DrugTolerance

ImmuneTolerance

Page 4: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

4

WORD SENSE DISAMBIGUATION

Word sense disambiguation is the task of determining the appropriate sense of a term

given context in which it is used.

Busprione attenuates tolerance to morphine in mice with skin cancer

DrugTolerance

ImmuneTolerance

Page 5: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

5

SENSE INVENTORY: UNIFIED MEDICAL LANGUAGE SYSTEM

Unified Medical Language Sources (UMLS) Semantic Network Metathesaurus

~1.7 million biomedical and clinical concepts; integrated semi-automatically

CUIs (Concept Unique Identifiers), linked: Hierarchical: PAR/CHD and RB/RN Non-hierarchical: SIB, RO

Sources viewed together or independently Medical Subject Heading (MSH)

SPECIALIST Lexicon Biomedical and clinical terms, including variants

Page 6: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

6

WORD SENSE DISAMBIGUATION

Busprione attenuates tolerance to morphine in mice with skin cancer

DrugTolerance: C0013220

ImmuneTolerance:C0020963

Concept Unique Identifiers: CUIs

Page 7: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

7

SENSERELATE ALGORITHM

Each possible sense of a target word is assigned a score [sum similarity between it and its surrounding terms]

Assign target word the sense with highest score

Proposed by Patwardhan and Pedersen 2003 using WordNet

UMLS::SenseRelate is a modification of this algorithm using information from the UMLS

NEXT UP: an example

Page 8: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

8

SENSERELATE EXAMPLEBusprione attenuates tolerance to morphine

in mice with skin cancer

Page 9: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

9

SENSERELATE EXAMPLEBusprione attenuates tolerance to morphine

in mice with skin cancer

DrugTolerance: C0013220

ImmuneTolerance:C0020963

Page 10: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

10

SENSERELATE EXAMPLEBusprione attenuates tolerance to morphine

in mice with skin cancer

DrugTolerance: C0013220

ImmuneTolerance:C0020963

Busprione:

C0006462

Morphine:C0026549

Mice: C0026809

Skin cancer:

C0007114

Page 11: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

11

SENSERELATE EXAMPLE

0.090.16

0.11

Busprione attenuates tolerance to morphine in mice with skin cancer

0.09

DrugTolerance: C0013220

ImmuneTolerance:C0020963

Busprione:

C0006462

Morphine:C0026549

Mice: C0026809

Skin cancer:

C0007114

Page 12: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

12

SENSERELATE EXAMPLE

0.090.16

0.11

Busprione attenuates tolerance to morphine in mice with skin cancer

0.09

DrugTolerance: C0013220

ImmuneTolerance:C0020963

Busprione:

C0006462

Morphine:C0026549

Mice: C0026809

Skin cancer:

C0007114

Drug ToleranceScore = 0.09 + 0.09 + 0.16 + 0.11 = 0.45

Page 13: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

13

SENSERELATE EXAMPLE

0.090.16

0.11

0.090.050.04

Busprione attenuates tolerance to morphine in mice with skin cancer

0.090.09

DrugTolerance: C0013220

ImmuneTolerance:C0020963

Busprione:

C0006462

Morphine:C0026549

Mice: C0026809

Skin cancer:

C0007114

Drug ToleranceScore = 0.09 + 0.09 + 0.16 + 0.11 = 0.45

Page 14: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

14

SENSERELATE EXAMPLE

0.090.16

0.11

0.090.050.04

Busprione attenuates tolerance to morphine in mice with skin cancer

0.090.09

DrugTolerance: C0013220

ImmuneTolerance:C0020963

Busprione:

C0006462

Morphine:C0026549

Mice: C0026809

Skin cancer:

C0007114

Drug ToleranceScore = 0.09 + 0.09 + 0.16 + 0.11 = 0.45

Immune ToleranceScore = 0.09 + 0.09 + 0.05 + 0.05 = 0.27

Page 15: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

15

SENSERELATE EXAMPLE

0.090.16

0.11

0.090.050.04

Busprione attenuates tolerance to morphine in mice with skin cancer

0.090.09

DrugTolerance: C0013220

ImmuneTolerance:C0020963

Busprione:

C0006462

Morphine:C0026549

Mice: C0026809

Skin cancer:

C0007114

Drug ToleranceScore = 0.09 + 0.09 + 0.16 + 0.11 = 0.45

Immune ToleranceScore = 0.09 + 0.09 + 0.05 + 0.05 = 0.27

Page 16: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

16

SENSE RELATE ASSUMPTION

An ambiguous word is often used in the sense

that is most similar to the sense of the terms that surround it

Page 17: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

17

SENSERELATE COMPONENTS

Identifying the concepts of surrounding terms

Calculating semantic similarity

Page 18: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

18

IDENTIFYING THE CONCEPTS OF THE SURROUNDING TERMS

Use the SPECIALIST LEXICON to identify the terms and map the terms doing a string match to the MRCONSO table in

the UMLS

Page 19: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

19

IDENTIFYING THE CONCEPTS OF THE SURROUNDING TERMS

Use the SPECIALIST LEXICON to identify the terms and map the terms doing a string match to the MRCONSO table in

the UMLSBusprione attenuates tolerance to morphine

in mice with skin cancer

Page 20: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

20

IDENTIFYING THE CONCEPTS OF THE SURROUNDING TERMS

Use the SPECIALIST LEXICON to identify the terms and map the terms doing a string match to the MRCONSO table in

the UMLS

...skin cancerskin graftingskin disease

...

SPECIALISTLEXICON

Busprione attenuates tolerance to morphine in mice with skin cancer

Page 21: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

21

IDENTIFYING THE CONCEPTS OF THE SURROUNDING TERMS

Use the SPECIALIST LEXICON to identify the terms and map the terms doing a string match to the MRCONSO table in

the UMLS

...skin cancerskin graftingskin disease

...

...skin cancer C0007114skin grafting C0037297skin disease C0037274

...

SPECIALISTLEXICON

MRCONSO

Busprione attenuates tolerance to morphine in mice with skin cancer

Page 22: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

22

SEMANTIC SIMILARITY MEASURES

Path-based measures Path Wu and Palmer Leacock and Chodorow Ngyuen and Al-Mubaid

Information content (IC)-based measures Resnik Lin Jiang and Conrath

Page 23: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

23

PATH-BASED SIMILARITY MEASURES

Use only the path information obtained from a taxonomy

Page 24: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

24

PATH-BASED SIMILARITY MEASURES

Use only the path information obtained from a taxonomy

Path measure sim(c1,c2) = 1 / minpath(c2,c2)

where minpath is the shortest path between the two concepts

Page 25: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

25

PATH-BASED SIMILARITY MEASURES

Use only the path information obtained from a taxonomy

Path measure sim(c1,c2) = 1/minpath(c2,c2)

where minpath is the shortest path between the two concepts

Wu and Palmer, 1994 sim(c1,c2) = (2*depth(LCS(c2,c2))) /

(depth(c1)+depth(c2)) where LCS is the least common subsumer of the

two concepts

Page 26: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

26

PATH-BASED SIMILARITY MEASURES

Use only the path information obtained from a taxonomy

Path measure sim(c1,c2) = 1/ minpath(c2,c2)

where minpath is the shortest path between the two concepts

Wu and Palmer, 1994 sim(c1,c2) = (2*depth(LCS(c2,c2))) / (depth(c1)+depth(c2))

where LCS is the least common subsumer of the two concepts

Leacock and Chodorow, 1998 sim(c1,c2) = -log( minpath(c1,c2) / (2D) )

where D is the total depth of the taxonomy

Page 27: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

27

PATH-BASED SIMILARITY MEASURES Use only the path information obtained from a taxonomy

Path measure sim(c1,c2) = 1/ minpath(c2,c2)

where minpath is the shortest path between the two concepts

Leacock and Chodorow, 1998 sim(c1,c2) = -log( minpath(c1,c2) / (2D) )

where D is the total depth of the taxonomy

Wu and Palmer, 1994 sim(c1,c2) = (2*depth(LCS(c2,c2))) / (depth(c1)+depth(c2))

where LCS is the least common subsumer of the two concepts

Nyguen and Al-Mubaid, 2006 sim(c1,c2) = log ( (2 + minpath(c1,c2) - 1) *

(D - depth(LCS(c1,c2))) )

Page 28: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

28

PATH-BASED SIMILARITY MEASURES

USE ONLY THE PATH INFORMATION OBTAINED FROM A TAXONOMY

Disease: C0012634

Drug Related

Disorder: C0277579

DrugTolerance: C0013220

Neoplasm: C1302761

Neoplastic Disease: C1882062

Malignant Neoplasm: C0006826

Skin cancer:

C0007114

Page 29: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

29

INFORMATION CONTENT-BASED MEASURES

Incorporate the probability of the concepts

IC = -log(P(concept))

Page 30: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

30

INFORMATION CONTENT-BASED MEASURES

Incorporate the probability of the concepts

IC = -log(P(concept))

P(concept)

Calculated by summing the probability of the concept and the probability of its descendants

Probabilities are obtained from an external corpus

Page 31: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

31

INFORMATION CONTENT-BASED MEASURES

Incorporate the probability of the concepts

IC = -log(P(concept)

Resnik, 1995 sim(c1,c2) = IC(LCS(c1,c2))

Page 32: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

32

INFORMATION CONTENT-BASED MEASURES

Incorporate the probability of the concepts

IC = -log(P(concept)

Resnik, 1995 sim(c1,c2) = IC(LCS(c2,c2))

Jiang and Conrath, 1997 sim(c1,c2) = 1 / (IC(c1)+IC(c2) – 2*

IC(LCS(c1,c2))

Page 33: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

33

INFORMATION CONTENT-BASED MEASURES

Incorporate the probability of the concepts

IC = -log(P(concept)

Resnik, 1995 sim(c1,c2) = IC(LCS(c2,c2))

Jiang and Conrath, 1997 sim(c1,c2) = 1 ÷ (IC(c1)+IC(c2) – 2* IC(LCS(c1,c2))

Lin, 1998 sim(c1,c2) = (2*IC(LCS(c2,c2))) / (IC(c1)+IC(c2))

Page 34: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

34

IC-BASED SIMILARITY MEASURES

Disease: C001263

4Drug

Related Disorder: C0277579Drug

Tolerance:

C0013220

Neoplasm:

C1302761Neoplasti

c Disease: C188206

2Malignan

t Neoplas

m: C000682

6Skin cancer: C000711

4

+

PATH INFORMATIONPROBABILITY OF

CONCEPTS

EXTERNAL CORPUS

Page 35: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

35

EXPERIMENTAL FRAMEWORK

Use open-source UMLS::Similarity package to obtain the similarity between the terms and possible senses in the SenseRelate algorithm

Path information: parent/child relations in MSH source

Information content: calculated using the UMLSonMedline dataset created by NLM

Consists of concepts from 2009AB UMLS and the frequency they occurred in Medline using the Essie Search Engine (Ide et al 2007)

Medline: database of citations of biomedical/clinical articles

Page 36: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

36

EVALUATION DATA: MSH WSD

MSH-WSD dataset (Jimeno-Yepes, et al 2011) 203 target words (ambiguous word) from Medline

106 terms e.g. tolerance 88 acronyms e.g. CA (calcium, california) 9 mixtures e.g. bat (brown adipose tissue)

Each target word contains ~187 instances (Medline abstracts) abstract = ~ 500 words

Each target word in the instances assigned a concept from MSH by exploiting the manually assigned MSH concepts assigned to the abstract

Average of 2.08 possible senses per target word Majority sense over all the target words is 54.5%

Page 37: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

37

RESULTS

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.55

0.7200000000000010.690000000

0000010.700000000

000001

0.720000000000001

0.730000000000001

0.740000000000001

0.740000000000001

baseline path

lch wup

nam

res jcn

accuracy

Path-based IC-based

lin

Page 38: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

38

COMPARISON ACROSS SUBSETS OF MSH-WSD

Terms Acronyms Mixture Overall0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.55 0.54 0.53 0.55

0.670000000000002

0.8 0.730000000000001

0.7400000000000020.71000000000

0001

0.870000000000002 0.88

0.8

0.670000000000002

0.850000000000001

0.93

0.78

BaselineSenseRe-lateMRD2-MRD

accuracy

Page 39: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

39

COMPARISON ACROSS SUBSETS OF MSH-WSD

Terms Acronyms Mixture Overall0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.55 0.54 0.53 0.55

0.670000000000002

0.8 0.730000000000001

0.7400000000000020.71000000000

0001

0.870000000000002 0.88

0.8

0.670000000000002

0.850000000000001

0.93

0.78

BaselineSenseRe-lateMRD2-MRD

accuracy

Page 40: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

40

COMPARISON ACROSS SUBSETS OF MSH-WSD

Terms Acronyms Mixture Overall0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.55 0.54 0.53 0.55

0.670000000000002

0.8 0.730000000000001

0.7400000000000020.71000000000

0001

0.870000000000002 0.88

0.8

0.670000000000002

0.850000000000001

0.93

0.78

BaselineSenseRe-lateMRD2-MRD

accuracy

Page 41: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

41

COMPARISON ACROSS SUBSETS OF MSH-WSD

Terms Acronyms Mixture Overall0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.55 0.54 0.53 0.55

0.670000000000002

0.8 0.730000000000001

0.7400000000000020.71000000000

0001

0.870000000000002 0.88

0.8

0.670000000000002

0.850000000000001

0.93

0.78

BaselineSenseRe-lateMRD2-MRD

accuracy

Page 42: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

42

COMPARISON ACROSS SUBSETS OF MSH-WSD

Terms Acronyms Mixture Overall0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.55 0.54 0.53 0.55

0.670000000000002

0.8 0.730000000000001

0.7400000000000020.71000000000

0001

0.870000000000002 0.88

0.8

0.670000000000002

0.850000000000001

0.93

0.78

BaselineSenseRe-lateMRD2-MRD

accuracy

Page 43: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

43

WINDOW SIZES

Use the terms surrounding the target word within a specified window: 1, 2, 5, 10, 25, 50, 60, 70

Busprione attenuates tolerance to morphine in mice with skin_cancer

WINDOW SIZE = 2

Page 44: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

44

COMPARISON OF WINDOW SIZES FOR LIN

0 1 2 5 10 25 50 60 700

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.50.53

0.650000000000002

0.690000000000001

0.710000000000001

0.740000000000002

0.740000000000002

0.740000000000002

0.740000000000002

lin

accuracy

window size

Page 45: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

45

SURROUNDING TERMS

Not all terms have a concept in the UMLS

therefore

Not all surrounding terms in the window mapped to CUIs

Page 46: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

46

WINDOW SIZES VERSUS MAPPED TERMS

0 1 2 5 10 25 50 60 700

2

4

6

8

10

12

14

16

18

0 0.27 0.79

1.85

3.49

7.6

12.96

14.28

15.64

lin

number

of

mappings

window size

Page 47: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

47

FUTURE WORK: MAPPING TERMS

Currently looking at mapping the terms to CUIs using information from the concept mapping system MetaMap

Obtain the terms from MetaMap and do a dictionary look up in MRCONSO Hypothesis – the terms obtained by MetaMap are more

accurate than using the SPECIALIST Lexicon

Obtain the CUIs from MetaMap Hypothesis – the CUIs obtained by MetaMap will be

more accurate than the dictionary look-up

Page 48: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

48

OBJECTIVE #1

Develop and evaluate a method than can disambiguate terms in biomedical text by

exploiting similarity information extrapolated from the UMLS

UMLS::SenseRelate statistically significantly higher disambiguation accuracy than the baseline

On par with previous unsupervised methods for terms

Page 49: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

49

OBJECTIVE #2

Evaluate the efficacy of IC-based similarity measures over path-based measures on a

secondary task

There is no statistically significant difference between the accuracies obtained by the IC-based measures

There is a statistically significant difference between the IC-based measures and the path-based measures

Page 50: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

50

TAKE HOME MESSAGE:

An ambiguous word is often used in the sense

that is most similar to the sense of the concepts

of the terms that surround it

Page 51: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

51

RESOURCES

Software: UMLS::SenseRelate

http://search.cpan.org/dist/UMLS-SenseRelate/ UMLS::Similarity

http://search.cpan.org/dist/UMLS-Similarity/

Data MSH-WSD

http://wsd.nlm.nih.gov/collaboration.shtml

Page 52: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

52

RESOURCES

Software: UMLS::SenseRelate

http://search.cpan.org/dist/UMLS-SenseRelate/ UMLS::Similarity

http://search.cpan.org/dist/UMLS-Similarity/

Data MSH-WSD

http://wsd.nlm.nih.gov/collaboration.shtml

THANK YOU

Page 53: K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget

53

RESOURCES

Software: UMLS::SenseRelate

http://search.cpan.org/dist/UMLS-SenseRelate/ UMLS::Similarity

http://search.cpan.org/dist/UMLS-Similarity/

Data MSH-WSD

http://wsd.nlm.nih.gov/collaboration.shtml

QUESTIONS?