View
168
Download
1
Embed Size (px)
DESCRIPTION
Poster associated with UMLS::Similarity demo at NAACL 2013.
Citation preview
Poster Design & Printing by Genigraphics® - 800.790.4001
Ted PedersenDepartment of Computer ScienceUniversity of Minnesota, Duluth
[email protected]://www.d.umn.edu/~tpederse
UMLS::Similarity is freely available open source
software that allows a user to measure the semantic
similarity or relatedness of biomedical terms found in the
Unified Medical Language Systems (UMLS). It is written in Perl and can be used via a command line interface, an
API, or a Web interface.
UMLS::Similarity has been modeled after and inspired by WordNet::Similarity (and yes, we've even used some code). But, it has evolved to a point where it is certainly more than a clone and has its own very distinctive identity. The development of UMLS::Similarity was supported in part by an RO1 grant from the National Institutes of Health (USA), National Library of Medicine (#1R01LM009623-01A2).
What are we measuring, and why?
Similarity Depends on IS-A hierarchy
Acknowledgments
Using UMLS::SimilarityAbstract
Contact
UMLS::Similarity : Measuring the Relatedness and Similarity of Biomedical ConceptsBridget T. McInnes & Ying Liu : Minnesota Supercomputing Institute
Ted Pedersen, Genevieve B. Melton & Serguei Pakhomov : University of Minnesotahttp://umls-similarity.sourceforge.net
Unified Medical Language System
To be similar is to be alike, how much is X like Y? Similar concepts share ancestors in is-a hierarchy, the deeper the ancestor the more similar• LCS : least common subsumer●Tetanus and strep throat are similar, since both are kinds of bacterial infections
The ability to organize concepts by their similarity or relatedness to each other is a fundamental operation in the human mind, and to many problems in Natural Language Processing and Artificial Intelligence
UMLS::Similarity : Measuring the Relatedness and Similarity of Biomedical ConceptsBridget T. McInnes & Ying Liu : Minnesota Supercomputing Institute
Ted Pedersen, Genevieve B. Melton & Serguei Pakhomov : University of Minnesotahttp://umls-similarity.sourceforge.net
UMLS::Similarity : Measuring the Relatedness and Similarity of Biomedical ConceptsBridget T. McInnes & Ying Liu : Minnesota Supercomputing Institute
Ted Pedersen, Genevieve B. Melton & Serguei Pakhomov : University of Minnesotahttp://umls-similarity.sourceforge.net
UMLS::Similarity : Measuring the Relatedness and Similarity of Biomedical ConceptsBridget T. McInnes & Ying Liu : Minnesota Supercomputing Institute
Ted Pedersen, Genevieve B. Melton & Serguei Pakhomov : University of Minnesotahttp://umls-similarity.sourceforge.net
Relatedness Relies on Definitions
Assign a numeric value that quantifies how similar or related two concepts or senses are, not words
Cold may be temperature or illness
To be related is much more general, since there are many ways to be related is-a, part-of, treats, symptom-of, ...●Tetanus and deep cuts are related but they really aren't similar (deep cuts can cause tetanus though)●Related words often defined using the same ore similar words, look for overlaps
Web Interface
• Allows for all measures to be computed using a subset of possible sources
•http://atlas.ahc.umn.edu•http://maraca.d.umn.edu
Command Line
• Supports all measures, all UMLS sources plus many additional functions (many from UMLS::Interface), examples include :
•GetChildren•GetParents•GetRelated•GetSemanticGroup•FindCuiDepth•FindPathtoRoot•findLeastCommonSubsumer
Semantic Similarity Measures
Path basedShortest Path (path, cdist)
Depth basedLeacock & Chodorow (lch)Zhong et al. (zhong)Nguyen & Al-Mubaid (nam)
Information ContentResnik (res)Lin (lin)Jiang & Conrath (jcn)
Relatedness Measures
Path BasedHirst & St-Onge (hso)
Definition BasedLesk (lesk)Adapted Lesk (lesk)
Definition + CorpusGloss Vector (vector)
The UMLS is a date warehouse distributed by the National Library of Medicine (twice a year)
It includes more than 100 terminologies, code sets, and ontologies encompassing many different areas of medical knowledge. A user can access individual sources (examples below) or view them as one large combined resource via the MetaThesaurus.
MeSH – medical subject headings, used for indexing articles in PubMed
FMA – Foundational Model of Anatomy, a very fine grained ontology of human anatomy
OMIM – Online Mendelian Inheritance in Man, catalog of genes and gene disorders
SNOMEDCT – Systematized Nomenclature of Medicine – Clinical Terms
Word Sense Disambiguation with UMLS::SenseRelate
We can measure senses, or we can use the measures to identify senses!
http://search.cpan/org/dist/UMLS-SenseRelate