View
221
Download
2
Category
Tags:
Preview:
Citation preview
Beyond “Bag of Words”: Towards a Framework for Conceptual Retrieval
Jimmy LinCollege of Information StudiesUniversity of Maryland
Thursday, October 4, 2007IPAM Workshop, UCLA
Beyond “Bag of Words”
IR is fundamentally based on counting words Different ways of “bookkeeping”: vector space,
probabilistic, LM, DFR, etc.
So… Words aren’t enough to capture meaning Term statistics aren’t enough to capture meaning
Thus…
IR systems should go beyond term statistics: concepts, relations, etc.
Hypothesis:
However… A reasonable hypothesis? Where’s the empirical support?
IR based on concepts, relations, etc. >> IR based on words
Outline
Previous attempts to go beyond BoW
Slightly different approach Start with specialized applications Generalize
Case study in the medical domain A clinical question answering system in support of
evidence-based medicine (EBM)
Broader applicability?
Previous Work
Beyond “bags” Indexing phrases
Modeling term dependencies
Beyond “words” Query expansion:
Word Sense Disambiguation
Results? Mixed
e.g., (Fagan, 1987; Smeaton et al., 1994; etc.)
e.g., (Gao et al., 2004; Liu et al., 2004; Metzler and Croft, 2005; Cui et al., 2005; etc.)
e.g., (Voorhees, 1993; 1994)
e.g., (Sanderson, 1994; Mihalcea and Moldovan, 2000)
A Different Approach
Previous work focuses on the general domain Broad but (relatively) shallow Hampered by commonsense problem Difficult to acquire large amounts of knowledge
Our approach: Develop a general framework Instantiate in domain-specific applications Leverage lessons learned to refine the framework Rinse, repeat
“Conceptual Retrieval”
Questions
SemanticMatcher
Answers
Conceptual representation
Conceptual representation
KnowledgeExtractor
Collection
What type of knowledge?
Knowledge about the problem structure What representations are useful for capturing the
information need?
Knowledge about user tasks Why is this information needed? How will it be further used?
Knowledge about the domain What background knowledge is needed to reason about
the information need?
K1: Problem Structure
Knowledge representations are important! Helps experts reason about problems Form the basis for tractable computational structures
GO’FAI Frames (Minsky) Scripts (Schank) Semantic networks (attribution less clear)
Knowledge about problem structureKnowledge about user tasksKnowledge about the domain
K2: User Tasks
The user is important!
Users are different High school student vs. intelligence analyst
Different types of relevance Topical, situational, etc.
Knowledge about problem structureKnowledge about user tasksKnowledge about the domain
K3: Domain
Why is the sky blue?
Users bring a tremendous amount of knowledge to bear when asking questions Specialized, technical knowledge Commonsense
Knowledge about problem structureKnowledge about user tasksKnowledge about the domain
“To really learn something, you basically have to already know it.”
K4 … Kn?
More types of knowledge need?
Working hypothesis: {K1, K2, K3} comprise a necessary set
Introductions
Dr. Dr. Dina Demner-Fushman, M.D., Ph.D.Dr. , Ph.D.
Why the Medical Domain?
Evidence-Based Medicine = A paradigm of medical practice that emphasizes
decision-support from high-quality clinical research Provides a basis for K1, K2, and K3
Need for retrieval systems is well documented:
Clinical QA: “Ready-made” domain for exploring conceptual retrieval Availability of corpora, resources, etc. Important and potentially high-impact application
e.g., (Gorman et al., 1994; Chambliss and Conley, 1996; Cogdill and Moore, 1997; Ely et al., 2005; Sutton et al., 2005)
K1: Problem Structure
EBM identifies four components of a question Originally developed as a clinical tool Can serve as a knowledge representation
Knowledge about problem structureKnowledge about user tasksKnowledge about the domain
“In children with an acute febrile illness, what is the efficacy of single-medication therapy with acetaminophen or ibuprofen in reducing fever?”
= PICO frame
Population/Problem
children/acute febrile illness
Intervention acetaminophen
Comparison ibuprofen
Outcome reducing fever
K2: User Tasks
Clinical tasks
Considerations for strength of evidence Strength of Recommendations Taxonomy (SORT):
three evidence grades
Knowledge about problem structureKnowledge about user tasksKnowledge about the domain
Therapy Selecting effective treatments, taking into account other factors such as risk and cost
Diagnosis Selecting and interpreting diagnostic tests, while considering factors such as precision and safety
Prognosis Estimating the patient’s likely course over time and anticipating likely complications
Etiology Identifying risk factors and the causes for a patient’s disease
K3: Domain
The Unified Medical Language System (UMLS) 2004 version: 1+ million biomedical concepts, > 5
million concept names
Software for leveraging this resource: MetaMap, SemRep for identifying concepts, relations
Knowledge about problem structureKnowledge about user tasksKnowledge about the domain
ofloxacin
boric acid
Quinolone
Ciclopirox
Borate product
Antibacterial drugs
Mucous membrane antifungal agent
Disinfectants and cleansers
Anti-infective agent
Antifungal
Re: Conceptual Retrieval
Question: In children with an acute febrile illness, what is the efficacy of single-medication therapy with acetaminophen or ibuprofen in reducing fever?
Task therapyP children/acute febrile illnessI acetaminophenC ibuprofenO reducing fever
MEDLINE
P children/acute febrile illnessI acetaminophenC ibuprofenO reducing fever
Answer:Ibuprofen provided greater temperature decrement and longer duration of antipyresis than acetaminophen when the two drugs were administered in approximately equal doses.
NLM’s authoritative repository of 17 million+ abstracts
System Architecture
query frame
Question(query frame)
Answers
search query
abstracts
SemanticMatcher
KnowledgeExtractors
QueryFormulator
AnswerGenerator
PubMed
annotatedabstracts
scoredcitations
Test Collection
Manually gathered 50 clinical questions from FPIN and the Parkhurst Exchange Reflects distribution of real-world questions Divided into development and test collections
Therapy 22 Does quinine reduce leg cramps for young athletes?
Diagnosis 12 How often is coughing the presenting complaint in patients with gastroesophageal reflux disease?
Prognosis 6 What’s the prognosis of lupoid sclerosis?
Etiology 10 What are the causes of hypomagnesemia?
Total 50
Gathering Judgments
Manually formulated PubMed queries ~40 minutes per question; gathered top 50 fits
Manually evaluated all retrieved citations ~2 hours per question
Question: What is the best treatment for analgesic rebound headaches?
PubMed Query: (((“analgesics”[TIAB] NOTMedline[SB]) OR “analgesics”[MeSH Terms] OR “analgesics”[Pharmacological Action] OR analgesic[TextWord]) AND ((“headache”[TIAB] NOT Medline[SB]) OR “headache”[MeSH Terms] OR headaches[TextWord]) AND (“adverse effects”[Subheading] OR side effects[Text Word])) AND hasabstract[text] AND English[Lang] AND “humans”[MeSH Terms]
Antipyretic efficacy of ibuprofen vs acetaminophen.
OBJECTIVE--To compare the antipyretic efficacy of ibuprofen, placebo, and acetaminophen. DESIGN--Double-dummy, double-blind, randomized, placebo-controlled trial. SETTING--Emergency department and inpatient units of a large, metropolitan, university-based, children's hospital in Michigan. PARTICIPANTS--37 otherwise healthy children aged 2 to 12 years with acute, intercurrent, febrile illness. INTERVENTIONS--Each child was randomly assigned to receive a single dose of acetaminophen (10 mg/kg), ibuprofen (7.5 or 10 mg/kg), or placebo. MEASUREMENTS/MAIN RESULTS--Oral temperature was measured before dosing, 30 minutes after dosing, and hourly thereafter for 8 hours after the dose. Patients were monitored for adverse effects during the study and 24 hours after administration of the assigned drug. All three active treatments produced significant antipyresis compared with placebo. Ibuprofen provided greater temperature decrement and longer duration of antipyresis than acetaminophen when the two drugs were administered in approximately equal doses. No adverse effects were observed in any treatment group. CONCLUSION--Ibuprofen is a potent antipyretic agent and is a safe alternative for the selected febrile child who may benefit from antipyretic medication but who either cannot take or does not achieve satisfactory antipyresis with acetaminophen.
Am J Dis Child. 1992 May; 146(5):622-5
Antipyretic efficacy of ibuprofen vs acetaminophen.
OBJECTIVE--To compare the antipyretic efficacy of ibuprofen, placebo, and acetaminophen. DESIGN--Double-dummy, double-blind, randomized, placebo-controlled trial. SETTING--Emergency department and inpatient units of a large, metropolitan, university-based, children's hospital in Michigan. PARTICIPANTS--37 otherwise healthy children aged 2 to 12 years with acute, intercurrent, febrile illness. INTERVENTIONS--Each child was randomly assigned to receive a single dose of acetaminophen (10 mg/kg), ibuprofen (7.5 or 10 mg/kg), or placebo. MEASUREMENTS/MAIN RESULTS--Oral temperature was measured before dosing, 30 minutes after dosing, and hourly thereafter for 8 hours after the dose. Patients were monitored for adverse effects during the study and 24 hours after administration of the assigned drug. All three active treatments produced significant antipyresis compared with placebo. Ibuprofen provided greater temperature decrement and longer duration of antipyresis than acetaminophen when the two drugs were administered in approximately equal doses. No adverse effects were observed in any treatment group. CONCLUSION--Ibuprofen is a potent antipyretic agent and is a safe alternative for the selected febrile child who may benefit from antipyretic medication but who either cannot take or does not achieve satisfactory antipyresis with acetaminophen.
Am J Dis Child. 1992 May; 146(5):622-5
Knowledge Extraction Example
Population Problem Interventions Outcome
Question
Answers
SemanticMatcher
KnowledgeExtractors
QueryFormulator
AnswerGenerator
PubMed
Knowledge Extractors
Population, Problem, Intervention: IE task Exploited coverage of medical concepts in UMLS Additional candidate ranking based a few features
Outcome: sentence-level classification task “Kitchen sink approach”, ensemble of classifiers Features:
• Manually-defined cue words
• N-grams
• Position in abstract
• Presence of certain UMLS concepts
• …
Semantics helps!Question
Answers
SemanticMatcher
KnowledgeExtractors
QueryFormulator
AnswerGenerator
PubMed
Knowledge Extractors
?80% 0% 20%
?90% 5% 5%
?80% 13% 7%
?95% 0% 5%
OutcomePopulationProblem Intervention
Antipyretic efficacy of ibuprofen vs acetaminophen.
OBJECTIVE--To compare the antipyretic efficacy of ibuprofen, placebo, and acetaminophen. DESIGN--Double-dummy, double-blind, randomized, placebo-controlled trial. SETTING--Emergency department and inpatient units of a large, metropolitan, university-based, children's hospital in Michigan. PARTICIPANTS--37 otherwise healthy children aged 2 to 12 years with acute, intercurrent, febrile illness. INTERVENTIONS--Each child was randomly assigned to receive a single dose of acetaminophen (10 mg/kg), ibuprofen (7.5 or 10 mg/kg), or placebo. MEASUREMENTS/MAIN RESULTS--Oral temperature was measured before dosing, 30 minutes after dosing, and hourly thereafter for 8 hours after the dose. Patients were monitored for adverse effects during the study and 24 hours after administration of the assigned drug. All three active treatments produced significant antipyresis compared with placebo. Ibuprofen provided greater temperature decrement and longer duration of antipyresis than acetaminophen when the two drugs were administered in approximately equal doses. No adverse effects were observed in any treatment group. CONCLUSION--Ibuprofen is a potent antipyretic agent and is a safe alternative for the selected febrile child who may benefit from antipyretic medication but who either cannot take or does not achieve satisfactory antipyresis with acetaminophen.
Am J Dis Child. 1992 May; 146(5):622-5
Question
Answers
SemanticMatcher
KnowledgeExtractors
QueryFormulator
AnswerGenerator
PubMed
Details: Dina Demner-Fushman and Jimmy Lin. Answering Clinical Questions with Knowledge-Based and Statistical Techniques. Computational Linguistics, 33(1):63-103, 2007
Semantic Matching
Three score components:
SEBM = SPICO + SSoE + SMeSH
SPICO Matching PICO frame elements
SSoE Strength of evidence considerations
SMeSH MeSH indicators for each clinical task
Problem Structure User Tasks
Question
Answers
SemanticMatcher
KnowledgeExtractors
QueryFormulator
AnswerGenerator
PubMed
Details: Dina Demner-Fushman and Jimmy Lin. Answering Clinical Questions with Knowledge-Based and Statistical Techniques. Computational Linguistics, 33(1):63-103, 2007
Semantic Matching: Evaluation
Research Questions Does it work? What are the relative contributions of each component? What is the interaction between knowledge-based and
statistical techniques?
Approach Reranking experiments with test collection Ablation studies
Question
Answers
SemanticMatcher
KnowledgeExtractors
QueryFormulator
AnswerGenerator
PubMed
Evaluation: Abstract RerankingQuestion: What is the best treatment for analgesic rebound headaches?
(((“analgesics”[TIAB] NOTMedline[SB]) OR “analgesics”[MeSH Terms] OR “analgesics”[Pharmacological Action] OR analgesic[TextWord]) AND ((“headache”[TIAB] NOT Medline[SB]) OR “headache”[MeSH Terms] OR headaches[TextWord]) AND (“adverse effects”[Subheading] OR side effects[Text Word])) AND hasabstract[text] AND English[Lang] AND “humans”[MeSH Terms]
MEDLINE
Question
Answers
SemanticMatcher
KnowledgeExtractors
QueryFormulator
AnswerGenerator
PubMed
KnowledgeExtractor
Clinical task,PICO frame
SemanticMatcher
P
I
C
O
vs. original PubMed orderingvs. Indri baseline (state-of-the-art LM)
Results: Complete Model
Performance on held-out blind test set:
Therapy Diagnosis Prognosis Etiology All
Precision at 10 (P10)
PubMed .350 (–39%) .150 (–70%) .200 (–46%) .320 (–20%) .281 (–44%)
Indri .575 .500 .367 .400 .500
EBM .783 (+36%) .583 (+17%) .467 (+27%) .660 (+65%) .677 (+35%)
Mean Average Precision (MAP)
PubMed .421 (–29%) .279 (–48%) .235 (–56%) .364 (–17%) .356 (–35%)
Indri .595 .534 .533 .439 .544
EBM .765 (+29%) .637 (+19%) .722 (+35%) .701 (+60%) .718 (+32%)
Results are statistically significant
Question
Answers
SemanticMatcher
KnowledgeExtractors
QueryFormulator
AnswerGenerator
PubMed
Details: Jimmy Lin and Dina Demner-Fushman. The Role of Knowledge in Conceptual Retrieval: A Study in the Domain of Clinical Medicine. SIGIR 2006.
Results: Parameter Settings
Tuning each component
No statistically significant difference
Combining EBM + Indri
Better performance, but not statistically significant
SEBM = λ1 SPICO + λ2 SSoE + (1 - λ1 - λ2 ) SMeSH
SEBM+Indri = λ SEBM + (1- λ ) SIndri
Question
Answers
SemanticMatcher
KnowledgeExtractors
QueryFormulator
AnswerGenerator
PubMed
Details: Jimmy Lin and Dina Demner-Fushman. The Role of Knowledge in Conceptual Retrieval: A Study in the Domain of Clinical Medicine. SIGIR 2006.
Results: Contributions
What’s the contribution of each EBM facet?
What types of knowledge are important? Problem structure (K1) helps a lot
User tasks (K2) help, but not as much
MAP vs. EBM vs. Indri
SPICO .646 –10%** +19%*
SSoE + SMeSH .538 –25%** –1%
** = sig. at 99%, * = sig. at 95%
Problem Structure
User Tasks
P10 vs. EBM vs. Indri
SPICO .627 –7% +25%**
SSoE + SMeSH .485 –28%** –3%
Problem Structure
User Tasks
Question
Answers
SemanticMatcher
KnowledgeExtractors
QueryFormulator
AnswerGenerator
PubMed
Details: Jimmy Lin and Dina Demner-Fushman. The Role of Knowledge in Conceptual Retrieval: A Study in the Domain of Clinical Medicine. SIGIR 2006.
Results: Partial Models
Can we use limited knowledge to improve term-based methods?
Any knowledge helps!
λ MAP P10
SIndri .544 .500
λ SIndri + (1- λ) SPICO .46 .668 (+23%)** .627 (+25%)**
λ SIndri + (1- λ)(.5 SSoE + .5 SMeSH) .55 .620 (+14%)** .565 (+13%)*
** = sig. at 99%, * = sig. at 95%
+ Problem Structure
+ User Tasks
Term Statistics
Question
Answers
SemanticMatcher
KnowledgeExtractors
QueryFormulator
AnswerGenerator
PubMed
Details: Jimmy Lin and Dina Demner-Fushman. The Role of Knowledge in Conceptual Retrieval: A Study in the Domain of Clinical Medicine. SIGIR 2006.
Answer: Prevention of thromboembolic events in atrial fibrillation: The results from the SPAF III study demonstrated that a combination of mini-intensity warfarin plus aspirin was insufficient for stroke prevention in atrial fibrillation. Other trials now indicate, that oral anticoagulation at INR-values below 2.0 is not effective for stroke prevention in these patients. The present clinical challenge is to ensure effective and safe oral anticoagulation to patients with atrial fibrillation at high risk of stroke.
Answer Generation
Physicians are most interested in outcomes
Approach: identify outcome sentences Generate an answer from each citation: abstract title
and three highest scoring outcome sentences
Question: Does combining aspirin and warfarin decrease the risk of stroke for patients with nonvalvular atrial fibrillation?
Answer: Prevention of thromboembolic events in atrial fibrillation: The results from the SPAF III study demonstrated that a combination of mini-intensity warfarin plus aspirin was insufficient for stroke prevention in atrial fibrillation. Other trials now indicate, that oral anticoagulation at INR-values below 2.0 is not effective for stroke prevention in these patients. The present clinical challenge is to ensure effective and safe oral anticoagulation to patients with atrial fibrillation at high risk of stroke.
abstract title outcome1 outcome2 outcome3
Question
Answers
SemanticMatcher
KnowledgeExtractors
QueryFormulator
AnswerGenerator
PubMed
Evidence Synthesis
Integrate findings from multiple citations
Question
Answers
SemanticMatcher
KnowledgeExtractors
QueryFormulator
AnswerGenerator
PubMed
Question: What is the best treatment for chronic prostatitis?► anti-microbial
[temafloxacin] Treatment of chronic bacterial prostatitis with temafloxacin. Temafloxacin 400 mg b.i.d. administered orally for 28 days represents a safe and effective treatment for chronic bacterial prostatitis.
[ofloxacin] Ofloxacin in the management of complicated urinary tract infections, including prostatitis. In chronic bacterial prostatitis, results to date suggest that ofloxacin may be more effective clinically and as effective microbiologically as carbenicillin....
► Alpha-adrenergic blocking agent
[terazosine] Terazosin therapy for chronic prostatitis/chronic pelvic pain syndrome: a randomized, placebo controlled trial. CONCLUSIONS: Terazosin proved superior to placebo for patients with chronic prostatitis/chronic pelvic pain syndrome who had not received alpha-blockers previously....
Semantic Clustering
Question
Answers
SemanticMatcher
KnowledgeExtractors
QueryFormulator
AnswerGenerator
PubMed
relevantcitations
Cluster1
Cluster2
Cluster3
Answer Extraction
Semantic Clustering
Interactive Presentation
Evaluation: Evidence Synthesis
What is the best treatment of X?
Compare Top three answers from PubMed First answer in three largest semantic clusters
Evaluation by a physician:
Question
Answers
SemanticMatcher
KnowledgeExtractors
QueryFormulator
AnswerGenerator
PubMed
“Good” “Okay” “Bad”
PubMed 0.600 0.227 0.173
Semantic Clustering 0.827 0.133 0.040
Details: Dina Demner-Fushman and Jimmy Lin. Answer Extraction, Semantic Clustering, and Extractive Summarization for Clinical Question Answering. ACL 2006.
Findings
K1 + K2 + K3 → “conceptual retrieval”
Knowledge helps a lot!
But here’s the catch: Limited domain: “narrow but deep” Dependent on availability of existing resources
Beyond “bag of words”: Develop a general framework Instantiate in domain-specific applications Leverage lessons learned to refine the framework Rinse, repeat
Re: Re: Conceptual Retrieval
Question: In children with an acute febrile illness, what is the efficacy of single-medication therapy with acetaminophen or ibuprofen in reducing fever?
Task therapyP children/acute febrile illnessI acetaminophenC ibuprofenO reducing fever
MEDLINE
P children/acute febrile illnessI acetaminophenC ibuprofenO reducing fever
Answer:Ibuprofen provided greater temperature decrement and longer duration of antipyresis than acetaminophen when the two drugs were administered in approximately equal doses.
NLM’s authoritative repository of 17 million+ abstracts
Task therapyP children/acute febrile illnessI acetaminophenC ibuprofenO reducing fever
= faceted query!
facetfacet
facetfacet
facet
Conceptual Retrieval
“Building blocks” strategy in library science Decompose information need into conceptual facets Identify terms that represent those facets Instantiate in a structured query
EBM-based retrieval is a specific case of facet analysis and structured querying!
( A1 A2 …) ( B1 B2 …) ( C1 C2 …) ( D1 D2 …) …
P I C O
A General Framework?
For a domain
1. Identify prototypical information needs
2. Develop a frame-based representation
3. Build extractor for frame elements
4. Instantiate semantic matcher
5. Watch performance go up!
The subject of ongoing work…
What comes next?
Retrieval in the biomedical domain
Complex question answeringWhat evidence is there for transport of [art looted by the Nazis in WWII] from [Germany] to [France]?
What [familial ties] exist between [Neanderthals] and [humans]?
What [common interests] exist between [Network Solutions] and [the Internet Corporation for Assigned Names and Numbers (ICANN)]?
Information describing the role(s) of a [gene] involved in a [disease]. gene: Interferon-beta disease: Multiple Sclerosis
Information describing the role of a [gene] in a specific [biological process]. gene: nucleoside diphosphate kinase (NM23) biological process: tumor progression
Acknowledgments
Dina Demner-Fushman (Ph.D., 2006)
This work was funded in part by NLM
References
Dina Demner-Fushman and Jimmy Lin. Answering Clinical Questions with Knowledge-Based and Statistical Techniques. Computational Linguistics, 33(1):63-103, 2007.
Jimmy Lin and Dina Demner-Fushman. The Role of Knowledge in Conceptual Retrieval: A Study in the Domain of Clinical Medicine. Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2006), 2006, pp. 99-106.
Dina Demner-Fushman and Jimmy Lin. Answer Extraction, Semantic Clustering, and Extractive Summarization for Clinical Question Answering. Proceedings of the 21th International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL 2006), 2006, pp. 841-848.
Recommended