Query Processing: Query Formulation

  • View
    62

  • Download
    0

Embed Size (px)

DESCRIPTION

Query Processing: Query Formulation. Ling573 NLP Systems and Applications April 14, 2011. Roadmap. Motivation: Retrieval gaps Query F ormulation: Question Series Query reformulation: AskMSR patterns MULDER parse-based formulation Classic query expansion Semantic resources - PowerPoint PPT Presentation

Transcript

Query Processing: Query Formulation

Query Processing:Query FormulationLing573NLP Systems and ApplicationsApril 14, 2011RoadmapMotivation:Retrieval gaps

Query Formulation:Question SeriesQuery reformulation:AskMSR patternsMULDER parse-based formulationClassic query expansionSemantic resourcesPseudo-relevance feedbackRetrieval GapsGoal: Based on question,Retrieve documents/passages that best capture answerRetrieval GapsGoal: Based on question,Retrieve documents/passages that best capture answerProblem:Mismatches in lexical choice, sentence structureRetrieval GapsGoal: Based on question,Retrieve documents/passages that best capture answerProblem:Mismatches in lexical choice, sentence structureQ: How tall is Mt. Everest? Retrieval GapsGoal: Based on question,Retrieve documents/passages that best capture answerProblem:Mismatches in lexical choice, sentence structureQ: How tall is Mt. Everest? A: The height of Everest isRetrieval GapsGoal: Based on question,Retrieve documents/passages that best capture answerProblem:Mismatches in lexical choice, sentence structureQ: How tall is Mt. Everest? A: The height of Everest isQ: When did the first American president take office?A: George Washington was inaugurated in.Query FormulationGoals:Query FormulationGoals:Overcome lexical gaps & structural differencesTo enhance basic retrieval matchingTo improve target sentence identificationIssues & Approaches:Query FormulationGoals:Overcome lexical gaps & structural differencesTo enhance basic retrieval matchingTo improve target sentence identificationIssues & Approaches:Differences in word forms:Query FormulationGoals:Overcome lexical gaps & structural differencesTo enhance basic retrieval matchingTo improve target sentence identificationIssues & Approaches:Differences in word forms:Morphological analysisDifferences in lexical choice:Query FormulationGoals:Overcome lexical gaps & structural differencesTo enhance basic retrieval matchingTo improve target sentence identificationIssues & Approaches:Differences in word forms:Morphological analysisDifferences in lexical choice:Query expansionDifferences in structureQuery FormulationConvert question suitable form for IRStrategy depends on document collectionWeb (or similar large collection):stop structure removal: Delete function words, q-words, even low content verbsCorporate sites (or similar smaller collection):Query expansionCant count on document diversity to recover word variationAdd morphological variants, WordNet as thesaurusReformulate as declarative: rule-basedWhere is X located -> X is located inQuestion SeriesTREC 2003-Target: PERS, ORG,..Assessors create series of questions about targetIntended to model interactive Q/A, but often stiltedIntroduces pronouns, anaphora

Question SeriesTREC 2003-Target: PERS, ORG,..Assessors create series of questions about targetIntended to model interactive Q/A, but often stiltedIntroduces pronouns, anaphora

Handling Question SeriesGiven target and series, how deal with reference?Handling Question SeriesGiven target and series, how deal with reference?Shallowest approach:Concatenation:Add the target to the questionHandling Question SeriesGiven target and series, how deal with reference?Shallowest approach:Concatenation:Add the target to the questionShallow approach:Replacement:Replace all pronouns with target

Handling Question SeriesGiven target and series, how deal with reference?Shallowest approach:Concatenation:Add the target to the questionShallow approach:Replacement:Replace all pronouns with targetLeast shallow approach:Heuristic reference resolutionQuestion Series ResultsNo clear winning strategyQuestion Series ResultsNo clear winning strategyAll largely about the targetSo no big win for anaphora resolutionIf using bag-of-words features in search, works fine

Question Series ResultsNo clear winning strategyAll largely about the targetSo no big win for anaphora resolutionIf using bag-of-words features in search, works fine

Replacement strategy can be problematic E.g. Target=Nirvana:What is their biggest hit?

Question Series ResultsNo clear winning strategyAll largely about the targetSo no big win for anaphora resolutionIf using bag-of-words features in search, works fine

Replacement strategy can be problematic E.g. Target=Nirvana:What is their biggest hit?When was the band formed?

Question Series ResultsNo clear winning strategyAll largely about the targetSo no big win for anaphora resolutionIf using bag-of-words features in search, works fine

Replacement strategy can be problematic E.g. Target=Nirvana:What is their biggest hit?When was the band formed?Wouldnt replace the band

Question Series ResultsNo clear winning strategyAll largely about the targetSo no big win for anaphora resolutionIf using bag-of-words features in search, works fine

Replacement strategy can be problematic E.g. Target=Nirvana:What is their biggest hit?When was the band formed?Wouldnt replace the band

Most teams concatenate

AskMSRShallow Processing for QA (Dumais et al 2002, Lin2007)

12345IntuitionRedundancy is useful!If similar strings appear in many candidate answers, likely to be solutionEven if cant find obvious answer stringsIntuitionRedundancy is useful!If similar strings appear in many candidate answers, likely to be solutionEven if cant find obvious answer stringsQ: How many times did Bjorn Borg win Wimbledon?Bjorn Borg blah blah blah Wimbledon blah 5 blah Wimbledon blah blah blah Bjorn Borg blah 37 blah. blah Bjorn Borg blah blah 5 blah blah Wimbledon 5 blah blah Wimbledon blah blah Bjorn Borg.IntuitionRedundancy is useful!If similar strings appear in many candidate answers, likely to be solutionEven if cant find obvious answer stringsQ: How many times did Bjorn Borg win Wimbledon?Bjorn Borg blah blah blah Wimbledon blah 5 blah Wimbledon blah blah blah Bjorn Borg blah 37 blah. blah Bjorn Borg blah blah 5 blah blah Wimbledon 5 blah blah Wimbledon blah blah Bjorn Borg.Probably 5Query ReformulationIdentify question type:E.g. Who, When, Where,Create question-type specific rewrite rules:Query ReformulationIdentify question type:E.g. Who, When, Where,Create question-type specific rewrite rules:Hypothesis: Wording of question similar to answerFor where queries, move is to all possible positionsWhere is the Louvre Museum located? =>Is the Louvre Museum locatedThe is Louvre Museum locatedThe Louvre Museum is located, .etc.Query ReformulationIdentify question type:E.g. Who, When, Where,Create question-type specific rewrite rules:Hypothesis: Wording of question similar to answerFor where queries, move is to all possible positionsWhere is the Louvre Museum located? =>Is the Louvre Museum locatedThe is Louvre Museum locatedThe Louvre Museum is located, .etc.Create type-specific answer type (Person, Date, Loc)Query Form Generation3 query forms: Initial baseline query

Query Form Generation3 query forms: Initial baseline queryExact reformulation:weighted 5 times higherAttempts to anticipate location of answerQuery Form Generation3 query forms: Initial baseline queryExact reformulation:weighted 5 times higherAttempts to anticipate location of answerExtract using surface patternsWhen was the telephone invented?

Query Form Generation3 query forms: Initial baseline queryExact reformulation:weighted 5 times higherAttempts to anticipate location of answerExtract using surface patternsWhen was the telephone invented?the telephone was invented ?x

Query Form Generation3 query forms: Initial baseline queryExact reformulation:weighted 5 times higherAttempts to anticipate location of answerExtract using surface patternsWhen was the telephone invented?the telephone was invented ?xGenerated by ~12 pattern matching rules on terms, POSE.g. wh-word did A verb B -Query Form Generation3 query forms: Initial baseline queryExact reformulation:weighted 5 times higherAttempts to anticipate location of answerExtract using surface patternsWhen was the telephone invented?the telephone was invented ?xGenerated by ~12 pattern matching rules on terms, POSE.g. wh-word did A verb B -> A verb+ed B ?x (general)Where is A? ->

Query Form Generation3 query forms: Initial baseline queryExact reformulation:weighted 5 times higherAttempts to anticipate location of answerExtract using surface patternsWhen was the telephone invented?the telephone was invented ?xGenerated by ~12 pattern matching rules on terms, POSE.g. wh-word did A verb B -> A verb+ed B ?x (general)Where is A? -> A is located in ?x (specific)Inexact reformulation: bag-of-words

Query ReformulationExamples

Deeper Processing for Query FormulationMULDER (Kwok, Etzioni, & Weld)Converts question to multiple search queriesForms which match targetVary specificity of queryMost general bag of keywordsMost specific partial/full phrasesDeeper Processing for Query FormulationMULDER (Kwok, Etzioni, & Weld)Converts question to multiple search queriesForms which match targetVary specificity of queryMost general bag of keywordsMost specific partial/full phrasesEmploys full parsing augmented with morphologySyntax for Query FormulationParse-based transformations:Applies transformational grammar rules to questionsSyntax for Query FormulationParse-based transformations:Applies transformational grammar rules to questionsExample rules:Subject-auxiliary movement:Q: Who was the first American in space?Syntax for Query FormulationParse-based transformations:Applies transformational grammar rules to questionsExample rules:Subject-auxiliary movement:Q: Who was the first American in space?Alt: was the first American; the first American in space wasSubject-verb movement:Who shot JFK? Syntax for Query FormulationParse-based transformations:Applies transformational grammar rules to questionsExample rules:Subject-auxiliary movement: