Essentials of Approach
A certain shift from deep text analysis and NLP methods to surface techniques
Use of formulas describing the structure of strings likely bearing certain semantic information
Example
FBI Director Louis Freeh A person represented by his/her first/last
names A person occupies a post in an
organization
The formula
A word composed of capital letters An item from a list of posts in an
organization An item from a list of first names A capitalized word
Patterns
Formulas of such kind were called “patterns”
First used at TREC-10 QA track Each pattern is characterized by a certain
generalized semantics
Steps (Overview)
Identify strings corresponding to a formula Identify the question terms (types) Check for expressions negating the
semantics of the found strings Apply the set of formulas (for a particular
question type) to match the strings in question-relevant passages
A Surface Approach
No need to distinguish linguistic entities Formulas for strings look like regular
expressions But patterns include elements referring to
lists of predefined words/phrases
Patterns and Question Types
Who is person X? Who occupies post Y in organization Z?
A relationship is established between 2 or more entities: person, post, organization etc
Where-question: suggest geographical items as answersConstruct formulas like: item from list of
cities/towns/counties, countries/states.
Examples
”In what year” – questionsFind strings with a sequence of 4 digits
Questions regarding length, area, weight, speed, etcDigits plus units of measurement
“What is the area of Venezuela?”340,569 square miles (a simple pattern
match)
Complex Patterns
Strings expressing relationship between several semantic entities
The more complex a pattern is, the higher its reliability
Names and Dates People Names
Items from first name list Capitalized words Specific name elements (bin, van, etc) Abbreviations like Sr. and Jr.
Dates Prepositions, articles, digits, month names, commas,
dashes, brackets, phrases like “early,” “in the period of,” “years ago,” “B.C.”
Pattern-Matching Strings and Question Semantics How question words are located in the pattern-
matching string (distance, left/right, position to other matching strings etc)
Simplicity of a pattern’s structure is compensated by complexity of rules
Without applying heuristic rules, sufficiently reliable results cannot be ensured
Rank assigned to question words/phrases and score assigned to candidate answers
QA Process Define question types for all questions Order the questions with more reliable patterns Form and rank queries from question terms Modify queries (if score is below threshold) Identify pattern-matching strings (apply complex
and then simple) Check correlation between patterns and
question semantics Identify exact answers and calculate their scores
Recommended