80
AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager [email protected] 301-688-7092 http://www.ic-arda.org

AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager [email protected] 301-688-7092

Embed Size (px)

Citation preview

Page 1: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT R&D Program:“State of the Program”

Phase I 12-Month Workshop2-5 December 2002

Dr. John D. PrangeAQUAINT Program Manager

[email protected]

http://www.ic-arda.org

Page 2: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Outline

• Overview of ARDA

• Overview of Information Exploitation Thrust

• AQUAINT R&D Program

– What it is and is not

– Technical Challenges

– State-of-the-Program

Page 3: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

• MEANS:

– A nimble, cross-community organization

– A modest, yet significant budget

– Small, outward-looking staff working as “honest brokers” and “agent provocateurs”

Introducing ARDA

• MISSION:

– Incubate revolutionary R&D for the shared benefit of the Intelligence Community

A joint Department of Defense / Intelligence Community organization launched in Dec 98

Page 4: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

How ARDA Interacts

• Community organizations– Plans, forecasts, oversight– Customer champions

• Thrust panels / managers– R&D problem statements– Internal peer review

• Industry and academia– Principal funding recipients – External peer review and

staff

Page 5: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Where ARDA IsWhere ARDA Is

National Security AgencyFort George G. Meade, MD

Room 12A69NBP#1 Building

301-688-7092800-276-3747 301-688-7410 (FAX)

http://www.ic-arda.org

[email protected]

Page 6: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Current ARDA Programs

CommunityParticipation

R&DThrusts

Information Exploitation

Dr. JohnPrange

ExploratoryResearchPrograms

Dr. TimPersons

Quantum Information

Science

Dr. DeanCollins

DigitalNetworking Mr. Greg

Puffenbarger

Novel Intelligencefrom

Massive Data

Dr. Greg SmithDr. Lucy Nowell

ResourceEnhancement

ProgramMs. Penny Lehtola Coming in FY2003:

Major New R&D Thrust inAdvanced IC INFOSEC

Page 7: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Outline

• Overview of ARDA

• Overview of Information Exploitation Thrust

• AQUAINT R&D Program

– What it is and is not

– Technical Challenges

– State-of-the-Program

Page 8: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Information Exploitation (Info-X)

Presentation and Visualization

Information Discovery

AnalyticKnowledge

Information Retrieval

InformationUnderstanding

Assessmentand

Interpretation

Content Data Mark-up

Content Data Transformation

Synthesis and Fusion

IC Analysts

Data Filtering& Selection

Reporting and Dissemination

What Functions Does It Include?

Info-X is Focused on Informational Content & Its Meaning!

Page 9: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Report……………………..

……………………..

……………………..

……………………..

……………………..

Report……………………..

……………………..

……………………..

……………………..

……………………..Analysis: Turning Raw Data into Reportable Intelligence

IntelligenceCommunity

Products

Analysis: Turning Raw Data into Reportable Intelligence

Presentation and Visualization

Information Discovery

AnalyticKnowledge

Information Retrieval

InformationUnderstanding

Assessmentand

InterpretationContent Data

Markup

Content Data Transformation

Synthesis and Fusion

Data Filtering

& Selection

Reporting and Dissemination

IC Analysts

It Remains an Analyst Intensive Activity

We Need To Dramatically Improve Our Ability to Find & Understand Information

Multiple Sources

Lack ofControl onCreation

Variable Topics

& Domains

Limited Reasoning

Capabilities

Natural (vs. Artificial)

Language

Image/Video Understanding

Missing,Conflicting,

Ambiguous Data

Types, Sources,Quantitiesof Errors

Degree of Interpretation& Judgment

Role of Knowledge

Formal vs.Informal

Conversation

AutomatedInformationExtraction

Lack of AutomatedLearning

Importanceof Time

Dimension

CrossDocumentAnalysis

Importanceof

Context

Multiple &Multi-Media

Data Integrity/Use of Deception

Many ForeignLanguages/

Character Scripts

Goal /Objective of Originator

KnowledgeRepresentation

Depth ofUnderstanding

Required

“Barriers” to Deep Understanding of ContentWith Each Passing Day . . .

• More “Hay” • Lower No. Of “Needles per Volume of Hay” • Fewer Analysts AND• Less Time!

Raw Data“Finding theNeedles in

the Haystack”

13

UNDER DESTRUCTION

Clearly . . .We MUST Reduce these “Barriers” & Create “Cracks in this Wall”!

But How . . . 13July 2002

Page 10: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Current Info-X R&D Programs

• AQUAINT Advanced QUestion & Answering for INTelligence

• NIMD Novel Intelligence from Massive Data

• VACE Video Analysis and Content Extraction

• GI2Vis Geospatial Intelligence Information Visualization

Full R&D Programs

consisting of MultiplePhases

ExploratoryR&D Programs consisting of

Programs1-Year

+ Option Year

• NDHB Non-Linear Dynamics from Human Behavior

• LEMUR Statistical Language Modeling for Information Retrieval

Page 11: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Outline

• Overview of ARDA

• Overview of Information Exploitation Thrust

• AQUAINT R&D Program

– What it is and is not

– Technical Challenges

– State-of-the-Program

Page 12: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Single, Factoid Question ?

Ranked List of Hopefully “Relevant”

Documents. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

System SpecificQuery; often Tailored

to Question TypeTraditional Information Retrieval

SingleData

Source

Move Closerto the Questione.g. QuestionClassification

QA

Open Domain Factoid Question Answering

“Answer”

Move Closerto the Answere.g. Passage

Retrieval

ShallowAnalysis

Page 13: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

TREC QA Track Results

• ARDA & DARPA co-sponsoring the Question Answering Track in the NIST’s organized Text Retrieval Conference (TREC) Program. (Starting with TREC-8 in Nov 1999)

• TREC-10 Results (Nov 2001):

– 500- factual questions; About 50 questions had no answer in the TREC-10 Data sources; Used “Real” Questions

– Data source: approx. 3 GByte database of ~980K news stories

– 36 US & international organizations participated; 92 separate runs evaluated

– System output: top 5 regions(50 bytes) in a single story believed to contain Answer to the given question

345 333308 296 292 281 280 279

0

100

200

300

400

500

Q's

with

Cor

rect

Ans

wer

(Top

5 R

espo

nses

- 50

Byt

e R

egio

n)

1 2 3 4 5 6 7 8

Systems

QA Track Results-TREC 10 (Nov 2001)

Top System: 70% of the“Answers” found in their top 5 50-byte Passages

Page 14: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

“Ask Jeeves” Approach

•Start with Your Question

• Identify Key Words & Classifies the Type of Question

• Respond with rephrased “Questions” for which “Ask Jeeves” knows the Answer

• Provide Additional Web Sites as a fall back position (a la --- a more traditional web search engine)

Page 15: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Direct KnowledgeEntry by Domain

Experts

ParallelDevelopment by

Distributed Teams

Rapid KnowledgeFormation

Comprehensive(Million-Axiom)

Knowledge BasesGene rate plau sib le

crisis scenarios

Uncover co nnectedactivities, thre ats

Reason a bout no velcrisis situations

Mon ito r and in terpre tmassive da ta steams

Gene rate po ssiblecour se s of actions

Perfor m vulnerab ilityana lyses

Reason a bout no velbatt le fie ld sit uations

Mon ito r and in terpre tchang in g battlefield

event s

Answe r cause & effe ctque st io ns about events

Answe r question s aboutfor ce capabilities

Retrieve f acts relevant toa crisis

CrisisUnderstanding

Answe r question s aboutter rain

Commander’sAssociate

10 K

100 K

1,000 K

Need to create newknowledge at a rate of 400

axioms per hour

(With HPKB technology, a 5-personteam can create knowledge at a

rate o f 40 axioms per hour)

Biological Weapons (BW)Knowledge

• Basic knowledge of space, time,causality, general physics

• Biology, & biologica l threats• BW R&D, produce, weaponize• Geo-po litical behavior & terrorism

Required

6 Months 12 Months

10 K

100 K

1,000 K

HPKB

Development Time

UpperOntology

Mid-LevelTheories

Domain-S pecificTheories

Rapid Knowledge Formation (RKF)

Structured Knowledge-Base Approach

Deepest QA but Limited to Given Subject Domain

•Create comprehensive Knowledge Base(s) or other Structured Data Base(s)

• At the 10K Axiom Level -- Capable of Answering factual questions within domain

• At the 100K Axiom Level -- Answer cause & effect/capability Questions

• At the 1000K Axiom Level -- Answer Novel Questions; ID alternatives

Page 16: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Other Tailored Approaches To Question Answering

• FAQ (Frequently Asked Questions)

• Help Desks / Customer Service Phone Centers

• Accessing Complex set of Technical Maintenance Manuals

• Integrating QA in Knowledge Management and Portals

• Wide variety of Other E-Business Applications

• Integrating QA Technology into post-secondary and lifelong learning strategies – The Learning Federation

• Vulcan, Inc. (Paul Allen’s Company) has established an independent R&D Program (HALO) in Knowledge-based QA – Seeking ultimate commercial applications

Multiple Commercial/Research Groups are currently pursuing the Application of Question Answering Methods to:

Page 17: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Overarching Context / Operational Requirement

Who is thisadvisor?

What do weknow about

him/her?

What are his/her views?

What influence does he/she have on FM?

And still more questions ???

In a foreign news broadcast a team of analysts observe a previously unknown individual conferring with the Foreign Minister. They suspect

that he/she is really a new senior advisor.

Does this signal that other

policy changes are coming?

Intelligence Analysts

AQUAINTAdvanced QUestion & Answering for INTelligence

Page 18: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Overarching Context /Operational Requirement

AdvancedQA

AQUAINTAdvanced QUestion & Answering for INTelligence

Deeper, AutomatedUnderstanding;

Extract & AnalyzeResults

Answers

Provide Answers in a Form Analysts Want

Interpret ResultsAnd Formulate

The Answer

DetermineThe

Answer

Ranked Lists of

“Relevant” Data Objects

System SpecificQueries; Fully Tailoredto Series of Questions

ExtendTraditional Information Retrieval

MultipleHeterogeneous

DataSources

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

Multi-Media Multi-Media Structured Structured

Other Other

Text Text Voice Voice

QuestionUnderstanding

AndInterpretation Factoid

Questions?

WhyQuestions

?

InterpretiveQuestions?

Judgement Questions?

OtherQuestions?

Predictive Questions

? Interpreting Complex

QA Scenario within a Larger Context

Information Analysts

Page 19: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

• What it is and What it is not . . .

– Question & Answering Aimed at the “Information Professional” --- Not just the Casual User

– Full Range of Questions --- Not just Factoid Questions

– Rich, Contextually-based Question Scenarios --- Not just Isolated Questions

– Open Domain, Multiple Media, Multiple Languages, Multiple Genre, Structured and Unstructured Data --- Not just a Focused Data Environment

AQUAINTAdvanced QUestion & Answering for INTelligence

Page 20: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Outline

• Overview of ARDA

• Overview of Information Exploitation Thrust

• AQUAINT R&D Program

– What it is and is not

– Technical Challenges

– State-of-the-Program

Page 21: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Top 10 Challenges

1) Satisfy QA requirements of the “Professional” Information Analyst

2) Pursue QA Scenarios and not just isolated, factually based QA

3) Support a collaborative, multiple analyst environment

4) Some times SMALL things really matter and other times BIG things don’t

5) Advanced QA must attack the “Data Chasm”

6) Time is of the Essence

Page 22: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Top 10 Challenges

7) Must extract, represent and preserve information uncovered when searching for answers

8) Rapidly increasing importance of Knowledge of all types -- regardless of the approach

9) Expanding requirements for more advanced learning and reasoning methods/approaches

10) Discovering the correct answer will be hard enough; but crafting an appropriate, articulate, succinct, explainable response will be even harder

Page 23: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Top 10 Challenges

1) Satisfy QA requirements of the “Professional” Information Analyst

Page 24: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

• For ARDA they are:– Government and Military Analysts

• But they could also be:– Investigative / “CNN-type” Reporters– Financial Industry Analysts / Investors– Historians / Biographers – Lawyers / Law Clerks– Law Enforcement Detectives– And Others

Professional Information Analysts:Target Audience for AQUAINT -- Who are They?

Professional Information Analysts

Page 25: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

• They are far more than just casual users of information

• They work in an information rich environment where they have access to large quantities of heterogeneous data

• They are almost always subject matter experts within their assigned task areas

• They track and follow a given event, scenario, problem, or situation for an extended period of time

• They are focused on their assigned task or mission and will do whatever it takes to accomplish it

• The end product that results from their analysis is often judged against the

standards of:Timeliness Accuracy UsabilityCompleteness Relevance

Professional Information Analysts:What do They have in Common?

Professional Information Analysts

Page 26: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Top 10 Challenges

1) Satisfy QA requirements of the “Professional” Information Analyst

2) Pursue QA Scenarios and not just isolated, factually based QA

Page 27: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

FactoidQuestions

?

WhyQuestions

?

InterpretiveQuestions?

Judgement Questions

?

OtherQuestions?

Information Analysts

Predictive Questions

?

Overarching Context / Operational Requirement

Implications of QA Scenarios

• Requires handling a Full Range of Complexity & Continuity of Questions

• Need to understand & track the analysts’ line of reasoning and flow of argument

• QA System requires significantly greater insight into knowledge, desires, past experiences, likes and dislikes of “Questioner”

• Place much higher value on recognizing and capturing “background” information

• Questioner/System dialogue is now more than just a means for clarification

Page 28: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Top 10 Challenges

1) Satisfy QA requirements of the “Professional” Information Analyst

2) Pursue QA Scenarios and not just isolated, factually based QA

3) Support a collaborative, multiple analyst environment

Page 29: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Collaboration within QA

• Standard Collaboration (From an Analyst Perspective)

– Who else is working all or a portion of my task?

– What do they know that I don’t and vice versa?

– Can we share/work together?

• Non-Standard Discovery (From a System Perspective)

– Identify previous QA Scenarios that have “similarity” to current QA Scenario. Compare & Contrast

– Use / Build-on / Update previous results

– Uncover new data sources– Borrow a successful “line

of reasoning” or “argument flow”

– Alerts analyst to different interpretations or to overlooked / undervalued data

QUESTION????

Clarification

Other Analysts

Question & RequirementContext; Analyst Background

Knowledge

Multimedia Examples

Natural Statement ofQuestion;

Use of

QueryAssessment,

Advisor,Collaboration

Question Understanding and Interpretation

Knowledge Bases;Technical Databases

Focus

Page 30: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Top 10 Challenges

1) Satisfy QA requirements of the “Professional” Information Analyst

2) Pursue QA Scenarios and not just isolated, factually based QA

3) Support a collaborative, multiple analyst environment

4) Some times SMALL things really matter and other times BIG things don’t

Page 31: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

“Small & Big” - Can we tell the difference?

• Some times SMALL differences can produce significantly different results/interpretations:– Stop Words

• “Books {by; for; about} kids”

– Attachments• “The man saw the woman in the park with the telescope.”

– Co-reference• “John {persuaded; promised} Bill to go. He just left.”• “Mary took the pill from the bottle. She swallowed it.”

• Other times BIG differences can produce the same/similar results:– “Name the films in which Denzel Washington starred.”

– “Denzel Washington played a leading role in which movies?”

– “In what Hollywood productions did Denzel Washington receive top billing?”

Page 32: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Top 10 Challenges

1) Satisfy QA requirements of the “Professional” Information Analyst

2) Pursue QA Scenarios and not just isolated, factually based QA

3) Support a collaborative, multiple analyst environment

4) Some times SMALL things really matter and other times BIG things don’t

5) Advanced QA must attack the “Data Chasm”

Page 33: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Missing Data

MANY Heterogeneous Data Sources;

All Types, Sizes, Locations

IncreasingVolumes

(Petabyte & up)

Synthesis AcrossMedia/”Documents”

ContradictoryData

Data Chasm

Attacking the Data Chasm

Future

Fully Intersected;Automatically

Generated;Variable Structure/Format;

Full Context Responses

Full Context-Based

QuestionScenario

Level III

Full Context-Based

QuestionScenario

Fully Intersected;Automatically

Generated;Variable Structure/Format;

Full Context Responses

Level II

Variable NarrativeSummary;

Multi-Media Presentations;

Simple InterpretedResults

Cross MediaCross Document

Simple Judgement

Level I

Fixed Templatesor

Tabular Lists

Mulit-ValuedFactual QuestionsQuestions

Answers

Today

50/250 BytePassage from

Single TextDocument

SingleFactualIsolated

Questions

Page 34: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Top 10 Challenges

1) Satisfy QA requirements of the “Professional” Information Analyst

2) Pursue QA Scenarios and not just isolated, factually based QA

3) Support a collaborative, multiple analyst environment

4) Some times SMALL things really matter and other times BIG things don’t

5) Advanced QA must attack the “Data Chasm”

6) Time is of the Essence

Page 35: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Time: Our Achilles Heel?

• Real Difficulties Exist in:– Extracting, correctly interpreting time references &

then creating manageable timelines– Estimating & updating changing reliability of

information over time– Processing information in time sequence e.g.

Tracking the details of an evolving event over time -- A whole different set of problems

• And of course: – We can’t forget all of the issues related to the

timeliness of the system’s response to our question(s) -- we’ll need at least “near real time responses”

March April May June July August

Page 36: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Top 10 Challenges

7) Must extract, represent and preserve information uncovered when searching for answers

Page 37: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

• A Different Paradigm may be useful when handling QA Scenarios:

• Current Analytic Paradigm:

QA Scenarios: A Different Paradigm?

– Sequentially “Filter Down” to the

final result

Processing & Analysis

Data

Results

– Works when QA’s are independent, isolated activities

– Cast a “wider net” while searching

for “golden nuggets” (Answers)

AnswersSpace of Data Objects and Sources

How Wide to Cast the “Net”?

What Info to Retain? In what form?

For how long?

– Automatically Extract, Represent,

and Preserve “closely related”

background information within

context of the QA Scenario

Background

Discarded

Page 38: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Top 10 Challenges

7) Must extract, represent and preserve information uncovered when searching for answers

8) Rapidly increasing importance of Knowledge of all types -- regardless of the approach

Page 39: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

DIMENSIONS OF THE QUESTIONPART OF THE QA PROBLEM

DIMENSIONS OF THE ANSWERPART OF THE QA PROBLEM

Context

Judgement

Scope

Fusion

Interpretation

MultipleSources

Complex QA:The Need for Ever Increasing Knowledge -- Of All Types

** Knowledge Requirement would be better represented with a whole “quiver of arrows” of different sizes, lengths and types

SimpleFactual

Question

SimpleAnswer,SingleSource

QA R&D Program

QA R&D Program

Advanced Advanced

Increasing

Knowledge Requirements **

IncreasingKnowledgeRequirements **

Page 40: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Top 10 Challenges

7) Must extract, represent and preserve information uncovered when searching for answers

8) Rapidly increasing importance of Knowledge of all types -- regardless of the approach

9) Expanding requirements for more advanced learning and reasoning methods/approaches

Page 41: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Overarching Context / Operational Requirement

Who is thisadvisor?

What do weknow about

him/her?

What are his/her views?

What influence does he/she have on FM?

And still more questions ???

In a foreign news broadcast a team of analysts observe a previously unknown individual conferring with the Foreign Minister. They suspect

that he/she is really a new senior advisor.

Does this signal that other

policy changes are coming?

Information Analysts

Improved Reasoning & Learning

FOCUS

Page 42: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Improved Reasoning & Learning

Advanced Reasoning:• Use Multi-level Plans• Create and evaluate chains of reasoning• Reason across hetero- geneous data sources• Infer answers from data extracted from multiple sources when the answer is not explicitly stated • Utilize Link Analysis & Evidence Discovery• Plus other strategies

New SeniorAdvisor

Associates Associates Follow-upLeads

Follow-upLeads

“Bio”………..….……..…….………..….……..…….………..….……..…….…………...

“Views: Past & Present” .….… ….…...……. ….…...……. ….…...……. ….…...……. ….…..

Summarized Results

Collected Views

TV & RadioBroadcasts,Newspapers

& OtherArchives

Raw “Bio”Information

Education

Past Positions

Family

Travels

Other Activities

Summarized Results

Cross Fertilization

Advanced Learning:• Automatically learn new or modify existing reasoning strategies

Page 43: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Top 10 Challenges

7) Must extract, represent and preserve information uncovered when searching for answers

8) Rapidly increasing importance of Knowledge of all types -- regardless of the approach

9) Expanding requirements for more advanced learning and reasoning methods/approaches

10) Discovering the correct answer will be hard enough; but crafting an appropriate, articulate, succinct, explainable response will be even harder

Page 44: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Difficulties in Generating Answers

• Natural Language Generation continues to be a difficult, open research area.

– Adding the requirement to generate multimedia answers makes this problem even harder.

• Providing the ability to explain and/or justify answers also continues to be a difficult, open research area.

– The more complex the line or chain of reasoning, the more complex the explanation and/or justification

• In addition, QA Scenarios add another level of complexity. The same question asked by different end users within different scenarios could produce substantially different results because of different end users background, perspectives, needs and desires:

– Different Answer content

– Different Answer format, structure, depth and/or breadth of coverage

– Or even both

Page 45: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Outline

• Overview of ARDA

• Overview of Information Exploitation Thrust

• AQUAINT R&D Program

– What it is and is not

– Technical Challenges

– State-of-the-Program

Page 46: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

QUESTION????

Clarification

Other Analysts

Question & RequirementContext; Analyst Background

Knowledge

Multimedia Examples

Natural Statement ofQuestion;

Use of

QueryAssessment,

Advisor,Collaboration

Question Under- standing andInterpretation

Knowledge Bases;Technical Databases

AQUAINT:R&D Focused on Three Functional Components

Question & Answer Context

•Relevant information extracted and combined where possible;•Accumulation of Knowledge across “Documents”•Cross “Document” Summaries created;•Language/Media Independent Concept Representation•Inconsistencies noted;•Proposed Conclusions and Inferences Generated

Determinethe

Answer

Relevant “Documents”

MultipleRanked

Lists

Single, Merged

Ranked List ofRelevant “Documents”

Queries

Relevant“Knowledge”

KBQueries

Multiple Sources;Multiple Media;Multi-Lingual;Multiple Agencies

MultipleSource

SpecificQueries

Translate Queriesinto Source Specific Retrieval Languages

Partially Annotated & Structured Data

Automatic Metadata Creation

SupplementalUse

Supple- mentalUse

Query Refinement based on Analyst

Feedback

Iterative Refinementof Results based

on Analyst Feedback

AnalystFeed-back

FINAL ANSWER

Results of Analysis• Formulate Answer for Analyst in form they want

• Multimedia Navigation Tools for Analyst Review

AnswerFormulation

ProposedAnswer

AnswerContext

Page 47: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Specifically Solicited Research Areas include:

1) Advanced Reasoning for Question Answering

2) Sharable Knowledge Sources

3) Content Representation

4) Interactive Question Answering Sessions

5) Role of Context

6) Role of Knowledge

7) Deep, Human Language Processing and Understanding

AQUAINT:Cross Cutting/Enabling Technologies R&D Areas

Page 48: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Cross Cutting/Enabling Technologies Research Issues

QUESTION????

FINAL ANSWER

AnswerFormulation

Question Under-

standing and Inter-pretation

InformationRetrievalProcess

Analysis &SynthesisProcess

Determinethe Answer

AQUAINTPhase I

Solicitation

AQUAINT:Separate, Coordinated Activities

Page 49: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

AQUAINT Program Contractors

CarnegieMellonUniv. Univ. of

Albany

Univ. ofMassachusetts

BBN (2)

IBM

Columbia Univ.

Rutgers Univ.

Princeton Univ.

Univ. of Texas-Dallas

Language Computer Corp. (2)

CycorpSAIC

Univ. of SouthernCalifornia

/ Info ScienceInstitute

SRI

Stanford Univ.

Univ. of California-Berkeley

Univ. of Colorado-Boulder

HNC Software New MexicoState University (2)

Univ. of Maryland –Baltimore County (UMBC)

CoGen Tex

Language Computer Corp.

Univ. of SouthernCalifornia

/ Info ScienceInstitute

CarnegieMellon

Univ. (2)

Original (16)+ New (7)

Page 50: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

AQUAINT Phase I Projects (Fall 01 - Fall 03)

Total End-to-End Systems (6)Organization Title Investigator

Topical Focus

Data Dimension ARDA Agent

BBN Technologies Answering Questions through Understanding and Analysis (AQUA)

Ralph Weischedel / Scott Miller

Total System

Focused (Text) NSA

Carnegie Mellon University (Language Technology Institute)

JAVELIN: Justification-based Answer Valuation through Language Interpretation

Eric Nyberg / Jamie Callan / Jaime Carbonell

Total System

Multi-Lingual (Text)

DIA

Columbia Univ. / Univ. of Colorado, Boulder

Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering

Vasileios Hatzivassiloglou / Kathleen McKeown // Daniel Jurafsky / Wayne Ward / Jim Martin

Total System

Multi-Media (Text/Voice)

DIA

CyCorp. / IBM T.J. Watson Research Center

QUIRK: Question Answering (QU)= Information Retrieval (IR) + Knowledge (K)

Stefano Bertolo / David Gunning // John Prager

Total System

Structured / Unstructured

CIA

IBM T.J. Watson Research Center / Cycorp

Intelligent Question Answering (IQA) David Ferrucci // Stefano Bertolo

Total System

Structured / Unstructured

NSA

SUNY/Univ. of Albany / Rutgers Univ.

HITIQA: High-Quality Interactive Question Answering

Tomek Strzalkowski // Paul Kantor

Total System

Focused (Text) CIA

Page 51: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Emphasis on One or more Advanced QA System Components (6)

Organization Title Investigator Topical Focus Data Dimension ARDA Agent

Language Computer Corporation

Advanced Techniques for Answer Extraction and Formulation

Dan Moldovan / Sanda Harabagiu

Components Focused (Text) CIA

SAIC / Stanford University (Knowledge Systems Lab)

AQUAINT Question Answering (AQUA) System

Maureen Caudill / Barbara Starr // Richard Fikes

Components Multiple Genre CIA

SRI International From Question-Answering to Information-Seeking Dialogs

Jerry Hobbs Components Focused (Text) + TREC Queries +

AQUAINT Scenarios CIA

University of Massachusetts

Relevance Models and Answer Granularity for Question Answering

Bruce Croft / James Allan

Components Structured / Unstructured

NSA

University of Southern California (Information Science Institute)

TextMap: An Intelligent Question-Answering Assistant

Daniel Marcu / Ed Hovy / Kevin Knight

Components Focued (Text) DIA

University of Texas Computational Implicatures for Advanced Question Answering

Sanda Harabagiu Component Focused (Text) CIA

AQUAINT Phase I Projects (Fall 01 - Fall 03)

Page 52: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Focused Effort -- Cross Cutting / Enabling Technologies (4)

Organization Title Investigator Topical Focus Data

Dimension ARDA Agent

BBN Technologies Question Answering from Spontaneous Speech Data (Answer Spotting Component)

Herbert Gish / Rukmini Iyer

Enabling Tech: Answer Spotting in

Speech

Multi-Lingual

(Speech) NSA

Language Computer Corporation

Just-In-Time Interactive Question Answering (JITIQA)

Sanda Harabagiu / Dan Moldovan

Enabling Tech: Interactive QA

Focused (Text)

CIA

Princeton University WordNet Enhancements: Toward Version 2.0

George Miller / Christiane Fellbaum

Enabling Tech: WordNet Enrichment

Not Applicable

NSA

University of California, Berkeley, ICSI, Stanford Univ.

QuASI: Question Answering using Statistics, Semantics, and Inference

Marti Hearst // Jerome Feldman // Chris Manning

Enabling Tech: Adv. Reasoning;

Content Rep; Lang. Processing

Variety of Text

Collections CIA

AQUAINT Phase I Projects (Fall 01 - Fall 03)

Page 53: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

New Projects – June 2002 Starts (7)Organization Title Investigator Topical Focus Data Dimension

ARDA Agent

New Mexico State Univ.; Univ. of Maryland-Baltimore County; CoGenTex

Meaning-Oriented Question-Answering with Ontological Semantics

Jim Cowie // Sergei Nirenburg // Tanya Korelsky

Total System Multi-Lingual

(Text) CIA

University of Southern California (Information Science Institute)

Advanced Generation for Presenting Answers

Kevin Knight Components Focused (Text) NSA

Carnegie Mellon Univ. Q&A from Errorful Multimedia Information Streams

Howard Wactlar Components Multi-Media; Stuctured /

Unstructured NSA

Language Computer Corporation

Question Answering for the Web Dan Moldovan Component Focused (Text &

Web Pages) CIA

Carnegie Mellon University (Language Technology Institute)

Mining the Web for Multimedia Q&A Yiming Yang; Jaime Carbonell

Enabling Tech: Multi-media QA

Structured / Unstructured

DIA

HNC Software, Inc.

A New Mathematical Framework for Language Representation, Association, Processing, and Understanding

Robert Means

Enabling Tech: Language

Representation, Understanding

Focused (Text) NSA

New Mexico State Univ. Aware: Investigating Interactive Question and Answering

Bill Ogden Enabling Tech: Interactive QA

Multi-Lingual CIA

AQUAINT Phase I Projects (Summer 02 - Fall 03)

Page 54: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Gazetteer Exploitation for QA

• Internal Government Lab Project• PIs: Beth Sundheim & Robert Irie, SPAWAR Systems

Center• Objectives:

Provide the basis for advanced placename gazetteer use in NLP applications, particularly question-answering

1. Exploration of use of existing resources and creation of additional resources

2. Exploration of methods for enhancing the value of gazetteers to NLP systems

3. Promotion of community-wide discussion of placename analysis issues and uses for question answering

Page 55: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Northeast Regional Research Center

• Conduct 6-8 week workshops on multiple ARDA-related challenge problems

• FY 2002 Workshops (Focus on AQUAINT Problems)

– Two Full Workshops Funded (Temporal Issues & Multiple Perspectives)

– One Mini Workshop to further explore challenge problem planned (Re-Use of Accumulated Knowledge)

• FY 2003 Workshops (Focus on AQUAINT & VACE Problems

Hosted By MITRE, Bedford, MAAdministered by CIA

Page 56: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

FY2002 NRRC Wkshp Challenge Problems

1. TERQAS - Time & Event Recognition for Question Answering Systems

– Generate Sequence of events and activities along evolving timeline, resolving multiple levels of time references across series of documents/sources.

– Leader: James Pustejovsky, Brandeis University

NRRC Web Site: http://nrrc.mitre.org

TERQAS Web Site: http://time2002.org

Page 57: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

TERQAS Workshop Goals

• TimeML:

Define and Design a Metadata Standard for Markup of events, their temporal anchoring, and how they are related to each other in News articles.

• TIMEBANK:

Given the specification of TimeML, create a gold standard corpus of 300 articles marked up for temporal expressions, events, and basic temporal relations.

Page 58: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

FY2002 NRRC Wkshp Challenge Problems

2. MPQA - Multiple Perspectives for Question Answering

– Develop approaches for handling situations where relevant information is obtained from multiple sources on the same topic but generated from different perspectives (e.g. cultural or political differences).

– Leader: Jan Wiebe, University of Pittsburgh

NRRC Web Site: http://nrrc.mitre.org

Page 59: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Thinking about Multiple Perspectives

• Multiple Documents and Data Objects whose content was developed, created (either consciously or unconsciously) from a distinguishable perspective. In particular one or more of the following may apply to a document or other data object:

– Contains opinions or less-than-objective positions/views.

– Presents facts that have been filtered/selected in a manner that is intended to support or undermine a particular point of view. That is, facts are presented in a less-than-objective or more-subjective manner.

– Expresses beliefs, strong positions, emotion that reflect positions grounded in or taken by more broadly identifiable cultural, political, social, economic, religious, ideology, secular-based groups.

• Overt Examples: Editorials; Op-Ed Articles; Debates; Opinion-based Speeches (Political speeches); etc.

• Less Overt Examples: “News” reports published/produced by State-Run news organizations; Interviews; Press Releases; etc.

Page 60: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Why the Interest in Multiple Perspectives?

• Can the analyst/QA system identify when information is being presented from a less-than-objective perspective?

• How does the presence of perspective affect the ability of an analyst/QA system to objectively judge the reliability or interpretability of a given document or data object in all or in part?

• What is the range of perspectives across different interested constituents on a particular topic, event, issue?

• How does the analyst handle, process, interpret nested perspectives?

• Is there a difference between publicly and privately stated perspectives?

• How does the perspective of a person, organization, country on a particular topic changing over time?

• Can we detect mismatches between the agent’s stated perspective and the presumed beliefs, opinions, position of the larger group associated with that perspective?

Page 61: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Cross Cutting/Enabling Technologies Research Issues

QUESTION????

FINAL ANSWER

AnswerFormulation

Question Under-

standing and Inter-pretation

InformationRetrievalProcess

Analysis &SynthesisProcess

Determinethe Answer

AQUAINTPhase I

Solicitation

Component Integration and System Architecture Issues

Component Level / End-to-End Testing & Evaluation

Annotated and ‘Ground Truthed’ Data

SeparateCoordinated

Activities

AQUAINT:Separate, Coordinated Activities

Page 62: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Supporting Roles

Evaluation

User Testbed

Data / Operational Scenarios

Page 63: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

• Additional TREC Newspaper/Newswire Collection

– Newly assembled collection of English newswire text that spans the period from June, 1998 through September 2000, inclusive.

– Drawn from available sources: New York Times newswire, Associated Press Wordstream English newswire, and others.

– Collection should include at least 3 gigabytes of data, to be published in compressed form on a set of 2 CD-ROMs

– Collection is available through NIST for AQUAINT Program and TREC Program Evaluation participants and through LDC (Linguistic Data Consortium) for all others

AQUAINT:Newly Acquired Data Resources

Page 64: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

• Center for Non-Proliferation Studies at the Monterey Institute of International Studies

– Established in 1989 by Dr. William Potter in Monterey, CA

– Staff of 55 full-time specialists + over 60 graduate student research assistants

– “Strives to combat the spread of weapons of mass destruction (WMD) by training next generation of nonproliferation specialists and disseminating timely information and analysis”

– http://cns.miis.edu

• Multiple data collections available through CNS– Data generally on the topics of non-proliferation and weapons

of mass destruction

– Limited public availability through the Nuclear Threat Initiative Website

– Full access obtained for AQUAINT Program participants

AQUAINT:Newly Acquired Data Resources

Page 65: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

List of 8 Distinct Data Collections From CNS

1. Nuclear Development Abstracts – from 1986 through mid-2001; Database of more than 20,000 abstracts

2. Missile Development Abstracts – from 1990 through mid-2001, Database containing over 12,000 abstracts

3. Country Profiles - Equivalent of some 1,000 pages of text, diagrams and images on each country. The first profile (on North Korea) will be available shortly. Our current target is to complete four country profiles per year.

4. Newly Independent States (NIS) Nuclear Profiles - Organized by country and then by topic, the Profiles Database features information on fissile materials, export controls, nuclear facilities, Material Protection Control and Accounting programs, international nonproliferation regime participation, and nuclear-related

government agencies.

Page 66: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

List of 8 Distinct Data Collections From CNS

5. Newly Independent States (NIS) Nuclear Trafficking Database - Highlights proliferation-significant cases of diversion and features abstracts of all reported instances of trafficking in nuclear and radioactive materials involving the Newly Independent States

6. China Profiles - Database on Chinese arms control and nonproliferation developments; includes hundreds of primary source documents (in both English and Chinese), extensive reference materials, bibliographic information, and comprehensive fact sheets

7. The Chemical and Biological Weapons & WMD Terrorism News Archive - Consists of links to and key excerpts from articles, testimony, newspaper and magazine articles, government reports, speeches and specialized news reporting services

8. The Monterey WMD Terrorism Database - MS Access-based database that records incidents around the world involving the acquisition and/or use by sub-state actors of weapons of mass destruction (WMD); Database includes over 675 incidents, drawn from more than 1,000 open sources covering 1900 to present.

Page 67: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Additional Auxiliary Resources

• Extensive Profiler taxonomies from ICB: – Covering the entire non-proliferation, weapons of mass

destruction and terrorism fields (several hundred terms are broken down and cross-referenced at a variety of levels in a ‘topic-tree’ format).

• Complete keyword tree from Education Program

• Auxiliary resources are contained within the datasets outlined above:

– Glossaries of Chinese non-proliferation terms; including English names, Romanization of Chinese names and Chinese characters

– Russian-English Table of Proper Names for NIS Nuclear Enterprises

– Russian-English Table of Acronyms

Page 68: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Additional Comments on CNS Data

• AQUAINT Program Participants will have full access to the “raw data” contained in all eight data collections + auxiliary data for the life of the AQUAINT Program

• Special Data Rights associated with Monterey WMD Terrorism Database

• AQUAINT Program has option of paying to obtain future data collection updates – Decision based upon the usefulness of this data source

• CNS has agreed to assist the AQUAINT Program in the development of AQUAINT Scenarios based upon the CNS data collections.

Page 69: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

AQUAINT Program Evaluation:Three Types of Evaluation

1. TREC QA Track– Major metrics evaluation of AQUAINT will continue to be the

TREC QA track

– Most, but not all, AQUAINT contractors expected to participate

– Main Task expected to become more difficult in future years (lists; context; etc.)

2. AQUAINT “Focused Evaluation Tasks”– Focus on Question Type, Special Area, AQUAINT Unique Data

– First Year – try a pilot evaluation

– Each Subsequent Year – Conduct full-scale evaluation

– Most, but not all, full-scale evaluations open to non-AQUAINT parties

– Pilots being considered listed on next slide

3. End-to-end “test bed” evaluations that will focus on integration & usability issues (Earliest: FY 2003)

Page 70: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

AQUAINT “Focused Evaluation Tasks”

• FY 2002 AQUAINT Program Pilot Evaluations:– Dialog for QA

– Definitional questions (who is, what is)

– Questions about Relationships or Cause-and-Effect

• New FY 2003 Pilots– QA Systems within a Fixed Domain

– Answer Explanation/Justification

– QA Systems Accessing Multi-Lingual Data

• New FY 2004 Pilots – Questions asking for opinions

– Questions with No Answer or Only a Partial Answer

– QA Systems Accessing Multi-Media Data

Page 71: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

AQUAINT:User Testbed / System Integration

• Pull together best available system components emerging from AQUAINT Program research efforts– Couple AQUAINT components with existing GOTS and COTS software

• Develop end-to-end AQUAINT prototype(s) aimed at specific Operational QA environments

• Government-led effort:– Directly Linked into Sponsoring Agency’s Technology Insertion

Organizations

– Close, working relationship with working Analysts

– Provide external system development support

– Mitre/Bedford will lead External System Integration / Testbed efforts

– Plan to also utilize additional external researchers as Consultants / Advisors

Page 72: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

AQUAINT Program Executive Committee

• AQUAINT Program Executive Committee formed in January 2002; Meets Monthly

• Intelligence Community Members:– ARDA (John Prange; Paul Matthews-SETA)

– CIA (John Donelon, Steve Maiorano, Jean-Michel Pomarede-SETA)

– DIA (Kelcy Allwein)

– NIMA (Duncan McCarthy, Charlie Kim)

– NSA (Carol Van Ess-Dykema, J.K. Davis, Mike Blair-SETA)

• Other Members/Advisors:– NIST (Donna Harman, Ellen Voorhees)

– MITRE (Scott Mardis, John Burger)

– Tarragon Consulting (Richard Tong)

Page 73: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Visits to AQUAINT Project Sites

• Goal is that each AQUAINT Project will host a Project Review between each Program Workshop; Scheduled by Government Project COTR

• Multiple Project Reviews Grouped by geographic area so that it is cost/time effective for other interested government parties to attend

• My Goal as Program Manager is to visit each Project Site at least once per year; Will attempt to extend this to subcontractor sites as well.

• Status: Each current AQUAINT Project did have a project review visit since December Meeting; Program Manager has visited 15 of 16 original project sites.

Page 74: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Coordination with Other Government Sponsors of Information Technology R&D

• Periodic Coordination Meetings

• Program Managers Involved– DARPA: Charles Wayne, Ted Senator– NSF: Gary Strong– ITIC: Art Becker– ARDA: John Prange, Greg Smith– Intelligence Community: Steve Dennis; J.K. Davis

• Programs Covered– DARPA:TIDES, EARS, EELD, Info Awareness + Others– NSF: Information Technology Research - ITR– ITIC: KD-D– ARDA: AQUAINT, NIMD + Others– IC: ACE + Others

• Some Coordination with DARPA’s DAML and RKF Programs

Page 75: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Other External Visits

• Vulcan, Inc., Seattle, WA– Visited July 2002– In 1986, Paul G. Allen founded Vulcan Inc. with Jody Patton

and William Savoy to manage his personal and charitable endeavors

– Project HALO– http://vulcan.com

• Question Generation and Answering Systems R&D for Technology-Enabled Learning Systems Workshop– Held 4-5 October 2002, Univ. of Memphis, Memphis, TN– Sponsored by Federation of American Scientists– http://www.fas.org– http://www.learningfederation.org/overview.hml

Page 76: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Other External Visits (continued)

• Google, Mountain View, CA – Visited October 2002– Seeking increased/improved access to the Google Search

Engine, results and other related information

• Center for Non-Proliferation Studies, Monterey, CA– Visited November 2002– Small, Focused Workshop between CNS and AQUAINT

Program to explore ways of using newly acquired CNS Databases

– http://cns.miis.edu/– http://cns.miis.edu/dbinfo/

Page 77: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

External Question Answering Workshops

• Workshop on New Directions in Question Answering

AAAI Spring Symposium

24-26 March 2003 (Mon-Wed)

Stanford University

Palo Alto, CA

Mark Maybury, MITRE

Workshop Chair

http://www.aaai.org/Symposia/Spring/2003/sss-03.html

Page 78: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Related External Conferences

• HLT (Human Language Technology) Conferences– Sponsored by NAACL & Government Funders DARPA, NSF, and

ARDA; SIGIR and ISCA represented on HLT Advisory Committee

– HLT-2003: Univ. of Alberta, Edmonton, Alberta, Canada 27 May - 1 June 2003

http://www.hlt03.org

• Text REtrieval Conferences (TREC) Question Answering (QA) Track– TREC managed by NIST

– Co-Sponsored by DARPA and ARDA

– TREC-2003: NIST, Gaithersburg, MD 18-21 November 2003 http://trec.nist.gov

Page 79: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Schedule of Future AQUAINT Program Phase I Workshops

• Phase I 18-Month Workshop 10-12 June 2003 (Tues-Thurs) * Shelter Pointe Hotel & Marina 1551 Shelter Island DriveSan Diego, CA 92106(619)-221-8000

• Phase I 24-Month Workshop Week of 1-5 December 2003 NIST Gaithersburg, MD

Shelter PointeHotel & Marina

* Monday Evening Reception on 9 June

Page 80: AQUAINT R&D Program: “State of the Program” Phase I 12-Month Workshop 2-5 December 2002 Dr. John D. Prange AQUAINT Program Manager jprange@nsa.gov 301-688-7092

AQUAINT Ph I 12-Month Wkshp – 2-5 Dec 2002

Contact Information

Dr. John Prange, AQUAINT Program Director

• ARDA Web Pages: http://www.ic-arda.org

• Email [email protected] [email protected]

• Phones: 301-688-7092800-276-3747301-688-7410 (Fax)

• Mailing: ARDARoom 12A69 NBP#1

STE 66449800 Savage Road

Fort Meade, MD 20755-6644