Slide 1 Project Halo: Towards a Digital Aristotle April 30 th, 2003 Noah S Friedland, PhD

Slide 1

Project Halo: Towards a Digital Aristotle

April 30th, 2003Noah S Friedland, PhD

Slide 2

Overview

• Project Halo is a staged research and development effort towards a digital Aristotle:• An application capable of providing

user appropriate answers and justifications for questions in an ever growing number of domains

Slide 3

Step One: The Halo Pilot

• Goals:• To investigate the state-of-the-art in knowledge

representation and reasoning (KRR), especially “deep reasoning”

• To identify leaders in the field and to get to know them well

• To quickly come up-to-speed on the algorithmic and technical issues critical for good program management

• To establish and “test run” an evaluation methodology• To determine a roadmap for possible future research• Complete scientific transparency within the program

and ultimately with the entire scientific community• Limited/tight timeframe (6 months)

Slide 4

Domain/Syllabus Selection

• 50 pages from the AP-chemistry syllabus (Stoichiometry, Reactions in aqueous solutions, Acid-Base equilibria)• Small and self contained enough to be do-able in a short

period of time• Large enough to create many novel questions• complex “deep” combinations of rules• Standardize exam with well understood scores (AP1-AP5)• Chemistry is an exact science, more “monotonic”• No undo reliance on graphics (no free-body diagrams)• Availability of experts for exam generation and grading

Slide 5

Team Selection

• Selective Call-For-Proposals• Solid track record of relevant .gov and industrial

funding• Significant number of man-years invested in existing

relevant technology• World-class team• Responsiveness of proposal to the CFP• Bid within guidelines and expectations• Ability to work within the Project Halo contractual

environment

• Funded teams: Cycorp, SRI and Ontoprise

Slide 6

Climbing a Steep Hill

• Vulcan had little background in question answering prior to project Halo

• Hundreds of hours were dedicated to three rounds of training:• General primer in AI• Algorithmic training from each team• Tools and Admin training from each team• At the end, Vulcan was capable of encoding

questions using each teams’ formal language

Slide 7

The Challenge• Each team had four months to develop their chemistry

question answering applications• At the end of this time, the systems were sequestered and

the exam was released• Each team had two weeks to create formal encodings of the

100 (169 total subparts) questions in three sections (MC, DA, FF)

• Formal encodings were evaluated for fidelity against the original English by committee

• Encoded questions were run in batch on the sequestered systems, generating answers and justifications in English

• These answers were distributed to three SMEs for grading

Slide 8

The Systems: No NLP In The Pilot

QA SystemNLPEnglish FL English

Answer &Justification

Slide 9

Metrics• “Coverage”: the ability of the system to answer novel

questions from the entire specified syllabus• What percentage of the question types was the system

capable of reliably answering?• “Justification”: the ability to provide concise, user and

domain appropriate explanations• What percentage of the answer justifications was

acceptable to domain evaluators?• “Query encoding”: the ability to robustly represent

queries• Were questions encoded faithful to the original

English? How sensitive were the systems to these encodings?

• “Brittleness”: the ability to describe, measure and defeat major sources of brittleness• What were the major causes of failure? How can these

be remedied?

Slide 10

Examples of Question Encodings

• MC2. When lithium metal is reacted with nitrogen gas, under proper conditions, the product is:(a) no reaction occurs(b) LiN(c) Li2N

(d) Li3N

(e) LiN3

Slide 11

F-logic Encoding

• Encoded questionm1:Reaction[hasReactants->>{"Li","N"};enforced-

>>TRUE].answer("A") <- exists X m1[hasProducts->>X] and not

equal(X,"LiN") and not equal(X,"Li2N") and not equal(X,"Li3N") and not equal(X,"LiN3").

answer("B") <- m1[hasProducts->>"LiN"].answer("C") <- m1[hasProducts->>"Li2N"].answer("D") <- m1[hasProducts->>"Li3N"].answer("E") <- m1[hasProducts->>"LiN3"].FORALL X <- answer(X)

Slide 12

KM Encoding(every QF2 has (context ((a Reaction with (raw-material

((a Chemical with (has-basic-structural-unit

(((a Metal) & (an instance of (the output of (a Compute-Element-from-Name with

(input ("Lithium"))))))))) (a Chemical with (has-basic-structural-unit (((a Molecular-Compound with (has-chemical-formula ((a Chemical-Formula with (term ((:seq (:pair

2 N))))))))))) (state ((a State-Value with (value (*gas)))))))))))

Slide 13

KM Encoding (Cont)(output ((forall (the atomic-chemical-formula of (the has-basic-

structural-unit of (the result of (the context of Self)))) (if ((the elements of (the term of It)) = (:set (:pair 1 Li)

(:pair 1 N))) then "(b) LiN" else (if ((the elements of (the term of It)) = (:set (:pair 2

Li) (:pair 1 N))) then "(c) Li2N" else (if ((the elements of (the term of It)) = (:set (:pair 3

Li) (:pair 1 N))) then "(d) Li3N" else (if ((the elements of (the term of It)) = (:set (:pair 1

Li) (:pair 3 N))) then "(e) LiN3" else "(a) no reaction occurs")))) (comm [QF2-output-1] Self)))))

Slide 14

CYCL Encoding(implies (and

(chemicalReactants-TypeType ?REACTION Nitrogen)(chemicalReactants-TypeType ?REACTION

(ElementalSubstanceFn Lithium)) (ionicDecomposition ?LI2N LithiumIon 2 NitrideIon 1)(ionicDecomposition ?LI3N LithiumIon 3 NitrideIon 1)(ionicDecomposition ?LIN LithiumIon 1 NitrideIon 1) (ionicDecomposition ?LIN3 LithiumIon 1 NitrideIon 3))

(thereExists ?COMPOUND(thereExists ?LI-NUM

(thereExists ?N-NUM(thereExists ?LI-CHARGE

(thereExists ?N-CHARGE (and

Slide 15

CYCL Encoding (Cont.)(relationAllInstance chargeOfObject LithiumIon

(ElectronicCharge ?LI-CHARGE)) (relationAllInstance chargeOfObject NitrideIon

(ElectronicCharge ?N-CHARGE)) (ionicDecomposition ?COMPOUND LithiumIon ?LI-

NUM NitrideIon ?N-NUM) (evaluate 0 (PlusFn (TimesFn ?LI-CHARGE ?LI-

NUM) (TimesFn ?N-CHARGE ?N-NUM)))(goodChoiceAmongSentences ?ANSWER(TheList

(not (thereExists ?REACTION-2 (and (chemicalReactants-TypeType ?REACTION-2

(GaseousFn Nitrogen))(chemicalReactants-TypeType ?REACTION-2

(ElementalSubstanceFn Lithium)))))(equals ?COMPOUND ?LIN)(equals ?COMPOUND ?LI2N)(equals ?COMPOUND ?LI3N)(equals ?COMPOUND ?LIN3))))))))))

Slide 16

Evaluating Encodings

• High fidelity encodings do not add or delete relevant chemical knowledge from the original English.

• The encoding committee reviewed all encodings to verify that they were all “high fidelity.”

• A second criterion was “automatability”, the likelihood encodings could be produced automatically from English, given today’s state-of-the-art.

Slide 17

Challenge Results

• All three teams produced challenge results• All SMEs graded all the results• Each question part got separate grades for

answers and justifications• The grade ranges for each question part were

0, .5 and 1 for answers and likewise for justifications

• Graders were given guidelines to be as “AP-like” as possible

Slide 18

Results: MC Section

• Features 50 multiple choice questions (MC1-MC50).

• MC3: sodium azide is used in air bags to rapidly produce gas to inflate the bag. The products of the decomposition reaction are:(a) Na and water;(b) Ammonia and sodium metal;(c) N2 and O2;(d) Sodium and nitrogen gas;(e) Sodium oxide and nitrogen gas.

Slide 19

MC Results

MC Justification Scores

0.00

10.00

20.00

30.00

40.00

50.00

60.00

SME1 SME2 SME3

Sco

res

(%)

CYCORP

ONTOPRISE

SRI

MC Answer Scores

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

SME1 SME2 SME3

Sco

res(

%)

CYCORP

ONTOPRISE

SRI

Slide 20

Results: DA Section

• Features 25 multi-part questions (DA1-DA25)

• DA1. Balance the following reactions, and indicate whether they are examples of combustion, decomposition, or combination

(a) C4H10 + O2 CO2 + H2O

(b) KClO3 KCl + O2

(c) CH3CH2OH + O2 CO2 + H2O

(d) P4 + O2 P2O5

(e) N2O5 + H2O HNO3

Slide 21

DA Results

DA Justification Scores

0.00

10.00

20.00

30.00

40.00

50.00

60.00

SME1 SME2 SME3

Sco

res

(%)

CYCORP

ONTOPRISE

SRI

DA Answer Scores

0.00

10.00

20.00

30.00

40.00

50.00

60.00

SME1 SME2 SME3

Sco

res

(%)

CYCORP

ONTOPRISE

SRI

Slide 22

Results: FF Section

• Features 25 multi-part questions (FF1-FF25)

• More qualitative, less computational• FF2. Pure water is a poor conductor of

electricity, yet ordinary tap water is a good conductor. Account for this difference.

Slide 23

FF Results

FF Justification Scores

0.00

5.00

10.00

15.00

20.00

25.00

30.00

SME1 SME2 SME3

Sco

res

(%)

CYCORP

ONTOPRISE

SRI

FF Answer Scores

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

40.00

45.00

SME SME2 SME3

Sco

res

(%)

CYCORP

ONTOPRISE

SRI

Slide 24

Total Results

Challenge Justification Scores

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

40.00

45.00

SME1 SME2 SME3

Sco

res

(%)

CYCORP

ONTOPRISE

SRI

Challenge Answer Scores

0.00

10.00

20.00

30.00

40.00

50.00

60.00

SME1 SME2 SME3

Sco

res

(%)

CYCORP

ONTOPRISE

SRI

Slide 25

Grader Comments

• Organization and brevity were the two major remarks• Some of the justifications were over 16 pages long• Many of the arguments were used repetitively• Proofs took a long time to “get to the point”• In some multiple choice cases, proofs involve invalidating

all wrong answers rather than proving the right one• Generalized proofs relied on instance-based solutions,

lack of meta-reasoning capability• Gaps in the knowledge were evident, e.g. many of the

teams had issues with net ionic equations

Slide 26

Brittleness Analysis: SRI

SRI Brittleness

107

63.5

37.5

26

261

B-MOD-2

B-IMP-1

B-ANJ-3

B-MTA-1

OTHER

Slide 27

Brittleness Analysis: CycorpCycorp Brittleness

19.646.1

20

29.6

25.1

46.5

61.5408.5

B-MOD-1

B-MOD-2

B-MOD-3

B-MOD-4

B-MGT-2

B-INF-3

B-ANJ-1

OTHER

Slide 28

Brittleness Analysis: Ontoprise

Ontoprise Brittleness

487

5.5

6

7

143

B-MOD-1

B-IMP-1

B-INF-2

B-ANJ-1

OTHER

Slide 29

Website Mockup: Main Page

Slide 30

Website Mockup: Failure Explanation

Slide 31

Performance Analysis

SRI CYCORP ONTOPRISE

32.28mins 1512mins 33.61mins

Slide 32

Projections for the Next Iteration (3 Months)

• Same domain and scope:• AP-5 for multiple choice (~85%)• AP-4 for non-multiple choice (DA & FF)

(~65%)

Slide 33

Observations• Per-page encoding costs O($10K) for 50 pages• Encoding took highly expert teams 2 weeks of

effort• SRI relied most heavily on professional chemists,

most thorough on assembly process• The Ontoprise platform was the fastest and most

reliable (<2 hours). F-Logic was the most concise formal language• SRI was >5 hours and Cycorp >12 hours

• Cycorp’s generative explanations were the most ambitious. Needed more domain expert feedback

• Previously stated metrics, like the number of concept and relations, do not provide insight into coverage

Slide 34

Next Steps: Phase II

• Building tools to allow domain experts to encode robust knowledge

• Building tools to allow students to pose questions/problems

• Currently in pre-CFP design• Required skills:

• Knowledge Engines• Knowledge Acquisition (against documents)• HCI and Human Factors

Slide 35

Inquiries

• Check out new website: www.projecthalo.com

• Contact me at: [email protected]

Documents

Slide 1 Project Halo: Towards a Digital Aristotle April 30 th, 2003 Noah S Friedland, PhD