30
1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil IKRAFT IKRAFT: Interactive Knowledge Representation and Acquisition from Text Yolanda Gil Varun Ratnakar www.isi.edu/expect/projects/trellis trellis.semanticweb.org USC/Information Sciences Institute [email protected]

IKRAFT: Interactive Knowledge Representation and Acquisition from Text

  • Upload
    burian

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

IKRAFT: Interactive Knowledge Representation and Acquisition from Text. Yolanda Gil Varun Ratnakar www.isi.edu/expect/projects/trellis trellis.semanticweb.org USC/Information Sciences Institute [email protected]. Motivation: How KBs Are Built Today. Domain Expert. Read/ask /study/listen. - PowerPoint PPT Presentation

Citation preview

Page 1: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

1USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

IKRAFT:Interactive Knowledge Representation and

Acquisition from Text

Yolanda GilVarun Ratnakar

www.isi.edu/expect/projects/trellis

trellis.semanticweb.org

USC/Information Sciences Institute

[email protected]

Page 2: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

2USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

Motivation:How KBs Are Built Today

Knowledge Acquisition

Tools

Read/ask /study/listen...

…reason/deduce/solve

…analyze/group/index...

…structure/relate/fit...

KB

Domain Expert

Knowledge Engineer

Page 3: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

3USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

Motivation:The Aftermath of Knowledge Base Development

Knowledge Acquisition

Tools…reason/deduce/solve

Read/ask /study/listen...

…analyze/group/index...

…structure/relate/fit...

KB

Domain Expert

Knowledge EngineerTRASH

Page 4: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

4USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

Motivation:Capturing the Design of Knowledge Bases

((( )) ())))

Richer representationsMore ambiguous

More versatile

(defconcept bridge ()))

More formalMore concrete

More introspectible

Introductory texts, expert hints, explanations, dialogues, comments, examples, exceptions,...

Info. extraction templates,dialogue segments and pegs,filled-out forms, high-level connections,...

Alternative formalizations (KIF, MELD, RDF,…), alternative views of the same notion (e.g., what is a threat)

Descriptions augmented with prototypical examples & exceptions, problem-solving steps and substeps, ...

WWW

Page 5: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

5USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

Claims Knowledge can be reused at any level of (in)formality Knowledge can be extended more easily

• Addt’l documents and semi-formal structures readily available Knowledge can be translated and integrated at any

level to facilitate interoperability• KR languages can be a straitjacket for some kinds of

knowledge Intelligent systems will provide better justifications

• Many users want to know where axioms came from before they trust system’s reasoning

Content providers will not need to be sophisticated programmers/knowledge engineers• May be easier for end users to organize knowledge rather

than formalize it• Good symbiosis of sophisticated and unsophisticated users

Page 6: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

6USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

An Example:Building a Knowledge Base from a Textbook(DARPA Rapid Knowledge Formation -- RKF)“…The first step a cell takes in reading out part of its genetic instructions is to

copy the required portion of the nucleotide sequence of DNA – the gene – into a nucleotide sequence of RNA. The process is called transcription because the information, though copied into another chemical form, is still written in essentially the same language – the language of nucleotides. Like DNA, RNA is a linear polymer made of four different types of nucleotides subunits linked together by phosphodiester bonds. It differs from DNA chemically in two respects: (1) the nucleotides in RNA are ribonucleotides – that is, they contain the sugar ribose (hence the name ribonucleic acid) rather than deoxyribose; (2) although, like DNA, RNA contains the bases adenine (A), guanine (G), and cytosine (C), it contains uracil (U) instead of the thymine (T) in DNA. Since U, like T, can base-pair by hydrogen-bonding with A, the base-pairing properties described for DNA also apply to RNA…”

-- Essential Cell Biology, Alberts et al. 1992

Page 7: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

7USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

Protein Synthesis in RKF’s SHAKEN Authored by a Biologist [Chaudri et al 2001]

Page 8: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

8USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

Step 1: Selecting Relevant Knowledge Fragments

“…The first step a cell takes in reading out part of its genetic instructions is to copy the required portion of the nucleotide sequence of DNA – the gene – into a nucleotide sequence of RNA. The process is called transcription because the information, though copied into another chemical form, is still written in essentially the same language – the language of nucleotides. Like DNA, RNA is a linear polymer made of four different types of nucleotides subunits linked together by phosphodiester bonds. It differs from DNA chemically in two respects: (1) the nucleotides in RNA are ribonucleotides – that is, they contain the sugar ribose (hence the name ribonucleic acid) rather than deoxyribose; (2) although, like DNA, RNA contains the bases adenine (A), guanine (G), and cytosine (C), it contains uracil (U) instead of the thymine (T) in DNA. Since U, like T, can base-pair by hydrogen-bonding with A, the base-pairing properties described for DNA also apply to RNA…”

-- Essential Cell Biology, Alberts et al. 1992

Page 9: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

9USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

Step 2:Composing Stylized Knowledge Fragments- ribose - it is a kind of sugar, like deoxyribose - it is contained in the nucleotides of RNA - uracil - it is a kind of nucleotide, like adenine and guanine - it can base-pair with adenine - RNA - it is a kind of nucleic acid, like DNA - it contains uracil instead of thymine - it is single-stranded - it folds in complex 3-D shapes - nucleotides are linked with phospohodiester bonds, like DNA - there are many types of RNA - RNA is the template for synthesizing protein - its nucleotides contain the sugar ribose (DNA has deoxyribose) - gene - subsequence of DNA that can be used as a template to create protein - protein synthesis - non-destructive creation process: RNA and protein created from DNA - its speed is regulated by the cell - substeps: (ordered in sequence) 1) RNA transcription - a DNA fragment (a gene) is copied, just like DNA is copied during DNA synthesis - the result is an RNA chain 2) protein translation - RNA is used as a template

Page 10: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

10USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

Step 3:Creating Knowledge Base Items

(defconcept uracil :is-primitive nucleotide

:constraints (:the base-pair adenine))

(defconcept RNA

:is (:and nucleic-acid

(:some contains uracil)))

Page 11: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

11USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

IKRAFT: Interactive Knowledge Representation and Acquisition from Text

User starts with documents, extracts a small amount of information from them• Text contains significant portions for

context/reference/recall IKRAFT allows users to annotate text with

statements, expressed in natural language• Highlight portions of original text, annotate statement• Statements tend to be stylized

Statements are parsed, system generates summary of:• Objects• Events/actions

Page 12: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

12USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

IKRAFT: Annotating Manual Information Extraction

Page 13: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

13USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

IKRAFT: Extracting Statements from Complementary/Contradictory Text Sources

Page 14: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

14USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

IKRAFT: Documenting Seismic Hazard in Southern California

Page 15: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

15USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

Seismic Hazard Analysis (SHA) for Southern California Earthquake Center (SCEC)

Page 16: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

16USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

DOCKER: Scientist Publishes SHA Models

SCEContologies

AS97

msg

types

AS97 ontology

constrs

docs

User specifies: Types of model parameters Format of input messages Documentation Constraints

User Interface

ConstraintAcquisition

ModelSpecification

DOCKER

Web Browser

WrapperGeneration

(WSDL, PWL)

AS97

Page 17: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

17USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

Documenting the Model with IKRAFT

Page 18: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

18USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

Documenting Each Constraint

Page 19: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

19USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

Formalizing Simple Constraints

Page 20: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

20USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

Documentation of Constraints (Some Are Formalized, Some Are Not)

Page 21: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

21USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

DOCKER: Engineer Uses SHA Model

User Interface

Sharedontologies

AS97

msgtypes

AS97 ontology

constrsdocs

ConstraintReasoning

User can: Browse through SHA models Invoke SHA models Get help in selecting

appropriate model

KR&R(Powerloom)

ModelReasoning

PathwayElicitation

DOCKER

Web Browser

AS97

Page 22: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

22USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

DOCKER Detects Constraint Violations

Page 23: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

23USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

Should Engineer Override Constraint Specified by Model Developer?

Page 24: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

24USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

Engineer Brings Up IKRAFT to Find Reasons for the Constraint

Page 25: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

25USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

Engineer Can Check Additional Model Constraints (Not Formalized)

Page 26: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

26USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

Constraints Grounded on Model Documentation

Page 27: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

27USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

Engineers Makes an Informed Decision on Whether to Override the Constraint

Page 28: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

28USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

Discussion

Overhead in capturing the rationale?• Related to motivation and payoff• Rationale here is captured in a very simple process

Related Work:• Documenting design rationale [Shum 96]• Methodologies for knowledge base development

[Schreiber et al 00]• Higher-level languages, e.g., KARL [Fensel et al 98]

Page 29: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

29USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

Conclusions and Future Work

IKRAFT helps users document formal expressions• Each formal expression is back up by a concise NL statement

that is linked back to one or more sources Users can understand justification for system’s

reasoning (e.g., SHA) Future work:

• NLP techniques to extract terms from user’s concise statements

• Controlled grammar for formulation of statements• Other documentation: e.g., tables, forms, exceptions

High payoff in capturing the rationale of knowledge bases

Page 30: IKRAFT: Interactive Knowledge Representation and Acquisition from Text

30USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT

Speculation: Will the (Semantic) Web End Up Looking Like This?

((( )) ())))

Richer representationsMore ambiguous

More versatile

(defconcept bridge ()))

More formalMore concrete

More introspectible

Introductory texts, expert hints, explanations, dialogues, comments, examples, exceptions,...

Info. extraction templates,dialogue segments and pegs,filled-out forms, high-level connections,...

Alternative formalizations (KIF, MELD, RDF,…), alternative views of the same notion (e.g., what is a threat)

Descriptions augmented with prototypical examples & exceptions, problem-solving steps and substeps, ...