15
Indra: Emergent Ontologies from Text for Feeding Data to Simulations Deborah Duong Augustine Consulting TRAC-Monterey

Indra: Emergent Ontologies from Text for Feeding Data to Simulations Deborah Duong Augustine Consulting TRAC-Monterey

Embed Size (px)

Citation preview

Page 1: Indra: Emergent Ontologies from Text for Feeding Data to Simulations Deborah Duong Augustine Consulting TRAC-Monterey

Indra: Emergent Ontologies from Text for Feeding Data to Simulations

Deborah DuongAugustine Consulting

TRAC-Monterey

Page 2: Indra: Emergent Ontologies from Text for Feeding Data to Simulations Deborah Duong Augustine Consulting TRAC-Monterey

Indra

• Uses Mutual Information to choose parse, assign word sense, and form ontologies based on context

• Iterative feedback finds global consensus on meaning, for accurate role discovery

• Flexible emergent ontologies form, combining data driven with hypothesis driven approaches

• Feedback facilitates data fusion with other modalities• A way to feed higher level information back to lower

level extraction, introducing feedback to data fusion

Page 3: Indra: Emergent Ontologies from Text for Feeding Data to Simulations Deborah Duong Augustine Consulting TRAC-Monterey

Language is Context Dependent

• Language is deeply context dependent, but natural language programs complete each stage before the next starts in “pipelines”

• Indra uses a feedback loop to let the parse, word sense assignment, and ontological assignments inform each other

• The result is a flexible data driven ontology that can be aligned with other models

Page 4: Indra: Emergent Ontologies from Text for Feeding Data to Simulations Deborah Duong Augustine Consulting TRAC-Monterey

Making “Sense” of Text

• “Word sense” of entities and their actions • Inter-Document Coreference Resolution

• Many ways of Naming a Person• Different Persons may have the same

name• Link Normalization

• Many ways of referring to a Behavior• Different Behaviors referred to with the

same words

Page 5: Indra: Emergent Ontologies from Text for Feeding Data to Simulations Deborah Duong Augustine Consulting TRAC-Monterey

General Roles and Role Relationships

• Indra extracts general Role and Role relationships from text

• These Role and Role relationships are arranged in ontological groupings

• Iterative feedback allows different parts of the ontology to influence each other

• Iterative feedback makes system deeply adaptive so outside data can have widespread influence

Page 6: Indra: Emergent Ontologies from Text for Feeding Data to Simulations Deborah Duong Augustine Consulting TRAC-Monterey

Global Consensus on Sense

• Grouping of entities and links increases the information with each iteration

• With each iteration, the unsupervised scatter-gather finds the “sense” of named entities, finding which individuals they are based on their role

• As information corrects senses of links and entities, and neighbors correct their neighbors, a global consensus on sense forms.

• As links and entities are grouped, an emergent ontology is formed

Page 7: Indra: Emergent Ontologies from Text for Feeding Data to Simulations Deborah Duong Augustine Consulting TRAC-Monterey

Iterative Feedback introduced in stages

• Stage 1: Upper-lower feedback *Implemented• Larger clusters and smaller clusters influence each other

• Stage 2: Side-to-side feedback *Implemented• Node clusters and link clusters influence each other

• Stage 3: More Upper-lower feedback • Ontology and parse influence each other

• Stage 4: Feedback with external systems • Seed hypotheses from analysts and inference engines have

wide influence

Page 8: Indra: Emergent Ontologies from Text for Feeding Data to Simulations Deborah Duong Augustine Consulting TRAC-Monterey

Stage 1: Upper –Lower Feedback

• Roles are clustered according to link contexts, and Role relations are clustered according to entity contexts• Two separate ontologies form

• Clusters at higher levels split clusters at lower levels• Essential for word sense (and “entity sense”)• For example, clusters for factories and autotrophs split the

word “plant”• Clustering algorithms are either agglomerative or

divisive: “unsupervised scatter gather” is both• Clusters split and divide until convergence

Page 9: Indra: Emergent Ontologies from Text for Feeding Data to Simulations Deborah Duong Augustine Consulting TRAC-Monterey

Stage 2 : Side to Side feedback

• Stage 1 was clustering entities based on links and links based on entities

• Stage 2 is clustering entities based on link *clusters* and links based on entity *clusters*

• The separate Role and Role relationship ontologies of stage 1 become intertwined

• Needed for data smoothing and more consensus

Page 10: Indra: Emergent Ontologies from Text for Feeding Data to Simulations Deborah Duong Augustine Consulting TRAC-Monterey

Stage 3: More upper-lower feedback

• Choose parse based on ontology (parse already influences ontology in feedforward)

• Choose parse based on how common it is for similar words to be attached in that way.

• Example:• Jane ate the salad with a fork

• “with” modifies “ate” because tools such as “forks” and “knives” are typically found to be used to “eat” or “consume”

• Jane ate the salad with croutons• With modifies salad, because things that are “eaten” or “consumed” are

typically foods such as “croutons” or “tomatoes”

• Later, instead of using rule based parser, use mutual information to parse (Yuret), making Indra purely statistical• Can be used with any language

Page 11: Indra: Emergent Ontologies from Text for Feeding Data to Simulations Deborah Duong Augustine Consulting TRAC-Monterey

Stage 4: Feedback with External Systems

• Purpose of feedback is deep adaptivity, so external data can influence and be easily fused

• Hypothesis Driven AND Data Driven Ontologies • If an analyst groups concepts:

• Collocated paths found

• These help develop analyst’s concept

• More consonant concepts and paths found

• RELATIVELY FEW points of correspondence needed

Page 12: Indra: Emergent Ontologies from Text for Feeding Data to Simulations Deborah Duong Augustine Consulting TRAC-Monterey

Example Cluster

p:35805,n:34540.fes // morocco cityp:35805,n:37114.tenerife //spanish cityp:35805,n:37344.zaragoza //spanish city, with football clubp:35805,n:37548.boavista //portugese island, with football clubp:35805,n:38590.maritimo //portugese sports club known for football teamp:43243,n:39997.p:39997,n:29474.saccohp:39997,n:29612.spaho //bosnia small townp:39997,n:33375.spartak //Moscow football clubp:39997,n:34467.environmentastritp:39997,n:34721.haxhi //Albanian football playerp:43243,n:40629.tenerifep:43243,n:41043.boavistap:43243,n:42049.maritimop:46477,n:44423.bilbao //basque cityp:46477,n:44563.centreleft //football positionp:49912,n:48979.oviedo //spanish city

Page 13: Indra: Emergent Ontologies from Text for Feeding Data to Simulations Deborah Duong Augustine Consulting TRAC-Monterey

Example Cluster

p:49224,n:50682.tenerifep:56352,n:53348.p:53348,n:46799.p:46799,n:40301.rayo //football club in madridp:46799,n:41027.bilbaop:53348,n:47751.bilbaop:56352,n:53354.p:53354,n:47225.shellingp:56352,n:53766.shellingp:56352,n:53814.spartakp:56352,n:54104.youridjourkaeffp:56352,n:54108.zaragozap:56352,n:54460.colo //chile football clubp:56352,n:55076.kickoff //football termp:65663,n:62554.p:62554,n:60508.youridjourkaeffp:83660,n:85323.youridjourkaeffp:86579,n:84114.p:84114,n:81134.p:81134,n:75091.p:75091,n:73692.deportivo //spanish football club

Page 14: Indra: Emergent Ontologies from Text for Feeding Data to Simulations Deborah Duong Augustine Consulting TRAC-Monterey

Ontologies Problematic

• Indra will approximate most likely (highest mutual information) ontology

• BUT, analysts want their own ontologies• Different experts look at same data• Data stored in primitive entities and paths• Indra to make semantic model on the fly tailored to

ontology of who is looking at it• Tailored Ontologies towards ontologies of

particular simulation models

Page 15: Indra: Emergent Ontologies from Text for Feeding Data to Simulations Deborah Duong Augustine Consulting TRAC-Monterey

Hypothesis Driven AND Data Driven

• Indra can flexibly take in analyst input

• Indra can align its ontology to another with very few points of correspondence

• Indra can fill in the gaps

• Feedback gives Indra advantage over other systems that generate ontologies:• Global consensus

• Ability to adapt to any amount of user input