64
Comparing three models of scientific discourse annotation for enhanced knowledge extraction Anita de Waard, Maria Liakata, Paul Thompson, Raheel Nawaz and Sophia Ananiadou

Annotation systems

Embed Size (px)

DESCRIPTION

Comparing three annotation systems for marking up rhetorical move

Citation preview

Page 1: Annotation systems

Comparing three models of scientific discourse annotation

for enhanced knowledge extraction

Anita de Waard, Maria Liakata, Paul Thompson, Raheel Nawaz and Sophia Ananiadou

Page 2: Annotation systems

Accessing the knowledge in papers

Page 3: Annotation systems

Accessing the knowledge in papers

- Papers are ‘Stories that persuade with data’

Page 4: Annotation systems

Accessing the knowledge in papers

- Papers are ‘Stories that persuade with data’

- So how is this persuasion done? Three ways of annotating key rhetorical moves:

- Discourse segment types (de Waard, Elsevier/Utrecht)

- Zones of conceptualisation using Core Scientific Concepts (Liakata, Aberystwyth/EBI)

- Metaknowledge annotation of BioEvents (Thompson, Ananiadou et al, NACTeM/Manchester)

Page 5: Annotation systems

Accessing the knowledge in papers

- Papers are ‘Stories that persuade with data’

- So how is this persuasion done? Three ways of annotating key rhetorical moves:

- Discourse segment types (de Waard, Elsevier/Utrecht)

- Zones of conceptualisation using Core Scientific Concepts (Liakata, Aberystwyth/EBI)

- Metaknowledge annotation of BioEvents (Thompson, Ananiadou et al, NACTeM/Manchester)

- Comparison of 3 methods on full-text paper

Page 6: Annotation systems

Accessing the knowledge in papers

- Papers are ‘Stories that persuade with data’

- So how is this persuasion done? Three ways of annotating key rhetorical moves:

- Discourse segment types (de Waard, Elsevier/Utrecht)

- Zones of conceptualisation using Core Scientific Concepts (Liakata, Aberystwyth/EBI)

- Metaknowledge annotation of BioEvents (Thompson, Ananiadou et al, NACTeM/Manchester)

- Comparison of 3 methods on full-text paper

- What are overlaps/differences? Can we combine?

Page 7: Annotation systems

3

“Scientific articles are stories...The Story of Goldilocks and the Three Bears

Story Grammar Paper The AXH Domain of Ataxin-1 Mediates Neurodegeneration through Its Interaction with Gfi-1/Senseless Proteins

Once upon a time Time Setting Background The mechanisms mediating SCA1 pathogenesis are still not fully understood, but some general principles have emerged.

a little girl named Goldilocks Characters

Setting

Objects of study the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract,

She went for a walk in the forest. Pretty soon, she came upon a house.

Location

Setting

Experimental setup

studied and compared in vivo effects and interactions to those of the human protein

She knocked and, when no one answered,

Goal Theme Researchgoal

Gain insight into how Atx-1's function contributes to SCA1 pathogenesis. How these interactions might contribute to the disease process and how they might cause toxicity in only a subset of neurons in SCA1 is not fully understood.

she walked right in. Attempt

Theme

Hypothesis Atx-1 may play a role in the regulation of gene expression

At the table in the kitchen, there were three bowls of porridge.

Name Episode 1 Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When Overexpressed in Files

Goldilocks was hungry. Subgoal

Episode 1

Subgoal test the function of the AXH domain

She tasted the porridge from the first bowl.

Attempt

Episode 1

Method overexpressed dAtx-1 in flies using the GAL4/UAS system (Brand and Perrimon, 1993) and compared its effects to those of hAtx-1.

This porridge is too hot! she exclaimed.

Outcome

Episode 1

Results Although at 2 days after eclosion, overexpression of either Atx-1 does not show obvious morphological changes in the photoreceptor cells

So, she tasted the porridge from the second bowl.

Activity

Episode 1

Data (data not shown),

This porridge is too cold, she said

Outcome

Episode 1

Results both genotypes show many large holes and loss of cell integrity at 28 days

So, she tasted the last bowl of porridge.

 Activity

Episode 1

Data (Figures 1B-1D).

Ahhh, this porridge is just right, she said happily and

Outcome

Episode 1

Results Overexpression of dAtx-1 using the GMR-GAL4 driver also induces eye abnormalities. The external structures of the eyes that overexpress dAtx-1 show disorganized ommatidia and loss of interommatidial bristles

she ate it all up.  

Episode 1

Data (Figure 1F),

Page 8: Annotation systems

3

“Scientific articles are stories...The Story of Goldilocks and the Three Bears

Story Grammar Paper The AXH Domain of Ataxin-1 Mediates Neurodegeneration through Its Interaction with Gfi-1/Senseless Proteins

Once upon a time Time Setting Background The mechanisms mediating SCA1 pathogenesis are still not fully understood, but some general principles have emerged.

a little girl named Goldilocks Characters

Setting

Objects of study the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract,

She went for a walk in the forest. Pretty soon, she came upon a house.

Location

Setting

Experimental setup

studied and compared in vivo effects and interactions to those of the human protein

She knocked and, when no one answered,

Goal Theme Researchgoal

Gain insight into how Atx-1's function contributes to SCA1 pathogenesis. How these interactions might contribute to the disease process and how they might cause toxicity in only a subset of neurons in SCA1 is not fully understood.

she walked right in. Attempt

Theme

Hypothesis Atx-1 may play a role in the regulation of gene expression

At the table in the kitchen, there were three bowls of porridge.

Name Episode 1 Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When Overexpressed in Files

Goldilocks was hungry. Subgoal

Episode 1

Subgoal test the function of the AXH domain

She tasted the porridge from the first bowl.

Attempt

Episode 1

Method overexpressed dAtx-1 in flies using the GAL4/UAS system (Brand and Perrimon, 1993) and compared its effects to those of hAtx-1.

This porridge is too hot! she exclaimed.

Outcome

Episode 1

Results Although at 2 days after eclosion, overexpression of either Atx-1 does not show obvious morphological changes in the photoreceptor cells

So, she tasted the porridge from the second bowl.

Activity

Episode 1

Data (data not shown),

This porridge is too cold, she said

Outcome

Episode 1

Results both genotypes show many large holes and loss of cell integrity at 28 days

So, she tasted the last bowl of porridge.

 Activity

Episode 1

Data (Figures 1B-1D).

Ahhh, this porridge is just right, she said happily and

Outcome

Episode 1

Results Overexpression of dAtx-1 using the GMR-GAL4 driver also induces eye abnormalities. The external structures of the eyes that overexpress dAtx-1 show disorganized ommatidia and loss of interommatidial bristles

she ate it all up.  

Episode 1

Data (Figure 1F),

Page 9: Annotation systems

4

“...that persuade (reviewers/readers)…”

Page 10: Annotation systems

4

“...that persuade (reviewers/readers)…”Aristotle QuintilianQuintilian Scientific Paper

prooimion Introduction/ exordium

The introduction of a speech, where one announces the subject and purpose of the discourse, and where one usually employs the persuasive appeal to ethos in order to establish credibility with the audience.

Introduction: positioning

prothesis Statement of Facts/narratio

The speaker here provides a narrative account of what has happened and generally explains the nature of the case.

Introduction: research question

  Summary/ propostitio

The propositio provides a brief summary of what one is about to speak on, or concisely puts forth the charges or accusation.

Summary of contents

pistis Proof/ confirmatio

The main body of the speech where one offers logical arguments as proof. The appeal to logos is emphasized here.

Results

  Refutation/ refutatio

As the name connotes, this section of a speech was devoted to answering the counterarguments of one's opponent.

Related Work

epilogos peroratio  Following the refutatio and concluding the classical oration, the peroratio conventionally employed appeals through pathos, and often included a summing up.

Discussion: summary, implications.

Page 11: Annotation systems

4

“...that persuade (reviewers/readers)…”Aristotle QuintilianQuintilian Scientific Paper

prooimion Introduction/ exordium

The introduction of a speech, where one announces the subject and purpose of the discourse, and where one usually employs the persuasive appeal to ethos in order to establish credibility with the audience.

Introduction: positioning

prothesis Statement of Facts/narratio

The speaker here provides a narrative account of what has happened and generally explains the nature of the case.

Introduction: research question

  Summary/ propostitio

The propositio provides a brief summary of what one is about to speak on, or concisely puts forth the charges or accusation.

Summary of contents

pistis Proof/ confirmatio

The main body of the speech where one offers logical arguments as proof. The appeal to logos is emphasized here.

Results

  Refutation/ refutatio

As the name connotes, this section of a speech was devoted to answering the counterarguments of one's opponent.

Related Work

epilogos peroratio  Following the refutatio and concluding the classical oration, the peroratio conventionally employed appeals through pathos, and often included a summing up.

Discussion: summary, implications.

Page 12: Annotation systems

5

“... with data.”

Page 13: Annotation systems

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 express ion i s a se lec t ive even t dur ing tumorigenesis.

Annotate: fine-grained models of argumentationMethod 1: Discourse Segment Types

Page 14: Annotation systems

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 express ion i s a se lec t ive even t dur ing tumorigenesis.

Both seminomas and the EC component of nonseminomas share features with ES cells.To exclude thatthe detection of miR-371-3 merely reflects its expression pattern in ES cells,we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),suggesting thatmiR-371-3 expression is a selective event during tumorigenesis.

Annotate: fine-grained models of argumentationMethod 1: Discourse Segment Types

Page 15: Annotation systems

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 express ion i s a se lec t ive even t dur ing tumorigenesis.

Both seminomas and the EC component of nonseminomas share features with ES cells.To exclude thatthe detection of miR-371-3 merely reflects its expression pattern in ES cells,we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),suggesting thatmiR-371-3 expression is a selective event during tumorigenesis.

Fact

Annotate: fine-grained models of argumentationMethod 1: Discourse Segment Types

Page 16: Annotation systems

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 express ion i s a se lec t ive even t dur ing tumorigenesis.

Both seminomas and the EC component of nonseminomas share features with ES cells.To exclude thatthe detection of miR-371-3 merely reflects its expression pattern in ES cells,we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),suggesting thatmiR-371-3 expression is a selective event during tumorigenesis.

Fact

Hypothesis

Annotate: fine-grained models of argumentationMethod 1: Discourse Segment Types

Page 17: Annotation systems

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 express ion i s a se lec t ive even t dur ing tumorigenesis.

Both seminomas and the EC component of nonseminomas share features with ES cells.To exclude thatthe detection of miR-371-3 merely reflects its expression pattern in ES cells,we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),suggesting thatmiR-371-3 expression is a selective event during tumorigenesis.

Fact

Hypothesis

Method

Annotate: fine-grained models of argumentationMethod 1: Discourse Segment Types

Page 18: Annotation systems

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 express ion i s a se lec t ive even t dur ing tumorigenesis.

Both seminomas and the EC component of nonseminomas share features with ES cells.To exclude thatthe detection of miR-371-3 merely reflects its expression pattern in ES cells,we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),suggesting thatmiR-371-3 expression is a selective event during tumorigenesis.

Fact

Hypothesis

Method

Result

Annotate: fine-grained models of argumentationMethod 1: Discourse Segment Types

Page 19: Annotation systems

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 express ion i s a se lec t ive even t dur ing tumorigenesis.

Both seminomas and the EC component of nonseminomas share features with ES cells.To exclude thatthe detection of miR-371-3 merely reflects its expression pattern in ES cells,we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),suggesting thatmiR-371-3 expression is a selective event during tumorigenesis.

Fact

Hypothesis

Method

Result

Implication

Annotate: fine-grained models of argumentationMethod 1: Discourse Segment Types

Page 20: Annotation systems

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 express ion i s a se lec t ive even t dur ing tumorigenesis.

Both seminomas and the EC component of nonseminomas share features with ES cells.To exclude thatthe detection of miR-371-3 merely reflects its expression pattern in ES cells,we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),suggesting thatmiR-371-3 expression is a selective event during tumorigenesis.

Fact

Hypothesis

Method

Result

Implication

Goal

Annotate: fine-grained models of argumentationMethod 1: Discourse Segment Types

Page 21: Annotation systems

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 express ion i s a se lec t ive even t dur ing tumorigenesis.

Both seminomas and the EC component of nonseminomas share features with ES cells.To exclude thatthe detection of miR-371-3 merely reflects its expression pattern in ES cells,we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),suggesting thatmiR-371-3 expression is a selective event during tumorigenesis.

Fact

Hypothesis

Method

Result

Implication

Goal

Reg-Implication

Annotate: fine-grained models of argumentationMethod 1: Discourse Segment Types

Page 22: Annotation systems

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 express ion i s a se lec t ive even t dur ing tumorigenesis.

Both seminomas and the EC component of nonseminomas share features with ES cells.To exclude thatthe detection of miR-371-3 merely reflects its expression pattern in ES cells,we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),suggesting thatmiR-371-3 expression is a selective event during tumorigenesis.

Fact

Hypothesis

Method

Result

Implication

Goal

Reg-Implication

Conceptual knowledge

Annotate: fine-grained models of argumentationMethod 1: Discourse Segment Types

Page 23: Annotation systems

Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the m i R - 3 7 1 - 3 e x p r e s s i n g s e m i n o m a s a n d nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 express ion i s a se lec t ive even t dur ing tumorigenesis.

Both seminomas and the EC component of nonseminomas share features with ES cells.To exclude thatthe detection of miR-371-3 merely reflects its expression pattern in ES cells,we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),suggesting thatmiR-371-3 expression is a selective event during tumorigenesis.

Fact

Hypothesis

Method

Result

Implication

Goal

Reg-Implication

Conceptual knowledge

ExperimentalEvidence

Annotate: fine-grained models of argumentationMethod 1: Discourse Segment Types

Page 24: Annotation systems

Segment types point to realms of discourse:

Page 25: Annotation systems

Segment types point to realms of discourse:

(1) Both seminomas and the EC component of nonseminomas share features with ES cells.

(2) b. the detection of miR-371-3 merely reflects its expression pattern in ES cells,

Fact Problem

Page 26: Annotation systems

Segment types point to realms of discourse:

(1) Both seminomas and the EC component of nonseminomas share features with ES cells.

(2) b. the detection of miR-371-3 merely reflects its expression pattern in ES cells,

Fact Problem

(2) a. To exclude that

Goal

Page 27: Annotation systems

Segment types point to realms of discourse:

(1) Both seminomas and the EC component of nonseminomas share features with ES cells.

(2) b. the detection of miR-371-3 merely reflects its expression pattern in ES cells,

Fact Problem

(2) a. To exclude that

Goal

(2) c. we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).

(3) a. In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),

Method Result

Page 28: Annotation systems

Segment types point to realms of discourse:

(1) Both seminomas and the EC component of nonseminomas share features with ES cells.

(2) b. the detection of miR-371-3 merely reflects its expression pattern in ES cells,

Fact Problem

(2) a. To exclude that

Goal

(3) b. suggesting that

Regulatory-Implication

(2) c. we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).

(3) a. In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),

Method Result

Page 29: Annotation systems

Segment types point to realms of discourse:

(1) Both seminomas and the EC component of nonseminomas share features with ES cells.

(2) b. the detection of miR-371-3 merely reflects its expression pattern in ES cells,

Fact Problem

(3) c. miR-371-3 expression is a selective event during tumorigenesis.

Implication

(2) a. To exclude that

Goal

(3) b. suggesting that

Regulatory-Implication

(2) c. we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).

(3) a. In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),

Method Result

Page 30: Annotation systems

Concepts, models, ‘facts’: Present tense

Segment types point to realms of discourse:

(1) Both seminomas and the EC component of nonseminomas share features with ES cells.

(2) b. the detection of miR-371-3 merely reflects its expression pattern in ES cells,

Fact Problem

(3) c. miR-371-3 expression is a selective event during tumorigenesis.

Implication

(2) a. To exclude that

Goal

(3) b. suggesting that

Regulatory-Implication

(2) c. we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).

(3) a. In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),

Method Result

Page 31: Annotation systems

Concepts, models, ‘facts’: Present tense

Experiment: Past tense

Segment types point to realms of discourse:

(1) Both seminomas and the EC component of nonseminomas share features with ES cells.

(2) b. the detection of miR-371-3 merely reflects its expression pattern in ES cells,

Fact Problem

(3) c. miR-371-3 expression is a selective event during tumorigenesis.

Implication

(2) a. To exclude that

Goal

(3) b. suggesting that

Regulatory-Implication

(2) c. we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).

(3) a. In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),

Method Result

Page 32: Annotation systems

Concepts, models, ‘facts’: Present tense

Experiment: Past tense

Transitions: present tense

Segment types point to realms of discourse:

(1) Both seminomas and the EC component of nonseminomas share features with ES cells.

(2) b. the detection of miR-371-3 merely reflects its expression pattern in ES cells,

Fact Problem

(3) c. miR-371-3 expression is a selective event during tumorigenesis.

Implication

(2) a. To exclude that

Goal

(3) b. suggesting that

Regulatory-Implication

(2) c. we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004).

(3) a. In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8),

Method Result

Page 33: Annotation systems

Method 2: Annotate with Core-Scientific Concepts (CoreSC) Annotation Scheme

s

Page 34: Annotation systems

Method 2: Annotate with Core-Scientific Concepts (CoreSC) Annotation Scheme

A three layer, ontology motivated annotation scheme for sentence annotation, which views a paper as the humanly readable representation of a scientific investigation [Liakata et al 2010], with 45-page guidelines [Liakata & Soldatova 2008]

1st layer: Core Scientific Concepts (CoreSCs): Hypothesis, Motivation, Goal, Object, Background, Method, Experiment, Model, Observation, Result, Conclusion

2nd layer: Properties of CoreSCs. Novelty (New/Old) and Advantage (advantage/disadvantage)

3rd layer: Concept Identifiers: linking sentences together which refer to the same instance of a CoreSC

s

Page 35: Annotation systems

CoreSC Annotation Scheme (layers 1&2)

Page 36: Annotation systems

CoreSC Annotation Scheme (layers 1&2)HypothesisMotivationBackgroundGoalObject-NewObject-New-AdvantageObject-New-DisadvantageMethod-NewMethod-New-AdvantageMethod-New-DisadvantageMethod-OldMethod-Old-DisadvantageMethod-Old-AdvantageExperimentModelObservationResultConclusion

A statement not yet confirmed rather than a factThe reasons behind an investigationBackground knowledge & previous workA target state of the investigationA main product or theme of the investigationAdvantage of an objectDisadvantage of an objectMeans by which the goals of the investigation are achievedAdvantage of a MethodDisadvantage of a MethodA method pertaining to previous workDisadvantage of method in previous workAdvantage of method in previous workAn experimental methodStatement about theoretical model, method or frameworkData/phenomena recorded in an investigationFactual statements about the results of an investigationStatements inferred from observations and results

Page 37: Annotation systems

CoreSC Annotation tool:

Page 38: Annotation systems

Method 3: Bio-event Annotation

- A  dynamic  biological  rela0onship  involving  one  or  more  par0cipants

Page 39: Annotation systems

Method 3: Bio-event Annotation

- A  dynamic  biological  rela0onship  involving  one  or  more  par0cipants

We  found  that  Y  ac.vates  the  expression  of  X

Page 40: Annotation systems

Method 3: Bio-event Annotation

- A  dynamic  biological  rela0onship  involving  one  or  more  par0cipants

We  found  that  Y  ac.vates  the  expression  of  X

Page 41: Annotation systems

Method 3: Bio-event Annotation

- A  dynamic  biological  rela0onship  involving  one  or  more  par0cipants

We  found  that  Y  ac.vates  the  expression  of  X

ID:       E1

TRIGGER:    expression  

TYPE:              GENE_EXPRESSION

THEME:          X  :  gene

CAUSE:          none  (empty)

 

Page 42: Annotation systems

Method 3: Bio-event Annotation

- A  dynamic  biological  rela0onship  involving  one  or  more  par0cipants

We  found  that  Y  ac.vates  the  expression  of  X

ID:       E1

TRIGGER:    expression  

TYPE:              GENE_EXPRESSION

THEME:          X  :  gene

CAUSE:          none  (empty)

 

Page 43: Annotation systems

Method 3: Bio-event Annotation

- A  dynamic  biological  rela0onship  involving  one  or  more  par0cipants

We  found  that  Y  ac.vates  the  expression  of  X

ID:                E2

TRIGGER:      ac3vates  

TYPE:                    POSITIVE_REGULATION

THEME:          E1  :  event  

CAUSE:              Y  :  protein

ID:       E1

TRIGGER:    expression  

TYPE:              GENE_EXPRESSION

THEME:          X  :  gene

CAUSE:          none  (empty)

 

Page 44: Annotation systems

Meta-Knowledge annotation scheme for BioEvents

Class  /  Type(Grounded  to  an  event  

ontology)

Knowledge  Type•  InvesHgaHon•  ObservaHon•  Analysis•  General

Manner•  High•  Low•  Neutral

Certainty  Level•L3•L2•L1

Polarity•  NegaHve•  PosiHve

Source•  Other•  Current

ParHcipants•  Theme(s)•  Actor(s)

Bio-­‐Event(Centred  on  an  Event  

Trigger)

Page 45: Annotation systems

Meta-Knowledge annotation scheme for BioEvents

Class  /  Type(Grounded  to  an  event  

ontology)

Knowledge  Type•  InvesHgaHon•  ObservaHon•  Analysis•  General

Manner•  High•  Low•  Neutral

Certainty  Level•L3•L2•L1

Polarity•  NegaHve•  PosiHve

Source•  Other•  Current

ParHcipants•  Theme(s)•  Actor(s)

• Currently being applied to the entire GENIA event corpus (1000 MEDLINE abstracts)

Bio-­‐Event(Centred  on  an  Event  

Trigger)

Page 46: Annotation systems

BioEvent/MetaKnowledge Annotation

S3 = These results suggest that Y has no effect on expression of X

Page 47: Annotation systems

BioEvent/MetaKnowledge Annotation

S3 = These results suggest that Y has no effect on expression of X

EventKnowledge

Type

Certainty

Level

Lexical  

PolarityManner Source

E1 General L3 PosiHve Neutral Current

E2 Analysis L2 NegaHve Neutral Current

Page 48: Annotation systems

BioEvent/MetaKnowledge Annotation

S3 = These results suggest that Y has no effect on expression of X

EventKnowledge

Type

Certainty

Level

Lexical  

PolarityManner Source

E1 General L3 PosiHve Neutral Current

E2 Analysis L2 NegaHve Neutral Current

Page 49: Annotation systems

BioEvent/MetaKnowledge Annotation

S3 = These results suggest that Y has no effect on expression of X

EventKnowledge

Type

Certainty

Level

Lexical  

PolarityManner Source

E1 General L3 PosiHve Neutral Current

E2 Analysis L2 NegaHve Neutral Current

Page 50: Annotation systems

Name Purpose Granularity Manual/Automated

CoreSC Identify main components of scientific investigation for machine learning

Sentence Manual corpus, automated annotation tools

MetaKnowledge/BioEvents

Enhance information extraction for biomedical texts to enable metadiscourse annotation

Events (intra-sentential): can be several per sentence, or one in more sentences

Manual corpus, working on automated

Discourse Segment Types

Identify mechanisms of conveying (epistemic) knowledge in scientific discourse

Clause Manual

Comparing 3 annotating systems

Page 51: Annotation systems

3 Annotation Systems on the same paper:

Page 52: Annotation systems

3 Annotation Systems on the same paper:CoreSC:

<annotationART atype="GSC" type="Res" conceptID="Res24" novelty="None" advantage="None">Here we show that BOB.1/OBF.1 regulates Btk gene expression.</annotationART> BioEvent/MetaKnowledge:<sentence id="S6">Here we show that <term id="T13" sem="Protein_family_or_group">

<gene-or-gene-product id="G9">BOB.1</gene-or-gene-product>/<gene-or-gene-product id="G10">OBF.1</gene-or-gene-product>

</term> regulates <term id="T14" sem="Biological_process">

<term id="T15" sem="DNA_domain_or_region"><gene-or-gene-product id="G11">Btk

</gene-or-gene-product> gene</term> expression

</term>. </sentence>

Discourse Segments:<segment segID ="286" section = "D" segtype = "RegImplication">Here we show that</segment><segment segID ="287" section = "D" segtype = "Implication">

Page 53: Annotation systems

3 Annotation Systems on the same paper:CoreSC:

<annotationART atype="GSC" type="Res" conceptID="Res24" novelty="None" advantage="None">Here we show that BOB.1/OBF.1 regulates Btk gene expression.</annotationART> BioEvent/MetaKnowledge:<sentence id="S6">Here we show that <term id="T13" sem="Protein_family_or_group">

<gene-or-gene-product id="G9">BOB.1</gene-or-gene-product>/<gene-or-gene-product id="G10">OBF.1</gene-or-gene-product>

</term> regulates <term id="T14" sem="Biological_process">

<term id="T15" sem="DNA_domain_or_region"><gene-or-gene-product id="G11">Btk

</gene-or-gene-product> gene</term> expression

</term>. </sentence>

Discourse Segments:<segment segID ="286" section = "D" segtype = "RegImplication">Here we show that</segment><segment segID ="287" section = "D" segtype = "Implication">BOB.1/OBF.1 regulates Btk gene expression.</segment>

Page 54: Annotation systems

3 Annotation Systems on the same paper:CoreSC:

<annotationART atype="GSC" type="Res" conceptID="Res24" novelty="None" advantage="None">Here we show that BOB.1/OBF.1 regulates Btk gene expression.</annotationART> BioEvent/MetaKnowledge:<sentence id="S6">Here we show that <term id="T13" sem="Protein_family_or_group">

<gene-or-gene-product id="G9">BOB.1</gene-or-gene-product>/<gene-or-gene-product id="G10">OBF.1</gene-or-gene-product>

</term> regulates <term id="T14" sem="Biological_process">

<term id="T15" sem="DNA_domain_or_region"><gene-or-gene-product id="G11">Btk

</gene-or-gene-product> gene</term> expression

</term>. </sentence>

Discourse Segments:<segment segID ="286" section = "D" segtype = "RegImplication">Here we show that</segment><segment segID ="287" section = "D" segtype = "Implication">BOB.1/OBF.1 regulates Btk gene expression.</segment>

<event KT="Gen-Other" CL="L3" Manner="Neutral" Polarity=Positive"Source="Current" id="E16"><type class="Gene_expression"/><theme idref="G11"/><clue>Here we show that BOB.1/OBF.1 regulates Btk gene<clueType>expression</clueType>. </clue></event>

<event KT="Analysis" CL="L3" Manner="Neutral" Polarity=Positive"Source="Current" id="E17"><type class="Regulation"/><theme idref="E16"/><cause idref="T13"/><clue>Here we <clueKT>show</clueKT> that BOB.1/OBF.1<clueType>regulates</clueType> Btk gene expression. </clue></event>

Page 55: Annotation systems

3 Annotation Systems on the same paper:CoreSC:

<annotationART atype="GSC" type="Res" conceptID="Res24" novelty="None" advantage="None">Here we show that BOB.1/OBF.1 regulates Btk gene expression.</annotationART> BioEvent/MetaKnowledge:<sentence id="S6">Here we show that <term id="T13" sem="Protein_family_or_group">

<gene-or-gene-product id="G9">BOB.1</gene-or-gene-product>/<gene-or-gene-product id="G10">OBF.1</gene-or-gene-product>

</term> regulates <term id="T14" sem="Biological_process">

<term id="T15" sem="DNA_domain_or_region"><gene-or-gene-product id="G11">Btk

</gene-or-gene-product> gene</term> expression

</term>. </sentence>

Discourse Segments:<segment segID ="286" section = "D" segtype = "RegImplication">Here we show that</segment><segment segID ="287" section = "D" segtype = "Implication">BOB.1/OBF.1 regulates Btk gene expression.</segment>

<event KT="Gen-Other" CL="L3" Manner="Neutral" Polarity=Positive"Source="Current" id="E16"><type class="Gene_expression"/><theme idref="G11"/><clue>Here we show that BOB.1/OBF.1 regulates Btk gene<clueType>expression</clueType>. </clue></event>

<event KT="Analysis" CL="L3" Manner="Neutral" Polarity=Positive"Source="Current" id="E17"><type class="Regulation"/><theme idref="E16"/><cause idref="T13"/><clue>Here we <clueKT>show</clueKT> that BOB.1/OBF.1<clueType>regulates</clueType> Btk gene expression. </clue></event>

Page 56: Annotation systems

CoreSC vs Event Meta-knowledge

- Meta-knowledge event annotation can help to provide a more fine-grained analysis of CoreSC Background.

- Certainty Level and Source can help to refine Results and Conclusions

- More straightforward mappings occur between other categories, e.g. most sentences of the Motivation category contain only events of type Investigation.

- Categories such as Goal and Object are catered for by CoreSCs but not covered by the meta-knowledge scheme.

- Observation_L3_Current can be refined into CoreSC Obs, Res, Con and Hyp

Page 57: Annotation systems

CoreSC vs Segments

- In most cases natural mapping between the two schemes:- CoreSC Observation maps to Result, Res maps to Result and Implication. - CoreSC Conclusion maps to Implication and Hypothesis. - Implication consists of CoreSC Conclusion and Result. - Fact is CoreSC Background and Conclusion. - Hypothesis is CoreSC Hypothesis and Conclusion. - Problem is CoreSC Motivation.

- Most of CoreSC Bac maps to Fact and the Other categories, which refine it.

- CoreSCs refines Method and Result Segments

Page 58: Annotation systems

Segments vs Event Meta-knowledge

- Schemes can be complementary to each other - Segment types can refine the interpretation of Analysis events into Hypothesis,

Implication or Result. - Certainty level can help determine the confidence ascribed to the segments- Likewise, meta-knowledge can help to distinguish Result segments that

correspond either to analyses of results or experimental observations.

Page 59: Annotation systems

Conclusions (in detail):Common categories across the three schemes: (CoreSC Observation, Observation_L3_Current, Result)

(CoreSC Hypothesis, Analysis_L2_Current, Hypothesis)

(CoreSC Motivation, Investigation_L3_Current, Problem)

Categories that need refining from the three schemes: CoreSC: Background, ConclusionMetaknowledge: Gen_Other_L3_Current, Observation_L3_Current Segments: Method and Result

The three schemes have different strengths and offer annotation at different levels:

- CoreSC: complimenting the other two schemes, more fine grained Methods, Objectives and Results.

- Metaknowledge: Certainty levels and Source can help to refine the interpretation of certain CoreSC and segment types.

- Segments: Refinement of Background; signals for modality cues

Page 60: Annotation systems

Conclusions (general)

- Very small example, shows differences can be overcome. Each has advantages:

- Clause-level is most precise for identifying core claims

- Knowledge type/Certainty level are important refinement

- CoreSC refines methods and results and shows most promise for automated recognition

- So we need to work together!

- Plan to join forces; work on joint corpus

- Other work to add: KEfED, SWAN, ScholOnto

- Together develop a ‘claim identifier’ (not a fact extractor)+ standards for modality/evidence scales and types

- Work together towards claim-evidence network representation! (cf also Hypotheses, Evidence and Relationships)

Page 61: Annotation systems

The goal of the Workshop on “Models of Scientific Discourse Annotation” is to compare and contrast the motivation behind efforts in the discourse annotation of scientific text, the techniques and principles applied in the various approaches, and discuss ways in which they can complement each other and collaborate to form standards for an optimal method of annotating appropriate levels of discourse, with enhanced accuracy and usefulness.We wish to compare, contrast and evaluate different scientific discourse annotation schemes and tools, in order to answer questions such as:• What motivates a certain level, method, viewpoint for annotating scientific text?• What is the annotation level for a unit of argumentation: an event, a sentence, a segment? What are advantages and disadvantages of all three?• How easily can different schemes to be applied to texts? Are they easily trainable?• Which schemes are the most portable? Can they be applied to both full papers and abstracts? Can they be applied to texts in different domains?• How granular should annotation schemes be? What are the advantages/disadvantages of fine and coarse grained annotation categories?• Can different schemes complement each other to provide different levels of information? Can different schemes be combined to give better results?• How can we compare annotations, how do we decide which features, approaches, techniques work best?• How do we exchange and evaluate each other’s annotations?• How applicable are these efforts towards improved methods of publishing or summarizing science?

http://msda2011.wordpress.com/

Models of Scientific Discourse Annotation, Portland, OR, June 25

Page 62: Annotation systems

CoreSC References

Liakata, M. and Teufel, S. and Siddharthan, A. and Batchelor. 2010. Corpora for the conceptualisation and zoning of scientific papers. Proceedings of 7th International Conference on Language Resources and Evaluation, Malta.

Guo, Y. and Korhonen, A. and Liakata, M. and Silins, I and sSun, L. and Stenius, U. 2010.Identifying the Information Structure of Scientific Abstracts: An investigation of Three Different Schemes. Proceedings of BioNLP 2010, Uppsala, Sweden.

Liakata, M. and Q, Claire and Soldatova, S. 2009Semantic Annotation of Papers: Interface & Enrichment Tool (SAPIENT)Proceedings of BioNLP-09, 2009, Boulder, Colorado

Liakata M. and Soldatova L.N. 2008. Guidelines for the annotation of General Scientific Concepts. Aberystwyth University, JISC ProjectReport http://ie-repository.jisc.ac.uk/88/ 2008.

Soldatova L.N and Liakata M. 2007. An ontology methodology and CISP - the proposed Core Information about Scientific Papers. JISC Project Report, http://ie-repository.jisc.ac.uk/137/.

Page 63: Annotation systems

Meta-Annotation References

Ananiadou, S., Thompson, P. and Nawaz, R. (2010). "Improving Search Through Event-based Biomedical Text Mining. In Proceedings of First International Workshop on Automated Motif Discovery in Cultural Heritage and Scientific Communication Texts (AMICUS 2010).

Nawaz, R., Thompson, P., McNaught, J. and Ananiadou, S. (2010). Meta-Knowledge Annotation of Bio-Events. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), pp. 2498-2505

Nawaz, R., Thompson, P. and Ananiadou, S. (2010). Evaluating a Meta-Knowledge Annotation Scheme for Bio-Events. In Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pp. 69-77

Nawaz, R., Thompson, P. and Ananiadou, S. (2010). Event Interpretation: A Step towards Event-Centred Text Mining. In Proccedings of the First International Workshop on Automated Motif Discovery in Cultural Heritage and Scientific Communication Texts (AMICUS 2010).

Page 64: Annotation systems

Discourse Segment Referencesde Waard, A. (2010d). The Story of Science: A syntagmatic/paradigmatic analysis of scientific text. Proceedings of the AMICUS Workshop, Vienna, Austria, October 2010.de Waard, A., and Pandermaat, H. (2010). A Classification of Research Verbs to Facilitate Discourse Segment Identification in Biological Text, Proceedings of the Interdisciplinary Workshop on Verbs. The Identification and Representation of Verb Features, Pisa, Italy, November 4-5 2010.de Waard, A. (2010c). The Future of the Journal? Integrating research data with scientific discourse, Logos vol. 21, issues 1-2, January 2011.de Waard, A. (2010b). From Proteins to Fairytales: Directions in Semantic Publishing. IEEE Intelligent Systems 25(2): 83-88 (2010)de Waard, A. (2010a). Realm Traversal In Biological Discourse: From Model To Experiment and back again, Workshop on Multidisciplinary Perspectives on Signalling Text Organisation (MAD 2010), March 17-20, 2010, Moissac, France.de Waard, A. (2009b), Categorizing Epistemic Segment Types in Biology Research Articles. Workshop on Linguistic and Psycholinguistic Approaches to Text Structuring (LPTS 2009), September 21-23 2009. – to be published as a chapter in Linguistic and Psycholinguistic Approaches to Text Structuring, Laure Sarda, Shirley Carter Thomas & Benjamin Fagard (eds), John Benjamins, (planned for 2010).de Waard, A., Simon Buckingham Shum, Annamaria Carusi, Jack Park, Matthias Samwald and Ágnes Sándor. (2009). Hypotheses, Evidence and Relationships: The HypER Approach for Representing Scientific Knowledge Claims, Proceedings of the Workshop on Semantic Web Applications in Scientific Discourse (SWASD 2009), co-located with the 8th International Semantic Web Conference (ISWC-2009).de Waard, A. Buitelaar, P., & Eigner, T. (2009), Identifying the Epistemic Value of Discourse Segments in Biology Texts, In: Proceedings of the Eighth International Conference on Computational Semantics, Tilburg, The Netherlands, Jan.7-9 2009.