194
Identifying Biological Knowledge: Three Possible Strategies Anita de Waard Disruptive Technologies Director, Elsevier Labs, Amsterdam Casimir Researcher, UiL-OTS, Utrecht University XRCE, Grenoble, 24 September 2009

Xerox2009

Embed Size (px)

Citation preview

Page 1: Xerox2009

Identifying Biological Knowledge: Three Possible Strategies

Anita de Waard

Disruptive Technologies Director, Elsevier Labs, Amsterdam

Casimir Researcher,

UiL-OTS, Utrecht University

XRCE, Grenoble, 24 September 2009

Page 2: Xerox2009

Overview

Page 3: Xerox2009

Overview

Problem: too much discourse, tools are not yet good enough...

Page 4: Xerox2009

Overview

Problem: too much discourse, tools are not yet good enough...

1. First attempt: allow authors to validate entities

Page 5: Xerox2009

Overview

Problem: too much discourse, tools are not yet good enough...

1. First attempt: allow authors to validate entities

2. Second attempt: discourse analysis

Page 6: Xerox2009

Overview

Problem: too much discourse, tools are not yet good enough...

1. First attempt: allow authors to validate entities

2. Second attempt: discourse analysis

3. Third attempt: collaboration to identify hypotheses

Page 7: Xerox2009

Why Study Biological Discourse?

Page 8: Xerox2009

Why Study Biological Discourse?

- There is too much of it!

Page 9: Xerox2009

Why Study Biological Discourse?

- There is too much of it!

Page 10: Xerox2009

Why Study Biological Discourse?

- There is too much of it!

- Text mining and ‘fact extraction’ techniques are gaining ground to tame thistangle

Page 11: Xerox2009

Why Study Biological Discourse?

- There is too much of it!

- Text mining and ‘fact extraction’ techniques are gaining ground to tame thistangle

- Emerging area of biologicalnatural language processing (BioNLP): subfield of computational linguistics

Page 12: Xerox2009

Why Study Biological Discourse?

- There is too much of it!

- Text mining and ‘fact extraction’ techniques are gaining ground to tame thistangle

- Emerging area of biologicalnatural language processing (BioNLP): subfield of computational linguistics

- Main focus: identifying biological entities (genes, proteins, drugs) and their relationships

Page 13: Xerox2009

Example state of the art: MEDIE

Page 14: Xerox2009

Example state of the art: MEDIE

Alteration of nm23, P53, and S100A4 expression may contribute to the development of gastric

Page 15: Xerox2009

Example state of the art: MEDIE

Previous studies have implicated miR-34a as a tumor suppressor gene whose transcription is activated by p53.

Alteration of nm23, P53, and S100A4 expression may contribute to the development of gastric

Page 16: Xerox2009

Example state of the art: MEDIE

Previous studies have implicated miR-34a as a tumor suppressor gene whose transcription is activated by p53.

Alteration of nm23, P53, and S100A4 expression may contribute to the development of gastric

Add this knowledge during authoring?

Page 17: Xerox2009

First attempt: allow authors to validate entities

Page 18: Xerox2009

Improve time + quality of knowledgebase entry

Page 19: Xerox2009

Improve time + quality of knowledgebase entry

- For database curators: save time and money

Page 20: Xerox2009

Improve time + quality of knowledgebase entry

- For database curators: save time and money

- For authors: lower the threshold to submitting papers with metadata

Page 21: Xerox2009

Improve time + quality of knowledgebase entry

- For database curators: save time and money

- For authors: lower the threshold to submitting papers with metadata

- Structured Digital Abstract: an editorial experiment to increase the reach of online published articles

Page 22: Xerox2009

Improve time + quality of knowledgebase entry

- For database curators: save time and money

- For authors: lower the threshold to submitting papers with metadata

- Structured Digital Abstract: an editorial experiment to increase the reach of online published articles

- SDA encodes in a schema information contained in the article

Page 23: Xerox2009

D-07-02772Testis-specific poly(A) polymerase (TPAP) is a cytoplasmic poly(A) polymerase that is highly expressed in round spermatids. We identified germ cell-specific gene 1 (GSG1) as a TPAP interaction partner protein using yeast two-hybrid and coimmunoprecipitation assays. Subcellular fractionation analysis showed that GSG1 is exclusively localized in the endoplasmic reticulum (ER) of mouse testis where TPAP is also present. In NIH3T3 cells cotransfected with TPAP and GSG1, both proteins colocalize in the ER. Moreover, expression of GSG1 stimulates TPAP targeting to the ER, suggesting that interactions between the two proteins lead to the redistribution of TPAP from the cytosol to the ER.

MINT-6168263:Gsg1 (uniprotkb:Q8R1W2), TPAP (uniprotkb:Q9WVP6) and Calmegin (uniprotkb:P52194) colocalize (MI:0403) by cosedimentation (MI:0027)

MINT-6168204, MINT-6168178:Gsg1 (uniprotkb:Q8R1W2) and TPAP (uniprotkb:Q9WVP6) colocalize (MI:0403) by fluorescence microscopy (MI:0416)

MINT-6167930:Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:0218) with TPAP (uniprotkb:Q9WVP6) by two-hybrid (MI:0018)

Page 24: Xerox2009

D-07-02772Testis-specific poly(A) polymerase (TPAP) is a cytoplasmic poly(A) polymerase that is highly expressed in round spermatids. We identified germ cell-specific gene 1 (GSG1) as a TPAP interaction partner protein using yeast two-hybrid and coimmunoprecipitation assays. Subcellular fractionation analysis showed that GSG1 is exclusively localized in the endoplasmic reticulum (ER) of mouse testis where TPAP is also present. In NIH3T3 cells cotransfected with TPAP and GSG1, both proteins colocalize in the ER. Moreover, expression of GSG1 stimulates TPAP targeting to the ER, suggesting that interactions between the two proteins lead to the redistribution of TPAP from the cytosol to the ER.

MINT-6168263:Gsg1 (uniprotkb:Q8R1W2), TPAP (uniprotkb:Q9WVP6) and Calmegin (uniprotkb:P52194) colocalize (MI:0403) by cosedimentation (MI:0027)

MINT-6168204, MINT-6168178:Gsg1 (uniprotkb:Q8R1W2) and TPAP (uniprotkb:Q9WVP6) colocalize (MI:0403) by fluorescence microscopy (MI:0416)

MINT-6167930:Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:0218) with TPAP (uniprotkb:Q9WVP6) by two-hybrid (MI:0018)

Page 25: Xerox2009

D-07-02772Testis-specific poly(A) polymerase (TPAP) is a cytoplasmic poly(A) polymerase that is highly expressed in round spermatids. We identified germ cell-specific gene 1 (GSG1) as a TPAP interaction partner protein using yeast two-hybrid and coimmunoprecipitation assays. Subcellular fractionation analysis showed that GSG1 is exclusively localized in the endoplasmic reticulum (ER) of mouse testis where TPAP is also present. In NIH3T3 cells cotransfected with TPAP and GSG1, both proteins colocalize in the ER. Moreover, expression of GSG1 stimulates TPAP targeting to the ER, suggesting that interactions between the two proteins lead to the redistribution of TPAP from the cytosol to the ER.

MINT-6168263:Gsg1 (uniprotkb:Q8R1W2), TPAP (uniprotkb:Q9WVP6) and Calmegin (uniprotkb:P52194) colocalize (MI:0403) by cosedimentation (MI:0027)

MINT-6168204, MINT-6168178:Gsg1 (uniprotkb:Q8R1W2) and TPAP (uniprotkb:Q9WVP6) colocalize (MI:0403) by fluorescence microscopy (MI:0416)

MINT-6167930:Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:0218) with TPAP (uniprotkb:Q9WVP6) by two-hybrid (MI:0018)

Page 26: Xerox2009

D-07-02772Testis-specific poly(A) polymerase (TPAP) is a cytoplasmic poly(A) polymerase that is highly expressed in round spermatids. We identified germ cell-specific gene 1 (GSG1) as a TPAP interaction partner protein using yeast two-hybrid and coimmunoprecipitation assays. Subcellular fractionation analysis showed that GSG1 is exclusively localized in the endoplasmic reticulum (ER) of mouse testis where TPAP is also present. In NIH3T3 cells cotransfected with TPAP and GSG1, both proteins colocalize in the ER. Moreover, expression of GSG1 stimulates TPAP targeting to the ER, suggesting that interactions between the two proteins lead to the redistribution of TPAP from the cytosol to the ER.

MINT-6168263:Gsg1 (uniprotkb:Q8R1W2), TPAP (uniprotkb:Q9WVP6) and Calmegin (uniprotkb:P52194) colocalize (MI:0403) by cosedimentation (MI:0027)

MINT-6168204, MINT-6168178:Gsg1 (uniprotkb:Q8R1W2) and TPAP (uniprotkb:Q9WVP6) colocalize (MI:0403) by fluorescence microscopy (MI:0416)

MINT-6167930:Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:0218) with TPAP (uniprotkb:Q9WVP6) by two-hybrid (MI:0018)

Page 27: Xerox2009

D-07-02772Testis-specific poly(A) polymerase (TPAP) is a cytoplasmic poly(A) polymerase that is highly expressed in round spermatids. We identified germ cell-specific gene 1 (GSG1) as a TPAP interaction partner protein using yeast two-hybrid and coimmunoprecipitation assays. Subcellular fractionation analysis showed that GSG1 is exclusively localized in the endoplasmic reticulum (ER) of mouse testis where TPAP is also present. In NIH3T3 cells cotransfected with TPAP and GSG1, both proteins colocalize in the ER. Moreover, expression of GSG1 stimulates TPAP targeting to the ER, suggesting that interactions between the two proteins lead to the redistribution of TPAP from the cytosol to the ER.

MINT-6168263:Gsg1 (uniprotkb:Q8R1W2), TPAP (uniprotkb:Q9WVP6) and Calmegin (uniprotkb:P52194) colocalize (MI:0403) by cosedimentation (MI:0027)

MINT-6168204, MINT-6168178:Gsg1 (uniprotkb:Q8R1W2) and TPAP (uniprotkb:Q9WVP6) colocalize (MI:0403) by fluorescence microscopy (MI:0416)

MINT-6167930:Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:0218) with TPAP (uniprotkb:Q9WVP6) by two-hybrid (MI:0018)

Page 28: Xerox2009

How? Word Plugin

5

Page 29: Xerox2009

How? Word Plugin

- Okkam4MsW: a Microsoft Word plugin interact with Web Services performing NLP and semantic technologies to detect entities and contextual information

- The OKKAM repository is queried to get the right OKKAM id and alternative ids (UniProt in this case)

5

Page 30: Xerox2009

How? Word Plugin

- Okkam4MsW: a Microsoft Word plugin interact with Web Services performing NLP and semantic technologies to detect entities and contextual information

- The OKKAM repository is queried to get the right OKKAM id and alternative ids (UniProt in this case)

5

Page 31: Xerox2009

OKKAM Entity Editor in MS Word

Page 32: Xerox2009

OKKAM Entity Editor in MS Word

Page 33: Xerox2009

OKKAM Entity Editor in MS Word

Page 34: Xerox2009

OKKAM Entity Editor in MS Word

Page 35: Xerox2009

http://sig.ma

Page 36: Xerox2009

Second attempt:

discourse analysis

Page 37: Xerox2009

What else is wrong with MEDIE?

Page 38: Xerox2009

What else is wrong with MEDIE?

Alteration of nm23, P53, and S100A4 expression may contribute to the development of gastric

Page 39: Xerox2009

What else is wrong with MEDIE?

Previous studies have implicated miR-34a as a tumor suppressor gene whose transcription is activated by p53.

Alteration of nm23, P53, and S100A4 expression may contribute to the development of gastric

Page 40: Xerox2009

What else is wrong with MEDIE?

Previous studies have implicated miR-34a as a tumor suppressor gene whose transcription is activated by p53.

Alteration of nm23, P53, and S100A4 expression may contribute to the development of gastric

without some idea of the status of the sentence, it cannot be interpreted!

Page 41: Xerox2009

Discourse Analysis

Page 42: Xerox2009

Discourse AnalysisUnderlying model of text mining systems:

- Scientific paper is ‘statement of pertinent facts’

- So: finding entities and relationships will give you a summary of the knowledge within the paper

- However, information extracted this way is not very useful....

Page 43: Xerox2009

Discourse AnalysisUnderlying model of text mining systems:

- Scientific paper is ‘statement of pertinent facts’

- So: finding entities and relationships will give you a summary of the knowledge within the paper

- However, information extracted this way is not very useful....

Proposed approach: treat scientific paper as a persuasive text: specific genre, with genre characteristics and allowed persuasive techniques:

- ‘these results suggest’ (depersonification)

- ‘as fig. 2a shows’ (evidence is in the data)

- ‘oncogenes produce a stress response [Serrano, 2003]’

References and data form a “folded array of successive defense lines, behind which scientists ensconce themselves” (Latour, 1986)

Page 44: Xerox2009

Overall Research Questions

Page 45: Xerox2009

Overall Research Questions

i. How can we model the discourse/suasive moves in a biological paper?

Page 46: Xerox2009

Overall Research Questions

i. How can we model the discourse/suasive moves in a biological paper?

ii. Can this model help enable automated epistemic markup?

Page 47: Xerox2009

Overall Research Questions

i. How can we model the discourse/suasive moves in a biological paper?

ii. Can this model help enable automated epistemic markup?

iii. Can it improve knowledge representations of collections of papers?

Page 48: Xerox2009

Discourse analysis

Page 49: Xerox2009

Discourse analysis

Segmentation and classification:

Page 50: Xerox2009

Discourse analysis

Segmentation and classification:

1. Parse text into discourse segments (edu’s) containing a single rhetorical move (if possible...)

Page 51: Xerox2009

Discourse analysis

Segmentation and classification:

1. Parse text into discourse segments (edu’s) containing a single rhetorical move (if possible...)

2. Determine categories or types of discourse segments that have similar semantic/pragmatic properties

Page 52: Xerox2009

Discourse analysis

Segmentation and classification:

1. Parse text into discourse segments (edu’s) containing a single rhetorical move (if possible...)

2. Determine categories or types of discourse segments that have similar semantic/pragmatic properties

3. Look at a number of linguistic characteristics and see if these segment types share those characteristics.

Page 53: Xerox2009

Segmentation

Page 54: Xerox2009

Segmentation

Goal: ‘one new thought per segment’:

Page 55: Xerox2009

Figure 4A shows that following RASV12 stimulation, p53 was stabilized and activated, and its target gene, p21cip1, was induced in all cases, indicating an intact p53 pathway in these cells.

Segmentation

Goal: ‘one new thought per segment’:

Page 56: Xerox2009

a. Figure 4a shows thatb. following RASV12 stimulationc. p53 was stabilized and activatedd. and the target gene, p21cip1, was induced in all cases,e. indicating an intact p53 pathway in these cells.

Figure 4A shows that following RASV12 stimulation, p53 was stabilized and activated, and its target gene, p21cip1, was induced in all cases, indicating an intact p53 pathway in these cells.

Segmentation

Goal: ‘one new thought per segment’:

Page 57: Xerox2009

a. Figure 4a shows thatb. following RASV12 stimulationc. p53 was stabilized and activatedd. and the target gene, p21cip1, was induced in all cases,e. indicating an intact p53 pathway in these cells.

Figure 4A shows that following RASV12 stimulation, p53 was stabilized and activated, and its target gene, p21cip1, was induced in all cases, indicating an intact p53 pathway in these cells.

IntratextualMethod

Result Result

Implication

Segmentation

Goal: ‘one new thought per segment’:

Page 58: Xerox2009

a. Figure 4a shows thatb. following RASV12 stimulationc. p53 was stabilized and activatedd. and the target gene, p21cip1, was induced in all cases,e. indicating an intact p53 pathway in these cells.

Figure 4A shows that following RASV12 stimulation, p53 was stabilized and activated, and its target gene, p21cip1, was induced in all cases, indicating an intact p53 pathway in these cells.

IntratextualMethod

Result Result

Implication

Segmentation

Goal: ‘one new thought per segment’:

Page 59: Xerox2009

Segment Types

Page 60: Xerox2009

Segment TypesSegment Description Example

Fact a known fact, generally without explicit citation

mature miR-373 is a homolog of miR-372

Hypothesis a proposed idea, not supported by evidence

This could for instance be a result of high mdm2 levels

Problem unresolved, contradictory, or unclear issue

However, further investigation is required to demonstrate the exact mechanism of LATS2 actionGoal research goal To identify novel functions of miRNAs,

Method experimental method Using fluorescence microscopy and luciferase assays,

Result a restatement of the outcome of an experiment

all constructs yielded high expression levels of mature miRNAs

Implication an interpretation of the results, in light of earlier hypotheses and facts

our procedure is sensitive enough to detect mild growth differences

Page 61: Xerox2009

Segment TypesSegment Description Example

Fact a known fact, generally without explicit citation

mature miR-373 is a homolog of miR-372

Hypothesis a proposed idea, not supported by evidence

This could for instance be a result of high mdm2 levels

Problem unresolved, contradictory, or unclear issue

However, further investigation is required to demonstrate the exact mechanism of LATS2 actionGoal research goal To identify novel functions of miRNAs,

Method experimental method Using fluorescence microscopy and luciferase assays,

Result a restatement of the outcome of an experiment

all constructs yielded high expression levels of mature miRNAs

Implication an interpretation of the results, in light of earlier hypotheses and facts

our procedure is sensitive enough to detect mild growth differences

‘Other-segments’, related to (referenced) other work:

Page 62: Xerox2009

Segment TypesSegment Description Example

Fact a known fact, generally without explicit citation

mature miR-373 is a homolog of miR-372

Hypothesis a proposed idea, not supported by evidence

This could for instance be a result of high mdm2 levels

Problem unresolved, contradictory, or unclear issue

However, further investigation is required to demonstrate the exact mechanism of LATS2 actionGoal research goal To identify novel functions of miRNAs,

Method experimental method Using fluorescence microscopy and luciferase assays,

Result a restatement of the outcome of an experiment

all constructs yielded high expression levels of mature miRNAs

Implication an interpretation of the results, in light of earlier hypotheses and facts

our procedure is sensitive enough to detect mild growth differences

‘Other-segments’, related to (referenced) other work:

Regulatory segments, acting as matrix sentences framing other segments:

Page 63: Xerox2009

Linguistic and structural properties

Page 64: Xerox2009

Linguistic and structural properties

Page 65: Xerox2009

Linguistic and structural properties1. Position in text

- Section of the paper (Introduction, Results, Discussion)

- Beginning/middle/end of section

- First/second/third part of sentence

Page 66: Xerox2009

Linguistic and structural properties1. Position in text

- Section of the paper (Introduction, Results, Discussion)

- Beginning/middle/end of section

- First/second/third part of sentence

2. Verb:

- Tense, aspect, voice

- Verb class: Thing (increase), Thing-Thing (inhibit), Person-Thing (examine, observe, operate, implicate), Person: Report

- Lexicon

Page 67: Xerox2009

Linguistic and structural properties1. Position in text

- Section of the paper (Introduction, Results, Discussion)

- Beginning/middle/end of section

- First/second/third part of sentence

2. Verb:

- Tense, aspect, voice

- Verb class: Thing (increase), Thing-Thing (inhibit), Person-Thing (examine, observe, operate, implicate), Person: Report

- Lexicon

3. Metadiscourse markers [Hyland, 2003]:

- Connectives

- Endophorics, Evidentials

- Hedges, Boosters

- Person markers

Page 68: Xerox2009

Results: Section and Sequence

Page 69: Xerox2009

Results: Section and Sequence

1. Voorhoeve, 2006: Cell - 427 segments

Page 70: Xerox2009

Results: Section and Sequence

1. Voorhoeve, 2006: Cell - 427 segments

2. Louiseau, 2008: European Neuropsychopharmacology - 281 segments

Page 71: Xerox2009

Results: Section and Sequence

1. Voorhoeve, 2006: Cell - 427 segments

2. Louiseau, 2008: European Neuropsychopharmacology - 281 segments

- Introduction (90): Other-Result (24), Other-Implication (11), Problem (9), Fact (8)

Page 72: Xerox2009

Results: Section and Sequence

1. Voorhoeve, 2006: Cell - 427 segments

2. Louiseau, 2008: European Neuropsychopharmacology - 281 segments

- Introduction (90): Other-Result (24), Other-Implication (11), Problem (9), Fact (8)

- Result (334): Goal (26) -> Method (68) -> Result (105) -> Reg-Implication (23) ->Implication (50)

Page 73: Xerox2009

Results: Section and Sequence

1. Voorhoeve, 2006: Cell - 427 segments

2. Louiseau, 2008: European Neuropsychopharmacology - 281 segments

- Introduction (90): Other-Result (24), Other-Implication (11), Problem (9), Fact (8)

- Result (334): Goal (26) -> Method (68) -> Result (105) -> Reg-Implication (23) ->Implication (50)

- Discussion (187):Implication (27), Result (21), Other-Result (24), Hypothesis (19), Problem (17)

Page 74: Xerox2009

Results: Verb Tense

Page 75: Xerox2009

Results: Verb Tense

- Realm of the Present: Fact (82%), Hypothesis (71%), Implication (62%)

Page 76: Xerox2009

Results: Verb Tense

- Realm of the Present: Fact (82%), Hypothesis (71%), Implication (62%)

- Realm of the Past: Result (82%), Method (76%) - 50% Passive, of Method 50% Past Perfect

Page 77: Xerox2009

Results: Verb Tense

- Realm of the Present: Fact (82%), Hypothesis (71%), Implication (62%)

- Realm of the Past: Result (82%), Method (76%) - 50% Passive, of Method 50% Past Perfect

- Realm of the Modal: 44% in Hypothesis

Page 78: Xerox2009

Results: Verb Tense

- Realm of the Present: Fact (82%), Hypothesis (71%), Implication (62%)

- Realm of the Past: Result (82%), Method (76%) - 50% Passive, of Method 50% Past Perfect

- Realm of the Modal: 44% in Hypothesis

- Realm of the To-Infinitive: 50% is Goal, 75% of Goal is to-infinitive (Purpose Clause)

Page 79: Xerox2009

Results: Verb Type

Page 80: Xerox2009

Results: Verb Type

- Thing - Thing: high in experimental (Method, Result) and conceptual (Problem, Hypothesis, Fact, Implication) segments:

‣ Need to differentiate between ‘concept’ things and ‘experimental’ things!

Page 81: Xerox2009

Results: Verb Type

- Thing - Thing: high in experimental (Method, Result) and conceptual (Problem, Hypothesis, Fact, Implication) segments:

‣ Need to differentiate between ‘concept’ things and ‘experimental’ things!

- Person - Implicate: high in Hypothesis, Implication, Problem

Page 82: Xerox2009

Results: Verb Type

- Thing - Thing: high in experimental (Method, Result) and conceptual (Problem, Hypothesis, Fact, Implication) segments:

‣ Need to differentiate between ‘concept’ things and ‘experimental’ things!

- Person - Implicate: high in Hypothesis, Implication, Problem

- Person - Operate: high in Methods (90%)

Page 83: Xerox2009

Results: Verb Type

- Thing - Thing: high in experimental (Method, Result) and conceptual (Problem, Hypothesis, Fact, Implication) segments:

‣ Need to differentiate between ‘concept’ things and ‘experimental’ things!

- Person - Implicate: high in Hypothesis, Implication, Problem

- Person - Operate: high in Methods (90%)

- Person - Examine: high in Goal (87%)

Page 84: Xerox2009

Results: Metadiscourse Markers

Page 85: Xerox2009

Results: Metadiscourse Markers

- Causitive: high in Implications (therefore, thus),

- Comparison: high in Results (whereas, in contrast),

- Temporality: high in Methods (next, subsequently)

- Person markers: high in Methods (50%) and Results

- Boosters: high in Results (indeed, surprisingly, interestingly)

- Hedges: high in Implication, Reg-Implication (raises the possibility that, explains at least in part)

- but modals and ‘suggest’ verbs are left out

Page 86: Xerox2009

Discourse as a Fact-oryhypothesis

fact fact fact

problem

i. How can we model the discourse moves in a biological paper?

Page 87: Xerox2009

goalto

Discourse as a Fact-oryhypothesis

fact fact fact

problem

i. How can we model the discourse moves in a biological paper?

Page 88: Xerox2009

goalto

Discourse as a Fact-oryhypothesis

fact fact fact

problem

i. How can we model the discourse moves in a biological paper?

Page 89: Xerox2009

goalto

wemethod

resultresulting in

Discourse as a Fact-oryhypothesis

fact fact fact

problem

i. How can we model the discourse moves in a biological paper?

Page 90: Xerox2009

goalto

wemethod

resultresulting in

Discourse as a Fact-oryhypothesis

fact fact fact

problem

i. How can we model the discourse moves in a biological paper?

Page 91: Xerox2009

goalto

wemethod

resultresulting in

Discourse as a Fact-ory

suggests that

implication

discussion

hypothesis

fact fact fact

problem

i. How can we model the discourse moves in a biological paper?

Page 92: Xerox2009

goalto

wemethod

resultresulting in

Discourse as a Fact-ory

suggests that

implication

discussion

hypothesis

fact fact fact

problem

i. How can we model the discourse moves in a biological paper?

Page 93: Xerox2009

goalto

wemethod

resultresulting in

Discourse as a Fact-ory

suggests that

implication

discussion

hypothesis

fact fact fact

problem

introduction

i. How can we model the discourse moves in a biological paper?

Page 94: Xerox2009

goalto

wemethod

resultresulting in

Discourse as a Fact-ory

suggests that

implication

discussion

hypothesis

fact fact fact

problem

introduction

results

i. How can we model the discourse moves in a biological paper?

Page 95: Xerox2009

goalto

wemethod

resultresulting in

Discourse as a Fact-ory

suggests that

implication

discussion

hypothesis

fact fact fact

problem

introduction

results

discussion

i. How can we model the discourse moves in a biological paper?

Page 96: Xerox2009

goalto

wemethod

resultresulting in

Discourse as a Fact-ory

suggests that

implication

discussion

Own viewShared view

hypothesis

fact fact fact

problem

introduction

results

discussion

i. How can we model the discourse moves in a biological paper?

Page 97: Xerox2009

goalto

hypothetical realm: (might, would)

realm of activity: (to test, to see)

realm of models: present

realm of experience:

past

wemethod

resultresulting in

Discourse as a Fact-ory

suggests that

implication

discussion

Own viewShared view

hypothesis

fact fact fact

problem

introduction

results

discussion

i. How can we model the discourse moves in a biological paper?

Page 98: Xerox2009

ii. Is this useful for enabling automated epistemic markup?

Page 99: Xerox2009

ii. Is this useful for enabling automated epistemic markup?

✓ first efforts seem promising: simple markers (‘suggest’ verbs, connectives, etc.) already help:

Page 100: Xerox2009

ii. Is this useful for enabling automated epistemic markup?

✓ first efforts seem promising: simple markers (‘suggest’ verbs, connectives, etc.) already help:

6> It is thus emerging that A_1-42-induced memory deficits may involve subtler neuronal alternations leading to synaptic deficits, prior to frank neurodegeneration in AD brains.

Page 101: Xerox2009

ii. Is this useful for enabling automated epistemic markup?

✓ first efforts seem promising: simple markers (‘suggest’ verbs, connectives, etc.) already help:

6> It is thus emerging that A_1-42-induced memory deficits may involve subtler neuronal alternations leading to synaptic deficits, prior to frank neurodegeneration in AD brains.TRIPLET(that A_1_GENE:+ - 42 - induced memory deficits,involve,subtler neuronal alternations)

Page 102: Xerox2009

ii. Is this useful for enabling automated epistemic markup?

✓ first efforts seem promising: simple markers (‘suggest’ verbs, connectives, etc.) already help:

6> It is thus emerging that A_1-42-induced memory deficits may involve subtler neuronal alternations leading to synaptic deficits, prior to frank neurodegeneration in AD brains.TRIPLET(that A_1_GENE:+ - 42 - induced memory deficits,involve,subtler neuronal alternations)

Page 103: Xerox2009

ii. Is this useful for enabling automated epistemic markup?

✓ first efforts seem promising: simple markers (‘suggest’ verbs, connectives, etc.) already help:

6> It is thus emerging that A_1-42-induced memory deficits may involve subtler neuronal alternations leading to synaptic deficits, prior to frank neurodegeneration in AD brains.TRIPLET(that A_1_GENE:+ - 42 - induced memory deficits,involve,subtler neuronal alternations)

‣ issue: segment parsing is difficult!

Page 104: Xerox2009

ii. Is this useful for enabling automated epistemic markup?

✓ first efforts seem promising: simple markers (‘suggest’ verbs, connectives, etc.) already help:

6> It is thus emerging that A_1-42-induced memory deficits may involve subtler neuronal alternations leading to synaptic deficits, prior to frank neurodegeneration in AD brains.TRIPLET(that A_1_GENE:+ - 42 - induced memory deficits,involve,subtler neuronal alternations)

‣ issue: segment parsing is difficult!

‣ issue: verb tense is not always accessible

Page 105: Xerox2009

ii. Is this useful for enabling automated epistemic markup?

✓ first efforts seem promising: simple markers (‘suggest’ verbs, connectives, etc.) already help:

6> It is thus emerging that A_1-42-induced memory deficits may involve subtler neuronal alternations leading to synaptic deficits, prior to frank neurodegeneration in AD brains.TRIPLET(that A_1_GENE:+ - 42 - induced memory deficits,involve,subtler neuronal alternations)

‣ issue: segment parsing is difficult!

‣ issue: verb tense is not always accessible

‣ bionlp: not that much work on full text, since commercial publishers are difficult :-)!

Page 106: Xerox2009

ii. Is this useful for enabling automated epistemic markup?

✓ first efforts seem promising: simple markers (‘suggest’ verbs, connectives, etc.) already help:

6> It is thus emerging that A_1-42-induced memory deficits may involve subtler neuronal alternations leading to synaptic deficits, prior to frank neurodegeneration in AD brains.TRIPLET(that A_1_GENE:+ - 42 - induced memory deficits,involve,subtler neuronal alternations)

‣ issue: segment parsing is difficult!

‣ issue: verb tense is not always accessible

‣ bionlp: not that much work on full text, since commercial publishers are difficult :-)!

‣ possible challenge at biolink 2011: watch this space...

Page 107: Xerox2009
Page 108: Xerox2009

Concepts

KnownFact KnownFact

Page 109: Xerox2009

Concepts

KnownFact KnownFact

Hypothesis

To investigate the possibility that miR-372 and miR-373 suppress the

expression of LATS2, we...

Page 110: Xerox2009

Concepts

KnownFact KnownFact

Experiment 1

Goal

Result

Data

Method

Hypothesis

To investigate the possibility that miR-372 and miR-373 suppress the

expression of LATS2, we...

Page 111: Xerox2009

Concepts

KnownFact KnownFact

Experiment 1

Goal

Result

Data

Method

Hypothesis

To investigate the possibility that miR-372 and miR-373 suppress the

expression of LATS2, we...

Implication

Therefore, these results point toLATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,

Page 112: Xerox2009

Concepts

KnownFact KnownFact

Experiment 1

Goal

Result

Data

Method

Hypothesis

To investigate the possibility that miR-372 and miR-373 suppress the

expression of LATS2, we...

Implication

Therefore, these results point toLATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,

Voorhoeve, 2006

Page 113: Xerox2009

Concepts

KnownFact KnownFact

Experiment 1

Goal

Result

Data

Method

Goal

Experiment 2

Data

Method Result

Hypothesis

To investigate the possibility that miR-372 and miR-373 suppress the

expression of LATS2, we...

Implication

Therefore, these results point toLATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,

Voorhoeve, 2006

Page 114: Xerox2009

Concepts

KnownFact KnownFact

Experiment 1

Goal

Result

Data

Method

Goal

Experiment 2

Data

Method Result

Hypothesis

To investigate the possibility that miR-372 and miR-373 suppress the

expression of LATS2, we...

Implication

Therefore, these results point toLATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,

Fact

two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in testicular germ cell

tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumor

suppressor (Voorhoeve et al., 2006).

Raver-Shapira et.al, JMolCell 2007

Voorhoeve, 2006

Page 115: Xerox2009

Concepts

KnownFact KnownFact

Experiment 1

Goal

Result

Data

Method

Goal

Experiment 2

Data

Method Result

Hypothesis

To investigate the possibility that miR-372 and miR-373 suppress the

expression of LATS2, we...

Implication

Therefore, these results point toLATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,

Fact

two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in testicular germ cell

tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumor

suppressor (Voorhoeve et al., 2006).

Raver-Shapira et.al, JMolCell 2007

miR-372 and miR-373 target the Lats2 tumor suppressor (Voorhoeve et al., 2006)

Yabuta, JBioChem 2007

Voorhoeve, 2006

Page 116: Xerox2009

Fact creation vs. Latour (1986)

Page 117: Xerox2009

Fact creation vs. Latour (1986)

Page 118: Xerox2009

Future research:

Page 119: Xerox2009

Future research:

‣ Need co-annotators to verify semantic types

Page 120: Xerox2009

Future research:

‣ Need co-annotators to verify semantic types

‣ Need to scale up with more (types of) texts!

Page 121: Xerox2009

Future research:

‣ Need co-annotators to verify semantic types

‣ Need to scale up with more (types of) texts!

I. How is a scientific fact created, as it moves from a hedged claim to a throughout successive citations?

Page 122: Xerox2009

Future research:

‣ Need co-annotators to verify semantic types

‣ Need to scale up with more (types of) texts!

I. How is a scientific fact created, as it moves from a hedged claim to a throughout successive citations?

II. Can we identify a rhetorically successful text, using these segments and characteristics?

Page 123: Xerox2009

Future research:

‣ Need co-annotators to verify semantic types

‣ Need to scale up with more (types of) texts!

I. How is a scientific fact created, as it moves from a hedged claim to a throughout successive citations?

II. Can we identify a rhetorically successful text, using these segments and characteristics?

III. Can we help authors create such texts (guidelines, tools?

Page 124: Xerox2009

Third attempt:

collaboration!

Page 125: Xerox2009

Improve ‘what is claimed about an entity’

insulin ::: GB000841

maintaining glucose homeostasis

... diabetes defect) to overcome insulin resistance in maintaining glucose homeostasis, hyperglycemia and glucose intolerance ...improve glucose

homeostasis... in T2D is able to increase insulin secretion and improve glucose homeostasis.

improves glucose homeostasis

... SIRT1, whose administration to insulin-resistant animals improves glucose homeostasis.

is capable glucose homeostasis

S15511 is a novel insulin sensitizer that is capable of improving glucose homeostasis in nondiabetic rats.

maintains glucose homeostasis

Pancreatic beta-cells possess a well-regulated insulin secretory property that maintains systemic glucose homeostasis.

may be involved

glucose homeostasis

... similar way to those of insulin, PANDER may be involved in glucose homeostasis.

participates glucose homeostasis

Fine-tuning of insulin secretion from pancreatic beta-cells participates in blood glucose homeostasis.

Page 126: Xerox2009

Improve ‘what is claimed about an entity’

insulin ::: GB000841

maintaining glucose homeostasis

... diabetes defect) to overcome insulin resistance in maintaining glucose homeostasis, hyperglycemia and glucose intolerance ...improve glucose

homeostasis... in T2D is able to increase insulin secretion and improve glucose homeostasis.

improves glucose homeostasis

... SIRT1, whose administration to insulin-resistant animals improves glucose homeostasis.

is capable glucose homeostasis

S15511 is a novel insulin sensitizer that is capable of improving glucose homeostasis in nondiabetic rats.

maintains glucose homeostasis

Pancreatic beta-cells possess a well-regulated insulin secretory property that maintains systemic glucose homeostasis.

may be involved

glucose homeostasis

... similar way to those of insulin, PANDER may be involved in glucose homeostasis.

participates glucose homeostasis

Fine-tuning of insulin secretion from pancreatic beta-cells participates in blood glucose homeostasis.

When insulin secretion cannot be increased adequately (type I diabetes defect) to overcome insulin resistance in maintaining glucose homeostasis, hyperglycemia and glucose intolerance ensues. Insulin resistance and glucose intolerance has been well recognized in patients with advanced chronic kidney diseases (CKD).

.. Incretin metabolism is abnormal in T2D, evidenced by a decreased incretin effect, reduction in nutrient-mediated secretion of GIP and GLP-1 in T2D, and resistance to GIP. GLP-1, on the other hand, when administered intravenously in T2D is able to increase insulin secretion and improve glucose homeostasis.

SIRT1, a NAD(+)-dependent protein deacetylase that regulates transcription factors involved in key cellular processes, has been implicated as a mediator of the beneficial effects of calorie restriction. In a recent issue of Nature, Milne et al. (2007) describe novel potent activators of SIRT1, whose administration to insulin-resistant animals improves glucose homeostasis.

S15511 is a novel insulin sensitizer that is capable of improving glucose homeostasis in nondiabetic rats.... However, the mechanisms behind the insulin-sensitizing effect of S15511 are unknown. The aim of our study was to explore whether S15511 improves insulin sensitivity in skeletal muscles. S15511 treatment was associated with an increase in insulin-stimulated glucose transport in type IIb fibers, while type I fibers were unaffected. Pancreatic beta-cells possess a well-regulated insulin secretory property that maintains systemic glucose homeostasis. Although it has long been thought that differentiated beta-cells are nearly static, recent studies have shown that beta-cell mass dynamically changes throughout the lifetime. In this article, recent progress of regenerative medicine of the pancreas is reviewed.... Our results showed that glucose up-regulated PANDER mRNA and protein levels in a time- and dose-dependent manner in MIN6 cells and pancreatic islets. ...Because PANDER is expressed by pancreatic beta-cells and in response to glucose in a similar way to those of insulin, PANDER may be involved in glucose homeostasis.

Fine-tuning of insulin secretion from pancreatic beta-cells participates in blood glucose homeostasis. ... Our data identify miR124a and miR96 as novel regulators of the expression of proteins playing a critical role in insulin exocytosis and in the release of other hormones and neurotransmitters.

Page 127: Xerox2009

Improve ‘what is claimed about an entity’

insulin ::: GB000841

maintaining glucose homeostasis

... diabetes defect) to overcome insulin resistance in maintaining glucose homeostasis, hyperglycemia and glucose intolerance ...improve glucose

homeostasis... in T2D is able to increase insulin secretion and improve glucose homeostasis.

improves glucose homeostasis

... SIRT1, whose administration to insulin-resistant animals improves glucose homeostasis.

is capable glucose homeostasis

S15511 is a novel insulin sensitizer that is capable of improving glucose homeostasis in nondiabetic rats.

maintains glucose homeostasis

Pancreatic beta-cells possess a well-regulated insulin secretory property that maintains systemic glucose homeostasis.

may be involved

glucose homeostasis

... similar way to those of insulin, PANDER may be involved in glucose homeostasis.

participates glucose homeostasis

Fine-tuning of insulin secretion from pancreatic beta-cells participates in blood glucose homeostasis.

When insulin secretion cannot be increased adequately (type I diabetes defect) to overcome insulin resistance in maintaining glucose homeostasis, hyperglycemia and glucose intolerance ensues. Insulin resistance and glucose intolerance has been well recognized in patients with advanced chronic kidney diseases (CKD).

.. Incretin metabolism is abnormal in T2D, evidenced by a decreased incretin effect, reduction in nutrient-mediated secretion of GIP and GLP-1 in T2D, and resistance to GIP. GLP-1, on the other hand, when administered intravenously in T2D is able to increase insulin secretion and improve glucose homeostasis.

SIRT1, a NAD(+)-dependent protein deacetylase that regulates transcription factors involved in key cellular processes, has been implicated as a mediator of the beneficial effects of calorie restriction. In a recent issue of Nature, Milne et al. (2007) describe novel potent activators of SIRT1, whose administration to insulin-resistant animals improves glucose homeostasis.

S15511 is a novel insulin sensitizer that is capable of improving glucose homeostasis in nondiabetic rats.... However, the mechanisms behind the insulin-sensitizing effect of S15511 are unknown. The aim of our study was to explore whether S15511 improves insulin sensitivity in skeletal muscles. S15511 treatment was associated with an increase in insulin-stimulated glucose transport in type IIb fibers, while type I fibers were unaffected. Pancreatic beta-cells possess a well-regulated insulin secretory property that maintains systemic glucose homeostasis. Although it has long been thought that differentiated beta-cells are nearly static, recent studies have shown that beta-cell mass dynamically changes throughout the lifetime. In this article, recent progress of regenerative medicine of the pancreas is reviewed.... Our results showed that glucose up-regulated PANDER mRNA and protein levels in a time- and dose-dependent manner in MIN6 cells and pancreatic islets. ...Because PANDER is expressed by pancreatic beta-cells and in response to glucose in a similar way to those of insulin, PANDER may be involved in glucose homeostasis.

Fine-tuning of insulin secretion from pancreatic beta-cells participates in blood glucose homeostasis. ... Our data identify miR124a and miR96 as novel regulators of the expression of proteins playing a critical role in insulin exocytosis and in the release of other hormones and neurotransmitters.

Page 128: Xerox2009

30

A network of hypotheses and evidence

Page 129: Xerox2009

30

PHC Growth arrestundergo

A network of hypotheses and evidence

Page 130: Xerox2009

30

PHC Growth arrestundergo

A network of hypotheses and evidence

Paper A:implication

results

method

goal

fact

fact

Page 131: Xerox2009

30

PHC Growth arrestundergo

A network of hypotheses and evidence

Paper A:implication

results

method

goal

fact

fact

data 1

data 2 data 3

Page 132: Xerox2009

30

PHC Growth arrestundergo

A network of hypotheses and evidence

Paper A:implication

results

method

goal

fact

fact

Paper B:

data 4

data 5 data 6

implication

results

method

goal

fact

fact

data 1

data 2 data 3

Page 133: Xerox2009

30

PHC Growth arrestundergo

A network of hypotheses and evidence

Paper A:implication

results

method

goal

fact

fact

Paper B:

data 4

data 5 data 6

implication

results

method

goal

fact

fact

data 1

data 2 data 3

Page 134: Xerox2009

30

PHC Growth arrestundergo

A network of hypotheses and evidence

Paper A:implication

results

method

goal

fact

fact

Paper B:

data 4

data 5 data 6

implication

results

method

goal

fact

fact

data 1

data 2 data 3

Page 135: Xerox2009

30

PHC Growth arrestundergo

A network of hypotheses and evidence

Paper A:implication

results

method

goal

fact

fact

Paper B:

data 4

data 5 data 6

implication

results

method

goal

fact

fact

underpinning

data 1

data 2 data 3

Page 136: Xerox2009

30

PHC Growth arrestundergo

A network of hypotheses and evidence

Paper A:implication

results

method

goal

fact

fact

Paper B:

data 4

data 5 data 6

implication

results

method

goal

fact

fact

data 1

data 2 data 3

Page 137: Xerox2009

30

PHC Growth arrestundergo

A network of hypotheses and evidence

Paper A:implication

results

method

goal

fact

fact

Paper B:

data 4

data 5 data 6

implication

results

method

goal

fact

fact

data 1

data 2 data 3

method link

Page 138: Xerox2009

For Example: SWAN

Page 139: Xerox2009

For Example: SWAN

Page 140: Xerox2009

For Example: SWAN

Page 141: Xerox2009

For Example: SWAN

Page 142: Xerox2009

HypER Working Group:

- Goal: Align and expand existing efforts on detection and analysis of Hypotheses, Evidence & Relationships

- Partners:

- Harvard/MGH: SWAN, ARF

- Open University: Cohere

- Oxford University: CiTO, eLearning/Rhetoric

- DERI: SALT, aTags

- University of Trento: LiquidPub

- Xerox Research: XIP hypothesis identifier

- U Tilburg: ML for Science

- Elsevier, UUtrecht: Discourse analysis of biology

Page 143: Xerox2009

HypER Working Group:

- Goal: Align and expand existing efforts on detection and analysis of Hypotheses, Evidence & Relationships

- Partners:

- Harvard/MGH: SWAN, ARF

- Open University: Cohere

- Oxford University: CiTO, eLearning/Rhetoric

- DERI: SALT, aTags

- University of Trento: LiquidPub

- Xerox Research: XIP hypothesis identifier

- U Tilburg: ML for Science

- Elsevier, UUtrecht: Discourse analysis of biology

Page 144: Xerox2009

HypER Working Group:

- Goal: Align and expand existing efforts on detection and analysis of Hypotheses, Evidence & Relationships

- Partners:

- Harvard/MGH: SWAN, ARF

- Open University: Cohere

- Oxford University: CiTO, eLearning/Rhetoric

- DERI: SALT, aTags

- University of Trento: LiquidPub

- Xerox Research: XIP hypothesis identifier

- U Tilburg: ML for Science

- Elsevier, UUtrecht: Discourse analysis of biology

Page 145: Xerox2009

HypER Working Group:

- Goal: Align and expand existing efforts on detection and analysis of Hypotheses, Evidence & Relationships

- Partners:

- Harvard/MGH: SWAN, ARF

- Open University: Cohere

- Oxford University: CiTO, eLearning/Rhetoric

- DERI: SALT, aTags

- University of Trento: LiquidPub

- Xerox Research: XIP hypothesis identifier

- U Tilburg: ML for Science

- Elsevier, UUtrecht: Discourse analysis of biology

Page 146: Xerox2009

HypER Working Group:

- Goal: Align and expand existing efforts on detection and analysis of Hypotheses, Evidence & Relationships

- Partners:

- Harvard/MGH: SWAN, ARF

- Open University: Cohere

- Oxford University: CiTO, eLearning/Rhetoric

- DERI: SALT, aTags

- University of Trento: LiquidPub

- Xerox Research: XIP hypothesis identifier

- U Tilburg: ML for Science

- Elsevier, UUtrecht: Discourse analysis of biology

Page 147: Xerox2009

HypER Working Group:

- Goal: Align and expand existing efforts on detection and analysis of Hypotheses, Evidence & Relationships

- Partners:

- Harvard/MGH: SWAN, ARF

- Open University: Cohere

- Oxford University: CiTO, eLearning/Rhetoric

- DERI: SALT, aTags

- University of Trento: LiquidPub

- Xerox Research: XIP hypothesis identifier

- U Tilburg: ML for Science

- Elsevier, UUtrecht: Discourse analysis of biology

Page 148: Xerox2009

HypER Working Group:

- Goal: Align and expand existing efforts on detection and analysis of Hypotheses, Evidence & Relationships

- Partners:

- Harvard/MGH: SWAN, ARF

- Open University: Cohere

- Oxford University: CiTO, eLearning/Rhetoric

- DERI: SALT, aTags

- University of Trento: LiquidPub

- Xerox Research: XIP hypothesis identifier

- U Tilburg: ML for Science

- Elsevier, UUtrecht: Discourse analysis of biology

Hypothesis 22: Intramembrenous Aβ dimer may be toxic.

Derived from: POSTAT_CONTRIBUTION(This essay explores the possibility that a fraction of these Abeta peptides never leave the membrane lipid bilayer after they are generated, but instead exert their toxic effects by competing with and compromising the functions of intramembranous segments of membrane-bound proteins that serve many critical functions.

Page 149: Xerox2009

HypER Activities: http://hyper.wik.is

Page 150: Xerox2009

HypER Activities: http://hyper.wik.is

Current activities:

- Aligning discourse ontologies: joint task with W3C HCLSSig

- Aligning architectures to exchange hypotheses + evidence

- Format for a rhetorical conference paper (SALT + abcde)

- Parser test of hypothesis identification tools on pharmacology corpus

Page 151: Xerox2009

HypER Activities: http://hyper.wik.is

Current activities:

- Aligning discourse ontologies: joint task with W3C HCLSSig

- Aligning architectures to exchange hypotheses + evidence

- Format for a rhetorical conference paper (SALT + abcde)

- Parser test of hypothesis identification tools on pharmacology corpus

Further interests:

- Better structure of evidence: MyExperiment, KeFeD, ...

- Granularity of annotation/access: entity, hypothesis, discussion?

Page 152: Xerox2009

Conclusion

Page 153: Xerox2009

Conclusion

Problem: too much discourse, tools are not yet good enough...

Page 154: Xerox2009

Conclusion

Problem: too much discourse, tools are not yet good enough...

1. First attempt: allow authors to validate entities - pursue

Page 155: Xerox2009

Conclusion

Problem: too much discourse, tools are not yet good enough...

1. First attempt: allow authors to validate entities - pursue

2. Second attempt: discourse analysis - any help is great!

Page 156: Xerox2009

Conclusion

Problem: too much discourse, tools are not yet good enough...

1. First attempt: allow authors to validate entities - pursue

2. Second attempt: discourse analysis - any help is great!

3. Third attempt: collaboration to identify hypotheses: do join!

Page 158: Xerox2009

References

Hyland, K. (2004). Disciplinary Discourses: Social Interactions in Academic Writing, Addison Wesley Publishing Company, 2004.

Latour, B., and Woolgar, S. (1986). Laboratory Life: The Construction of Scientific Facts. 2nd ed. Princeton, NJ: Princeton University Press, 1986. ISBN: 9780691028323.

Latour, B. (1987). Science in Action, How to Follow Scientists and Engineers through Society, (Cambridge, Ma.: Harvard University Press, 1987)

Page 159: Xerox2009

Segmentation Criteria (summary)

Finite/Non-finite

Grammatical role Segment? Example

Finite/Non-finite Subject N The extent to which miRNAs specifically affect metastasis

Finite/Non-finite Direct Object Y these miRNAs are potential novel oncogenes

Nonfinite Phrase-level adjunct (restrictive and non-restrictive)

N spanning a given miRNA genomic region

Nonfinite Clause-level adjunct Y by cloning eight miR-Vec plasmids

Finite Non-restrictive Phrase-level adjunct Y which is only active when tamoxifen is added (De Vita et al, 2005) […]

Finite Restrictive Phrase-level adjunct N that we examined

Finite Clause-level adjunct Ywhich correlates with the reported ES-cell expression pattern of the miR-371-3 cluster (Suh et al, 2004)

Page 160: Xerox2009

Basic Segment TypesSegment Description Example

Fact a known fact, generally without explicit citation

mature miR-373 is a homolog of miR-372

Hypothesis a proposed idea, not supported by evidence This could for instance be a result of high mdm2 levels

Problem unresolved, contradictory, or unclear issue

However, further investigation is required to demonstrate the exact mechanism of LATS2 action

Goal research goal To identify novel functions of miRNAs,

Method experimental method Using fluorescence microscopy and luciferase assays,

Result a restatement of the outcome of an experiment

all constructs yielded high expression levels of mature miRNAs

Implication an interpretation of the results, in light of earlier

hypotheses and facts

our procedure is sensitive enough to detect mild growth differences

Page 161: Xerox2009

Two Types of Derived Segment Types

Page 162: Xerox2009

Two Types of Derived Segment Types

‘Other-segments’, related to (referenced) other work:

Page 163: Xerox2009

Two Types of Derived Segment Types

‘Other-segments’, related to (referenced) other work:

- other-result: ‘they are also found in the FCX and other cortical structures ([Sokoloff et al., 1990]’

Page 164: Xerox2009

Two Types of Derived Segment Types

‘Other-segments’, related to (referenced) other work:

- other-result: ‘they are also found in the FCX and other cortical structures ([Sokoloff et al., 1990]’

- other-goal: ‘the role of D3 receptors in the control of motivation and affect has been intensively studied [Heidbreder et al., 2005]’

Page 165: Xerox2009

Two Types of Derived Segment Types

‘Other-segments’, related to (referenced) other work:

- other-result: ‘they are also found in the FCX and other cortical structures ([Sokoloff et al., 1990]’

- other-goal: ‘the role of D3 receptors in the control of motivation and affect has been intensively studied [Heidbreder et al., 2005]’

- other-implication: ‘D1 or, more likely, D5, receptors have been implicated in mechanisms underlying long-term spatial memory [Hersi et al., 1995]’

Page 166: Xerox2009

Two Types of Derived Segment Types

‘Other-segments’, related to (referenced) other work:

- other-result: ‘they are also found in the FCX and other cortical structures ([Sokoloff et al., 1990]’

- other-goal: ‘the role of D3 receptors in the control of motivation and affect has been intensively studied [Heidbreder et al., 2005]’

- other-implication: ‘D1 or, more likely, D5, receptors have been implicated in mechanisms underlying long-term spatial memory [Hersi et al., 1995]’

Regulatory segments, acting as matrix sentences framing other segments:

Page 167: Xerox2009

Two Types of Derived Segment Types

‘Other-segments’, related to (referenced) other work:

- other-result: ‘they are also found in the FCX and other cortical structures ([Sokoloff et al., 1990]’

- other-goal: ‘the role of D3 receptors in the control of motivation and affect has been intensively studied [Heidbreder et al., 2005]’

- other-implication: ‘D1 or, more likely, D5, receptors have been implicated in mechanisms underlying long-term spatial memory [Hersi et al., 1995]’

Regulatory segments, acting as matrix sentences framing other segments:

- reg-hypothesis: ‘we hypothesized that ’

Page 168: Xerox2009

Two Types of Derived Segment Types

‘Other-segments’, related to (referenced) other work:

- other-result: ‘they are also found in the FCX and other cortical structures ([Sokoloff et al., 1990]’

- other-goal: ‘the role of D3 receptors in the control of motivation and affect has been intensively studied [Heidbreder et al., 2005]’

- other-implication: ‘D1 or, more likely, D5, receptors have been implicated in mechanisms underlying long-term spatial memory [Hersi et al., 1995]’

Regulatory segments, acting as matrix sentences framing other segments:

- reg-hypothesis: ‘we hypothesized that ’

- reg-implication: ‘These observations suggest that’

Page 169: Xerox2009

Two Types of Derived Segment Types

‘Other-segments’, related to (referenced) other work:

- other-result: ‘they are also found in the FCX and other cortical structures ([Sokoloff et al., 1990]’

- other-goal: ‘the role of D3 receptors in the control of motivation and affect has been intensively studied [Heidbreder et al., 2005]’

- other-implication: ‘D1 or, more likely, D5, receptors have been implicated in mechanisms underlying long-term spatial memory [Hersi et al., 1995]’

Regulatory segments, acting as matrix sentences framing other segments:

- reg-hypothesis: ‘we hypothesized that ’

- reg-implication: ‘These observations suggest that’

- intratextual: ‘Fig 4 shows that’

Page 170: Xerox2009

Two Types of Derived Segment Types

‘Other-segments’, related to (referenced) other work:

- other-result: ‘they are also found in the FCX and other cortical structures ([Sokoloff et al., 1990]’

- other-goal: ‘the role of D3 receptors in the control of motivation and affect has been intensively studied [Heidbreder et al., 2005]’

- other-implication: ‘D1 or, more likely, D5, receptors have been implicated in mechanisms underlying long-term spatial memory [Hersi et al., 1995]’

Regulatory segments, acting as matrix sentences framing other segments:

- reg-hypothesis: ‘we hypothesized that ’

- reg-implication: ‘These observations suggest that’

- intratextual: ‘Fig 4 shows that’

- intertextual: ‘reviewed in (Serrano, 1997)’

Page 171: Xerox2009

My categories vs. Latour (1979)

Page 172: Xerox2009

Linguistic and structural properties

Page 173: Xerox2009

Linguistic and structural properties

Page 174: Xerox2009

Linguistic and structural properties1. Position in text

Page 175: Xerox2009

Linguistic and structural properties1. Position in text

- Section of the paper (Introduction, Results, Discussion)

Page 176: Xerox2009

Linguistic and structural properties1. Position in text

- Section of the paper (Introduction, Results, Discussion)

- Beginning/middle/end of section

Page 177: Xerox2009

Linguistic and structural properties1. Position in text

- Section of the paper (Introduction, Results, Discussion)

- Beginning/middle/end of section

- First/second third part of sentence

Page 178: Xerox2009

Linguistic and structural properties1. Position in text

- Section of the paper (Introduction, Results, Discussion)

- Beginning/middle/end of section

- First/second third part of sentence

2. Verb:

Page 179: Xerox2009

Linguistic and structural properties1. Position in text

- Section of the paper (Introduction, Results, Discussion)

- Beginning/middle/end of section

- First/second third part of sentence

2. Verb:

- Tense, aspect, voice

Page 180: Xerox2009

Linguistic and structural properties1. Position in text

- Section of the paper (Introduction, Results, Discussion)

- Beginning/middle/end of section

- First/second third part of sentence

2. Verb:

- Tense, aspect, voice

- Verb class (idiosyncratic)

Page 181: Xerox2009

Linguistic and structural properties1. Position in text

- Section of the paper (Introduction, Results, Discussion)

- Beginning/middle/end of section

- First/second third part of sentence

2. Verb:

- Tense, aspect, voice

- Verb class (idiosyncratic)

- Lexicon

Page 182: Xerox2009

Linguistic and structural properties1. Position in text

- Section of the paper (Introduction, Results, Discussion)

- Beginning/middle/end of section

- First/second third part of sentence

2. Verb:

- Tense, aspect, voice

- Verb class (idiosyncratic)

- Lexicon

3. Metadiscourse markers [Hyland, 2003]:

Page 183: Xerox2009

Linguistic and structural properties1. Position in text

- Section of the paper (Introduction, Results, Discussion)

- Beginning/middle/end of section

- First/second third part of sentence

2. Verb:

- Tense, aspect, voice

- Verb class (idiosyncratic)

- Lexicon

3. Metadiscourse markers [Hyland, 2003]:

- Connectives

Page 184: Xerox2009

Linguistic and structural properties1. Position in text

- Section of the paper (Introduction, Results, Discussion)

- Beginning/middle/end of section

- First/second third part of sentence

2. Verb:

- Tense, aspect, voice

- Verb class (idiosyncratic)

- Lexicon

3. Metadiscourse markers [Hyland, 2003]:

- Connectives

- Endophorics, Evidentials

Page 185: Xerox2009

Linguistic and structural properties1. Position in text

- Section of the paper (Introduction, Results, Discussion)

- Beginning/middle/end of section

- First/second third part of sentence

2. Verb:

- Tense, aspect, voice

- Verb class (idiosyncratic)

- Lexicon

3. Metadiscourse markers [Hyland, 2003]:

- Connectives

- Endophorics, Evidentials

- Hedges, Boosters

Page 186: Xerox2009

Linguistic and structural properties1. Position in text

- Section of the paper (Introduction, Results, Discussion)

- Beginning/middle/end of section

- First/second third part of sentence

2. Verb:

- Tense, aspect, voice

- Verb class (idiosyncratic)

- Lexicon

3. Metadiscourse markers [Hyland, 2003]:

- Connectives

- Endophorics, Evidentials

- Hedges, Boosters

- Person markers

Page 187: Xerox2009

Verb class

Page 188: Xerox2009

Verb classTwo types of entities interact in biology texts:

- Thing:

- Thing -> Increase, die, etc

- Thing-thing: affect, stimulate etc.

- People:

- People -> Thing:

- Examine (Goal)

- Operate (Method)

- Observe (Result)

- Implicate (Implication)

- People - people: Report

Page 189: Xerox2009

Interpretation: 3 Realms of Science:

Experimental realm

Data realm

Conceptual realm

Page 190: Xerox2009

Interpretation: 3 Realms of Science:

Experimental realm

Data realm

Conceptual realm

(1) Oncogene-induced senescence is characterized by the appearance of cells with a flat morphology that express senescence associated (SA)-

-Galactosidase.

(4b) transduction with either miR-Vec-371&2 or miR-Vec-373 prevents RASV12-induced growth arrest in primary human cells.

(2b) control RASV12-arrested cells showed relatively high abundance of flat cells expressing SA- -Galactosidase

(3b) very few cells showed senescent morphology when transduced with either miR-Vec-371&2, miR-Vec-373, or control p53kd.

(4a) Altogether, these data show that

(3a) Consistent with the cell growth assay,

(Figures)

(2a) Indeed,

(2c) (Figures 2G and 2H).

Page 191: Xerox2009

Interpretation: 3 Realms of Science:

Experimental realm

Data realm

Conceptual realm

(1) Oncogene-induced senescence is characterized by the appearance of cells with a flat morphology that express senescence associated (SA)-

-Galactosidase.

(4b) transduction with either miR-Vec-371&2 or miR-Vec-373 prevents RASV12-induced growth arrest in primary human cells.

(2b) control RASV12-arrested cells showed relatively high abundance of flat cells expressing SA- -Galactosidase

(3b) very few cells showed senescent morphology when transduced with either miR-Vec-371&2, miR-Vec-373, or control p53kd.

(4a) Altogether, these data show that

(3a) Consistent with the cell growth assay,

(Figures)

(2a) Indeed,

(2c) (Figures 2G and 2H).

Page 192: Xerox2009

Tense 1: Concepts vs. Experiment

(1) Oncogene-induced senescence is characterized by the appearance of cells with a flat morphology that express senescence associated (SA)-

-Galactosidase.

(4b) transduction with either miR-Vec-371&2 or miR-Vec-373 prevents RASV12-induced growth arrest in primary human cells.

(2b) control RASV12-arrested cells showed relatively high abundance of flat cells expressing SA- -Galactosidase

(3b) very few cells showed senescent morphology when transduced with either miR-Vec-371&2, miR-Vec-373, or control p53kd.

(4a) Altogether, these data show that

(3a) Consistent with the cell growth assay,

(Figures)

(2a) Indeed,

Expe

r ime n

t al r

ealm

( p

ers o

nal,

past

) D

a ta

real

m

(non

tver

bal)

(2c) (Figures 2G and 2H).

Con

c ep t

real

m

Page 193: Xerox2009

Tense 2: Referral

Introduction

Current work (= Results section)

After current work: past

Before current work: present

Discussion

present past future

othe

r pap

ers

own

pap e

r

Other Work

After other work: past

Page 194: Xerox2009

Tense 1+ 2 = 3:

Reading time

Expe

rient

ial

Con

cept

ual

past present future

Claim, fact

Experiment