Upload
anita-de-waard
View
541
Download
6
Tags:
Embed Size (px)
DESCRIPTION
Talk given at Detecting Structure in Scholarly Discourse workshop, ACL 2012 - http://www.nactem.ac.uk/dssd/programme.php
Citation preview
Epistemic Modality and Knowledge A5ribu9on: Types and Features
Anita de Waard, Elsevier Labs Henk Pander Maat, UiL-‐OTS, Utrecht University
July 12, 2012 DSSD-‐2012, ACL Jeju
Epistemic Modality and Knowledge A5ribu9on:
Introduc9on: – Why is epistemic modality interes9ng? – Research ques9ons – Some related work in genre studies, linguis9cs, CL
Methods and Results: – A taxonomy of types and markers – In defense of the clause as a unit of thought – A small corpus study
Conclusions and Applica9ons: – Connec9ng formal representa9ons to text – A corpus of cita9ons – Did this answer our research ques9ons?
Latour, 1987: “[Y]ou can transform a fact into fic9on or a fic9on into fact just by adding or subtrac9ng references”
Introduc9on | Methods and Results | Conclusions and Applica9ons
How a claim becomes a fact: • Voorhoeve et al., 2006: “These miRNAs neutralize p53-‐ mediated CDK
inhibi9on, possibly through direct inhibi9on of the expression of the tumor suppressor LATS2.”
• Kloosterman and Plasterk, 2006: “In a gene9c screen, miR-‐372 and miR-‐373 were found to allow prolifera9on of primary human cells that express oncogenic RAS and ac9ve p53, possibly by inhibi9ng the tumor suppressor LATS2 (Voorhoeve et al., 2006).”
• Yabuta et al., 2007: “[On the other hand,] two miRNAs, miRNA-‐372 and-‐373, func9on as poten1al novel oncogenes in tes9cular germ cell tumors by inhibi9on of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).”
• Okada et al., 2011: “Two oncogenic miRNAs, miR-‐372 and miR-‐373, directly inhibit the expression of Lats2, thereby allowing tumorigenic growth in the presence of p53 (Voorhoeve et al., 2006).”
Introduc9on | Methods and Results | Conclusions and Applica9ons
Research Ques9ons: 1. Can we find a model for epistemic evalua1on and knowledge a5ribu9on to describe all biological statements in a straighhorward way?
2. If yes: can we detect this evalua9on -‐ manually, and automa9cally?
3. Is this model useful for examining the mechanism of ‘hedging erosion’, does it show how a claim becomes validated aier being cited?
Introduc9on | Methods and Results | Conclusions and Applica9ons
Related work: Genre Studies • Why do authors hedge? – Make a claim ‘pending […] acceptance in the community’ (Myers, 1989)
– ‘Create A Research Space’ – hedging allows authors to insert themselves into the discourse in a community (Swales, 1990)
– ‘the strongest claim a careful researcher can make’ (Salager-‐Meyer, 1994)
– Types: writer-‐oriented, accuracy-‐oriented and reader-‐oriented hedges (Hyland, 1994)
Introduc9on | Methods and Results | Conclusions and Applica9ons
Related work: Linguis9cs • How do authors hedge? – ‘Modifiers of Proposi9onal Content’ -‐ kind, degree and source (Hengeveld/Mackenzie, 2008)
– Type of hypotaxis: projec9on vs. embedding/expanding (e.g. Halliday & Ma5hiessen, 2004)
– Cogni9ve linguis9cs: ‘grounding elements […] establish an epistemic rela9onship between the ground and the profiled thing…’ (Langacker, 2008)
– E.g. finite complements make ‘The subject become(s) the object’ (Verhagen, 2007), foregrounding the author: ‘we hypothesized that nuclear proteins bind to exon 1’
Introduc9on | Methods and Results | Conclusions and Applica9ons
Related work: CL • How do we find hedges? – Hedging cues, specula9ve language, modality/nega9on
(very small selec9on – see many more, e.g. by Teufel Morante, Sporleder, others!): • (Light et al, 2004): finding specula9ve language • (Wilbur et al, 2006): focus, polarity, certainty, evidence, and direc9onality
• (Thompson et al, 2008): level of specula9on, type/source of the evidence and level of certainty
– Sen9ment detec9on (e.g. Kim and Hovy, 2004 a.m.o.): • Holder of the opinion, strength, polarity as ‘mathema9cal func9on’ ac9ng on main proposi9onal content
• S(P) has different a5ributes: strength, polarity, source, etc.
Introduc9on | Methods and Results | Conclusions and Applica9ons
Proposal: taxonomy of epistemic evalua9on/knowledge a5ribu9on
For a Proposi9on P, an epistemically marked clause E is an Evalua9on of P, EV, B, S(P), with:
V = Value: 3 = Assumed true, 2 = Probable, 1 = Possible, 0 = Unknown, (-‐ 1= possibly untrue, -‐ 2 = probably untrue, -‐3 = assumed untrue)
B = Basis: Reasoning Data
S = Source: A = speaker is author A, explicit IA = speaker author, A, implicit N = other author N, explicit NN = other author NN, implicit Introduc9on | Methods and Results | Conclusions and Applica9ons
Concept Values Example Value 0 -‐ Lack of knowledge: Thus, it remains to be determined if...
1 – Hypothe9cal: low certainty GATA-‐1 binding to exon 1 may affect transcrip1on start site func1on
2 – Dubita9ve: higher likelihood but short of complete certainty
sugges0ng the presence of lineage-‐specific elements.
3 – Doxas9c: complete certainty, accepted/known/proven fact
the 1.6 kb 5' flanking region of CCR3 has promoter ac1vity in vivo.
Basis R – Reasoning Therefore, one can argue…
D – Data These results suggest… 0 – Uniden9fied Studies report that…
Source A -‐ Author: Explicit men9on of author/current paper as source
We hypothesize that… Fig 2a shows that…
N -‐ Named external source, either explicitly or as a reference
…several reports have documented this expression [11-‐16,42].
IA -‐ Implicit a5ribu9on to the author Electrophore0c mobility shiB analysis revealed that…
NN – Nameless external source no eosinophil-‐specific transcrip1on factors have been reported…
0 – No source of knowledge transcrip1on factors are the final common pathway driving differen1a1on
Some examples:
Epistemic Markers • Modal auxiliary verbs (e.g. can, could, might) • Qualifying adverbs and adjec9ves (e.g. interes1ngly, possibly, likely, poten1al, somewhat, slightly, powerful, unknown, undefined)
• References, either external (e.g. ‘[Voorhoeve et al., 2006]’) or internal (e.g. ‘See fig. 2a’).
• Repor9ng/epistemic verbs (e.g. suggest, imply, indicate, show) – either within the clause: ‘These results suggest that...’ – or in a subordinate clause governed by repor9ng-‐verb matrix clause ‘{These results suggest that} indeed, this represents the true endogenous ac1vity.’
Introduc9on | Methods and Results | Conclusions and Applica9ons
In defense of the clause as a unit of thought:
• Argumenta9ve zoning: several sentences
• Bio-‐events: supra-‐ to sub-‐senten9al
• CORE-‐SC: sentence • My discourse segments: clause – Elementary Discourse Units (EDU)
Introduc9on | Methods and Results | Conclusions and Applica9ons
Voorhoeve et al., (2006): 1. Importantly, our results so far indicate that the expression of
miR-‐372&3 did not reduce the ac9vity of RASV12, as these cells were s9ll growing faster than normal cells and were tumorigenic, for which RAS ac9vity is indispensable (Hahn et al, 1999 and Kolfschoten et al, 2005).
2. To shed more light on this aspect, we examined the effect of miR-‐372&3 expression on p53 ac9va9on in response to oncogenic s9mula9on.
3. We used for this experiment BJ/ET cells containing p14ARFkd because, following RASV12 treatment, in those cells p53 is s9ll ac9vated but more clearly stabilized than in parental BJ/ET cells (Voorhoeve and Agami, 2003), resul9ng in a sensi9zed system for slight altera9ons in p53 in response to RASV12.
4. Figure 4A shows that following RASV12 s9mula9on, p53 was stabilized and ac9vated, and its target gene, p21cip1, was induced in all cases, indica9ng an intact p53 pathway in these cells.
• More than one ‘thought unit’ per sentence. • Verb tense changes within sentence (several 9mes). • A5ribu9on, ac9ons/states, and preposi9ons all contained within a sentence.
Introduc9on | Methods and Results | Conclusions and Applica9ons
1. Importantly, our results so far indicate that the expression of miR-‐372&3 did not reduce the ac9vity of RASV12, as these cells were s9ll growing faster than normal cells and were tumorigenic, for which RAS ac9vity is indispensable (Hahn et al, 1999 and Kolfschoten et al, 2005).
2. To shed more light on this aspect, we examined the effect of miR-‐372&3 expression on p53 ac9va9on in response to oncogenic s9mula9on.
3. We used for this experiment BJ/ET cells containing p14ARFkd because, following RASV12 treatment, in those cells p53 is s9ll ac9vated but more clearly stabilized than in parental BJ/ET cells (Voorhoeve and Agami, 2003), resul9ng in a sensi9zed system for slight altera9ons in p53 in response to RASV12.
4. Figure 4A shows that following RASV12 s9mula9on, p53 was stabilized and ac9vated, and its target gene, p21cip1, was induced in all cases, indica9ng an intact p53 pathway in these cells.
Head: premise, mo9va9on, a5ribu9on (matrix clause)
Middle: main biological statement
End: interpreta9on, elabora9on, a5ribu9on (reference)
Voorhoeve et al., (2006):
Introduc9on | Methods and Results | Conclusions and Applica9ons
1. Importantly, our results so far indicate that the expression of miR-‐372&3 did not reduce the ac9vity of RASV12, as these cells were s9ll growing faster than normal cells and were tumorigenic, for which RAS ac9vity is indispensable (Hahn et al, 1999 and Kolfschoten et al, 2005).
2. To shed more light on this aspect, we examined the effect of miR-‐372&3 expression on p53 ac9va9on in response to oncogenic s9mula9on.
3. We used for this experiment BJ/ET cells containing p14ARFkd because, following RASV12 treatment, in those cells p53 is s9ll ac9vated but more clearly stabilized than in parental BJ/ET cells (Voorhoeve and Agami, 2003), resul9ng in a sensi9zed system for slight altera9ons in p53 in response to RASV12.
4. Figure 4A shows that following RASV12 s9mula9on, p53 was stabilized and ac9vated, and its target gene, p21cip1, was induced in all cases, indica9ng an intact p53 pathway in these cells.
Regulatory clause
Fact Goal Method Result Implica9on
Voorhoeve et al., (2006):
Introduc9on | Methods and Results | Conclusions and Applica9ons
Small corpus study: • Marked up of clauses with modality types and markers for one
full-‐text biology paper, 640 clauses (Zimmermann, 2005)
Introduc9on | Methods and Results | Conclusions and Applica9ons
Comments on small corpus study • Very preliminary: one paper and one annotator! • Not always completely clear of value:
– ‘report’ vs. ‘demonstrate’? – ‘Indicate’ vs. ‘show’?
• Some clauses don’t have a modal evalua9on, – e.g. Goal: ‘In order to determine if this region had promoter ac9vity in vivo…’
– Method: ‘Nuclear extracts from AML14.3D10 cells were incubated with the radiolabelled full-‐length CCR3 exon 1 probe…’
• Some9mes modality changes within sentence: – ‘It has been reported that (value =2) the 5' untranslated exons may contain sequences that facilitate transcrip9on of the gene. (value = 1)‘
– In this case, iden9fy at a clausal level Introduc9on | Methods and Results | Conclusions and Applica9ons
Small corpus explora9on, result:
Value Modal Aux
Repor1ng Verb
Ruled by RV
Adverbs/Adjec1ves
References None Total
Total value = 3 1 (0.5%) 81 (40%) 24 (12%) 7 (4%) 41 (20%) 47 (24%) 201 (100%)
Total Value = 2 29 (51%) 23 (40%) 1 (2%) 4(7%) 57 (100%)
Total Value = 1 9 (27%) 11 (33%) 11 (33%) 1 (3%) 1(3%) 33 (100%)
Total Value = 0 9 (64%) 3 (21%) 1 (7%) 1(7%) 14(100%)
Total No Modality 16 (37%) 3 (7%) 0 3(7%) 22(50%) 44 (100%)
Overall Total 10 (2%) 146 (23%) 64 (10%) 10 (2%) 50 (8%) 69 (11%) 640(100%)
Introduc9on | Methods and Results | Conclusions and Applica9ons
Repor9ng verbs vs. epistemic value: Value = 0 (unknown)
establish, (remain to be) elucidated, be (clear/useful), (remain to be) examined/determined, describe, make difficult to infer, report
Value = 1 (hypothe9cal)
be important, consider, expect, hypothesize (5x), give insight, raise possibility that, suspect, think
Value = 2 (probable)
appear, believe, implicate (2x), imply, indicate (12x), play a role, represent, suggest (18x), validate (2x),
Value = 3 (presumed true)
be able/apparent/important /posi9ve/visible, compare (2x), confirm (2x), define, demonstrate (15x), detect (5x), discover, display (3x), eliminate, find (3x), iden9fy (4x), know, need, note (2x), observe (2x), obtain (success/results-‐ 3x), prove to be, refer, report(2x), reveal (3x), see(2x), show(24x), study, view
Introduc9on | Methods and Results | Conclusions and Applica9ons
Most prevalent clause type: “These results suggest that...”
Adverb/Connec9ve thus, therefore, together, recently, in summary
Determiner/Pronoun it, this, these, we/our
Adjec9ve previous, future, be\er
Noun phrase data, report, study, result(s); method or reference
Modal form of ‘to be’, may, remain
Adjec9ve o_en, recently, generally
Verb show, obtain, consider, view, reveal, suggest, hypothesize, indicate, believe
Preposi9on that, to
Introduc9on | Methods and Results | Conclusions and Applica9ons
Applica9on: connec9ng text to formal representa9ons
• Add knowledge value/basis/source a5ribute to a bio-‐event, e.g.:
Biological statement with epistemic markup Epistemic evalua1on
Our findings reveal that miR-‐373 would be a poten9al oncogene and it par9cipates in the carcinogenesis of human esophageal cancer by suppressing LATS2 expression.
Value = Probable Source = Author Basis = Data
Further biochemical characteriza9on of hMOBs showed that only hMOB1A and hMOB1B interact with both LATS1 and LATS2 in vitro and in vivo [39].
Value = Presumed true Source = Reference Basis = Data
Moreover, the mechanisms by which tumor suppressor genes are inhibited may vary between tumors.
Value = Possible Source = Unknown Basis = Unknown
Introduc9on | Methods and Results | Conclusions and Applica9ons
E.g. to augment Medscan (Ariadne) Biological statement with Medscan/epistemic markup
MedScan Analysis: Epistemic evalua1on
Furthermore, we present evidence that the secre1on of nesfa0n-‐1 into the culture media was drama9cally increased during the differen9a9on of 3T3-‐L1 preadipocytes into adipocytes (P < 0.001) and aier treatments with TNF-‐alpha, IL-‐6, insulin, and dexamethasone (P < 0.01).
IL-‐6 è NUCB2 (nesfa1n-‐1) Rela9on: MolTransport Effect: Posi9ve CellType: Adipocytes Cell Line: 3T3-‐L1
Value = Probable Source = Author Basis = Data
Introduc9on | Methods and Results | Conclusions and Applica9ons
Or BEL (Biological Exchange Language): Biological statement with BEL/ epistemic markup
BEL representa1on: Epistemic evalua1on
These miRNAs neutralize p53-‐mediated CDK inhibi1on, possibly through direct inhibi1on of the expression of the tumor-‐suppressor LATS2.
Increased abundance of miR-‐372 decreases: Increased ac1vity of TP53 decreases ac1vity of CDK protein family r(MIR:miR-‐372) -‐|(tscript(p(HUGO:Trp53)) -‐| kin(p(PFH:”CDK Family”))) Increased abundance of miR-‐372 decreases abundance of LATS2 r(MIR:miR-‐372) -‐| r(HUGO:LATS2)
Value = Possible Source = Unknown Basis = Unknown
Introduc9on | Methods and Results | Conclusions and Applica9ons
Implementa9on: can we find this in text?
• Work on Claimed Knowledge updates was a first a5empt…
• Probably: – Need be5er clause taggers (e.g. Feng and Hirst, 2012) – Need be5er verb form detec9on – Need more appropriate seman9c verb classes
• Hope to piggyback on bio-‐event detec9on.
Introduc9on | Methods and Results | Conclusions and Applica9ons
Following a claim as it becomes a fact?
• TAC Challenge 2013: find most appropriate cited ‘zones’ in reference papers, given the reference
• With NIST and U Colorado: Create a goal standard: 20 papers in biology with 10 ci9ng papers each
• Perhaps we can trace a trail of 3 ‘genera9ons’ of cita9ons?
• Will allow a first answer to the manifesta9on of fact crea9on
Introduc9on | Methods and Results | Conclusions and Applica9ons
Revisi9ng our Research Ques9ons: 1. Can we find a model for epistemic evalua1on and knowledge
a5ribu9on to describe all biological statements in a straighhorward way?
– This seems to work & agree with previous models 2. If yes: can we detect this evalua9on – manually,
– Seems to be the case, need more annotators and automa9cally? – First experiments seem promising but no conclusions
3. Is this model useful for examining the mechanism of ‘hedging erosion’? – Hopefully, TAC Corpus work will help answer this ques9on? Other corpora?
Introduc9on | Methods and Results | Conclusions and Applica9ons
In summary:
• Epistemic modality marking and knowledge a5ribu9on: – are cri9cal features of scien9fic text; – are manifesta9ons of the objec9fica9on of (scien9fic) subjec9ve experiences;
– can be described by our three-‐part taxonomy and set of markers;
– are instan9ated largely through a small set of markers, mostly prominently in matrix clauses: ‘(deic9c marker) + (repor9ng verb) + that’.
• This model can link formal representa9ons of biological statements to the text, and improve knowledge network models with epistemic values.
Introduc9on | Methods and Results | Conclusions and Applica9ons
Acknowledgements • Thanks to NWO in the Netherlands for the ini9al
research funding • Thanks to Bradley Allen at Elsevier Labs for suppor9ng
my research throughout • Thanks to Eduard Hovy for helping develop a model of
epistemic modality as a mathema9cal func9on • Thanks to Lucy Vanderwende for work on the TAC
Corpus concept • Thanks to Dexter Pra5 for work on the BEL
representa9on • Thanks to Agnes Sandor for the work on CKUs (stay
tuned..)
Introduc9on | Methods and Results | Conclusions and Applica9ons
References • De Waard, A., Pander Maat, H. (2009). Categorizing Epistemic Segment Types in
Biology Research Ar9cles. Wkshp on Linguis9c and Psycholinguis9c Approaches to Text Structuring (LPTS 2009), September 21-‐23, 2009.
• Feng, Vanessa Wei and Hirst, Graeme (2012). Text-‐level discourse parsing with rich linguis9c features, 50th Annual Mee9ng of the Associa9on for Computa9onal Linguis9cs (ACL-‐2012), July, Jeju, Korea
• Hengeveld, K. & Mackenzie, J. L. (2008), Func9onal Discourse Grammar: A Typologically-‐Based Theory of Language Structure. Oxford Univ. Press, 2008.
• Hyland, K. (2005). Stance and engagement: a model of interac9on in academic discourse. Discourse Studies, Vol 7(2): 173–192.
• Kim, S-‐M. Hovy, E.H. (2004). Determining the Sen9ment of Opinions. Proceedings of the COLING conference, Geneva, 2004.
• Latour, B., Woolgar, S. (1979). Laboratory Life: The Social Construc9on of Scien9fic Facts. Beverly Hills: Sage Publica9ons. ISBN 0-‐80-‐390993-‐4.
• Light M., Qiu X.Y., Srinivasan P. (2004). The language of bioscience: facts, specula9ons, and statements in between. BioLINK 2004: Linking Biological Literature, Ontologies and Databases 2004:17-‐24.
• Medlock B., Briscoe T. (2007). Weakly supervised learning for hedge classifica9on in scien9fic literature. ACL 2007:992-‐999.
• Myers, G. (1992). ‘In this paper we report’: Speech acts scien9fic facts, Jnl of Pragmatlcs 17 (1992) 295-‐313
• Salager-‐Meyer, F. (1994), Hedges and Textual Communica9ve Func9on in Medical English Wri5en Discourse, English for Specific Purposes, Vol. 13, No. 2, PP. 149-‐170, 1994.
• Sándor, Á. and de Waard, A (2012). Iden9fying Claimed Knowledge Updates in Biomedical Research Ar9cles, Workshop on Detec9ng Structure in Scholarly Discourse at ACL 2012 (this workshop).
• Thompson P., Venturi G., McNaught J, Montemagni S, Ananiadou S. (2008). Categorising modality in biomedical texts.. LREC 2008: Building and Evalua9ng Resources for Biomedical Text Mining 2008.
• Verhagen, A. (2007), Construc9ons of Intersubjec9vity, Oxford University Press, 2007.
• Vincze, V., Szarvas, Farkas, Móra and Csirik, (2008). The BioScope corpus: biomedical texts annotated for uncertainty, nega9on and their scopes, BMC Bioinforma9cs 2008, 9 (Suppl 11):S9.
• Wilbur W.J., Rzhetsky A, Shatkay H (2006). New direc9ons in biomedical text annota9ons: defini9ons, guidelines and corpus construc9on. BMC Bioinforma9cs 2006, 7:356.