The Impact of Modeling Overall Argumentation with Tree Kernels · 2017-12-03 · pacts argumentation-related analysis tasks. 2 Related Work The study of overall argumentation traces

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2379–2389Copenhagen, Denmark, September 7–11, 2017. c©2017 Association for Computational Linguistics

The Impact of Modeling Overall Argumentation with Tree KernelsHenning Wachsmuth

Bauhaus-Universität WeimarFaculty of Media, Webis Group

[email protected]

Giovanni Da San MartinoQatar Computing Research Institute

Arabic Language [email protected]

Dora KieselBauhaus-Universität WeimarFaculty of Media, VR Group

[email protected]

Benno SteinBauhaus-Universität Weimar

Faculty of Media, Webis [email protected]

Abstract

Several approaches have been proposed tomodel either the explicit sequential struc-ture of an argumentative text or its implicithierarchical structure. So far, the adequacyof these models of overall argumentationremains unclear. This paper asks what typeof structure is actually important to tackledownstream tasks in computational argu-mentation. We analyze patterns in the over-all argumentation of texts from three cor-pora. Then, we adapt the idea of positionaltree kernels in order to capture sequentialand hierarchical argumentative structure to-gether for the first time. In systematic ex-periments for three text classification tasks,we find strong evidence for the impact ofboth types of structure. Our results suggestthat either of them is necessary while theircombination may be beneficial.

1 IntroductionArgumentation theory has established a numberof major argument models focusing on differentaspects, such as the roles of an argument’s units(Toulmin, 1958), the inference scheme of an argu-ment (Walton et al., 2008), or the support and attackrelations between arguments (Freeman, 2011). Thecommon ground of these models is that they con-ceptualize an argument as a conclusion (in terms ofa claim) inferred from a set of pro and con premises(reasons), which in turn may be the conclusions ofother arguments. For the overall argumentation ofa monological argumentative text such as the one inFigure 1(a), this results in an implicit hierarchicalstructure with the text’s main claim at the lowestdepth. In addition, the text has an explicit linguisticstructure that can be seen as a regulated sequence ofspeech acts (van Eemeren and Grootendorst, 2004).

hier

arch

ical

str

uctu

re(a

rgum

enta

tive

dept

h)

main claim(depth 0)

sequential structure (ordering in text)1 2 3 4 5

1

0

2 con argument unit(attack of parent node)

pro argument unit(support of parent node)

[1] The death penalty is a legal means that as such is not practicable in Germany. [2] For one thing, inviolable human dignity is anchored in our constitution, [3] and furthermore no one may have the right to adjudicate upon the death of another human being. [4] Even if many people think that a murderer has already decided on the life or death of another person, [5] this is precisely the crime that we should not repay with the same.

(b)

(a) monological argumentative text

Figure 1: (a) Example text with five argument units,taken from the Arg-Microtexts corpus introduced inSection 3. (b) Graph visualization of the sequentialand hierarchical overall argumentation of the text.

Figure 1(b) illustrates the interplay of the two typesof overall structure in form of a tree-like graph.

Natural language processing research has largelyadopted the outlined hierarchical models for min-ing arguments from text (Stab and Gurevych, 2014;Habernal and Gurevych, 2015; Peldszus and Stede,2016). However, the adequacy of the resulting over-all structure for downstream analysis tasks of com-putational argumentation has rarely been evaluated(see Section 2 for details). In fact, a computationalapproach that can capture patterns in hierarchicaloverall argumentation is missing so far. Even more,our previous work indicates that a sequential modelof overall structure is preferable for analysis taskssuch as stance classification or quality assessment(Wachsmuth and Stein, 2017).

In this paper, we ask and investigate what modelof (monological) overall argumentation is impor-tant to tackle argumentation-related analysis tasks.To this end, we consider three corpora with fully

2379

annotated argument structure (Section 3). Eachcorpus allows studying one text classification task,two of which we hypothesize to benefit from mod-eling argumentation (myside bias, stance), the thirdnot (genre). An empirical analysis of the corporareveals class-specific patterns of how people argue(Section 4). In order to combine the explicit se-quential and the implicit hierarchical structure ofan argumentative text for the first time, we thenadapt the approach of route kernels (Aiolli et al.,2009), modeling overall argumentation in form ofa positional tree (Section 5).

On this basis, we design an experiment to eval-uate the impact of the different types of argumen-tative structure (Section 6). In particular, we de-compose our approach into four complementarymodeling steps, both for a general model of overallargumentation and for the specific models of thegiven corpora. Using the structure annotated in thecorpora, we systematically compare the effective-ness of all eight resulting models and two standardbaselines in the three classification tasks.

Our results provide strong evidence that bothsequential and hierarchical structure are important.As indicated by related work, sequential structurenearly competes with hierarchical structure, at leastbased on the specific argument models. Even moreimpressively, modeling hierarchical structure prac-tically solves the task of identifying argumentationwith myside bias, achieving an outstanding accu-racy of 97.1%. For stance classification, the combi-nation captured by positional trees works best. Incontrast, all types of structure fail in distinguishinggenres, suggesting that they indeed capture proper-ties of argumentation. We conclude that the impactof modeling overall structure on downstream analy-sis tasks is high, while the required type may vary.

Contributions To summarize, the main contribu-tions of this paper are the following:

1. Empirical insights into how people structureargumentative texts in overall terms.

2. The first approach to model and analyze thesequential and hierarchical overall structureof argumentative texts in combination.

3. Evidence that modeling overall structure im-pacts argumentation-related analysis tasks.

2 Related WorkThe study of overall argumentation traces back toAristotle (2007) who outlined the impact of the

sequential arrangement of the different parts of aspeech. Conceptually, theory agrees that a mono-logical argumentative text has an implicit tree-likehierarchical structure: Toulmin (1958) defines anargument as a claim supported by data that is rea-soned by a warrant, which in turn may have a back-ing. In addition, a rebuttal may be given showingexceptions to the claim. The role of support andattack relations is investigated by Freeman (2011)who models dialectical arguments that discuss botha proponent’s and an opponent’s view on the mainclaim argued for. Walton et al. (2008) put the focuson the inference scheme that describes how an argu-ment’s conclusion follows from its premises, whichmay themselves be conclusions of arguments.

In natural language processing, argumentationresearch deals with the mining of argument unitsand their relations from text (Mochales and Moens,2011). Several corpora with annotated argumentstructure have been published in the last years.Many of the corpora adapt the hierarchical mod-els from theory (Reed and Rowe, 2004; Habernaland Gurevych, 2015; Peldszus and Stede, 2016) orpropose comparable models (Stab and Gurevych,2014). Since we target monological overall argu-mentation, we use those that capture the completestructure of texts, as detailed in Section 3. Corporafocusing on dialogical argumentation (Walker et al.,2012), topic-related arguments (Rinott et al., 2015),or sequential structure (Wachsmuth et al., 2014b;Al Khatib et al., 2016) are out of scope.

We do not mine the structure of argumentativetexts, but we exploit the previously mined structureto tackle downstream tasks of computational argu-mentation, namely, to classify the myside bias andstance of texts. For myside bias, Stab and Gurevych(2016) use features derived from discourse struc-ture, whereas Faulkner (2014) and Sobhani et al.(2015) model arguments to classify stance. Onget al. (2014) and we ourselves (Wachsmuth et al.,2016) do similar to assess the quality of persua-sive essays, and Beigman Klebanov et al. (2016)examine how an essay’s content and structure influ-ence quality. Other works predict the outcome oflegal cases based on the applied types of reasoning(Brüninghaus and Ashley, 2003) or analyze infer-ence schemes for given arguments (Feng and Hirst,2011). In contrast to the local structure of singlearguments employed by all these approaches, westudy the impact of the global overall structure ofcomplete monological argumentative texts.

2380

In (Wachsmuth et al., 2017), we point out thatthe argumentation quality of a text is affected byinteractions of its content at different levels of gran-ularity, from single argument units over argumentsto overall argumentation. Stede (2016) exploreshow different depths of overall argumentation canbe identified, observing differences across genres.Unlike in our experiments, however, the genresconsidered there reflect diverging types of argu-mentation. Related to argumentation, Feng et al.(2014) build upon rhetorical structure theory (Mannand Thompson, 1988) to assess the coherence oftexts, while Persing et al. (2010) score the organi-zation of persuasive essays based on sequences ofsentence and paragraph functions.

We introduced the first explicit computationalmodel of overall argumentation in (Wachsmuthet al., 2014a). There, we compared the flow of localsentiment in a review to a set of learned flow pat-terns in order to classify global sentiment. Recently,we generalized the model in order to make flowsapplicable to any type of information relevant forargumentation-related analysis tasks (Wachsmuthand Stein, 2017). However, flows capture only se-quential structure, whereas here we also model thehierarchical structure of overall argumentation. Tothis end, we make use of kernel methods.

Kernel methods are a popular approach for learn-ing on structured data, with several applicationsin natural language processing (Moschitti, 2006b)including argument mining (Rooney et al., 2012).They employ a similarity function defined betweenany two input objects that are represented in a task-specific implicit feature space. The evaluation ofsuch a kernel function relies on the common fea-tures of the input objects (Cristianini and Shawe-Taylor, 2000). The kernel function encodes knowl-edge of the task in the form of these features.

Several kernel functions have been defined forstructured data. To assess the impact of sequentialargumentation, we refer to the function of Mooneyand Bunescu (2006), which computes common sub-sequences of two input sequences. For trees, mostexisting approaches count common subtrees of acertain type (Collins and Duffy, 2001; Moschitti,2006a; Kimura and Kashima, 2012), but they do nottake the ordering of the nodes in the subtrees intoaccount. In contrast, Aiolli et al. (2009) developeda kernel that combines the content of substructureswith the relative positions inside trees, called theroute kernel. Similarly, the tree kernel of Daumé III

and Marcu (2004) includes positional informationfor document compression. For overall argumenta-tion, we decided to use the route kernel in Section 5,as it makes the modeling of the sequential positionsof an argument unit in a text straightforward. Thisallows us to capture both the sequential and thehierarchical structure at the same time. To ourknowledge, no work has done this before.1

Neural networks denote an alternative for learn-ing on structured data. They become particularlyeffective when few prior knowledge about what isimportant to address a task at hand is available, be-cause they can learn any feature representation inprinciple (Goodfellow et al., 2016). Due to this flex-ibility, however, large amounts of data are requiredfor training an effective model, making neural net-works inadequate for the small datasets that allowstudying overall argumentation.

3 Tasks and DatasetsWe seek to study the impact of modeling overallargumentation on downstream tasks without thenoise from argument mining errors. To this end, werely on three ground-truth argument corpora. Eachcorpus is suitable for evaluating one text classifica-tion task and comes with a specific model of overallargumentation, as detailed in the following.

Myside Bias on AAE-v2 The Argument Anno-tated Essays corpus was originally been presentedby Stab and Gurevych (2014). We use version 2 ofthe corpus (available on the website of the authors),which consists of 402 persuasive student essays. Ineach essay, all main claims, claims, and premisesare annotated as such. Each claim has a pro orcon stance towards each instance of the main claim,whereas each premise supports or attacks a claim oranother premise. Thereby, argumentation is mod-eled as one tree structure for each major claim.

Stab and Gurevych (2016) added a myside biasclass to each essay, reflecting whether its argumen-tation is one-sided considering only arguments forthe own stance (251 cases) or not (151 cases).

Stance on Arg-Microtexts The Arg-Microtextscorpus of Peldszus and Stede (2016) contains 112short argumentative texts. They cover 18 differentcontroversial topics and are annotated according toFreeman (2011): Each argument unit takes the roleof the proponent or opponent of a main claim. What

1While extensions of the route kernel idea have been pub-lished later on (Aiolli et al., 2011, 2015), we resort to theoriginal version in this paper for simplicity.

2381

AAE-v2 Arg-Microtexts Web Discourse

Argument units 6089 576 1149Avg. units/text 15.1 5.1 3.4Min. units/text 7 3 0Max. units/text 28 10 16

Arguments 5687 443 560Avg. depth 2.8 2.0 0.6Min. depth 2 1 0Max. depth 5 4 1

Texts 402 112 340

Table 1: Statistics of the argument units and argu-ments in the three corpora analyzed in this paper.

the main claim is follows from a tree-like overallstructure emerging from four types of relations:normal or example support from one unit to another,a rebuttal of units by other units, and undercutterswhere a relation is attacked by another unit.

For 88 texts, the stance towards a specified topicis labeled as pro (46) or con (42). We use theselabels for classification, but we do not access thetopic. This way, stance needs to be identified onlybased on a text itself — a very challenging task.2

Genre on Web Discourse Finally, we considerthe Argument Annotated User-Generated Web Dis-course corpus of Habernal and Gurevych (2015).There, 340 texts are annotated according to a modi-fied version of the specific model of Toulmin (1958)where claims are supported by premises or attackedby rebuttals. Rebuttals in turn may be attacked byrefutations. Besides, emotional units not participat-ing in the actual arguments are marked as pathos.The support and attack relations build up the overallargumentation of a text.

The corpus composes argumentative texts of fourgenres, namely, 5 articles, 216 comments to arti-cles, 46 blog posts, and 73 forum posts. The genreis specified in form of a label for each text. Due tothe low number, we ignore the articles below.

To give an idea of the sequential and hierarchicaloverall structure in each corpus, Table 1 presentsstatistics of the argument units, the arguments (interms of relations between two or more units), andthe depth of the resulting argumentation.

While the size of the given corpora and the va-riety of tasks are limited, the only other availablecorpus with fully annotated argument structure thatwe are aware of is AraucariaDB (Reed and Rowe,

2We do not include the topic, in order not to conflate theimpact of modeling argumentation with the influence of thetopic. The corpus is too small to analyze topic differences.

2004). No downstream task can be tackled on Arau-cariaDB besides inference scheme classification(Feng and Hirst, 2011). As all schemes composea conclusion and a set of premises (without morespecific roles), analyzing overall structure hardlymakes sense, which is why we omit the corpus.

4 Insights into Overall ArgumentationBefore we approach overall argumentation com-putationally, this section analyzes the three givencorpora empirically to provide insights into howpeople argue in overall terms. For this, we unifythe specific corpus models of overall argumentationoutlined above in one general model.

4.1 A Unified View of Overall ArgumentationThe texts in all corpora are segmented into argu-ment units, partly with non-argumentative spans inbetween that we ignore here for lack of relevance.To capture the sequential ordering of the segmenta-tion, we assign a global index to each unit.

As described in Section 3, the specific modelsof all three corpora in the end consider an argu-ment as a composition of one unit serving as theconclusion with one or more units that support orattack the conclusion (the premises). This compo-sition is defined through multiple relations fromone premise to one conclusion each. There is oneexception, namely, the undercutter relations in theArg-Microtexts corpus have a relation as their tar-get. To obtain a unified form in the general model,we modify the undercutters such that they targetthe premise of the undercutted relation.

In all corpora, a premise may be the conclusionof another argument, while no argument unit servesas a premise in multiple arguments. This leads to atree structure for each main claim of the associatedtext. A main claim corresponds to a unit that is nota premise. In AAE-v2 and in Web Discourse, morethan one such unit may exist per text.

Depending on the corpus, the distinction of sup-port and attack is encoded through a specified rela-tion type, a unit’s stance, or both. We unify thesealternatives by modeling the stance of each unit to-wards its parent in the associated tree. This stancecan be derived in all corpora.3 All other unit andrelation types from the specific models are ignored,since there is no clear mapping between them.

3Alternatively, the stance towards the main claim could bemodeled. We decided against this alternative to avoid possiblywrong reinterpretations, e.g., it is unclear whether a unit thatattacks its parent always supports a unit attacked by the parent.

2382

All texts

Texts with myside bias

Texts without myside bias

All texts

Texts with pro stance

Texts with con stance

All texts

Comments

Blog posts

Forum posts

(a) AAE-v2 (b) Arg-Microtexts (c) Web Discourse

0

1

2

3

0

1

2

3

0

1

2

3

0

1

2

3

0

1

2

3

0

1

2

0

1

0

1

0

1

0

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

1 2 3 4 5 6 7

1 2 3 4 5 6

1 2 3 4 5 6

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7

1 2 3 4 5 6 7 8720 21 22

20 21 22

20 21 22

Figure 2: Visualization of the overall argumentation in the three considered corpora based on the introducedgeneral model, averaged over all texts as well over all those texts that belong to a particular class. Left toright: Position in the text. Top to bottom: Depth in graph. Brightness: Inverse relative frequency of eachposition/depth combination in the corpus. Light red to gray: Proportion of argument units with con stance.

General Model As a result, we model the overallargumentation of an argumentative text as a forestof trees. Each node in a tree corresponds to an argu-ment unit. It has an assigned stance (pro or con) aswell as a global index that defines its position in thetext. Each edge defines a relation from a premise(the child node) to a conclusion (the parent node).Each main claim defines the root of a tree.

Figure 1(b) has already illustrated an instance ofthe general model. The general model is slightlyless expressive than the specific models. We evalu-ate in Section 6 to what extent this reduces its usefor tackling argumentation-related analysis tasks.The advantage of the general model is that it allowsa comparison of patterns of overall argumentationacross corpora, as we do in the following.4

4.2 Visualization of Argumentation PatternsBased on the general model, we empirically ana-lyze class-specific patterns of overall argumenta-tion on the three corpora. To this end, we computeone “average graph” for all texts in each completecorpus and one such graph for all texts with a par-ticular class (e.g., for all “no myside bias” textsin case of AAE-v2). In an average graph, eachnode is labeled with the relative frequency of theassociated combination of position and depth inall texts (edges accordingly). We align positions

4Besides, although not in the focus here, we also assumestance to be easier to detect in practice than fine-grained roles.

of different texts based on their start node, due toour observation that the first argument unit over-proportionally often represents the main claim.5 Inaddition the relative frequency, we determine theproportion of con to pro stance for each node.

As we aim to provide intuitive insights into howpeople argue in overall terms, we discuss the graphsin an informal visual way instead of listing exactnumbers.6 In the visualizations in Figure 2, bright-ness captures (inverse) frequency, so darker nodesrepresent more frequent argument units. The di-ameter of the inner light-red part of each node re-flects its proportion of con stance. Nodes with arelative frequency below 0.3% and/or an absolutefrequency below 3 are pruned, along with all theirassociated edges.

AAE-v2 Figure 2(a) stresses that most studentsstate the main claim (depth 0, position 1) in a per-suasive essay first. When the first argument unit isa premise of the main claim instead, it often attacksthe main claim, as the large light-red proportion ofthe node at depth 1 and position 1 conveys. While,on average, texts with myside bias do not differ inlength from those without, the latter show morecon stance, especially at depth 1. Also, argumenta-

5We also considered using the main claim as the fix point,but the resulting graphs would be much wider than the longestargumentation, which may be misleading.

6We provide files with the exact frequencies of all nodesand edges at: http://www.arguana.com/software.html

2383

tion without myside bias shows more variance, asindicated, for instance, by the nodes at depth 0 andposition 12 and 13 respectively. In contrast, clearpatterns in the sequential ordering of pro and constance are not recognizable in AAE-v2.

Arg-Microtexts According to the graphs in Fig-ure 2(b), the position of the main claim varies inthe microtexts. While the proportion of con stanceseems rather similar between pro and con texts, ourvisualization reveals that their overall structure is“mirror-inverted” to a limited extent: Most pro textsstart with the main claim (depth 0, position 1), dis-cuss con stance later (red proportions increase tothe right), and deepen the argumentation in a top-down fashion (most edges from top left to bottomright). Vice versa, con texts more often present themain claim later, attack it earlier, and seem to arguemore bottom-up. This suggests that both sequentialand hierarchical structure play a role here.

Web Discourse The web discourse texts, finally,comprise rather shallow argumentation across allgenres. Slight structural differences can be seen,especially, the comments appear a little shorter andricher of pro stance on average. Besides, the blogposts have more con stance later. Still, the darkerand thus more frequent nodes are at similar posi-tions in all graphs. So, if at all, differences may bereflected in a sequential model of argumentation,which implicitly covers length. In terms of the hier-archical structure of the frequent nodes, the graphsof all genres are rather indistinguishable.

Altogether, the visualizations give first supportfor the impact of modeling overall argumentation.In particular, we hypothesize that hierarchical over-all structure is decisive for myside bias, whereasa combination of sequential and hierarchical struc-ture helps to distinguish pro-stance from con-stancetexts. In contrast, we expect that the impact on clas-sifying genres in the Web Discourse corpus is low.

5 Modeling Overall ArgumentationThis section presents our kernel-based approachesfor argumentation-related analysis tasks. They relyon a tree representation of overall argumentation.

5.1 Representation of Overall ArgumentationWe model the overall structure of an argumentativetext in form of a positional tree T = (V,E) that, inprinciple, equals those exemplified in Figure 1 andanalyzed above. Each node v ∈ V represents anargument unit and each edge e = (v1, v2) ∈ E a

relation between two units. Technically, we there-for map the forest of trees representing a text (seeSection 4) to a single tree by adding a “virtual” rootnode v0 to V that is the parent of all tree roots.

In analysis tasks, we seek to compare sequentialand hierarchical structures irrespective of the actualtexts and the size of the associated trees. To thisend, we represent labels and positions as follows:

Labels The tree kernel approaches in natural lan-guage processing discussed in Section 2 includetext (usually words) in the leaf nodes. In contrast,we label each node v ∈ V with the type of the asso-ciated argument unit only. Thereby, we almost fullyabstract from the content of texts, which benefitsthe identification of common structures. In case ofthe general model, the only two labels are pro andcon. In case of the specific models, we combine therole of a unit with the type of the relation the unitis the source of (if any). On Arg-Microtexts, for in-stance, this creates labels such as opponent-supportor opponent-undercutter.

Positions As we adapt the route kernels of Aiolliet al. (2009) below, we follow their representationof sequential structure with one exception. In par-ticular, the authors assigned an index to each edgethat numbers the child nodes of each node ascend-ing from 1. Thereby, they encoded the relativepositions of sibling nodes to each other. To capturethe ordering of argument units in a text from leftto right, we also model positions as indices of theedges in E. Unlike Aiolli et al. (2009), however,we use indices decreasing from -1 in the left direc-tion of the parent node and ascending from 1 tothe right (derived from the nodes’ global indices).While such a simple relabeling allows us to reusetheir algorithm for computing kernels, it makes adecisive difference, namely, it encodes the relativepositions of child nodes to their parent. This in turnimplies the sequential structure of the whole tree.

Figure 3(a) exemplifies the tree representationfor the argument unit types of the general model,omitting the virtual root v0 for simplicity. Analo-gously, the types of the specific models of the threeconsidered corpora could be used.

5.2 Kernel-based Modeling ApproachesBased on the tree representation, we now introducefour approaches for modeling overall argumenta-tion. Figure 3(b) illustrates the kernel representa-tions of each approach. As discussed in Section 2,

2384

-1 1

-1 21pro

pro

pro procon

(a) Tree representation of overall argumentation

Label frequencies (a1)

pro pro propro con

procon

pro

pro pro pro

pro pro conpro con procon pro pro

pro pro con propro con procon pro pro

propro

pro pro con propro con pro pro

propro pro pro con pro pro pro

3x5xcon1x 1x

1x

1x1x1x1x

1x1x1x

1x1x 1x

Label sequences (a2)

pro5x con1x

pro

propro

proprocon

propropro

1x

1x

1x 2x

Label tree paths (a3)

pro

pro

pro

pro

con

1x

1x

1x

-1

pro1

-1

pro

pro

1x

pro1

1

pro

pro

1x

pro1

2

Positional tree paths (a4)

(b) Kernel representations of the tree

Figure 3: (a) Tree representation exemplified forthe general model; node labels are unit types, edgeindices relative positions of child nodes. (b) Kernelrepresentations of the tree for all four approaches.

the associated kernel function compares the repre-sentations of the trees T, T ′ of any two texts.

Label Frequencies (a1) Our simplest model ofoverall argumentation does not encode structureat all. Instead, it compares only the frequenciesof each node label in T and T ′. We represent themodel with a linear kernel, which in the end corre-sponds to a standard feature representation.

Label Sequences (a2) To encode sequential over-all structure, we refer to the kernel of Mooney andBunescu (2006), representing the sequential order-ing of node labels in a tree by all contiguous subse-quences. The similarity of two trees T and T ′ fol-lows from the proportion of common subsequences,but longer subsequences are penalized by a decayfactor. This approach can be seen as an imitation ofour flow model (Wachsmuth and Stein, 2017).7

Label Tree Paths (a3) We capture hierarchicaloverall structure adapting the non-positional part ofthe route kernel of Aiolli et al. (2009), label paths.

7We use a sequence kernel instead of flows in order toobtain a uniform setting. In Wachsmuth and Stein (2017), wealso analyze flow abstractions (e.g., collapsing sequences ofthe same label). Here, we resort only to the original sequence.

A label path ξ(vi, vj) denotes the sequence of la-bels of the nodes in the shortest path between vi, vjin a tree (including vi, vj). Following Aiolli et al.(2009), we consider only label paths starting at theroot vi = v0, abbreviated here as ξ(vj). Implicitly,other paths may still be considered through the useof polynomial kernels with degree d > 1. As theauthors, we compare any two paths with a functionδ whose values is 1 when the paths are identicaland 0 otherwise. Given two trees T = (V,E) andT ′ = (V ′, E′), we then define a normalized poly-nomial kernel Kξ(T, T ′) over all label paths as:(∑

v∈V

∑v′∈V ′

δ(ξ(v), ξ(v′))|V | · |V ′|

)dPositional Tree Paths (a4) In addition to labelpaths, Aiolli et al. (2009) define a route π(vi, vj)as the sequence of edge indices on the shortest pathbetween any two nodes vi, vj in a tree, i.e., the se-quence of local positions. As above, they restricttheir view to routes starting at the root, which wedenote as π(vj), and compare them using δ. Tocombine positional information with label informa-tion, the authors build the product of a kernel basedon the label paths and a kernel based on routes. Asa result, sequential and hierarchical overall struc-ture are compared at the same time. For overallargumentation, we define the resulting normalizedpolynomial product kernel Kξπ(T, T ′) as:(∑

v∈V

∑v′∈V ′

δ(ξ(v), ξ(v′)) · δ(π(v), π(v′))(|V | · |V ′|)2

)dEach approach, a1–a4, can be seen as represent-

ing one particular step of modeling overall argu-mentation; a4 combines the complementary stepsof a2 and a3, both of which implicitly include a1.

6 EvaluationFinally, we evaluate all four approaches to modeloverall argumentation from Section 5 on the threetasks associated to the corpora from Section 3.8

6.1 Experimental Set-upOur goal is to assess the theoretical impact of eachintroduced step of modeling overall argumentationas far as possible. To this end, we conduct a sys-tematic experiment where we use the ground-truthargument structure in each corpus for the associateddownstream task based on the following set-up:

8The Java source code for reproducing the experimentresults is available at: http://www.arguana.com/software.html

2385

Myside Bias on AAE-v2 Stance on Arg-Microtexts Genre on Web Discourse

# Approach General model Specific model General model Specific model General model Specific model

b1 POS n-grams 63.3 63.3 58.8 58.8 74.0 (99.9% >a2) 74.0 (99.9% >a2)

b2 Token n-grams 70.5 (95% >b1) 70.5 (95% >b1) 65.2 (99% >a2) 65.2 75.6 (99.9% >a2) 75.6 (99.9% >a2)

a1 Label frequencies 83.4 (99.9% >b2) 85.7 (99.9% >b2) 49.7 54.4 62.6 (95% >a4) 61.4a2 Label sequences 87.9 (99.9% >b2) 94.7 (99.9% >a1) 52.2 62.3 64.5 (95% >a3) 64.5 (99.9% >a3)

a3 Label tree paths 97.1 (99.9% >a2) 97.1 (95% >a2) 59.8 (95% >a1) 61.9 58.1 55.5a4 Positional tree paths 95.8 (99.9% >a2) 95.6 (99.9% >a1) 66.7 (99% >a2) 67.8 (95% >a1) 53.4 55.2

ba Best bi + Best aj 97.1 (99.9% >a2) 97.1 (95% >a2) 69.8 (99.9% >a2) 71.0 (95% >a1) 75.7 (99.9% >a2) 75.9 (99.9% >a2)

Majority baseline 62.4 62.4 52.3 52.3 64.5 64.5

Table 2: Accuracy in 10-fold cross-validation (10 repetitions, fairness in training) of all evaluatedapproaches on each of the three task/corpus combinations, both based on a general model of argumentsand based on the specific model of the respective corpus. The highest value on each corpus is marked inbold; the best bi and aj in each column are italicized. In parenthesis: The confidence level in percent atwhich the respective approach is significantly better than the specified approach and all worse approaches.

Approaches The modeling steps are reflected bythe approaches a1–a4 from Section 5. For each task,we measure the accuracy of all four approaches.We do this once for our general model of overallargumentation from Section 4 and once for thespecific model annotated in the respective corpus,in order to assess the loss of resorting to our alwaysapplicable general model.

Baselines As a basic task-intrinsic measure, wecompare a1–a4 to the majority baseline that alwayspredicts the majority class in the given corpus. Inaddition, we employ two standard feature types andcombine them with a1–a4, in order to roughly as-sess the need for modeling argumentation:

b1 POS n-grams. The frequency of each part-of-speech 1- to 3-gram found in ≥ 5% of alltexts. This style feature has been effective inargumentation-related analysis tasks (Persingand Ng, 2015; Wachsmuth et al., 2016).

b2 Token n-grams. The frequency of each token1- to 3-gram found in ≥ 5% of all texts. Thiscontent feature is strong in many text analysistasks (Joachims, 1998; Pang et al., 2002).

From the tackled tasks, only myside bias has beenapproached on the given datasets in previous work.While we mention the respective results for com-pleteness below, a comparison is in fact unfair dueto our resort to ground-truth argument structure.

Experiments The evaluation of all approachesand baselines was done using the kernel-based ma-chine learning platform KeLP (Filice et al., 2015),performing classification with the available imple-mentation of LibSVM (Chang and Lin, 2011). As

we target the theoretically possible impact of mod-eling overall argumentation, we tested a numberof hyperparameter configurations.9 We performed10-fold cross-validation on the complete corporaand repeated each experiment 10 times, with in-stance shuffling in between. Then, we averaged theaccuracy of each configuration over all folds andrepetitions. To prevent the classifiers from usingknowledge about the class distributions, we usedfairness during training, i.e., each class was givenan equal weight (Filice et al., 2014). Thus, themajority baseline is not a trivial competitor.

6.2 ResultsTable 2 presents the best obtained results of eachevaluated approach for each task/corpus combina-tion. To clarify the reliability of the differences be-tween the results, the table includes the confidencelevel (starting at 95%) at which each approach issignificantly better than all weaker approaches ac-cording to a two-tailed paired student’s t-test.10

Myside Bias on AAE-v2 The highest accuracyreported for classifying myside bias is 77.0 (Staband Gurevych, 2016). While the comparability islimited (see above), we see that label frequencies(a1) already achieve 83.4 and 85.7 for the generaland specific model respectively, outperforming allbaselines with 99.9% confidence. Matching the in-sights from Section 4, the sole proportion of attacksthus seems a good predictor of myside bias.

9SVM C parameter: 0.01, 0.1, 1, 10, 100; sequence kerneldecay factor: 0, 0.5, 1; polynomial tree kernel degree: 1, 2, 3.

10While selecting the best result a posteriori gives an upperbound on the true effectiveness, we do this to assess to whatextent each approach captures task-relevant information.

2386

Label sequences (a2) further improve over a1,which underlines that also the sequential positionof con stance and attack relations has an impact. a2

is particular strong under the specific model (94.7).Unlike the general model, this model reflects somehierarchical information via the roles of argumentunits, such as premise. a2 performs only slightlyworse than the label tree paths (a3), indicating thatan adequate sequential model can compete with ahierarchical model, as we hypothesized in previouswork (Wachsmuth and Stein, 2017).

Nevertheless, a3 turns out best on AAE-v2, mostlikely due to its capability to capture the depth atwhich con stance occurs. Considering that no cor-pus annotation is perfect, the outstanding accuracyof 97.1 conveys an important finding: Modeling thetree structure of an argumentation basically solvesthe myside bias task without requiring other fea-tures. Neither the positional tree paths (a4) nor thecombination with token n-grams (ba) can add tothat. Also, there is no difference between the gen-eral and the specific model, underlining that theunit roles in AAE-v2 are implicitly covered by thehierarchical structure in the general model.

Stance on Arg-Microtexts The accuracy resultsfor the given challenging variant of stance classi-fication (see Section 3) are much lower. Underthe general model, the label frequencies (49.7) donot even compete with the majority baseline (52.3).Notable gains are achieved by the label sequencesunder the specific model (62.3), slightly beatingthe label tree paths (61.9). Putting them togetherin the positional tree paths (a4) yields an accuracyof 66.7 and 67.8 respectively; more than the tokenn-grams (b2, 65.2). Combining a4 and b2 in bain turn results in the best observed accuracy value(71.0 on the specific model).

We conclude that both sequential and hierarchi-cal overall structure are important for the distinc-tion of pro from con argumentation, supportingour hypothesis from Section 4. They complementcontent-oriented approaches, such as b2. More-over, the fine-grained unit and relation types of thespecific model annotated in Arg-Microtexts seemuseful, consistently obtaining higher accuracy thanthe general model. Notice, though, that due to thesmall size of the corpus, only few reported gainsare statistically significant, as shown in Table 2.

Genre on Web Discourse Although Section 4has made minor structural differences in Web Dis-course visible, Table 2 shows that a1–a4 all fail in

genre classification: None of them beats the ma-jority baseline (64.5), suggesting that no decisivediscriminative patterns are learned. Both POS andtoken n-grams (b1–b2) significantly outperform a1–a4 at 99.9% confidence. While combining b2 witha2 (ba) minimally increases accuracy from 75.6 to75.9, the results reveal that overall argumentationhardly impacts genre — as hypothesized.

7 ConclusionThis paper provides answers to the question of howthe overall structure of a monological argumen-tative text should be modeled in order to tackledownstream tasks of computational argumentation.We have adopted the idea of including positional in-formation in tree kernels in order to capture the ex-plicit sequential and the implicit hierarchical over-all structure of the text at the same time. In system-atic experiments, we have demonstrated the strongimpact of modeling overall argumentation. Mostimpressively, we have found that hierarchical struc-ture decides about myside bias alone, while thecombination of sequential and hierarchical struc-ture has turned out beneficial for classifying stance.The missing impact on genre supports that the pre-sented approaches actually capture argumentation-related properties of a text.

So far, however, we have restricted our viewto ground-truth argument structure, leaving the in-tegration of computational argument mining ap-proaches to future work. While the noise frommining errors might qualify some of our findings,we also expect that larger corpora will allow usto discover more reliable and discriminative pat-terns. After all, our results underline the generalimportance of modeling overall argumentation.

ReferencesFabio Aiolli, Giovanni Da San Martino, and Alessan-

dro Sperduti. 2009. Route kernels for trees. InProceedings of the 26th Annual International Con-ference on Machine Learning, pages 17–24.

Fabio Aiolli, Giovanni Da San Martino, and Alessan-dro Sperduti. 2011. Extending tree kernels withtopological information. In Proceedings of the 21thInternational Conference on Artificial Neural Net-works - Volume Part I, pages 142–149.

Fabio Aiolli, Giovanni Da San Martino, and Alessan-dro Sperduti. 2015. An efficient topologi-cal distance-based tree kernel. IEEE Transac-tions on Neural Networks and Learning Systems,26(5):1115–1120.

2387

Khalid Al Khatib, Henning Wachsmuth, JohannesKiesel, Matthias Hagen, and Benno Stein. 2016.A news editorial corpus for mining argumentationstrategies. In Proceedings of COLING 2016, the26th International Conference on ComputationalLinguistics: Technical Papers, pages 3433–3443.The COLING 2016 Organizing Committee.

Aristotle. 2007. On Rhetoric: A Theory of Civic Dis-course (George A. Kennedy, translator). ClarendonAristotle series. Oxford University Press.

Beata Beigman Klebanov, Christian Stab, Jill Burstein,Yi Song, Binod Gyawali, and Iryna Gurevych. 2016.Argumentation: Content, structure, and relationshipwith essay quality. In Proceedings of the ThirdWorkshop on Argument Mining (ArgMining2016),pages 70–75. Association for Computational Lin-guistics.

Stefanie Brüninghaus and Kevin D. Ashley. 2003. Pre-dicting outcomes of case based legal arguments. InProceedings of the 9th International Conference onArtificial Intelligence and Law, pages 233–242.

Chih-Chung Chang and Chih-Jen Lin. 2011. LIB-SVM: A library for support vector machines. ACMTransactions on Intelligent Systems and Technology,2(3):27:1–27:27.

Michael Collins and Nigel Duffy. 2001. Convolutionkernels for natural language. In Advances in NeuralInformation Processing Systems 14, pages 625–632.MIT Press.

Nello Cristianini and John Shawe-Taylor. 2000. An In-troduction to Support Vector Machines and OtherKernel-based Learning Methods, 1st edition. Cam-bridge University Press, New York, NY, USA.

Hal Daumé III and Daniel Marcu. 2004. A tree-position kernel for document compression. In Pro-ceedings of the Fourth Document UnderstandingConference.

Frans H. van Eemeren and Rob Grootendorst. 2004. ASystematic Theory of Argumentation: The Pragma-Dialectical Approach. Cambridge University Press,Cambridge, UK.

Adam Robert Faulkner. 2014. Automated Classifica-tion of Argument Stance in Student Essays: A Lin-guistically Motivated Approach with an Applicationfor Supporting Argument Summarization. Disserta-tion, City University of New York.

Vanessa Wei Feng and Graeme Hirst. 2011. Classify-ing arguments by scheme. In Proceedings of the49th Annual Meeting of the Association for Com-putational Linguistics: Human Language Technolo-gies, pages 987–996. Association for ComputationalLinguistics.

Vanessa Wei Feng, Ziheng Lin, and Graeme Hirst.2014. The impact of deep hierarchical discourse

structures in the evaluation of text coherence. In Pro-ceedings of COLING 2014, the 25th InternationalConference on Computational Linguistics: Techni-cal Papers, pages 940–949. Dublin City Universityand Association for Computational Linguistics.

Simone Filice, Giuseppe Castellucci, Danilo Croce,and Roberto Basili. 2014. Effective kernelized on-line learning in language processing tasks. In Pro-ceedings of the 36th European Conference on IR Re-search on Advances in Information Retrieval - Vol-ume 8416, pages 347–358.

Simone Filice, Giuseppe Castellucci, Danilo Croce,and Roberto Basili. 2015. KeLP: A kernel-basedlearning platform for natural language process-ing. In Proceedings of ACL-IJCNLP 2015 SystemDemonstrations, pages 19–24. Association for Com-putational Linguistics and The Asian Federation ofNatural Language Processing.

James B. Freeman. 2011. Argument Structure: Repre-sentation and Theory. Springer.

Ian Goodfellow, Yoshua Bengio, and Aaron Courville.2016. Deep Learning. MIT Press.

Ivan Habernal and Iryna Gurevych. 2015. Exploit-ing debate portals for semi-supervised argumenta-tion mining in user-generated web discourse. In Pro-ceedings of the 2015 Conference on Empirical Meth-ods in Natural Language Processing, pages 2127–2137. Association for Computational Linguistics.

Thorsten Joachims. 1998. Text categorization with su-port vector machines: Learning with many relevantfeatures. In Proceedings of the 10th European Con-ference on Machine Learning, pages 137–142.

Daisuke Kimura and Hisashi Kashima. 2012. Fast com-putation of subpath kernel for trees. In Proceedingsof the 29th International Conference on MachineLearning.

William C. Mann and Sandra A. Thompson. 1988.Rhetorical structure theory: Toward a functional the-ory of text organization. Text, 8(3):243–281.

Raquel Mochales and Marie-Francine Moens. 2011.Argumentation mining. Artificial Intelligence andLaw, 19(1):1–22.

Raymond J. Mooney and Razvan C. Bunescu. 2006.Subsequence kernels for relation extraction. In Ad-vances in Neural Information Processing Systems18, pages 171–178. MIT Press.

Alessandro Moschitti. 2006a. Efficient convolutionkernels for dependency and constituent syntactictrees. In Proceedings of the 17th European Confer-ence on Machine Learning, pages 318–329.

Alessandro Moschitti. 2006b. Making tree kernelspractical for natural language learning. In 11th Con-ference of the European Chapter of the Associationfor Computational Linguistics.

2388

Nathan Ong, Diane Litman, and AlexandraBrusilovsky. 2014. Ontology-based argument min-ing and automatic essay scoring. In Proceedings ofthe First Workshop on Argumentation Mining, pages24–28. Association for Computational Linguistics.

Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan.2002. Thumbs up? Sentiment classification usingmachine learning techniques. In Proceedings of the2002 Conference on Empirical Methods in NaturalLanguage Processing (EMNLP 2002).

Andreas Peldszus and Manfred Stede. 2016. An anno-tated corpus of argumentative microtexts. In Argu-mentation and Reasoned Action: 1st European Con-ference on Argumentation. College Publications.

Isaac Persing, Alan Davis, and Vincent Ng. 2010. Mod-eling organization in student essays. In Proceedingsof the 2010 Conference on Empirical Methods inNatural Language Processing, pages 229–239. As-sociation for Computational Linguistics.

Isaac Persing and Vincent Ng. 2015. Modeling argu-ment strength in student essays. In Proceedingsof the 53rd Annual Meeting of the Association forComputational Linguistics and the 7th InternationalJoint Conference on Natural Language Processing(Volume 1: Long Papers), pages 543–552. Associa-tion for Computational Linguistics.

Chris Reed and Glenn Rowe. 2004. Araucaria: Soft-ware for argument analysis, diagramming and repre-sentation. International Journal of AI Tools, 14:961–980.

Ruty Rinott, Lena Dankin, Carlos Alzate Perez,M. Mitesh Khapra, Ehud Aharoni, and NoamSlonim. 2015. Show me your evidence — An au-tomatic method for context dependent evidence de-tection. In Proceedings of the 2015 Conference onEmpirical Methods in Natural Language Processing,pages 440–450. Association for Computational Lin-guistics.

Niall Rooney, Hui Wang, and Fiona Browne. 2012. Ap-plying kernel methods to argumentation mining. InProceedings of the 25th International FLAIRS Con-ference, pages 272–275.

Parinaz Sobhani, Diana Inkpen, and Stan Matwin.2015. From argumentation mining to stance classifi-cation. In Proceedings of the 2nd Workshop on Ar-gumentation Mining, pages 67–77. Association forComputational Linguistics.

Christian Stab and Iryna Gurevych. 2014. Annotatingargument components and relations in persuasive es-says. In Proceedings of COLING 2014, the 25th In-ternational Conference on Computational Linguis-tics: Technical Papers, pages 1501–1510. DublinCity University and Association for ComputationalLinguistics.

Christian Stab and Iryna Gurevych. 2016. Recogniz-ing the absence of opposing arguments in persuasiveessays. In Proceedings of the Third Workshop onArgument Mining (ArgMining2016), pages 113–118.Association for Computational Linguistics.

Manfred Stede. 2016. Towards assessing depth ofargumentation. In Proceedings of COLING 2016,the 26th International Conference on ComputationalLinguistics: Technical Papers, pages 3308–3317.The COLING 2016 Organizing Committee.

Stephen E. Toulmin. 1958. The Uses of Argument.Cambridge University Press.

Henning Wachsmuth, Khalid Al Khatib, and BennoStein. 2016. Using argument mining to assess theargumentation quality of essays. In Proceedingsof COLING 2016, the 26th International Confer-ence on Computational Linguistics: Technical Pa-pers, pages 1680–1691. The COLING 2016 Orga-nizing Committee.

Henning Wachsmuth, Nona Naderi, Yufang Hou,Yonatan Bilu, Vinodkumar Prabhakaran, Alberd-ingk Tim Thijm, Graeme Hirst, and Benno Stein.2017. Computational argumentation quality assess-ment in natural language. In Proceedings of the 15thConference of the European Chapter of the Associa-tion for Computational Linguistics: Volume 1, LongPapers, pages 176–187. Association for Computa-tional Linguistics.

Henning Wachsmuth and Benno Stein. 2017. A uni-versal model for discourse-level argumentation anal-ysis. Special Section of the ACM Transactions on In-ternet Technology: Argumentation in Social Media,17(3):28:1–28:24.

Henning Wachsmuth, Martin Trenkmann, Benno Stein,and Gregor Engels. 2014a. Modeling review argu-mentation for robust sentiment analysis. In Proceed-ings of COLING 2014, the 25th International Con-ference on Computational Linguistics: Technical Pa-pers, pages 553–564. Dublin City University and As-sociation for Computational Linguistics.

Henning Wachsmuth, Martin Trenkmann, Benno Stein,Gregor Engels, and Tsvetomira Palakarska. 2014b.A review corpus for argumentation analysis. In Pro-ceedings of the 15th International Conference onIntelligent Text Processing and Computational Lin-guistics, Part II, pages 115–127.

Marilyn Walker, Jean Fox Tree, Pranav Anand, RobAbbott, and Joseph King. 2012. A corpus for re-search on deliberation and debate. In Proceedingsof the Eighth International Conference on LanguageResources and Evaluation (LREC-2012), pages 812–817. European Language Resources Association(ELRA).

Douglas Walton, Christopher Reed, and FabrizioMacagno. 2008. Argumentation Schemes. Cam-bridge University Press.

2389

Documents

The Impact of Modeling Overall Argumentation with Tree Kernels · 2017-12-03 · pacts argumentation-related analysis tasks. 2 Related Work The study of overall argumentation traces