Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Interpreting Hierarchical Linguistic Interactions inDNNs
Die Zhang Huilin Zhou Xiaoyi Bao Da Huo
Ruizhao Chen Xu Cheng Hao Zhang Mengyue Wu
Quanshi ZhangShanghai Jiao Tong University
zizhan52,zhouhuilin116,zjbaoxiaoyi,sjtuhuoda,stelledge,xcheng8,1603023-zh,mengyuewu,
Abstract
This paper proposes a method to disentangle and quantify interactions among wordsthat are encoded inside a DNN for natural language processing. We construct atree to encode salient interactions extracted by the DNN. Six metrics are proposedto analyze properties of interactions between constituents in a sentence. Theinteraction is defined based on Shapley values of words, which are considered as anunbiased estimation of word contributions to the network prediction. Our method isused to quantify word interactions encoded inside the BERT, ELMo, LSTM, CNN,and Transformer networks. Experimental results have provided a new perspectiveto understand these DNNs, and have demonstrated the effectiveness of our method.
1 Introduction
Deep neural networks (DNNs) have shown promise in various tasks of natural language processing(NLP), but a DNN is usually considered as a black-box model. In recent years, explaining featuresencoded inside a DNN has become an emerging direction. Based on the inherent hierarchical structureof natural language, many methods use latent tree structures of language to guide the DNN to learninterpretable feature representations [4, 8, 29, 30, 31, 36, 42, 45]. However, the interpretabilityusually conflicts with the discrimination power [1]. There is a considerable gap between pursuing theinterpretability of features and pursuing superior performance.
Therefore, in this study, we aim to explain a trained black-box DNN in a post-hoc manner, so that theexplanation of the DNN does not affect its performance. This is essentially different from previousstudies of designing new network architectures or losses to learn interpretable features, e.g. physicallyembedding tree structures into a DNN.
Given a trained DNN, in this paper, we propose to analyze interactions among input words, whichare used by the DNN to make a prediction. Our method generates a tree structure to objectivelyreflect interactions among words. Mathematically, the interaction of several words is quantified asthe difference of the contribution when these words contribute jointly to the prediction w.r.t. wheneach individual word contributes independently to the prediction. The interaction between words maybring either positive or negative effects on the prediction. For example, the word green and the wordhand in the sentence he is a green hand have a strong and positive interaction, because the wordsgreen and hand contribute to the person’s identity jointly, rather than independently.
Preprint. Under review.
arX
iv:2
007.
0429
8v1
[cs
.CL
] 2
9 Ju
n 20
20
the sun
the sun rises
the sun rises
the sun rises .B([S])
B([Sl])
rises ...
B([Sr])Bbetween
Figure 1: A tree to represent interactions among words.The tree is built to explain a trained DNN. Each leaf node(blue) represents an input word in the sentence. Each non-leaf node encodes the significance of interactions within aconstituent.
The core challenge in this study is to guarantee the objectiveness of the explanation. I.e. the treeneeds to reflect true interactions among words without significant bias. We notice that the Shapleyvalue is widely considered as a unique unbiased estimation of the word contribution [21], whichsatisfies four desirable properties (linearity, dummy, symmetry and efficiency) [10]. Thus, we definethe interaction benefit among words based on the Shapley value. Let us consider a constituentwith m words. φ1, φ2, . . . , φm denote numerical contributions of each word to the prediction ofa DNN, respectively. φall represents the numerical contribution of the entire constituent to theprediction. Hence, B = φall−
∑mi=1 φi measures the interaction benefit of this constituent. IfB > 0,
interactions among these m words have positive effects on the prediction; otherwise, negative effects.Here, φ1, ..., φm, φall can be computed as Shapley values.
Given a trained DNN and an input sentence with n words, Figure 1 shows the tree structure thatreflects word interactions encoded inside the DNN. In the tree, n leaf nodes represent n input words.Each non-leaf node corresponds to a constituent of the input sentence. A parent node connects twochild nodes with significant interaction benefits. We use the parent node to encode interactions amongits child sub-constituents.
More specifically, there are two types of interactions among words, i.e. (1) interactions within aconstituent and (2) interactions between constituents.
• Interactions within a constituent exist among any two or more words in the constituent. Forthe sentence the sun is shining in the sky, interactions within the constituent in the sky consistof interactions among all combinations of words, including interactions (1) between (in, the), (2)between (the, sky), (3) between (in, sky) and (4) among (in, the, sky).
• Interactions between constituents. In the aforementioned sentence, interactions between theconstituent the sun and its adjacent constituent is shining are composed of all potential interactionsamong all combinations of words from the two constituents, including interactions (1) between theand is; (2) between the and shining; (3) between sun and is; (4) between sun and shining; (5) betweenthe and is shining; (6) between sun and is shining; (7) between the sun and is; (8) between the sunand shining; (9) between the sun and is shining.
We use a tree structure to select and encode the most salient interactions among words, in order toreveal the signal processing in a DNN. We further propose additional metrics to diagnose interactionsamong words, e.g. the quantification of interactions within a constituent, the quantification ofinteractions between two adjacent constituents, and ratios of interactions that are modeled andunmodeled by the tree.
Theoretically, our method can be used as a generic tool to analyze DNNs with different architecturesfor various tasks, including the BERT [7], ELMo [25], LSTM [12], CNN [16] and Transformer [39].Experimental results have demonstrated the effectiveness of our method.
Contributions of this paper can be summarized as follows. (1) We propose a method to extract andquantify interactions among words. (2) A tree structure is automatically generated to represent salientinteractions encoded in a DNN. (3) We further design six metrics to analyze interactions, whichprovides new perspectives to understand DNNs.
2 Related Work
• Hierarchical representations of natural language. Many studies integrated hierarchical struc-tures of natural language into DNNs for better representations [9, 36, 41, 42]. Other studies learnedsyntactic parsers [8, 13, 18, 19, 20, 23], although these methods pursued a high parsing accuracy,instead of explaining the DNN. Essentially, the learning of the syntactic parser aimed to make theparser fit syntactic structures defined by people. Nevertheless, the post-hoc explanation of a DNNwas proposed to objectively explain the signal processing in a DNN. In this way, we hope to provide
2
a generic tool to analyze DNNs in a post-hoc manner, without being affected by the subjective bias inhuman annotations.
Learning interpretable DNNs: Some studies designed specific network architectures to learn inter-pretable feature representations, which reflected hierarchical structures of natural language. Chunget al. [5] revised an RNN to generate a hierarchical structure. Shen et al. [30] designed a novelnetwork to automatically capture the latent tree structure of an input sentence.
Post-hoc explanation of DNNs: Another important direction was to explain DNNs using hierarchicalstructures. Yogatama et al. [46] evaluated the ability of various RNNs for natural language tocapture syntactic dependencies. Murdoch et al. [24] estimated contributions of input words to theprediction of an LSTM as well as inter-word relationships.1 Singh et al. [33] generated a treestructure to explain the predictions of a DNN. Reif et al. [27] found that the attention matrices inBERT contained syntactic representations. Raganato and Tiedemann [26] exploited attention weightsof the Transformer to analyze what kind of linguistic information was learned by the encoder ofthe model. Jin et al. [15] provided hierarchical explanations by quantifying the importance of eachword or phrase. Voita et al. [40] studied the evolution of token representations across layers in theTransformer under different learning objectives. Lundberg and Lee [21] proposed SHAP value toassign each feature an importance value for a prediction. Simonyan et al. [32] visualized saliencymaps for the class prediction to understand deep CNNs.
Unlike above studies of estimating importance/attribution/contribution/saliency of inputs, we focus oninteractions among words encoded inside DNNs. Chen and Jordan [3] used a “predefined” syntacticconstituency structure to assign an importance score to each word in a sentence, and to quantifyinteractions2 between sibling nodes on a parse tree, instead of learning the linguistic structure. Janizeket al. [14] explained pairwise feature interactions by extending the Integrated Gradients explanationmethod. Lundberg et al. [22] defined the SHAP interaction values to quantify interaction effectsbetween two features. Cui et al. [6] estimated global pairwise interactions from a trained Bayesianneural network. Tsang et al. [37] detected statistical interactions from the weights of feedforwardneural networks. An ensemble tree-based method [35] was proposed to detect variable interactions.It compared the predictive performance of two regression trees, one with interactions between twovariables of interest, and the other with the absence of the interactions. The neural interactiontransparency framework [38] was presented to separate feature interactions by way of regularization,and could only be applied to fully connected vanilla multi-layer perceptrons. Greenside et al. [11]identified interactions between all pairs of discrete features in an input DNA sequence. However,these studies mainly focus on interactions between two variables [6, 11, 14, 22] or are limited tospecific network architectures [35, 37, 38]. Instead, we aim to quantify interactions among multiplevariables in DNNs with arbitrary architectures without any prior linguistic structure. More specifically,our method uses a tree to organize the extracted interactions hierarchically.
• Shapley values. The Shapley value [28] was first introduced in the game theory. Given a game withmultiple players, each player is supposed to pursue a high score/award. Sometimes, some playersmay form a coalition in order to pursue more awards. Since each player contributes differently to thecoalition, the final award distributed to each player should be unequal. The Shapley value is widelyconsidered as a unique unbiased approach to fairly allocating the total award to each player, whichsatisfies four desirable properties, including the linearity, dummy, symmetry and efficiency properties.Please see the supplementary material for details of these properties.
Given a game vN with n players, let N = 1, 2, ..., n represent the set of n players. The superscriptN indicates the set of players participating in the game. Let 2N denote all the potential subsetsof N . For example, there are three players a, b and c in a game. Hence, N = a, b, c and2N = ∅, a, b, c, a, b, a, c, b, c, a, b, c. vN is a set function mapping from eachsubset to a real number (i.e. vN : 2N 7→ R). For any subset of players S ⊆ N , where S can beregarded as a coalition, vN (S) represents the award of the coalition. Considering that the player ais not in the coalition S (i.e. a /∈ S), then if player a joins the coalition S, the overall award of thecoalition would be vN (S ∪ a). vN (S ∪ a) − vN (S) is considered as the marginal award ofplayer a. The Shapley value φv(a) is an unbiased contribution estimation of player a in the game.φv(a) is formulated as the weighted sum of marginal awards of player a brought to all possible
1Although Murdoch et al. [24] called the inter-word relationships interactions, such interactions had essentialdifference from the interaction defined in this paper.
2The interaction was defined as deviation of composition from linearity.
3
coalitions S ⊆ N\a.
φv(a) =∑
S⊆N\a
(|N | − |S| − 1)!|S|!|N |!
(vN (S ∪ a)− vN (S)) (1)
Due to the exponential number of coalitions, the computation of Shapley values is NP-hard. Asampling-based method [2] can be used to approximate Shapley values.
3 Algorithm
3.1 Interactions
Interactions between two players. In the game theory, some players may form a coalition tocompete with other players and win an award. Considering that the Shapley value is an unbiasedestimation of each player’s contribution [21], we quantify interactions based on the Shapley value.Suppose that there are n players N = 1, 2, ..., n in a game v. Without loss of generality, werandomly select a pair of players a, b ∈ N . Shapley values of players a and b are denoted by φv(a)and φv(b), respectively. If players a and b cooperate to form a coalition Sab = a, b, we canconsider the coalition as a new singleton, which is represented using brackets, [Sab]. In this way,the game can be considered to have n − 1 players, and one of them is the singleton [Sab]. I.e. aand b always appear together in the game. The interaction benefit between a and b is defined asB([Sab]) = φv
N\a,b∪[Sab]([Sab])− (φvN\b
(a) + φvN\a
(b)). N \ a, b ∪ [Sab] representsthe set of players in N excluding a, b and being added a new singleton player [Sab]. The absolutevalue of the interaction benefit |B([Sab])| represents the significance of the interaction. B([Sab]) > 0indicates a cooperative relationship between a and b. Whereas, B([Sab]) < 0 indicates an adversarialrelationship between a and b.
Extension to interactions among multiple players. We extend the two-player interaction to inter-actions among multiple players. When the game has n players, let us consider a subset of playersS ( N as a coalition, which is regarded as a new singleton player [S]. The interaction benefit of thecoalition S is defined as follows (please see the supplementary material for more discussions).
B([S]) = φv(N\S)∪[S]
([S])−∑a∈S
φv(N\S)∪a
(a) (2)
In this way, the interaction benefit measures the additional award brought by the singleton player[S] w.r.t. the individual contribution of each player computed in Equation (1) without requiring allplayers in S to appear together. The Shapley value φv
(N\S)∪[S]([S]) is computed only considering
the set of players when we remove all players in S from N and add a new singleton player [S] in thegame. Similarly, φv
(N\S)∪a(a) is computed only considering the set of players when we remove all
players in S from N and add the player a. If B([S]) is greater/less than 0, interactions of players inS have positive/negative effects, revealing the cooperative/adversarial relationship among players.
Furthermore, players in S can be divided into two disjoint subsets S1, S2 (i.e. S1∩S2 = ∅, S1∪S2 =S). Accordingly, the interaction benefit can be decomposed into three terms:
B([S]) = B([S1]) +B([S2]) +Bbetween(S1, S2) (3)The first and second terms B([S1]) and B([S2]) indicate interaction benefits among players withinS1 and S2, respectively. The third term Bbetween(S1, S2) indicates interaction benefits among playersselected from both S1 and S2. Bbetween(S1, S2) will be introduced in detail in Section 3.2.
Properties of interaction benefits. Theoretically, the overall interaction benefit, B([S]), S ⊆ N ,can be decomposed into elementary interaction components Iv
N
(S). The elementary interactioncomponent was originally proposed in [10] (please see the supplementary material for details).The elementary interaction component Iv
N
(S) measures the marginal benefit received from thecoalition [S], from which benefits of all potential smaller coalitions S′ ( S are removed. Forexample, let S = a, b, c. Then, Iv
N
(S) measures interactions caused by [S] = (a, b, c), andignores all potential interactions caused by coalitions of (a, b), (a, c), (b, c), (a), (b), (c). Therefore,the elementary interaction component is formulated as follows.
IvN
(S) = Iv(N\S)∪[S]
([S])−∑
S′(S,S′ 6=∅Iv
(N\S)∪S′
(S′) (4)
4
In particular, for any singleton player [S], we have Iv(N\S)∪[S]
([S]) = φv(N\S)∪[S]
([S]). Thus,we can compute Iv
N
(S) via dynamic programming. Therefore, B([S]) can be decomposed intoelementary interaction components as follows (please see the supplementary material for the proof).
B([S]) =∑
S′⊆S,|S′|>1
Iv(N\S)∪S′
(S′) (5)
3.2 Fine-Grained analysis of interactions between two sets of players
Interactions between two sets of players Bbetween(S1, S2) can be further decomposed into three partsψinter, ψintra
1 , ψintra2 . Please see the supplementary material for the proof.
Bbetween(S1, S2) = ψinter + ψintra1 + ψintra
2 (6)
where
ψinter =∑
L⊆S,L 6⊂S1,L 6⊂S2,|L|>1Iv
(N\S)∪L
(L) (7)
ψintra1 =
∑L⊆S1,|L|>1
Iv(N\S)∪L
(L)−∑
L⊆S1,|L|>1
Iv(N\S1)∪L
(L) = B([S1])|N ′=(N\S2) −B([S1]) (8)
ψintra2 =
∑L⊆S2,|L|>1
Iv(N\S)∪L
(L)−∑
L⊆S2,|L|>1
Iv(N\S2)∪L
(L) = B([S2])|N ′=(N\S1) −B([S2]) (9)
ψinter represents all potential interaction benefits caused by coalitions whose elements are selectedfrom both S1 and S2. B([S1])|N ′=(N\S2) denotes interaction benefits of the singleton [S1], when theset of players in the game is N ′ = (N \S2). ψintra
1 indicates the difference of internal interactionsamong players in the set S1 in the absence and presence of players in the set S2.
3.3 Interactions encoded inside a DNN
We aim to analyze interactions among words, which are encoded inside a trained DNN. Given aninput sentence, we construct a tree to disentangle and quantify interactions among input words.
Given an input sentence with n words, we first introduce the Shapley value of input words w.r.t. theprediction of the DNN. Here, we consider each word as a player, and the scalar output of a DNN asthe aforementioned award in the game. If a DNN has a scalar output, we can take the scalar output asthe award v. If the DNN outputs a vector for multi-category classification, we select the score beforethe softmax layer corresponding to the true class as the award score. To compute v(S), we maskwords in N \S in the input sentence, and feed the modified input into the DNN. The embedding ofthe masked word is set to a dummy vector, which refers to a padding of the input to the DNN. Then,the Shapley value of each word a is approximated using a sampling-based method [2].
As Figure 1 shows, we construct a binary tree with n leaf nodes. Each leaf node represents a word,while each non-leaf node represents a constituent. Two adjacent nodes with strong interactions willbe merged into a node in the next layer. For each sub-structure of a parent node S with two childnodes Sl and Sr, we can obtain the following equation according to Equation (3).
B([S]) =B([Sl]) +B([Sr]) +Bbetween(Sl, Sr)
=B([Sll]) +B([Slr]) +B([Srl]) +B([Srr])
+Bbetween(Sll, Slr) +Bbetween(Srl, Srr) +Bbetween(Sl, Sr)
=∑
H∈non-leaf nodesBbetween(Hl, Hr)
(10)
B([S]) can be recursively decomposed into the sum of interaction benefits between two child nodesof all non-leaf nodes. Please see the supplementary material for the proof.
3.4 Metrics for interactions and the construction of a tree
Metrics for interactions. Besides B([Sl]), B([Sr]) and Bbetween(Sl, Sr), we define three additionalmetrics to provide insightful analysis of interactions among words. Let us consider a sub-structure of
5
400 1000 1600Number of sampling
0.02
0.06
0.10
Inst
abilit
y
SST-2BERTELMoTransformerCNNLSTM
400 1000 1600Number of sampling
0.0
0.1
0.2
0.3
Inst
abilit
y
CoLABERTELMoTransformerCNNLSTM
(a)
400 1000 1600Number of sampling
0.01
0.03
0.05
0.07
Erro
rs o
f B([S
])
SST-2BERTCNNELMoLSTMTransformer
400 1000 1600Number of sampling
0.00
0.04
0.08
0.12
Erro
rs o
f B([S
])
CoLABERTCNNELMoLSTMTransformer
(b)
Figure 2: The instability of sampling-based Shapley values (a), and errors of the estimated interactionbenefits (b) along with the number of sampling times.
a' a b b'
BabBa'a Bbb'
ϕa ϕb
c
Figure 3: Interaction benefits between constituents.The interaction benefitBab is more significant thanBa′a and Bbb′ , so the tree merges a and b to forma coalition c.
a1 a2 a3 a4 a5 a6 a7
AND AND
OR
a8 a9 a10 a11
AND
Figure 4: A model in the AND-OR dataset.Each leaf node is a binary variable.
a parent node c (corresponding to the constituent S) and two child nodes a and b (corresponding tosub-constituents Sl and Sr). As Figure 3 shows, a′ is the left adjacent node of a, and b′ is the rightadjacent node of b. We propose the metric density of modeled interactions for a candidate coalitionsuch as a, b, denoted by r(a, b). This metric measures the ratio of interaction benefits betweentwo adjacent nodes a and b to total interaction benefits related to a and b. The density of modeledinteractions is approximated as follows.
r(a, b) =interaction benefits between a and b
total interaction benefits related to a and b≈ |Bab||Bab|+ |Ba′a|+ |Bbb′ |+ |φa|+ |φb|
(11)
where Bab = Bbetween(Sa, Sb), φa and φb can be approximated as φv(N\Sa)∪[Sa]
([Sa]) andφv
(N\Sb)∪[Sb]([Sb]), respectively. To measure interaction benefits that are not represented by thetree, a metric called density of unmodeled interactions denoted by s(a, b) is given.
s(a, b) =unmodeled interaction benefits
total interaction benefits related to a and b≈ |Ba′a|+ |Bbb′ ||Bab|+ |Ba′a|+ |Bbb′ |+ |φa|+ |φb|
(12)
Note that neither r(a, b) nor s(a, b) is an accurate estimation of the ratio of interactions. If twoconstituents are far away (e.g. not adjacent), their interaction benefits are usually small and sometimescan be neglected. Therefore, we only consider interaction benefits between adjacent nodes (i.e. Ba′a,Bab, Bbb′). We have demonstrated very little effects of such neglection in Table 4. In addition,according to Equation (6), we have Bbetween(Sl, Sr) = ψinter + ψintra
l + ψintrar . Therefore, we define
the following metric to measure the ratio of inter-constituent interactions.
t = |ψinter|/(|ψinter|+ |ψintral + ψintra
r |) (13)
Construction of a tree. We introduce a method to construct a tree structure. We use the metricr(a, b) in Equation (11) to quantify the significance of interactions between two adjacent constituents,and to guide the construction of the tree. We are given a trained DNN and an input sentence. TheDNN can be trained for various tasks, such as sentiment classification, and estimation of linguisticacceptability. We construct the tree in a bottom-up manner. Let Ω denote the set of current candidatenodes to merge. In the beginning, each word ai of the input sentence is initialized as a leaf node,Ω = a1, a2, ..., an. In each step, we compute the value of each pair of adjacent nodes r(ai, ai+1).Then, we select and merge two adjacent nodes with the largest value of r(ai, ai+1). In this way, weuse a greedy strategy to build up the tree, so that salient interactions among words are represented.
4 Experiments
• Instability and accuracy of Shapley values. According to Equation (1), the accurate computationof Shapley values was NP-hard. Castro et al. [2] proposed a sampling-based method to approximate
6
Table 1: The rate of incorrect extractions ofword interactions, which verifies the assump-tion that effects of non-adjacent nodes canbe neglected on the SST-2 dataset (see thesupplementary meterial for more results).
# of merges BERT ELMo CNN LSTM
1 0.00 0.02 0.01 0.06
2 0.00 0.06 0.02 0.13
3 0.00 0.12 0.02 0.19
4 0.03 0.15 0.07 0.15
5 0.03 0.16 0.07 0.14
Table 2: Fitness (the unlabeled F1) between theextracted trees from NLP models and syntactictrees, which demonstrates that interactions en-coded in a DNN are not quite related to the syn-tactic structure.
Dataset BERT ELMo CNN LSTM
CoLA 39.85% 17.08% 16.69% 14.07%
SST-2 19.58% 18.65% 12.82% 32.68%
Transformer Random LB RB
CoLA 3.79% 15.18% 2.68% 60.46%
SST-2 26.19% 19.95% 12.27% 47.35%
Table 3: Comparison of the cor-rectness of the extracted interac-tions on the AND-OR dataset.
F1 Recall
Our method 45.1% 96.8%SHAP interaction 38.6% 80.9%
Random 13.2% 27.6%
LB 8.4% 18.1%
RB 4.3% 10.0%
emotional have
good funchildThere is no pleasure watching a suffer . in
I just loved every minite of this film Too much of the humor falls flat . .
1st merge 2nd merge 3rd merge
it up to .all adds
it could been worsebut .a widly experience .inconsistent
Figure 5: Examples of the phenomenon that constituents with dis-tinct emotional attitudes have strong interactions and are extractedin the first three steps for BERT learned on the SST-2 dataset.
Shapley values with polynomial computational complexity. In order to evaluate the instability ofB([S]), we quantified the change of the instability of Shapley values along with the increase of thenumber of sampling times. Let us compute the Shapley value φv(a) for each word by sampling Ttimes. We repeated such a procedure of computing Shapley values two times. Then, the instabilityof the computation of Shapley values was measured as 2||φ− φ
′ ||/(||φ||+ ||φ′ ||) where φ and φ′
denoted two vectors of word-wise Shapley values computed in these two times. The overall instabilityof Shapley values was reported as the average value of the instability of all sentences. Figure 2(a)shows the change of the instability of Shapley values along with the number of sampling times T .We found that when T ≥ 1000, we obtained stable Shapley values.
In addition, we also evaluated the accuracy of the estimation of interaction benefits B([S]). Theproblem was that the ground truth value of B([S]) was computed using the NP-hard brute-forcemethod in Equation (1). Considering the NP-hard computational cost, we only conducted suchevaluations on sentences with no more than 10 words. The average absolute difference (i.e. the error)between the estimated B([S]) and its ground truth value over all sentences is reported in Figure 2(b).We found that the estimated interaction benefits were accurate enough when the number of samplingtimes was greater than 1000.
• Effects of non-adjacent nodes. To compute r(a, b), we only considered interaction benefitsbetween two adjacent nodes, and assumed that interactions of non-adjacent nodes were much lesssignificant than those of adjacent nodes. To verify this assumption, we defined the followingmetric to quantify the interaction benefit r′(a, c) between two non-adjacent nodes a and c, andevaluated whether the most salient interaction between adjacent nodes a, b detected by our methodwas more significant than interactions between all potential non-adjacent nodes. We use r′(a, c) =|Bac|/(|Bac|+ |Ba′a|+ |Baa′′ |+ |Bc′c|+ |Bcc′′ |+ |φa|+ |φc|) to quantify the interaction densitybetween non-adjacent nodes a and c, where a′ and a′′ were the left and right adjacent nodes of a, c′and c′′ were the left and right adjacent nodes of c. If the interaction density r(a, b) estimated by ourmethod was higher than that between potential non-adjacent nodes, we considered this as a correctextraction of word interactions. Table 4 reports the rate of incorrect extractions of word interactionsover all sentences during the construction of the tree (please see the supplementary material for moreresults). Based on this assumption, our method performed correctly in most cases.
• Correctness of the extracted interaction. We aimed to evaluate whether the extracted interactionobjectively reflected the true interaction in the model, but the core challenge was that it was impossible
7
, separationcomedy aboutAn
.long too , and too
.and funny, Just
. John reagards bill as a
. to try and buy
. The farmer load
and film .deepBbetween=0.35
t=0.97
r=0.30s=0.05s=0.55r=0.33B([s])=1.75
B([s])=0.41B([s])=0.35
Bbetween=0.41t=0.77
Bbetween=1.28t=0.40
meaningful
s=0.26r=0.25
r=0.33 s=0.31
B([s])=4.35
Bbetween=2.66t=0.08
betried toJohn goodBbetween=1.82
t=0.50
r=0.43s=0.29
Bbetween=5.31t=0.40
Bbetween=-0.82t=0.43
r=0.25 s=0.02
s=0.46r=0.47
B([s])=12.04
B([s])=12.81
B([s])=7.95
a
Bbetween=3.34t=0.41
t=0.87
r=0.49 s=0.22
B([s])=0.48
Bbetween=0.48
r=0.59 s=0.16
B([s])=2.56
boy .t=0.94
r=0.47 s=0.33
B([s])=1.31
Bbetween=1.31
Bbetween=-0.27t=0.35
r=0.10 s=0.00
B([s])=11.76
a
r=0.21 s=0.00B([s])=5.87
Bbetween=1.06t=0.24
the cart with apples
some whiskey
good friend
up ##liftingas moving as ever
Too slow little happens
alien ##ationabsurd ##ist loss .
I want
Figure 6: Trees extracted from BERT trained on the SST-2 dataset (left) and the CoLA dataset (right),respectively. Metrics are shown around each non-leaf node. Please see the supplementary materialfor more results of different models.
to annotate ground-truth interactions between words. It was because the human’s understanding ofword interactions was not necessarily equivalent to objective interactions encoded in a DNN. In thisway, we constructed a dataset with ground-truth interactions between the inputs, as follows.
We constructed a dataset with 2048 models. Each model was implemented as a boolean function,whose input was 11 binary variables a1, a2, · · · , a11 ∈ 0, 1. The output of the model was a binaryvariable which consisted of AND, OR operations in a tree structure (e.g. the tree in Figure 4). Weevaluated whether the extracted interaction could reflect the true AND, OR constituents in the input.
The unlabeled F1 and unlabeled recall were used to evaluate the correctness of the extracted interaction.We compared our method with four baselines. The first baseline was [22], which defined a typeof two-player interaction (i.e. SHAP interaction), and we extended this technique to construct atree. I.e. we merged the two adjacent nodes with the largest absolute SHAP interaction value. Sincethere was no other method to construct a tree for interactons, the other three baselines Random, left-branching (LB) and right-branching (RB) trees (used in [29]) were selected to show the performanceof trivial solutions. As Table 3 shows, our method outperformed all baselines. Note that theoretically,there did not exist a 100% F1 score, because the extracted binary tree was naturally different from theground-truth n-ary tree.
• Analysis of DNNs based on interactions. We learned DNNs for binary sentiment classificationbased on the SST-2 dataset [34], and learned DNNs to predict whether a sentence was linguisticallyacceptable based on the CoLA dataset [43]. For each task, we learned five DNNs, including theBERT [7], the ELMo [25], the CNN proposed in [16], the two-layer unidirectional LSTM [12], andthe Transformer [39].
We used our method to extract tree structures that encoded interactions among words inside varioustrained DNNs. Figure 6 illustrates trees extracted from BERT on different tasks. (1) For thelinguistic acceptability task, BERT usually combined noun phrases firstly, while the subject wascombined almost at last. ELMo and LSTM were prone to construct a tree with a “subject+verb-phrase+noun/adjective-phrase” structure. CNN usually extracted small constituents including apreposition or an article, e.g. “afraid of,” “fix the.” Transformer tended to encode interactions amongadjacent constituents sequentially. (2) For the sentiment analysis task, as Figure 5 shows, most treesof these DNNs usually extracted constituents with distinct positive/negative emotional attitudes inearly stages (please see the supplementary material for more results of different models).
Comparison of the fitness between the extracted trees and syntactic trees: Furthermore, we comparedthe fitness between the automatically extracted tree and the syntactic tree of the sentence. To this end,given an input sentence, we used the Berkeley Neural Parser [17] to generate the syntactic tree as theground-truth.3 We used the unlabeled F1 to evaluate the fitness. Experimental results are reported inTable 2, which demonstrates the logic of interactions modeled by the DNN was significantly differentfrom human knowledge.
3The parser’s performance was good enough to take its parsing results as ground-truth.
8
In addition, our method can be also applied to build a tree for interactions w.r.t. the computa-tion of features in an intermediate layer. Please see the supplementary material for details of suchexperiments.
5 Conclusions
In this paper, we have defined and extracted interaction benefits among words encoded in a DNN,and have used a tree structure to organize word interactions hierarchically. Besides, six metrics aredefined to disentangle and quantify interactions among words. Our method can be regarded as ageneric tool to objectively diagnose various DNNs for NLP tasks, which provides new insights ofthese DNNs.
References[1] David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba. Network dissection: Quantify-
ing interpretability of deep visual representations. In Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition, pages 6541–6549, 2017.
[2] Javier Castro, Daniel Gómez, and Juan Tejada. Polynomial calculation of the shapley value based onsampling. Computers & Operations Research, 36(5):1726–1730, 2009.
[3] Jianbo Chen and Michael I Jordan. Ls-tree: Model interpretation when the data are linguistic. arXivpreprint arXiv:1902.04187, 2019.
[4] Jihun Choi, Kang Min Yoo, and Sang-goo Lee. Learning to compose task-specific tree structures. InThirty-Second AAAI Conference on Artificial Intelligence, 2018.
[5] Junyoung Chung, Sungjin Ahn, and Yoshua Bengio. Hierarchical multiscale recurrent neural networks.CoRR, abs/1609.01704, 2016. URL http://arxiv.org/abs/1609.01704.
[6] Tianyu Cui, Pekka Marttinen, and Samuel Kaski. Learning global pairwise interactions with bayesianneural networks. arXiv: Learning, 2019.
[7] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirec-tional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
[8] Andrew Drozdov, Patrick Verga, Mohit Yadav, Mohit Iyyer, and Andrew McCallum. Unsupervisedlatent tree induction with deep inside-outside recursive auto-encoders. In Proceedings of the 2019Conference of the North American Chapter of the Association for Computational Linguistics: HumanLanguage Technologies, Volume 1 (Long and Short Papers), pages 1129–1141, Minneapolis, Minnesota,June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1116. URL https://www.aclweb.org/anthology/N19-1116.
[9] Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros, and Noah A. Smith. Recurrent neural networkgrammars. In Proceedings of the 2016 Conference of the North American Chapter of the Associationfor Computational Linguistics: Human Language Technologies, pages 199–209, San Diego, California,June 2016. Association for Computational Linguistics. doi: 10.18653/v1/N16-1024. URL https://www.aclweb.org/anthology/N16-1024.
[10] Michel Grabisch and Marc Roubens. An axiomatic approach to the concept of interaction among playersin cooperative games. International Journal of Game Theory, 28:547–565, 1999.
[11] Peyton Greenside, Tyler Shimko, Polly Fordyce, and Anshul Kundaje. Discovering epistatic featureinteractions from neural network models of regulatory DNA sequences. Bioinformatics, 34(17):i629–i637,09 2018. ISSN 1367-4803. doi: 10.1093/bioinformatics/bty575. URL https://doi.org/10.1093/bioinformatics/bty575.
[12] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780,1997.
[13] Phu Mon Htut, Kyunghyun Cho, and Samuel R Bowman. Inducing constituency trees through neuralmachine translation. arXiv preprint arXiv:1909.10056, 2019.
[14] Joseph D Janizek, Pascal Sturmfels, and Su-In Lee. Explaining explanations: Axiomatic feature interactionsfor deep networks. arXiv preprint arXiv:2002.04138, 2020.
9
[15] Xisen Jin, Zhongyu Wei, Junyi Du, Xiangyang Xue, and Xiang Ren. Towards hierarchical importanceattribution: Explaining compositional semantics for neural sequence models. In International Conferenceon Learning Representations, 2020. URL https://openreview.net/forum?id=BkxRRkSKwr.
[16] Yoon Kim. Convolutional neural networks for sentence classification. In Proceedings of the 2014Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1746–1751, Doha,Qatar, October 2014. Association for Computational Linguistics. doi: 10.3115/v1/D14-1181. URLhttps://www.aclweb.org/anthology/D14-1181.
[17] Nikita Kitaev and Dan Klein. Constituency parsing with a self-attentive encoder. In Proceedings of the 56thAnnual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne,Australia, July 2018. Association for Computational Linguistics.
[18] Nikita Kitaev, Steven Cao, and Dan Klein. Multilingual constituency parsing with self-attention andpre-training. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics,pages 3499–3505, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1340. URL https://www.aclweb.org/anthology/P19-1340.
[19] Bowen Li, Lili Mou, and Frank Keller. An imitation learning approach to unsupervised parsing. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3485–3492, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1338.URL https://www.aclweb.org/anthology/P19-1338.
[20] Xiang Lisa Li and Jason Eisner. Specializing word embeddings (for parsing) by information bottleneck. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9thInternational Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2744–2754,Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1276. URL https://www.aclweb.org/anthology/D19-1276.
[21] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In Advances inneural information processing systems, pages 4765–4774, 2017.
[22] Scott M Lundberg, Gabriel G Erion, and Su-In Lee. Consistent individualized feature attribution for treeensembles. arXiv preprint arXiv:1802.03888, 2018.
[23] Khalil Mrini, Franck Dernoncourt, Trung Bui, Walter Chang, and Ndapa Nakashole. Rethinking self-attention: An interpretable self-attentive encoder-decoder parser. arXiv preprint arXiv:1911.03875, 2019.
[24] W. James Murdoch, Peter J. Liu, and Bin Yu. Beyond word importance: Contextual decomposition toextract interactions from LSTMs. In International Conference on Learning Representations, 2018. URLhttps://openreview.net/forum?id=rkRwGg-0Z.
[25] Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and LukeZettlemoyer. Deep contextualized word representations. In Proceedings of the 2018 Conference of theNorth American Chapter of the Association for Computational Linguistics: Human Language Technolo-gies, Volume 1 (Long Papers), pages 2227–2237, New Orleans, Louisiana, June 2018. Association forComputational Linguistics. doi: 10.18653/v1/N18-1202. URL https://www.aclweb.org/anthology/N18-1202.
[26] Alessandro Raganato and Jörg Tiedemann. An analysis of encoder representations in transformer-based ma-chine translation. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and InterpretingNeural Networks for NLP, pages 287–297, Brussels, Belgium, November 2018. Association for Com-putational Linguistics. doi: 10.18653/v1/W18-5431. URL https://www.aclweb.org/anthology/W18-5431.
[27] Emily Reif, Ann Yuan, Martin Wattenberg, Fernanda B Viegas, Andy Coenen, Adam Pearce, and BeenKim. Visualizing and measuring the geometry of bert. In H. Wallach, H. Larochelle, A. Beygelz-imer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Sys-tems 32, pages 8594–8603. Curran Associates, Inc., 2019. URL http://papers.nips.cc/paper/9065-visualizing-and-measuring-the-geometry-of-bert.pdf.
[28] Lloyd S Shapley. A value for n-person games. Contributions to the Theory of Games, 2(28):307–317,1953.
[29] Yikang Shen, Zhouhan Lin, Chin wei Huang, and Aaron Courville. Neural language modeling by jointlylearning syntax and lexicon. In International Conference on Learning Representations, 2018. URLhttps://openreview.net/forum?id=rkgOLb-0W.
10
[30] Yikang Shen, Shawn Tan, Alessandro Sordoni, and Aaron Courville. Ordered neurons: Integrating treestructures into recurrent neural networks. In International Conference on Learning Representations, 2019.URL https://openreview.net/forum?id=B1l6qiR5F7.
[31] Haoyue Shi, Hao Zhou, Jiaze Chen, and Lei Li. On tree-based neural sentence modeling. In Proceedingsof the Conference on Empirical Methods in Natural Language Processing, 2018.
[32] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualisingimage classification models and saliency maps. CoRR, abs/1312.6034, 2014.
[33] Chandan Singh, W. James Murdoch, and Bin Yu. Hierarchical interpretations for neural network predictions.In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=SkEqro0ctQ.
[34] Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Y Ng, andChristopher Potts. Recursive deep models for semantic compositionality over a sentiment treebank.In Proceedings of the 2013 conference on empirical methods in natural language processing, pages1631–1642, 2013.
[35] Daria Sorokina, Rich Caruana, Mirek Riedewald, and Daniel Fink. Detecting statistical interactions withadditive groves of trees. In Proceedings of the 25th international conference on Machine learning, pages1000–1007, 2008.
[36] Kai Sheng Tai, Richard Socher, and Christopher D. Manning. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Associationfor Computational Linguistics and the 7th International Joint Conference on Natural Language Processing(Volume 1: Long Papers), pages 1556–1566, Beijing, China, July 2015. Association for ComputationalLinguistics. doi: 10.3115/v1/P15-1150. URL https://www.aclweb.org/anthology/P15-1150.
[37] Michael Tsang, Dehua Cheng, and Yan Liu. Detecting statistical interactions from neural network weights.In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=ByOfBggRZ.
[38] Michael Tsang, Hanpeng Liu, Sanjay Purushotham, Pavankumar Murali, and Yan Liu. Neural interac-tion transparency (nit): Disentangling learned interactions for improved interpretability. In S. Bengio,H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in NeuralInformation Processing Systems 31, pages 5804–5813. Curran Associates, Inc., 2018.
[39] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ŁukaszKaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processingsystems, pages 5998–6008, 2017.
[40] Elena Voita, Rico Sennrich, and Ivan Titov. The bottom-up evolution of representations in the transformer:A study with machine translation and language modeling objectives. In Proceedings of the 2019 Conferenceon Empirical Methods in Natural Language Processing and the 9th International Joint Conference onNatural Language Processing (EMNLP-IJCNLP), pages 4396–4406, Hong Kong, China, November 2019.Association for Computational Linguistics. doi: 10.18653/v1/D19-1448. URL https://www.aclweb.org/anthology/D19-1448.
[41] Xing Wang, Zhaopeng Tu, Longyue Wang, and Shuming Shi. Self-attention with structural positionrepresentations. In Proceedings of the 2019 Conference on Empirical Methods in Natural LanguageProcessing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP),pages 1403–1409, Hong Kong, China, November 2019. Association for Computational Linguistics. doi:10.18653/v1/D19-1145. URL https://www.aclweb.org/anthology/D19-1145.
[42] Yaushian Wang, Hung-Yi Lee, and Yun-Nung Chen. Tree transformer: Integrating tree structures intoself-attention. In Proceedings of the 2019 Conference on Empirical Methods in Natural LanguageProcessing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP),pages 1061–1070, Hong Kong, China, November 2019. Association for Computational Linguistics. doi:10.18653/v1/D19-1098. URL https://www.aclweb.org/anthology/D19-1098.
[43] Alex Warstadt, Amanpreet Singh, and Samuel R Bowman. Neural network acceptability judgments. arXivpreprint arXiv:1805.12471, 2018.
[44] Robert J Weber. Probabilistic values for games. The Shapley Value. Essays in Honor of Lloyd S. Shapley,pages 101–119, 1988.
[45] Dani Yogatama, Phil Blunsom, Chris Dyer, Edward Grefenstette, and Wang Ling. Learning to composewords into sentences with reinforcement learning. arXiv preprint arXiv:1611.09100, 2016.
11
[46] Dani Yogatama, Yishu Miao, Gabor Melis, Wang Ling, Adhiguna Kuncoro, Chris Dyer, and Phil Blunsom.Memory architectures in recurrent neural network language models. In International Conference onLearning Representations, 2018. URL https://openreview.net/forum?id=SkFqf0lAZ.
12
A Properties of Shapley values
In this section, we discuss about four desirable properties of Shapley values, which are mentioned in Line 118 ofthe paper.
In game theory, the Shapley value is a unique value function that satisfies all the following axioms [44]:
• Linearity axiom: When two games v and w are combined into a single game v + w, their Shapley values canbe added, i.e. φ(v+w)(i) = φv(i) + φw(i) for each player i in N . Similarly, for any c ∈ R and i ∈ N , therewill be φcv(i) = cφv(i).
• Dummy axiom: A player i ∈ N is referred to as a dummy player if v(S ∪ i) = v(S) + v(i) for each subsetS ⊆ N \ i. Thus, if i ∈ N is a dummy player, φv(i) = v(i), which indicates player i has no interactions toany coalition.
• Symmetry axiom: Given two players i, j ∈ N , if v(S ∪ i) = v(S ∪ j) for each subset S ⊆ N \ i, j,φv(i) = φv(j). In other words, if two players have the same interactions with all other players in the game,then they have the same Shapley value.
• Efficiency axiom: The sum of Shapley values of all players in N is equal to the award of all players in N(i.e.
∑i∈N φv(i) = v(N)). This axiom guarantees the overall award can be distributed to all players in the
game.
B Interactions among multiple players
In this section, we mainly discuss about how to extend interactions between two players to interactions amongmultiple players, which is mentioned in Line 151 of the paper.
Given a game v with n players, N = 1, 2, ..., n is the set of n players. If player a and player b form acoalition Sab = a, b, we regard the coalition as a new singleton player [Sab]. We define the interaction benefitbetween players a and b as B([Sab]).
B([Sab]) = φv(N\a,b)∪[Sab]([Sab])− (φvN\b
(a) + φvN\a(b)) (14)
(N \ a, b) ∪ [Sab] represents the set of players in N excluding a, b and being added a new singleton player[Sab].
Then, we extend the interaction between two players to interactions among multiple players. For example, if aset of players S form a coalition, which is regarded as a new singleton player [S], the interaction benefit amongplayers in the coalition is defined as follows (also see Equation (2) of the paper).
B([S]) = φv(N\S)∪[S]([S])−
∑a∈S
φv(N\S)∪a(a) (15)
C Elementary interaction components
In this section, we introduce the elementary interaction component in more detail, which is mentioned in Line166 of the paper.
In a game v, the elementary interaction component of players among coalition S ⊆ N is denoted by Iv(S). Thedefinition of the elementary interaction component [10] is given as follows.
∀S ⊆ N, Iv(S) =∑
T⊆N\S
(n− t− s)!t!(n− s+ 1)!
∑L⊆S
(−1)s−lv(L ∪ T ) (16)
where n, t, s, and l are the size of the corresponding sets N , T , S, and L, respectively. Note that for a singletonplayer, the elementary interaction component corresponds to the Shapley value, i.e. Iv(a) = φv(a) where a is asingleton player.
If a set of players S form a coalition, we regard the coalition as a singleton player [S]. Let us take two players aand b for example. If players a and b form a coalition S = a, b, which can be considered as a new singletonplayer [S]. The interaction benefit between players a and b is as follows.
Iv(a, b) = Iv(N\a,b)∪[a,b]
([a, b])− IvN\b
(a)− IvN\a
(b)
= φv(N\a,b)∪[a,b]([a, b])− φvN\b
(a)− φvN\a(b)
(17)
13
Therefore, if the marginal award of the coalition φv(N\a,b)∪[a,b]([a, b]) is larger than the sum of marginal
awards of players a and b (i.e. φvN\b(a) + φvN\a
(b)), players a and b are likely to cooperate in the gamev. In other words, the positive (or negative) value of Iv(a, b) indicates a positive (or negative) interactionbetween players a and b.
Besides, Iv(S) satisfies the following recursive axiom for each S ⊆ N, |S| > 1. K is a non-empty propersubset of S.
Iv(S) = Iv(N\S)∪[S]
([S])−∑
K(S,K 6=∅
IvN\K
(S \K)
= Iv(N\S)∪[S]
([S])−∑
S′(S,S′ 6=∅
Iv(N\S)∪S′
(S′)(18)
D Proof of the relationship between interaction benefits and elementaryinteraction components
In this section, we mainly prove the relationship between interaction benefits and elementary interactioncomponents, which is mentioned in Line 174 of the paper (also see Equation (5) of the paper).
According to Equation (15) and Equation (18), we can establish the relationship between the interaction benefitand the elementary interaction component.
B([S]) = φv(N\S)∪[S]([S])−
∑a∈S
φv(N\S)∪a(a)
= Iv(N\S)∪[S]
([S])−∑a∈S
Iv(N\S)∪a
(a)
=∑
S′⊆S,|S′|>1
Iv(N\S)∪S′
(S′)
(19)
E Proof of interactions between two sets of players Bbetween(S1, S2)
In this section, we mainly prove the fine-grained analysis of interactions between two sets of players, which ismentioned in Line 177 of the paper (also see Equation (6) of the paper).
Given a set of players S, we can split S into two subsets S1 and S2, S1 ∩ S2 = ∅, S1 ∪ S2 = S. According toEquation (19), we have:
B([S]) =∑
L⊆S,|L|>1
Iv(N\S)∪L
(L) (20)
B([S1]) =∑
L⊆S1,|L|>1
Iv(N\S1)∪L
(L) (21)
B([S2]) =∑
L⊆S2,|L|>1
Iv(N\S2)∪L
(L) (22)
Therefore, we derive the following equation.
B([S]) = B([S1]) +B([S2]) +∑
L⊆S,|L|>1
Iv(N\S)∪L
(L)
−∑
L⊆S1,|L|>1
Iv(N\S1)∪L
(L)−∑
L⊆S2,|L|>1
Iv(N\S2)∪L
(L)
= B([S1]) +B([S2]) +Bbetween(S1, S2)
(23)
14
Bbetween(S1, S2) =∑
L⊆S,|L|>1
Iv(N\S)∪L
(L)−∑
L⊆S1,|L|>1
Iv(N\S1)∪L
(L)−∑
L⊆S2,|L|>1
Iv(N\S2)∪L
(L)
=∑
L⊆S,L 6⊂S1,L 6⊂S2,|L|>1Iv
(N\S)∪L
(L)
+∑
L⊆S1,|L|>1Iv
(N\S)∪L
(L)−∑
L⊆S1,|L|>1Iv
(N\S1)∪L
(L)
+∑
L⊆S2,|L|>1Iv
(N\S)∪L
(L)−∑
L⊆S2,|L|>1Iv
(N\S2)∪L
(L)
= ψinter + ψintra1 + ψintra
2
(24)
Where
ψinter =∑
L⊆S,L 6⊂S1,L 6⊂S2,|L|>1Iv
(N\S)∪L
(L) (25)
ψintra1 =
∑L⊆S1,|L|>1
Iv(N\S)∪L
(L)−∑
L⊆S1,|L|>1Iv
(N\S1)∪L
(L)
= B([S1])|N′=(N\S2) −B([S1])(26)
ψintra2 =
∑L⊆S2,|L|>1
Iv(N\S)∪L
(L)−∑
L⊆S2,|L|>1Iv
(N\S2)∪L
(L)
= B([S2])|N′=(N\S1) −B([S2])(27)
Bbetween(S1, S2) reflects all interactions across players from S1 and S2.
F Proof of the decomposition of B([S])
In this section, we mainly prove the decomposition of the interaction benefit B([S]), which is mentioned in Line199 of the paper (also see Equation (10) of the paper).
B([S]) =B([Sl]) +B([Sr]) +Bbetween(Sl, Sr)
=B([Sll]) +B([Slr]) +B([Srl]) +B([Srr])
+Bbetween(Sll, Slr) +Bbetween(Srl, Srr) +Bbetween(Sl, Sr)
=∑
H∈non-leaf nodesBbetween(Hl, Hr)
(28)
Note that the interaction benefit of a leaf node is zero, so only interaction benefits between two child nodes ofnon-leaf nodes will be left at the end of the recursion in Equation (28).
G Experimental Results
We provided more results of “Effects of non-adjacent nodes” (Line 246) and “Analysis of DNNs based oninteractions” (Line 277) experiments in Section 4 (Line 227) of the paper, as well as further experiments toanalyze interactions encoded in intermediate layers, which were mentioned in Line 297 of the paper.
• Effects of non-adjacent nodes. Here, we provided more results of this experiment. Specifically, as Table 4shows, we reported the rate of incorrect extractions of word interactions over all sentences during the constructionof the tree on the SST-2 dataset, and the CoLA dataset, respectively. Note that Table 4 was a complement toTable 1 of the paper.
• Analysis of DNNs based on interactions. (1) For the linguistic acceptability task, we provided more resultsof trees extracted from different NLP models on the SST-2 dataset and the CoLA dataset, respectively (seeFigures 9—18, which were complements to Figure 6 of the paper). We found that BERT usually combinednoun phrases firstly, while the subject was combined almost at last. ELMo and LSTM were prone to construct atree with a “subject+verb-phrase+noun/adjective-phrase” structure. CNN usually extracted small constituentsincluding a preposition or an article. Transformer tended to encode interactions among adjacent constituentssequentially. (2) For the sentiment analysis task, we found that most trees of these DNNs usually extractedconstituents with distinct positive/negative emotional attitudes in early stages of the construction of the tree.More examples of this phenomenon were given in Tables 5—8, which were complements to Figure 5 of thepaper.
15
Table 4: The rate of incorrect extractions of word interactions, which verifies the assumption thateffects of non-adjacent nodes can be neglected on the SST-2 dataset (left) and the CoLA dataset(right).
# of merges BERT ELMo CNN LSTM
1 0.00 0.02 0.01 0.06
2 0.00 0.06 0.02 0.13
3 0.00 0.12 0.02 0.19
4 0.03 0.15 0.07 0.15
5 0.03 0.16 0.07 0.14
# of merges BERT ELMo CNN LSTM
1 0.02 0.02 0.01 0.06
2 0.04 0.06 0.02 0.13
3 0.01 0.12 0.02 0.19
4 0.03 0.15 0.07 0.15
5 0.02 0.16 0.07 0.14
Table 5: Constituents extracted from ELMo in the first three steps during the construction of the tree.sentence 1st merge 2nd merge 3rd merge
I just loved every minute of this film . just loved this film of this film
But it could have been worse . it could But it could been worse
Too much of humor falls flat . falls flat falls flat . humor falls flat .
There is no pleasure in watching a child suffer . no pleasure no pleasure in no pleasure in watching
It all adds up to good fun . good fun good fun . to good fun .
Table 6: Constituents extracted from CNN in the first three steps during the construction the tree.sentence 1st merge 2nd merge 3rd merge
A deep and meaningful film . A deep meaningful film meaningful film .
Dense with characters and contains some thrilling moments . thrilling moments Dense with and contains
It treats women like idiots . like idiots treats women It treats women
Just embarrassment and a vague sense of shame . sense of shame . embarrassment and
Just one bad idea after another . one bad idea after idea after another
Table 7: Constituents extracted from LSTM in the first three steps during the construction of the tree.sentence 1st merge 2nd merge 3rd merge
Just one bad idea after another . bad idea one bad idea after another
But it could have been worse . been worse But it could have
There is no pleasure in watching a child suffer . no pleasure is no pleasure child suffer
Too slow , too long and too little happens . too slow too long too slow ,
It treats women like idiots . like idiots like idiots . It treats
Table 8: Constituents extracted from Transformer in the first three steps during the construction ofthe tree.
sentence 1st merge 2nd merge 3rd merge
No way i can believe this load of junk . this load junk . i can
Just one bad idea after another . bad idea one bad idea Just one bad idea
There is no pleasure in watching a child suffer . no pleasure no pleasure in is no pleasure in
I just loved every minute of this film . loved every just loved every just loved every minute
But it could have been worse . been worse it could have been worse
16
G.1 Interactions encoded in intermediate layers.
Besides interactions w.r.t. the network output, we used our method to analyze interactions w.r.t. the computationof an intermediate-layer feature f . More specifically, we used fN and fS to represent the intermediate-layerfeatures when the input of the network was a set of words N and S in the sentence, respectively. Since theintermediate-layer features fN and fS were high dimensional vectors, we used the scalar 〈fN ,fS〉/||fN ||to represent the award v(S), where ||fN || was used for normalization. In this way, we evaluated interactionsencoded in different layers of BERT. The extracted trees from different intermediate layers of BERT are shown inFigure 7 (for the BERT learned on the SST-2 dataset) and Figure 8 (for the BERT learned on the CoLA dataset).
and film .deepBbetween=-0.04
t=0.93
r=0.11 s=0.07
s=0.007r=0.07
B([s])=-0.09
B([s])=-0.04 B([s])=-0.04
Bbetween=-0.04t=0.99
Bbetween=-0.06t=0.99
meaningful
s=0.07r=0.13
r=0.07 s=0.00
B([s])=-0.26
Bbetween=-0.08t=0.73
a
r=0.05s=0.001
B([s])=-0.07
t=0.47Bbetween=-0.03
and film .deepBbetween=10.70
t=0.98
r=0.13 s=0.09
s=0.04r=0.29
B([s])=-0.60
B([s])=-3.82 B([s])=10.70
Bbetween=-3.82t=0.75
Bbetween=3.76t=0.45
meaningful
s=0.02r=0.17
r=0.69 s=0.00
B([s])=74.75
Bbetween=59.79t=0.26
a
r=0.27 s=0.08
B([s])=22.17
t=0.63Bbetween=12.52
SST-2
4th
8th
Figure 7: The extracted trees of interactions encoded in different intermediate layers of BERT learnedon the SST-2 dataset.
17
betried toJohn goodBbetween=-0.01
t=0.88
r=0.12 s=0.01
Bbetween=-0.04t=0.98
Bbetween=1.52t=0.10
r=0.08 s=0.01
s=0.05r=0.07B([s])=2.75
B([s])=-0.04
B([s])=-0.04
a
Bbetween=0.13t=0.09
r=0.16 s=0.03
B([s])=-0.07
Bbetween=-0.07
r=0.03 s=0.07
B([s])=-0.01
boy .
t=0.18
r=0.10 s=0.00
B([s])=0.08
Bbetween=0.14
Bbetween=-0.05t=0.65
r=0.06s=0.003
B([s])=-0.17
t=0.99
betried toJohn goodBbetween=0.29
t=0.99
r=0.20 s=0.14
Bbetween=-0.53t=0.88
Bbetween=1.52t=0.10
r=0.30 s=0.004
s=0.01r=0.19
B([s])=2.75
B([s])=0.76
B([s])=-0.53
a
Bbetween=0.52t=0.49
r=0.16 s=0.12
B([s])=-0.48
Bbetween=-0.48
r=0.18 s=0.06
B([s])=0.29
boy .
t=0.77
r=0.14 s=0.00
B([s])=1.27
Bbetween=-1.00
Bbetween=0.84t=0.002
r=0.16 s=0.18B([s])=1.28
t=0.96
CoLA
4th
8th
Figure 8: The extracted trees of interactions encoded in different intermediate layers of BERT learnedon the CoLA dataset.
18
worsehave beencouldBbetween=0.001t=0.8298
r=0.012 s=0.1477
B([s])=0.001
Bbetween=0.1088t=0.9709
r=0.0438 s=0.0009
B([s])=-0.1795r=0.039 s=0.132
Bbetween=-0.0179t=0.3935
B([s])=-0.0272
itBut
r=0.0652s=0.0007
B([s])=-0.024
Bbetween=0.451t=0.8965
r=0.0159 s=0.0
B([s])=0.2245
.
SST-2LS
TMCNN
Transformer
BER
TEL
Mo
beenhavecould
t=0.9895
r=0.3587 s=0.1196
B([s])=-0.1459
B([s])=1.3651
r=0.1241 s=0.0391
Bbetween=0.5114t=0.7425
B([s])=1.6299
itBut
Bbetween=-0.574t=0.9914
r=0.6044 s=0.1978
B([s])=-0.574
worse .
B([s])=-0.3068
t=0.9262Bbetween=-0.024
r=0.1122 s=0.033
t=0.9664Bbetween=-0.3068
beenhavecould
t=0.9963
r=0.2191s=0.0508
B([s])=-0.4388Bbetween=0.1318t=0.7637
B([s])=-0.3036
it
Bbetween=0.0389t=0.907
r=0.1159s=0.0553
B([s])=0.0569
worse .But
r=0.1747 s=0.1776
t=0.781Bbetween=0.3742
Bbetween=1.3651
B([s])=-0.1974
r=0.0206 s=0.0
t=0.3392Bbetween=-0.2716
B([s])=0.0829
r=0.1013 s=0.0184
t=0.321Bbetween=-1.5201
beenhavecould
t=0.8028
r=0.3396 s=0.2739
B([s])=0.1935
it
Bbetween=-0.3793t=0.9773
r=0.2409 s=0.2035
B([s])=-0.3793
worseBut .
Bbetween=0.0935t=0.851
B([s])=-0.2098
Bbetween=0.0182t=0.3193
r=0.0895s=0.1546
B([s])=0.0182
Bbetween=0.0422t=0.2817
r=0.0352 s=0.0
B([s])=-0.1208
Bbetween=-0.4388
r=0.0862 s=0.0841
r=0.0696 s=0.0264
beenhavecould
r=0.0525 s=0.0
Bbetween=-0.1792t=0.4577
B([s])=2.9284
it
Bbetween=0.124t=0.9533
r=0.2743 s=0.2985
B([s])=0.124
worse .But
Bbetween=-0.6598t=0.9884
r=0.1975 s=0.1491
B([s])=-0.6598
Bbetween=-0.5145t=0.6828
r=0.1255s=0.0767
B([s])=-2.872
Bbetween=-0.1368t=0.3898
B([s])=-3.0415
Bbetween=0.566
Bbetween=-1.839t=0.9478
r=0.297 s=0.0746
B([s])=-2.3809
r=0.043 s=0.0
Bbetween=0.7715t=0.9771
r=0.2527s=0.2574
B([s])=0.7715Bbetween=1.3387t=0.7805
r=0.3478 s=0.2014
B([s])=2.0279Bbetween=0.1959t=0.962
r=0.4053s=0.3309
B([s])=0.2967 Bbetween=0.4865t=0.7565
r=0.1407 s=0.1354
B([s])=2.5282
Figure 9: Extracted trees of different NLP models trained on the SST-2 dataset.
19
goodup toadds
Bbetween=-0.0596t=0.8826
r=0.0285 s=0.0048
B([s])=-0.0596
r=0.066s=0.0079
Bbetween=-0.1275t=0.9949
B([s])=-0.1275
allIt
Bbetween=0.1413t=0.3439
r=0.0342s=0.0151
B([s])=-0.4644
.
SST-2LS
TMCNN
Transformer
BER
TEL
Mo
fun
goodup toadds
Bbetween=-1.531t=0.6867
r=0.2281 s=0.0789
B([s])=-1.8638r=0.1554 s=0.0291
Bbetween=0.3391t=0.852
B([s])=0.3391
all .funIt
goodup toadds
Bbetween=-0.9351t=0.9232
r=0.2778 s=0.0071
B([s])=-0.9351
r=0.0423 s=0.0319
Bbetween=-0.0049t=0.6401
B([s])=-0.0049
all .funIt
goodup toaddsBbetween=-1.4835
t=0.9791
r=0.2376 s=0.0678
B([s])=-1.4835
r=0.5049 s=0.0559
Bbetween=0.4806t=0.9546
B([s])=0.4806
all fun .It
goodup toaddsBbetween=0.894
t=0.9336
r=0.235 s=0.1102
B([s])=0.894
r=0.1269 s=0.0183
Bbetween=1.4277t=0.7169
B([s])=7.5308
all fun .It
r=0.0258 s=0.2137
Bbetween=0.0067t=0.758
B([s])=0.0067
r=0.0138s=0.0073
Bbetween=0.0285t=0.0294
B([s])=-0.0138
r=0.1088 s=0.0096
Bbetween=-0.4949t=0.7258
B([s])=-0.6098
r=0.0191 s=0.0
Bbetween=0.0809t=0.0628
B([s])=-0.382
r=0.233s=0.1093
Bbetween=-0.7157t=0.804
B([s])=-0.3768
r=0.1528s=0.1188
Bbetween=-0.5767t=0.9804
B([s])=-0.5767
r=0.0845s=0.0621
Bbetween=-0.4505t=0.8752
B([s])=-2.348
r=0.095 s=0.0
Bbetween=-0.7287t=0.6618
B([s])=-4.9881
r=0.1425s=0.0442
Bbetween=-1.2844t=0.3717
B([s])=-4.2624
r=0.1116s=0.004
Bbetween=0.1409t=0.9722
B([s])=0.1431r=0.0531s=0.0348
Bbetween=-0.1177t=0.7915
B([s])=-0.9727
r=0.1045 s=0.0437
Bbetween=-0.2834t=0.6768
B([s])=-1.2383
r=0.3203 s=0.0211
Bbetween=-2.1547t=0.9731
B([s])=-3.2931
r=0.0325 s=0.0
Bbetween=0.1114t=0.2771
B([s])=-3.1834
r=0.4019 s=0.3708
Bbetween=0.3545t=0.8161
B([s])=0.3545
r=0.225s=0.4015
Bbetween=0.4299t=0.3699
B([s])=1.2773
r=0.2473s=0.1087
Bbetween=1.4963t=0.0496
B([s])=1.6923
r=0.2645 s=0.0969
Bbetween=1.6853t=0.0639
B([s])=3.3586
r=0.1323 s=0.0
Bbetween=0.7391t=0.563
B([s])=4.0873
Bbetween=0.6637t=0.9882
r=0.1443 s=0.1478
B([s])=1.6339
Bbetween=0.7945t=0.6959
r=0.1735 s=0.0074
B([s])=2.4027
Bbetween=1.133t=0.8019
r=0.1639 s=0.1085
B([s])=3.7525
r=0.2129s=0.0701
Bbetween=2.3135t=0.7103
B([s])=6.1338
r=0.1381 s=0.0
Bbetween=1.6472t=0.9513
B([s])=9.1684
Figure 10: Extracted trees of different NLP models trained on the SST-2 dataset.
20
anotheridea afterbad
r=0.1541s=0.0618
Bbetween=-0.1791t=0.9944
B([s])=-0.1791
oneJust
Bbetween=0.1016t=0.8939
r=0.1005 s=0.0214
B([s])=-0.0768
.
SST-2LS
TMCNN
Transformer
BER
TEL
Mo
r=0.0475 s=0.0187
Bbetween=0.0395t=0.8847
B([s])=-0.0353
r=0.0206s=0.3343
Bbetween=-0.001t=0.113
B([s])=-0.001
anotheridea afterbad
r=0.21 s=0.3443
Bbetween=0.4237t=0.953
B([s])=0.4237
one .
r=0.0058 s=0.0
Bbetween=0.0392t=0.6092
B([s])=2.6727
Just
anotheridea afterbad
r=0.184 s=0.1195
Bbetween=-0.2757t=0.9808
B([s])=-0.2757
one .
r=0.1116 s=0.0187
Bbetween=-0.1213t=0.8726
B([s])=-0.1139
Just
anotheridea afterbad
r=0.1908 s=0.056
Bbetween=-0.6567t=0.9679
B([s])=-0.6567
one .
r=0.1653 s=0.5209
Bbetween=0.21t=0.9747
B([s])=0.21
Just
anotheridea afterbad
r=0.2438 s=0.0886
Bbetween=0.6444t=0.8822
B([s])=0.6444
one .
r=0.0531 s=0.0
Bbetween=0.2712t=0.6689
B([s])=4.9744
Just
r=0.0791 s=0.0036
Bbetween=-0.0691t=0.9602
B([s])=-0.1038
r=0.0443 s=0.0
Bbetween=0.0443t=0.9306
B([s])=-0.0605
r=0.1334 s=0.175
Bbetween=-0.7692t=0.9734
B([s])=-0.7692
r=0.3112s=0.1084
t=0.9014
B([s])=1.4699
Bbetween=1.077
r=0.1581 s=0.1876
t=0.7421
B([s])=2.1827
Bbetween=0.6392
r=0.1656 s=0.0092
Bbetween=1.1214t=0.8954
B([s])=2.6409
r=0.1915 s=0.0416
Bbetween=0.2601t=0.88
B([s])=0.01r=0.0911 s=0.0306
Bbetween=0.0208t=0.9665
B([s])=0.0208
r=0.045 s=0.1823
Bbetween=0.0097t=0.8675
B([s])=0.0329
r=0.138
Bbetween=-0.1751t=0.9244
B([s])=-0.2562
s=0.0
r=0.2345 s=0.0293
Bbetween=-0.9735t=0.8266
B([s])=-1.7219
r=0.1805 s=0.0902
Bbetween=-0.9006t=0.7165
B([s])=-2.3772
r=0.284 s=0.001
Bbetween=1.0836t=0.2163
B([s])=-1.1185
t=0.192
B([s])=-1.5542
r=0.1132 s=0.0
Bbetween=-0.4427
r=0.1947 s=0.224
Bbetween=0.6942t=0.6873
B([s])=1.231
r=0.3772 s=0.1614
Bbetween=2.0623t=0.8574
B([s])=3.3135
r=0.1421 s=0.1119
Bbetween=0.7469t=0.6926
B([s])=4.1615r=0.1072s=0.002
Bbetween=0.5347t=0.7281
B([s])=4.7118
Figure 11: Extracted trees of different NLP models trained on the SST-2 dataset.
21
theofmuchToo
Bbetween=0.0592t=0.9383
r=0.0466 s=0.0359
B([s])=0.1268
.
SST-2LS
TMCNN
Transformer
BER
TEL
Mo
r=0.0831 s=0.0088
Bbetween=0.0584t=0.9108
B([s])=0.0584
r=0.0374 s=0.0218
Bbetween=0.0174t=0.5219
B([s])=0.0762
r=0.0349s=0.0052
Bbetween=0.0421t=0.7728
B([s])=0.1748
falls flathumor
theofmuchToo
Bbetween=0.3142t=0.7915
r=0.1023 s=0.1788
B([s])=1.347
r=0.1304 s=0.0267
Bbetween=0.3131t=0.8672
B([s])=0.3131
r=0.1149 s=0.3835
Bbetween=-0.0715t=0.905
B([s])=-0.0715
falls flathumor .
r=0.1543s=0.0459
Bbetween=0.3021t=0.8537
B([s])=0.0657 r=0.1258 s=0.0969
Bbetween=-0.2353t=0.8985
B([s])=-0.2353
theofmuch
Bbetween=-0.4689t=0.9401
r=0.2396 s=0.1337
B([s])=-0.4689
r=0.1377 s=0.0001
Bbetween=0.0741t=0.9753
B([s])=0.0741
r=0.2353 s=0.0259
Bbetween=-0.4295t=0.827
B([s])=-0.401
falls flathumor .Too
theofmuchBbetween=-0.0884t=0.9173
r=0.1645 s=0.6295
B([s])=-0.0884
r=0.079 s=0.0487
Bbetween=-0.1478t=0.2893
B([s])=-0.6557
r=0.3903 s=0.0213
Bbetween=-3.2978t=0.8175
B([s])=-4.3701
falls flathumorToo .
theofmuch
Bbetween=0.7887t=0.9542
r=0.2466 s=0.193
B([s])=0.7887
r=0.2959 s=0.0112
Bbetween=1.271t=0.9234
B([s])=3.4548
falls flathumor .Too
r=0.082 s=0.0
Bbetween=-0.1153t=0.709
B([s])=0.1243
r=0.3109 s=0.0305
Bbetween=0.7583t=0.7639
B([s])=1.0002t=0.9078
r=0.0916 s=0.0179
B([s])=0.5316
Bbetween=-0.8125 t=0.6884
r=0.0073 s=0.0
B([s])=-0.864
Bbetween=0.0851t=0.07
r=0.1116 s=0.0005
B([s])=-0.9558
Bbetween=-1.5032
Bbetween=0.4067t=0.8731
r=0.2398 s=0.0886
B([s])=0.0719Bbetween=0.1942t=0.8014
r=0.1722 s=0.0954
B([s])=0.0551
Bbetween=-0.049t=0.8788
r=0.054 s=0.0406
B([s])=-0.0097
Bbetween=0.0901t=0.858
r=0.0719 s=0.0
B([s])=-0.3099
r=0.2545
Bbetween=-0.5192t=0.9513
B([s])=-0.5192
s=0.0156
Bbetween=-0.4138t=0.9142
r=0.1639 s=0.0523
B([s])=-0.4138
Bbetween=0.339t=0.322
r=0.1628 s=0.0435
B([s])=-0.2062 Bbetween=0.4469t=0.1979
r=0.1217 s=0.0
B([s])=-3.9076
Bbetween=0.8989t=0.9375
r=0.2487 s=0.1814
B([s])=1.7186
Bbetween=0.6227t=0.8497
r=0.1878 s=0.0537
B([s])=2.2058
Bbetween=0.8498t=0.724
r=0.179 s=0.0087
B([s])=4.2811
Bbetween=0.5887t=0.6611
r=0.1126 s=0.0048
B([s])=4.9104
Bbetween=0.6359t=0.6698
r=0.1087 s=0.0
B([s])=5.5262
Figure 12: Extracted trees of different NLP models trained on the SST-2 dataset.
22
badsoisThis
Bbetween=0.0437t=0.9758
r=0.0367 s=0.0412
B([s])=0.0437
.
SST-2LS
TMCNN
Transformer
BER
TEL
Mo
r=0.1004 s=0.0278
Bbetween=0.1336t=0.9993
B([s])=0.2348 r=0.0421 s=0.0489
Bbetween=0.0481t=0.8671
B([s])=0.1026
r=0.0287 s=0.0
Bbetween=0.019t=0.9014
B([s])=0.2541
badsoisBbetween=-0.92t=0.9892
r=0.1841 s=0.0152
B([s])=-0.92
.
r=0.3565 s=0.1126
Bbetween=0.4846t=0.9781
B([s])=0.4846
r=0.1498 s=0.0
Bbetween=-0.5102
t=0.5493
B([s])=-0.4072
This
badsois
Bbetween=0.2477t=0.9828
r=0.1213 s=0.123
B([s])=0.2477
.
r=0.1674 s=0.1632
Bbetween=-0.0184t=0.8443
B([s])=-0.0184
r=0.1329 s=0.4042
Bbetween=0.0629t=0.8722
B([s])=0.0456
This
badsois
Bbetween=-1.83t=0.993
r=0.2898 s=0.0303
B([s])=-1.83
.
r=0.3967 s=0.0247
Bbetween=-0.0811t=0.9237
B([s])=-0.0811
r=0.3077 s=0.0
Bbetween=1.4974t=0.1957
B([s])=-1.0794
This
badsoisBbetween=0.9261t=0.9835
r=0.2389 s=0.2761
B([s])=0.9261
.
r=0.1907 s=0.0
Bbetween=1.008t=0.7177
B([s])=4.8007 r=0.272 s=0.0075
Bbetween=1.2635t=0.8621
B([s])=3.8201
This
r=0.1484 s=0.0387
t=0.8572
B([s])=-1.4207
Bbetween=0.5245
r=0.0157 s=0.0
Bbetween=0.0215t=0.6689
B([s])=0.3187
r=0.1355s=0.0211
Bbetween=-0.6422t=0.7814
B([s])=-2.4375
Bbetween=1.7235t=0.908
r=0.3223 s=0.2448
B([s])=2.5458
Figure 13: Extracted trees of different NLP models trained on the SST-2 dataset.
23
CoLA
LSTM
CNN
Transformer
BER
TEL
Mo
havecan notHe
r=0.0002 s=0.00
Bbetween=-0.0006t=0.64
s=0.0001r=0.0011B([s])=-0.0082
B([s])=-0.0083
been
Bbetween=0.0001t=0.53
t=0.98
r=0.0093 s=0.0052
B([s])=-0.004
Bbetween=-0.004 working
B([s])=-0.0078
r=0.0008 s=0.00
t=0.56Bbetween=0.0005
B([s])=-0.0067
r=0.0053 s=0.0026
t=0.79Bbetween=-0.0027
.
B([s])=-0.0083
r=0.0031s=0.0015
t=0.71Bbetween=-0.0017
havecan notHe been working .t=0.95
r=0.37 s=0.18
B([s])=-0.25
t=0.98
r=0.25 s=0.26
B([s])=-0.62
t=0.96
r=0.34 s=0.05
B([s])=0.19
Bbetween=-0.25 Bbetween=-0.62 Bbetween=0.19
t=0.77
r=0.26 s=0.24
B([s])=-0.85
Bbetween=-0.21
t=0.96
r=0.32 s=0.18
B([s])=-0.22
Bbetween=0.52
t=0.48
r=0.65 s=0.00
B([s])=-1.18
Bbetween=-0.73
havecan notHe been working .t=0.91
r=0.16 s=0.13
B([s])=-0.0001t=0.89
r=0.12 s=0.04
B([s])=-0.0002
Bbetween=-0.0001Bbetween=-0.0001
t=0.94
r=0.06 s=0.02
B([s])=-0.0003
Bbetween=-0.0002
t=0.95
r=0.06 s=0.008
B([s])=-0.0005
Bbetween=-0.0002
t=0.99
r=0.07 s=0.01
B([s])=-0.0007
Bbetween=-0.0002
t=0.98
r=0.09 s=0.00
B([s])=-0.001
Bbetween=-0.0003
havecan notHe been working .
t=0.55
r=0.51 s=0.29
B([s])=2.26
t=0.86
r=0.48 s=0.11
B([s])=0.68
t=0.34
r=0.57 s=0.19
B([s])=15.93
Bbetween=1.83
Bbetween=0.68
Bbetween=4.83
t=0.46
r=0.58 s=0.26
B([s])=5.86
Bbetween=3.92
t=0.43
r=0.45 s=0.28
B([s])=11.17
Bbetween=5.41
t=0.88
r=0.15 s=0.00
B([s])=16.35
Bbetween=0.41
notHe can have been workingt=0.94
r=0.14 s=0.19
B([s])=0.07t=0.56
r=0.09 s=0.05
B([s])=0.16
Bbetween=0.07
Bbetween=0.08
t=0.56
r=0.13 s=0.08
B([s])=0.33
Bbetween=0.17
t=0.70
r=0.24 s=0.09
B([s])=0.56
Bbetween=0.22
t=0.71
r=0.20 s=0.004
B([s])=0.27
Bbetween=-0.30
t=0.60
r=0.05 s=0.00
B([s])=0.35
Bbetween=0.08
.
Figure 14: Extracted trees of different NLP models trained on the CoLA dataset.
24
withthe cartloadedfarmerThe apples .
CoLA
LSTM
CNN
Transformer
BER
TEL
Mo
t=0.80
r=0.0079
B([s])=-0.0013
s=0.014
Bbetween=-0.0013
t=0.82
r=0.0084s=0.006
B([s])=-0.0041
Bbetween=-0.0026
t=0.98
r=0.0069 s=0.0057
B([s])=-0.0066
Bbetween=-0.0025
t=0.77
r=0.0041s=0.0025
B([s])=-0.0082
Bbetween=-0.0017t=0.75
r=0.0105s=0.0006
B([s])=-0.0149
Bbetween=-0.0067
t=0.57
r=0.0008 s=0.00
B([s])=-0.0156
Bbetween=-0.0005
t=0.50
r=0.00 s=0.00
B([s])=-0.0155
Bbetween=0.00
withthe cartloadedfarmerThe apples .
t=0.92
r=0.25 s=0.31
B([s])=0.63
Bbetween=0.21
Bbetween=0.38t=0.96
r=0.44 s=0.15
B([s])=0.38
Bbetween=0.54t=0.92
r=0.62 s=0.24
B([s])=1.18Bbetween=-0.34
t=0.94
r=0.39 s=0.22
B([s])=0.84Bbetween=-0.93
t=0.73
r=0.72 s=0.11
B([s])=-0.10
Bbetween=0.20t=0.95
r=0.29 s=0.29
B([s])=0.20
Bbetween=-0.49t=0.25
r=0.46 s=0.00
B([s])=-0.48
withthe cartloadedfarmerThe apples .Bbetween=-0.00
t=0.96
r=0.11 s=0.12
B([s])=-0.00Bbetween=-0.0001
t=0.96
r=0.20 s=0.07
B([s])=-0.0001Bbetween=-0.0001
t=0.94
r=0.16 s=0.06
B([s])=-0.0002
Bbetween=-0.0001t=0.94
r=0.05 s=0.03
B([s])=-0.0004
Bbetween=-0.0001t=0.96
r=0.06 s=0.09
B([s])=-0.0001
Bbetween=-0.0005t=0.96
r=0.20 s=0.05
B([s])=-0.0009Bbetween=-0.0004
t=0.9997
r=0.09 s=0.00
B([s])=-0.0012
withthe cartloadedfarmerThe apples .Bbetween=0.89
t=0.93
r=0.41 s=0.34
B([s])=0.89
Bbetween=-0.74t=0.93
r=0.47 s=0.22
B([s])=-0.74Bbetween=3.53t=0.42
r=0.63 s=0.11
B([s])=4.22Bbetween=4.21
t=0.71
r=0.60 s=0.21
B([s])=8.41
Bbetween=-1.56t=0.34
r=0.27 s=0.15
B([s])=7.02Bbetween=3.69
t=0.39
r=0.58 s=0.05
B([s])=10.75Bbetween=-0.12
t=0.42
r=0.05 s=0.00
B([s])=10.62
theloadedfarmerTheBbetween=-0.07
t=0.96
r=0.09 s=0.04
B([s])=-0.07
Bbetween=-0.14t=0.86
r=0.11 s=0.11
B([s])=-0.26
withcart apples .Bbetween=0.18
t=0.94
r=0.23 s=0.01
B([s])=0.18Bbetween=-0.06
t=0.98
r=0.08 s=0.02
B([s])=0.12Bbetween=0.05
t=0.56
r=0.07 s=0.02
B([s])=0.17
Bbetween=0.25t=0.98
r=0.20 s=0.04
B([s])=-0.09Bbetween=0.85t=0.65
r=0.43 s=0.00
B([s])=0.54
Figure 15: Extracted trees of different NLP models trained on the CoLA dataset.
25
dressMary aboughtCarmenBbetween=-0.0032
t=0.9797
r=0.0085s=0.0035 Bbetween=-0.0005t=0.0517
r=0.0042 s=0.0026
B([s])=-0.0043
B([s])=-0.0032
.
Bbetween=-0.001t=0.0575
r=0.008 s=0.004
Bbetween=0.0003t=0.0598
r=0.0077 s=0.0109
B([s])=-0.0028B([s])=-0.0038
r=0.0012 s=0.0000
Bbetween=0.0002t=0.3575
B([s])=-0.0041
dressMary aboughtCarmen
Bbetween=-0.26t=0.81
r=0.43 s=0.13
Bbetween=0.43t=0.95
r=0.37 s=0.09
B([s])=0.43
B([s])=0.17
.Bbetween=-0.46
t=0.99
r=0.38 s=0.13Bbetween=-0.73t=0.96
r=0.39 s=0.00
B([s])=-1.52
B([s])=-0.46
r=0.25 s=0.22
Bbetween=-0.42t=0.99
B([s])=-0.24
dressMary aboughtCarmen
r=0.15 s=0.00
Bbetween=-0.0001t=0.95
r=0.06 s=0.05
B([s])=-0.0001
B([s])=-0.0002
B([s])=-0.0003
B([s])=-0.0009
.
B([s])=-0.0001t=0.91
Bbetween=-0.0002
r=0.07 s=0.03
t=0.80Bbetween=-0.0001
r=0.06 s=0.01
t=0.95
r=0.08 s=0.05
Bbetween=-0.0001
t=0.99Bbetween=-0.0005
abought MaryCarmen
Bbetween=2.09t=0.46
r=0.33 s=0.06
Bbetween=4.67t=0.50
s=0.32r=0.52B([s])=8.50
B([s])=7.44
dress
Bbetween=1.07t=0.18
t=0.65
r=0.30 s=0.36
B([s])=0.65
Bbetween=0.65
r=0.41 s=0.49
B([s])=2.71
.
B([s])=7.98
r=0.16 s=0.00
t=0.29Bbetween=-0.52
dressMary aboughtCarmen
Bbetween=0.23t=0.94
r=0.24 s=0.00
Bbetween=0.70t=0.78
Bbetween=-0.15t=0.94
r=0.10 s=0.04
s=0.03r=0.32
B([s])=-0.15
B([s])=0.41
B([s])=0.71
.
Bbetween=-0.29t=0.33
t=0.92
r=0.10 s=0.07
B([s])=-0.10
Bbetween=-0.10
r=0.10 s=0.001
B([s])=0.11
CoLA LS
TMCNN
Transformer
BER
TEL
Mo
Figure 16: Extracted trees of different NLP models trained on the CoLA dataset.
26
handsometo behimselfbelievedAnson .
CoLA
LSTM
CNN
Transformer
BER
TEL
Mo
Bbetween=0.0052t=0.98
r=0.0134 s=0.0103
B([s])=0.0052 Bbetween=-0.0044t=0.83
r=0.009 s=0.0071
B([s])=0.0008Bbetween=0.0029t=0.95
r=0.0088 s=0.0048
B([s])=0.0038
Bbetween=-0.0t=0.02
r=0.0018 s=0.00
B([s])=-0.0
Bbetween=-0.0018t=0.92
r=0.0047 s=0.0002
B([s])=0.002Bbetween=-0.0016
t=0.96
r=0.0051 s=0.00
B([s])=0.0004
handsometo behimselfbelievedAnson .Bbetween=0.32
t=0.93
r=0.31 s=0.43
B([s])=0.32
Bbetween=0.48t=0.99
r=0.35 s=0.39
B([s])=0.48Bbetween=0.65
t=0.80
r=0.54 s=0.17
B([s])=1.00 Bbetween=-0.98t=0.79
r=0.50 s=0.04
B([s])=0.45Bbetween=0.08t=0.31
r=0.14 s=0.07
B([s])=0.52 Bbetween=-0.18t=0.21
r=0.19 s=0.00
B([s])=0.34
handsometo behimselfbelievedAnson .Bbetween=-0.0001
t=0.89
r=0.11 s=0.14
B([s])=-0.0001Bbetween=-0.0001
t=0.89
r=0.13 s=0.04
B([s])=-0.0002Bbetween=-0.0001
t=1.00
r=0.04 s=0.02
B([s])=-0.0003
Bbetween=-0.0003t=0.98
r=0.05 s=0.01
B([s])=-0.0006
Bbetween=-0.0t=0.93
r=0.04 s=0.06
B([s])=-0.0
Bbetween=-0.0004t=0.98
r=0.05 s=0.00
B([s])=-0.0011
handsometo behimselfbelievedAnson .Bbetween=1.02
t=0.97
r=0.55 s=0.23
B([s])=1.02Bbetween=2.03
t=0.83
r=0.60 s=0.17
B([s])=3.03Bbetween=2.90
t=0.48
r=0.42 s=0.38
B([s])=5.80Bbetween=4.27t=0.11
r=0.52 s=0.38
B([s])=10.18Bbetween=1.10
t=0.01
r=0.37 s=0.01
B([s])=11.35 Bbetween=-0.09t=0.44
r=0.04 s=0.00
B([s])=11.26
handsometo behimselfbelievedAnson .Bbetween=0.24
t=0.98
r=0.31 s=0.10
B([s])=0.24
Bbetween=-0.22t=0.99
r=0.16 s=0.11
B([s])=-0.22Bbetween=0.17
t=0.82
r=0.22 s=0.25
B([s])=-0.04
Bbetween=0.10t=0.41
r=0.16 s=0.15
B([s])=0.34Bbetween=-0.50
t=0.82
r=0.24 s=0.10
B([s])=-0.17Bbetween=0.08
t=0.99
r=0.07 s=0.00
B([s])=-0.33
Figure 17: Extracted trees of different NLP models trained on the CoLA dataset.
27
topolitics yesterdayabouttalkedMike .
CoLA
LSTM
CNN
Transformer
BER
TEL
Mo
Bbetween=-0.0002t=0.92
r=0.06 s=0.01
B([s])=-0.0008
my friendsBbetween=-0.0042
t=0.99
r=0.0099 s=0.0074
B([s])=-0.0042
Bbetween=0.0026t=0.57
r=0.0108 s=0.0064
B([s])=-0.0016Bbetween=-0.0015t=0.61
r=0.0077 s=0.0097
B([s])=-0.0049
Bbetween=-0.0027t=0.41
r=0.012 s=0.0074
B([s])=-0.0076
Bbetween=-0.0019t=0.65
r=0.0058 s=0.0044
B([s])=-0.0034
Bbetween=-0.0009t=0.06
r=0.0135 s=0.0044
B([s])=-0.0085
Bbetween=-0.0001t=0.83
r=0.0024 s=0.0026
B([s])=-0.0001
Bbetween=0.0001t=0.43
r=0.0013 s=0.00
B([s])=-0.0086
topolitics yesterdayabouttalkedMike .my friendsBbetween=-0.17
t=0.96
r=0.25 s=0.19
B([s])=-0.17
Bbetween=-0.36t=0.92
r=0.19 s=0.29
B([s])=-0.36
Bbetween=-0.20t=0.90
r=0.38 s=0.21
B([s])=-0.20
Bbetween=-0.78t=0.98
r=0.38 s=0.12
B([s])=-0.78Bbetween=-0.46t=0.07
r=0.20 s=0.28
B([s])=-1.03 Bbetween=0.92t=0.94
r=0.19 s=0.11
B([s])=-1.04Bbetween=0.65
t=0.53
r=0.15 s=0.08
B([s])=-0.59 Bbetween=-0.81t=0.42
r=0.23 s=0.00
B([s])=-1.41
topolitics yesterdayabouttalkedMike .my friends
.t=0.87
r=0.07 s=0.11
B([s])=-0.0t=0.88
r=0.09 s=0.11
B([s])=-0.0001
Bbetween=-0.0
Bbetween=-0.0001
t=0.96
r=0.14 s=0.14
B([s])=-0.0003
Bbetween=-0.0001
t=0.96
r=0.25 s=0.17
B([s])=-0.0004
Bbetween=-0.0002Bbetween=-0.0002
t=0.93
r=0.06 s=0.01
B([s])=-0.0006Bbetween=-0.0002
t=0.95
r=0.05 s=0.01
B([s])=-0.001
Bbetween=-0.0003t=0.97
r=0.06 s=0.00
B([s])=-0.0013
topolitics yesterdayabouttalkedMike .my friendsBbetween=0.84
t=0.94
r=0.44 s=0.24
B([s])=0.84Bbetween=1.42
t=0.54
r=0.53 s=0.14
B([s])=2.17
Bbetween=2.29t=0.49
r=0.47 s=0.23
B([s])=4.59
Bbetween=1.24t=0.59
r=0.63 s=0.13
B([s])=5.99
Bbetween=1.23t=0.97
r=0.52 s=0.31
B([s])=1.23Bbetween=6.39
t=0.39
r=0.75 s=0.09
B([s])=14.44Bbetween=-0.61
t=0.15
r=0.17 s=0.14
B([s])=13.88
Bbetween=-0.78t=0.18
r=0.25 s=0.00
B([s])=13.13
topolitics yesterdayabouttalkedMike .my friendsBbetween=0.08
t=0.86
r=0.21 s=0.03
B([s])=0.08Bbetween=0.10
t=0.78
r=0.25 s=0.11
B([s])=0.17Bbetween=0.06
t=0.27
r=0.07 s=0.03
B([s])=0.25
Bbetween=0.16t=0.27
r=0.12 s=0.05
B([s])=0.41
t=0.86
r=0.17
B([s])=0.05
s=0.45
Bbetween=0.05
t=0.97
r=0.15 s=0.08
B([s])=-0.10
Bbetween=-0.16
t=0.96
r=0.16 s=0.003
B([s])=-0.30
Bbetween=-0.19
Bbetween=-0.19t=0.12
r=0.08 s=0.00
B([s])=-0.12
Figure 18: Extracted trees of different NLP models trained on the CoLA dataset.
28