Corpus-level and Concept-based Explanations for

1

Corpus-level and Concept-based Explanations forInterpretable Document Classification

TIAN SHI, Virginia Tech, USAXUCHAO ZHANG, Virginia Tech, USAPING WANG, Virginia Tech, USACHANDAN K. REDDY, Virginia Tech, USA

Using attention weights to identify information that is important for models’ decision making is a popularapproach to interpret attention-based neural networks. This is commonly realized in practice through thegeneration of a heat-map for every single document based on attention weights. However, this interpretationmethod is fragile and it is easy to find contradictory examples. In this paper, we propose a corpus-levelexplanation approach, which aims to capture causal relationships between keywords and model predictionsvia learning the importance of keywords for predicted labels across a training corpus based on attentionweights. Based on this idea, we further propose a concept-based explanation method that can automaticallylearn higher-level concepts and their importance to model prediction tasks. Our concept-based explanationmethod is built upon a novel Abstraction-Aggregation Network, which can automatically cluster importantkeywords during an end-to-end training process. We apply these methods to the document classificationtask and show that they are powerful in extracting semantically meaningful keywords and concepts. Ourconsistency analysis results based on an attention-based Naïve Bayes classifier also demonstrate that thesekeywords and concepts are important for model predictions.

CCSConcepts: • Information systems→ Sentiment analysis;Clustering and classification; Informationextraction.

Additional Key Words and Phrases: Attention mechanism, model interpretation, document classification,sentiment classification, concept-based explanation.

ACM Reference Format:Tian Shi, Xuchao Zhang, Ping Wang, and Chandan K. Reddy. 2021. Corpus-level and Concept-based Explana-tions for Interpretable Document Classification. ACM Trans. Knowl. Discov. Data. 1, 1, Article 1 (January 2021),17 pages. https://doi.org/10.1145/3477539

1 INTRODUCTIONAttention Mechanisms [2] have boosted performance of deep learning models in a variety ofnatural language processing (NLP) tasks, such as sentiment analysis [35, 47], semantic parsing [46],machine translation [30], reading comprehension [9, 17] and others. Attention-based deep learningmodels have been widely investigated not only because they achieve state-of-the-art performance,but also because they can be interpreted by identifying important input information via visualizingheat-maps of attention weights [11, 40, 42, 45], namely attention visualization. Therefore, attention

Authors’ addresses: Tian Shi, [email protected], Virginia Tech, USA; Xuchao Zhang, [email protected], Virginia Tech, USA; PingWang, [email protected], Virginia Tech, USA; Chandan K. Reddy, [email protected], Virginia Tech, USA.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without feeprovided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice andthe full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored.Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requiresprior specific permission and/or a fee. Request permissions from [email protected].© 2021 Association for Computing Machinery.1556-4681/2021/1-ART1 $15.00https://doi.org/10.1145/3477539

ACM Trans. Knowl. Discov. Data., Vol. 1, No. 1, Article 1. Publication date: January 2021.

https://doi.org/10.1145/3477539

https://doi.org/10.1145/3477539

1:2 Shi et al.

mechanisms help end-users to understand models and diagnose trustworthiness of their decisionmaking.

However, the attention visualization approach still suffers from several drawbacks: 1) The fragilityof attention weights can easily make end-users find contradicting examples, especially for noisydata and cross-domain applications. For example, a model may attend on punctuation or stop-words.2) Attention visualization cannot automatically extract high-level concepts that are important formodel predictions. For example, when a model assigns news articles to Sports, relevant keywordsmay be player, basketball, coach, nhl, golf, and nba. Obviously, we can build three concepts/clustersfor this example, i.e., roles (player, coach), games (basketball, soccer), and leagues (nba, nhl). 3)Attention visualization still relies on human experts to decide if keywords attended by models areimportant to model predictions.

There have been some studies that attempt to solve these problems. For example, Jain andWallace[20], Serrano and Smith [38] focused on studying if attention can be used to interpret a model,however, there are still problems in their experimental designs [48]. Yeh et al. [53] tried to apply ageneric concept-based explanation method to interpret BERT models in the text classification task,however, they did not obtain semantically meaningful concepts for model predictions. Antogniniand Faltings [1] introduced a concept explanation method that first extracts a set of text snippetsas concepts and infers which ones are described in the document, and then it explained thepredictions of sentiment with a linear aggregation of concepts. In this paper, we propose a general-purpose corpus-level explanation method and a concept-based explanation method based on a novelAbstraction-Aggregation Network (AAN) to tackle the aforementioned drawbacks of attentionvisualization. We summarize the primary contributions of this paper as follows:• To solve the first problem, we propose a corpus-level explanation method, which aims to discovercausal relationships between keywords and model predictions. The importance of keywords islearned across the training corpus based on attention weights. Thus, it can provide more robustexplanations compared to attention visualization case studies. The discovered keywords aresemantically meaningful for model predictions.

• To solve the second problem, we propose a concept-based explanation method (case-level andcorpus-level) that can automatically learn semantically meaningful concepts and their importanceto model predictions. The concept-based explanation method is based on an AAN that canautomatically cluster keywords, which are important to model predictions, during the end-to-endtraining for the main task. Compared to the basic attention mechanisms, the models with AANdo not compromise on classification performance or introduce any significant number of newparameters.

• To solve the third problem, we build aNaïve Bayes Classifier (NBC), which is based on an attention-based bag-of-words document representation technique and the causal relationships discoveredby the corpus-level explanation method. By matching predictions from the model and NBC, i.e.,consistency analysis, we can verify if the discovered keywords are important to model predictions.This provides an automatic verification pipeline for the results from the corpus-level explanationand concept-based explanation methods.The rest of this paper is organized as follows: In Section 2, we introduce related work of

feature-based and concept-based explanation. In Section 3, we first present details of our pro-posed abstraction-aggregation network (AAN), and then discuss corpus-level and concept-basedexplanation methods. In Section 4, we evaluate different self-attention and AAN based models onthree different datasets. We also show how corpus-level and concept-based explanations can helpin the interpretation of attention-based classification models andcan potentially provide a betterunderstanding of the training corpus. Our discussion concludes in Section 5.


Corpus-level and Concept-based Explanation 1:3

2 RELATEDWORKIncreasing the interpretability on machine learning models has become an important topic ofresearch in recent years. Most prior work [14, 27, 29, 41] focus on interpreting models via feature-based explanations, which alters individual features such as pixels and word-vectors in the form ofeither deletion [36] or perturbation [43]. However, these methods usually suffer from the reliabilityissues when adversarial perturbations [12] or even simple shifts [23] of the input data. Moreover,the feature-based approaches explain the model behavior locally [36] for each data samples withouta global explanation [13, 21] on how the models make their decisions. In addition, feature-basedexplanation is not necessarily the most effective way for human understanding.To alleviate the issues of feature-based explanation models, some researchers have focused on

explaining the model results in the form of high-level human concepts [4, 6, 8, 44, 50, 54, 56]. Unlikeassigning the importance scores to individual features, the concept-based methods use the corpus-level concepts as the interpretable unit. For instance, concept “wheels" can be used for detecting thevehicle images and concept “Olympic Games" for identifying the sports documents. However, mostof the existing concept-based approaches require human supervision in providing hand-labeledexamples of concepts, which is labor intensive and some human bias can be introduced in theexplanation process [19, 21, 39]. Recently, automated concept-based explanation methods [5, 53] areproposed to identify higher-level concepts that are meaningful to humans automatically. However,they have not shown semantically meaningful concepts on text data. In text classification area,most of the existing approaches focus on improving the classification performance, but ignore theinterpretability of the model behaviors [52]. Liu and Avci [27] utilize the feature attribution methodto help users interpret themodel behavior. Bouchacourt and Denoyer [5] propose a self-interpretablemodel through unsupervised concept extraction. However, it requires another unsupervised modelto extract concepts. Different from these studies, our corpus-level explanation method can begenerally applied to self-attention mechanisms, and our concept-based explanation method, whichis based on a hierarchical attention network, can automatic learn concepts during the end-to-endtraining.

3 PROPOSEDWORKIn this section, we first introduce the classification framework and our Abstraction-AggregationNetwork (AAN). Then, we discuss the corpus-level explanation, concept-based explanation, andattention-based Naïve Bayes Classifier.

3.1 The Proposed Model3.1.1 Basic Framework. A typical document classification model is equipped with three com-ponents, i.e., an encoder, an attention or pooling layer and a classifier. 1) Encoder: An encoderreads a document, denoted by 𝑑 = (𝑤1,𝑤2, ...,𝑤𝑇 ), and transforms it to a sequence of hiddenstates 𝐻 = (ℎ1, ℎ2, ..., ℎ𝑇 ). Here, 𝑤𝑡 is the one-hot representation of token 𝑡 in the document. ℎ𝑡is also known as a word-in-context representation. Traditionally, the encoder consists of a wordembedding layer followed by a LSTM [18] sequence encoder. Recently, pre-trained language models[9, 34, 51] have emerged as an important component for achieving superior performance on avariety of NLP tasks including text classification. Our model is adaptable to any of these encoders.2) Attention/Pooling: The attention or pooling (average- or max-pooling) layer is used to con-struct a high-level document representation, denoted by 𝑣doc. In attention networks, the attentionweights show the contributions of words to the representations [26, 52]. Compared with pooling,attention operations can be well interpreted by visualizing attention weights [52]. 3) Classifier:The document representation is passed into a classifier to get the probability distribution over


1:4 Shi et al.

...

Abstraction-AggregationNetwork (AAN)

AbstractionAttention

DiversityPenalty

WeightsDropout

Consistency Analysis

FC Network

Naive Bayes Classifier

Case-level Concept-based Explanation

Corpus-level Explanation

Doc-level Word & Concept Distribution

Causal Relationships

Word & ConceptDistribution

Corpus-level Concept-based Explanation

Causal RelationshipVerification

Attention-based BOW Representation

AggregationAttention

Prediction

Encoder Network

Interpretation

Pθ(ck|y=l)Pθ(w|ck, y=l) Pθ(w, ξ|ck,y=l) Pθ(ck, ξ|y=l)

Pθ(w|y=l)

w1 w2 w3 wT-1 wT

Fig. 1. The proposed Abstraction-Aggregation Network and different interpretation methods.

different class labels. The classifier can be a multi-layer feed-forward network with activation layerfollowed by a softmax layer, i.e., 𝑦 = softmax(𝑊2 · ReLU(𝑊1 · 𝑣doc + 𝑏1) + 𝑏2), where𝑊1,𝑊2, 𝑏1 and𝑏2 are model parameters.

To infer parameters, we can minimize the averaged cross-entropy error between predicted andground-truth labels. Here, loss function is defined as L\ = −∑𝐿

𝑙=1 𝑦 log(𝑦), where 𝑦 representsthe ground-truth label and 𝐿 is the number of class labels. The model is trained in an end-to-endmanner using back-propagation.

3.1.2 Abstraction-Aggregation Network. In order to use different explanation methods, especiallyconcept-based explanation, to interpret deep neural networks, we propose a novel AAN for theAttention/Pooling layer, which first captures keywords for different concepts from a document,and then aggregates all concepts to construct the document representation (see Fig. 1).AAN has two stacked attention layers, namely, abstraction-attention (abs) and aggregation-

attention (agg) layers. In the abs layer, for each attention unit 𝑘 , we calculate the alignment score𝑢abs𝑘,𝑡

and attention weight 𝛼abs𝑘,𝑡

as follows:

𝑢abs𝑘,𝑡

= (𝑔abs𝑘

)⊤ℎ𝑡 ,

𝛼abs𝑘,𝑡

=exp(𝑢abs

𝑘,𝑡)∑𝑇

𝜏=1 exp(𝑢abs𝑘,𝜏),

(1)

where 𝑔abs𝑘

are model parameters. Here, we do not apply linear transformation and tanh activationwhen calculating alignment scores for two reasons: 1) Better intuition: Calculating attentionbetween 𝑔abs

𝑘and ℎ𝑡 in Eq. (1) is the same as calculating a normalized similarity between them.

Therefore, abstraction-attention can also be viewed as a clustering process, where 𝑔abs𝑘

determinesthe centroid of each cluster. In our model, concepts are related to the clusters discovered by AAN.2) Fewer parameters: Without the linear transformation layer, the abstraction-attention layeronly introduces 𝐾 × |ℎ𝑡 | new parameters, where |ℎ𝑡 | is the dimension of ℎ𝑡 and 𝐾 ≪ |ℎ𝑡 |. The 𝑘 th



representation is obtained by 𝑣abs𝑘

=∑𝑇𝑡=1 𝛼

abs𝑘,𝑡ℎ𝑡 . We use 𝐾 to denote the total number of attention

units.In the agg layer, there is only one attention unit. The alignment score 𝑢agg

𝑘and attention weight

𝛼agg𝑘

are obtained by

𝑢agg𝑘

= (𝑔agg)⊤ tanh(𝑊agg𝑣abs𝑘

+ 𝑏agg),and

𝛼agg𝑘

=exp(𝑢agg

𝑘)∑𝐾

^=𝑘exp(𝑢agg^ )

,

where𝑊agg, 𝑏agg and 𝑔agg are model parameters. The final document representation is obtained by𝑣doc =

∑𝐾𝑘=1 𝛼

agg𝑘𝑣abs𝑘

. It should be noted that AAN is different from hierarchical attention [52], whichaims to get a better representation. However, AAN is used to automatically capture concepts/clusters.We have also applied two important techniques to obtain semantically meaningful concepts.

1) Diversity penalty for abstraction-attentionweights: To encourage the diversity of concepts,we introduce a new penalization term to abstraction-attention weights 𝐴 = [−→𝛼 abs

1 ,−→𝛼 abs2 , ...,−→𝛼 abs

𝐾] ∈

R𝑇×𝐾 , where −→𝛼 abs𝑘

= (𝛼abs𝑘,1 , 𝛼

abs𝑘,2 , ..., 𝛼

abs𝑘,𝑇

)⊤. We define the penalty function as

Ldiv =1𝐾∥𝐴⊤𝐴 − 𝐼 ∥𝐹 , (2)

where ∥ · ∥𝐹 represents the Frobenius norm of a matrix. Hence, the overall loss function is expressedas L = L\ + Ldiv.

2) Dropout of aggregation-attention weights: In the aggregation-attention layer, it is possiblethat 𝛼agg

𝑘≈ 1 for some 𝑘 , and other attention weights tend to be 0. To alleviate this problem,

we apply dropout with a small dropout rate to aggregation-attention weights (𝛼agg1 , 𝛼agg2 , ..., 𝛼

agg𝐾

),namely attention weights dropout. It should be noted that a large dropout rate has a negativeimpact on the explanation, since it discourages the diversity of concepts. More specifically, themodel will try to capture keywords in the dropped abstraction-attention units by the other units.

3.2 ExplanationIn this section, we discuss corpus-level and concept-based explanations. Given a corpus C with |C|documents, we use 𝑑 or b to represent a document. Let us also use \ to denote all parameters of amodel and V to represent the vocabulary, where |V| is the size of V . Throughout this paper, wewill assume that both prior document probability 𝑝 (𝑑) and prior label probability 𝑝\ (𝑦 = 𝑙) areconstants. For example, in a label-balanced dataset, 𝑝\ (𝑦 = 𝑙) ≈ 1/𝐿.We will first apply the attention weights visualization technique to the proposed AAN model.

Here, the document representation can be directly expressed by the hidden states, i.e.,

𝑣agg =

𝑇∑𝑡=1

(𝐾∑𝑘=1

𝛼agg𝑘𝛼abs𝑘,𝑡

)ℎ𝑡 ,

where

𝛼𝑑𝑡 =

𝐾∑𝑘=1

𝛼agg𝑘𝛼abs𝑘,𝑡

(3)

gives the contribution of word𝑤𝑡 to the document representation. Therefore, we can interpret anysingle example via visualizing the combined weights 𝛼𝑑𝑡 .


1:6 Shi et al.

3.2.1 Corpus-Level Explanation. Different from case-level explanation (attention weights visu-alization), which focuses on per-sample features, corpus-level explanation aims to find causalrelationships between keywords captured by the attention mechanism and model predictions,which can provide robust explanation for the model. To achieve this goal, we learn distributions ofkeywords for different predicted labels on the training corpus based on attention weights.Formally, for a given word𝑤 ∈ V and a label 𝑙 predicted by a model \ 1, the importance of the

word to the label can be estimated by the probability 𝑝\ (𝑤 |𝑦 = 𝑙) across the training corpus Ctrainsince the model is trained on it. Therefore, 𝑝\ (𝑤 |𝑦 = 𝑙) can be expanded as follows:

𝑝\ (𝑤 |𝑦 = 𝑙) =∑

b ∈C𝑙train

𝑝\ (𝑤, b |𝑦 = 𝑙), (4)

where C𝑙train ⊂ Ctrain consists of documents with model predicted label 𝑙 . For each documentb ∈ C𝑙train, probability 𝑝\ (𝑤, b |𝑦 = 𝑙) represents the importance of word𝑤 to label 𝑙 , which can bedefined using attention weights, i.e.,

𝑝\ (𝑤, b |𝑦 = 𝑙) :=∑𝑇𝑡=1 𝛼

b𝑡 · 𝛿 (𝑤𝑡 ,𝑤)∑

b′∈Ctrain 𝑓b′ (𝑤) + 𝛾 , (5)

where 𝑓b′ (𝑤𝑡 ) is frequency of𝑤𝑡 in document b ′ and𝛾 is a smoothing factor.𝛿 (𝑤𝑡 ,𝑤) ={1 if𝑤𝑡 = 𝑤0 otherwise

is a delta function. The denominator is applied to reduce noises from stop-words and punctuation.For the sake of simplicity, we will use 𝑝\ (𝑤, 𝑙, C) to denote 𝑝\ (𝑤𝑡 |𝑦 = 𝑙), where C corresponds tothe corpus Eq. (4), and can be different from Ctrain in our applications. The denominator in Eq. (5)is always determined by the training corpus.With respect to the applications: 1) Since Eq. (4) captures the importance of words to model

predicted labels, we can use it as a criterion for finding their causal relationships. In experiments, wecan collect top-ranked keywords for each label 𝑙 for further analysis. 2) We can also use corpus-levelexplanation to measure the difference between two corpora (i.e., Ctest1 and Ctest2). Formally, we cancompare |Ctrain |

|Ctest1 | · 𝑝\ (𝑤, 𝑙, Ctest1) with |Ctrain ||Ctest2 | · 𝑝\ (𝑤, 𝑙, Ctest2) across different words and class labels.

The difference can be evaluated by Kullback-Leibler divergence [25]. In addition, we can get mutualkeywords shared across different domains based on these distributions.It should be noted that the corpus-level explanation discussed in this section can be applied to

interpret different attention-based networks.

3.2.2 Concept-Based Explanation. The corpus-level explanation still suffers from the drawbackthat it cannot automatically obtain higher-level concepts/clusters for those important keywords.To alleviate this problem, we propose concept-based explanation for our AAN model. In AAN,each abstraction-attention unit can capture one concept/cluster. Here, we will take distribution ofconcepts into consideration. Formally, we express 𝑝\ (𝑤𝑡 |𝑦 = 𝑙) as follows:

𝑝\ (𝑤 |𝑦 = 𝑙) =𝐾∑𝑘=1

𝑝\ (𝑤 |𝑐𝑘 , 𝑦 = 𝑙)𝑝\ (𝑐𝑘 |𝑦 = 𝑙)

where 𝑝\ (𝑤 |𝑐𝑘 , 𝑦 = 𝑙) captures the distribution of 𝑤 across Ctrain for the 𝑘 th concept and label 𝑙 ,while 𝑝\ (𝑐𝑘 |𝑦 = 𝑙) captures the distribution of the concept 𝑐𝑘 across Ctrain for label 𝑙 . They can becomputed using the following equations.

1Here, the label is the model’s prediction, not the ground-truth label, because our goal is to explain the model.



𝑝\ (𝑤 |𝑐𝑘 , 𝑦 = 𝑙) =∑

b ∈C𝑙train

𝑝\ (𝑤, b |𝑐𝑘 , 𝑦 = 𝑙),

𝑝\ (𝑐𝑘 |𝑦 = 𝑙) =∑

b ∈C𝑙train

𝑝\ (𝑐𝑘 , b |𝑦 = 𝑙),(6)

where we define

𝑝\ (𝑤, b |𝑐𝑘 , 𝑦 = 𝑙) :=∑𝑇𝑡=1 𝛼

abs,b𝑘,𝑡

· 𝛿 (𝑤𝑡 ,𝑤)∑b′∈Ctrain 𝑓b′ (𝑤) + 𝛾 (7)

and

𝑝\ (𝑐𝑘 , b |𝑦 = 𝑙) :=𝛼agg,b𝑘

|Ctrain |, (8)

where 𝛼abs,b𝑘,𝑡

represents 𝛼abs𝑘,𝑡

for document b . Based on Eq. (6), we are able to obtain scores (impor-tance) and most relevant keywords for different concepts for a given label 𝑙 .

3.2.3 Consistency Analysis. In corpus-level and concept-based explanations, we have obtainedcausal relationships between keywords and predictions, i.e., 𝑝\ (𝑤 |𝑦 = 𝑙). However, we have notverified if these keywords are really important to predictions. To achieve this goal, we build a NaïveBayes classifier (NBC) [10] based on these causal relationships. Formally, for each testing document𝑑 , the probability of getting label 𝑙 is approximated as follows:

𝑝\ (𝑦 = 𝑙 |𝑑) = 𝑝\ (𝑑 |𝑦 = 𝑙)𝑝\ (𝑦 = 𝑙)𝑝 (𝑑)

∝ 𝑝\ (𝑑 |𝑦 = 𝑙) =𝑇∏𝑡=1

𝑝\ (𝑤𝑡 |𝑦 = 𝑙),(9)

where 𝑝\ (𝑤𝑡 |𝑦 = 𝑙) is obtained by Eq. (4) or Eq. (6) on the training corpus. We further approximateEq. (9) with

𝑝\ (𝑦 = 𝑙 |𝑑) =∏𝑤∈𝑑′

(𝑝\ (𝑤 |𝑦 = 𝑙) + _), (10)

where 𝑑 ′ ⊂ 𝑑 is an attention-based bag-of-words representation for document 𝑑 . It consists ofimportant keywords based on attention weights. _ is a smoothing factor. Here, we can conductconsistency analysis by comparing labels obtained by the model and NBC, which may also helpestimate the uncertainty of a model [55].

4 EXPERIMENTS4.1 DatasetsWe conducted experiments on three publicly available datasets. Newsroom is used for news catego-rization, while IMDB and Beauty are used for sentiment analysis. The details of the three datasetsare as follows: 1) Newsroom [15]: The original dataset, which consists of 1.3 million news articles,was proposed for text summarization. In our experiments, we first determined the category ofeach article based on the URL, and then, randomly sample 10,000 articles for each of the fivecategories, including business, entertainment, sports, health, and technology. 2) IMDB [31]: Thisdataset contains 50,000 movie reviews from the IMDB website with binary (positive or negative)labels. 3) Beauty [16]: This dataset contains product reviews in the beauty category from Amazon.We converted the original ratings (1-5) to binary (positive or negative) labels and sampled 20,000reviews for each label. For all three datasets, we tokenized reviews using the BERT tokenizer [49]


1:8 Shi et al.

Table 1. Statistics of the datasets used.

Dataset #docs Avg. Length ScaleNewsroom 50,000 827 1-5

IMDB 50,000 292 1-2Beauty 40,000 91 1-2

Table 2. Averaged accuracy of different models on Newsroom, IMDB, and Beauty testing sets.

Model Newsroom IMDB BeautyCNN 90.18 88.56 88.42LSTM-SAN 91.26 90.68 92.00BERT-SAN 92.28 92.60 93.72DistilBERT-SAN 92.66 92.52 92.82RoBERTa-SAN 91.16 92.76 93.40Longformer-SAN 92.04 93.74 94.50

and randomly split them into train/development/test sets with a proportion of 8/1/1. Statistics ofthe datasets are summarized in Table 1.

4.2 Models and Implementation DetailsWe compare different classification models including several baselines, variants of our AAN model,and Naïve Bayes classifiers driven by a basic self-attention network (SAN) [38] and AAN.• CNN [22]: This model extracts key features from a review by applying convolution and max-over-time pooling operations [7] over the shared word embedding layer.

• LSTM-SAN,BERT-SAN,DistilBERT-SAN,RoBERTa-SAN, andLongformer-SAN: All thesemodels are based on the SAN framework. In LSTM-SAN, the encoder consists of a word embed-ding layer and a Bi-LSTM encoding layer, where embeddings are pre-loaded with 300-dimensionalGloVe vectors [33] and fixed during training. BERT [49], DistilBERT [37], RoBERTa [28], andLongformer [3] leverage different pre-trained language models, which have 110M, 66M, 125M,and 125M parameters, respectively.

• AAN + C(𝑐) + Drop(𝑟 ): These are variants of AAN. C(𝑐) and Drop(𝑟 ) represent the number ofconcepts and dropout rate, respectively.We implemented all deep learning models using PyTorch [32] and the best set of parameters

are selected based on the development set. For CNN based models, the filter sizes are chosen tobe 3, 4 and 5, and the number of filters is set to 100 for each size. For LSTM based models, thedimension of hidden states is set to 300 and the number of layers is 2. All parameters are trainedwith the ADAM optimizer [24] with a learning rate of 0.0002. Dropout with a rate of 0.1 is alsoapplied in the classification layer. For all explanation tasks, we set the number of concepts to 10and dropout-rate to 0.02. Our codes and datasets are available at https://github.com/tshi04/ACCE.

4.3 Performance ResultsWe use accuracy as the evaluation metric to measure the performance of different models. Allquantitative results have been summarized in Tables 2 and 3, where we use bold font to highlightthe highest accuracy on testing sets in Table 2. Comparing LSTM-SAN with BERT, DistilBERT,RoBERTa and Longformer, we first find that different pre-trained language model-based encodersare better than the conventional LSTM encoder with pre-trained word embeddings. In Table 3,we replace self-attention on top of pre-trained language models with the abstraction-aggregation


https://github.com/tshi04/ACCE


Table 3. Averaged accuracy of BERT and Longformer-based AAN models on Newsroom, IMDB, and Beautytesting sets.

Newsroom IMDB BeautyBERT Longformer BERT Longformer BERT Longformer

SAN Framework 92.28 92.04 92.60 93.74 93.72 94.50AAN + C(10) + Drop(0.01) 92.54 91.72 92.22 92.96 93.38 93.42AAN + C(10) + Drop(0.02) 92.14 91.64 92.14 92.86 93.58 93.75AAN + C(10) + Drop(0.05) 92.14 91.60 91.82 92.66 93.05 93.80AAN + C(10) + Drop(0.10) 92.30 91.48 91.50 92.12 93.25 93.60AAN + C(20) + Drop(0.01) 92.02 91.98 91.64 92.78 93.70 93.48AAN + C(20) + Drop(0.02) 92.44 91.84 91.80 93.04 93.55 93.88AAN + C(20) + Drop(0.05) 92.54 91.86 91.92 93.14 93.68 93.42AAN + C(20) + Drop(0.10) 92.52 91.98 92.10 92.96 93.72 93.88

Table 4. Case-level concept-based explanation. Here, each ID is associated with a concept, i.e., abstraction-attention unit. Scores and weights (following each keyword) are calculated with Eqs. (7) and (8). ‘-’ representsspecial characters.

ID Score Keywords8 0.180 com(0.27), boston(0.26), boston(0.16), boston(0.1), m(0.02)

6 0.162 marketing(0.28), ad(0.06), ##fs(0.05), investors(0.03),said(0.03)

1 0.148 campaign(0.14), firm(0.14), money(0.06), brand(0.04), econo-mist(0.03)

2 0.122 economist(0.2), said(0.16), professional(0.16), investors(0.08),agency(0.06)

9 0.116 boston(0.89), boston(0.11)

7 0.108 bloomberg(0.16), cn(0.11), global(0.09), money(0.06), ca-ble(0.05)

5 0.103 -(0.96), -(0.03), s(0.01)

4 0.047 investment(0.76), money(0.14), investment(0.06), invest-ment(0.02), investors(0.01)

3 0.016 ,(0.64), -(0.36)10 0.000 .(0.93), -(0.07)

network (AAN). We observe that different AAN models do not significantly lower the classificationaccuracy, which indicates we can use AAN for the concept-based explanation task without losingthe overall performance. Here, the strategy of aggregation-attention weights dropout is necessarywhen training AAN models. In Table 9, we show that AAN models without randomly droppingaggregation-attention weights attain poor interpretability in concept-based explanation.

4.4 Heat-maps and Case-level Concept-based ExplanationFirst, we investigate if AAN attends to relevant keywords when it is making predictions, whichcan be accomplished by visualizing attention weights (see Fig. 2). This is a Business news articlefrom Newsroom and we observe that the most relevant keyword that AAN detects is boston. Otherimportant keywords include investment, economist, marketing and com. Compared with Fig. 2, ourcase-level concept-based explanation provides more informative results. From Table 4, we observethat AAN makes the prediction based on several different aspects, such as corporations (e.g., com),occupations (e.g., economist), terminology (e.g., marketing) and so on. Moreover, boston may be


1:10 Shi et al.

Fig. 2. Attention-weight visualization for an interpretable attention-based classification model.

Table 5. This table shows 20 most important keywords for model predictions on different training sets.Keywords are ordered by their scores. For IMDB, we use bold to highlight the common keywords shared withBeauty.

Dataset Label Keywords

IMDB

Negativeworst, awful, terrible, bad, disappointed, boring, disappointing, waste, hor-rible, sucks, fails, disappointment, lame, dull, poorly, poor, worse, mess, dreadful,pointless

Positive 8, 7, excellent, loved, 9, enjoyable, superb, enjoyed, highly,wonderful, enter-taining, best, beautifully, good, great, brilliant, terrific, funny, hilarious, fine

Beauty

Negative disappointed, nothing, unfortunately, made, not, waste, disappointing, terrible,worst, horrible, makes, no, sadly, disappointment, t, awful, sad, bad, never, started

Positivegreat, love, highly, amazing, pleased, perfect, works, best, happy, awesome,makes, recommend, excellent, wonderful, definitely, good, glad, well, fantastic,very

Newsroom

Businessinc, corp, boston, massachusetts, economic, cambridge, financial, economy, bank-ing, auto, automotive, startup, company, mr, finance, biotechnology, somerville,retailer, business, airline

Entertainmentsinger, actress, actor, star, fox, comedian, hollywood, sunday, rapper, fashion,celebrity, contestant, filmmaker, bachelor, insider, porn, oscar, rocker, host, mon-day

Sportsquarterback, coach, basketball, baseball, soccer, nba, sports, striker, tennis,hockey, nfl, nhl, football, olympic, midfielder, golf, player, manager, outfielder,nascar

Healthdr, health, pediatric, obesity, cardiovascular, scientists, researcher, medicine,psychologist, diabetes, medical, psychiatry, aids, fitness, healthcare, autism,psychology, neuroscience, fox, tobacco

Technologytech, cyber, electronics, wireless, lifestyle, silicon, gaming, culture, telecom-munications, scientist, company, google, smartphone, technology, francisco,broadband, privacy, internet, twitter

related with corporation (e.g., bostonglobe or gerritsen of boston) or city, thus, it appears in bothconcepts 8 (corporations) and 9 (locations).



Tokens0.000

0.005

0.010

0.015

0.020

0.025Sc

ores

TrainDevTest

Tokens0.000

0.005

0.010

0.015

0.020

Scores

(a) Newsroom. Left: Business. Right: Sports

Tokens0.000.020.040.060.080.100.12

Scores

Tokens0.00

0.02

0.04

0.06

0.08

0.10

Scores

(b) IMDB. Left: Negative. Right: Positive

Tokens0.00

0.05

0.10

0.15

0.20

Scores

Tokens0.000.010.020.030.040.050.060.070.08

Scores

(c) IMDB-to-Beauty. Left: Negative. Right: Positive

Fig. 3. Distribution of keywords on training, development, and testing sets. Scores are calculated by 𝑝\ (𝑤, 𝑙, C).The orders of tokens are the same as those in Table 5.

4.5 Corpus-Level ExplanationCorpus-level explanation aims to find the important keywords for the predictions. In Table 5, weshow 20most important keywords for each predicted label andwe assume these keywords determinethe predictions. In the last section, we will demonstrate this assumption by the consistency analysis.The scores of keywords have been shown in Fig. 3.

In addition to causal relationships, we can also use these keywords to check if our model anddatasets have bias or not. For example, boston andmassachusetts play an important role in predictingbusiness, which indicates the training set has bias. By checking our data, we find that many businessnews articles are from The Boston Globe. Another obvious bias example is that the numbers 8, 7and 9 are important keywords for IMDB sentiment analysis. This is because the original ratingsscale from 1 to 10 and many reviews mention that “rate this movie 8 out of 10”.Moreover, from Figs. 3 (a) and (b), we find that for a randomly split corpus, distributions of

keywords across training/development/test sets are similar to each other. This guarantees themodel achieves outstanding performance on testing sets. If we apply a model trained on IMDB toBeauty (see Fig. 3 (c)), it can only leverage the cross-domain common keywords (e.g., disappointed


1:12 Shi et al.

Table 6. Concept-based explanation (Business). Scores are calculated using Eq. (6).

ID Score Keywords8 0.173 inc, corp, massachusetts, boston, mr, ms, jr, ltd, mit, q1 0.168 economy, retailer, company, startup, ##maker, airline, chain, bank, utility, billionaire

7 0.151 biotechnology, banking, tech, startup, pharmaceuticals, mortgage, financial, auto, commerce, eco-nomic

6 0.124 economic, health, banking, finance, insurance, healthcare, economy, housing, safety, commerce4 0.107 financial, economic, banking, auto, automotive, securities, housing, finance, monetary, biotechnology9 0.086 boston, massachusetts, cambridge, washington, detroit, frankfurt, harvard, tokyo, providence, paris5 0.056 -, ##as, -, -, itunes, inc, corp, northeast, -, llc2 0.054 economist, executive, spokesman, analyst, economists, ##gist, ceo, director, analysts, president3 0.026 -, -, -, ), ##tem, ##sp, the, =, t, ob10 0.000 -, comment, ), insurance, search, ’, tesla, graphic, guitarist, ,

Table 7. Concept-based explanation (Sports). Scores are calculated using Eq. (6).

ID Score Keywords1 0.176 quarterback, player, striker, champion, pitcher, midfielder, outfielder, athlete, goaltender, forward7 0.165 nhl, mets, soccer, nets, yankees, nascar, mls, reuters, doping, twitter6 0.147 tennis, sports, soccer, golf, doping, hockey, athletic, athletics, injuries, basketball4 0.139 baseball, basketball, nba, nfl, sports, football, tennis, olympic, hockey, golf8 0.119 jr, ", n, j, fox, espn, nl, u, boston, ca2 0.100 coach, manager, commissioner, boss, gm, trainer, spokesman, umpire, coordinator, referee

9 0.060 philadelphia, indianapolis, boston, tampa, louisville, buffalo, melbourne, manchester, baltimore,atlanta

5 0.055 ’, ’, ##as, –, ##a, ‘, sides, newcomers, chelsea, jaguars3 0.022 ’, ’, ), „ ##kus, ##gre, the, whole, lever, ##wa10 0.000 ., ), finishes, bel, gymnastics, ’, ##ditional, becomes, tu, united

and loved) to make predictions. However, we achieve 71% accuracy, which is much better thanrandom predictions. In Table 5, we use bold font to highlight these common keywords.

4.6 Corpus-level Concept-based ExplanationCorpus-level concept-based explanation further improves the corpus-level explanation by introduc-ing clustering structures to keywords. In this section, we still use the AAN trained on Newsroom asan example for this task. Table 6 shows concepts and relevant keywords for AAN when it assignsan article to Business. Here, we observe that the first-tier salient concepts consist of concepts 8(corporations) and 1 (business terminology in general). The second-tier concepts 7, 6, and 4 arerelated to economy, finance, mortgage, and banking, which are domain-specific terminology. Theyshare many keywords. Concepts 9 and 2 are associated with locations and occupations, respectively,which receive relatively lower scores. Concepts 5, 3, and 10 are not quite meaningful. We have alsoshown results for Newsroom sports in Table 7, where we find that 1 (sports terminology) and 7(leagues and teams) are the first-tier salient concepts. The second-tier salient concepts 6 and 4 areabout games and campaigns. Concepts 7, 6, and 4 also share many keywords. Concepts 8 (corpora-tions and channels), 2 (occupations and roles) and 9 (locations) are the third-tier salient concepts.Concepts 5, 3, 10 are also meaningless. From these tables, we summarize some commonalities: 1)Domain-specific terminologies (i.e., concepts 1, 7, 6, and 4) play an important role in predictions. 2)Locations (i.e., concept 9) and Occupations/Roles (i.e., concept 2) are less important. 3) Meaninglessconcepts (i.e., concepts 5, 3, and 10), such as punctuation, have the least influence.



Table 8. Consistency between the model and NBC. CS represents consistency score, CP/NCP denote percent-age of incorrect predictions when NBC predictions are consistent/not consistent with model predictions.

Model Newsroom IMDB BeautyCS NCP CP CS NCP CP CS NCP CP

BERT-SAN 83.96 21.59 4.72 86.02 17.17 5.81 85.45 16.30 4.56BERT-AAN 84.36 20.20 5.57 85.46 21.18 5.05 84.72 16.04 4.51

Table 9. Concept-based explanation (Sports) for AAN without applying dropout to attention weights.

CID Weight Keywords1 0.8195 quarterback, athletic, olympic, basketball, athletics, qb, hockey, outfielder, sports7 0.0865 nascar, celtics, motorsports, nba, boston, augusta, nhl, tennis, leafs, zurich4 0.0370 mets, knicks, yankees, players, pitchers, lakers, hosts, coaches, forwards, swimmers3 0.0164 offensive, eli, bird, doping, nba, jay, rod, hurdle, afc, peyton2 0.0098 premier, american, mets, nl, field, yankee, national, aaron, nba, olympic10 0.0083 games, seasons, tries, defeats, baskets, players, season, contests, points, throws5 0.0015 dustin, antonio, rookie, dante, dale, dylan, lineman, ty, launch, luther8 0.0010 2016, 2014, college, tribune, card, press, s, -, this, leadership9 0.0004 men, -, grand, 9, s, usa, state, west, world, major6 0.0000 the, -, -, year, whole, vie, very, tr, too, to

4.7 Consistency AnalysisIn this section, we leverage the method proposed in Section 3.2.3 to respectively build a NBC forBERT-SAN and BERT-AAN on the training set. Then, we apply them to the testing set to compareif NBC predictions and the model predictions are consistent with each other. We approximate thenumerator of Eq. (5) with five words (can repeat) with highest attention weights in each document.In Eq. (4), 𝛾 is set to be 1000. In Eq. (10), we set _ = 1.2 for text categorization and _ = 1.0 forsentiment analysis. 𝑑 ′ consists of five words with highest attention weights.

We use the accuracy (consistency score) between labels predicted by NBC and the original modelto evaluate the consistency. Table 8 shows that around 85% of predictions are consistent. Thisdemonstrates that keywords obtained by the corpus-level and concept-based explanation methodsare important to predictions. They can be used to interpret attention based models. Moreover, fromCP and NCP scores, we observe a significantly higher probability that the model makes an incorrectprediction if it is inconsistent with NBC prediction. This finding suggests us to use consistencyscore as one criterion for uncertainty estimation.

4.8 Dropout of Aggregation-Attention WeightsFor AAN, we apply dropout to aggregation-attention weights during training. In Table 9, we showan example without using the attention weight dropout mechanism. We observed that the weightfor concept 1 is much higher than the other concepts. In addition, keywords for each concept arenot semantically coherent.

5 CONCLUSIONIn this paper, we proposed a general-purpose corpus-level explanation approach to interpret attention-based networks. It can capture causal relationships between keywords and model predictions vialearning importance of keywords for predicted labels across the training corpus based on attentionweights. Experimental results show that the keywords are semantically meaningful for predicted


1:14 Shi et al.

labels. We further propose a concept-based explanation method to identify important concepts formodel predictions. This method is based on a novel Abstraction-Aggregation Network (AAN), whichcan automatically extract concepts, i.e., clusters of keywords, during the end-to-end training forthe main task. Our experimental results also demonstrate that this method effectively capturessemantically meaningful concepts/clusters. It also provides relative importance of each concept tomodel predictions. To verify our results, we also built a Naïve Bayes Classifier based on an attention-based bag-of-word document representation technique and the causal relationships. Our consistencyanalysis results demonstrate that the discovered keywords are important to the predictions.

ACKNOWLEDGMENTSThis work was supported in part by the US National Science Foundation grants IIS-1707498, IIS-1838730, and Amazon AWS credits.

REFERENCES[1] Diego Antognini and Boi Faltings. 2021. Rationalization Through Concepts. arXiv preprint arXiv:2105.04837 (2021).[2] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align

and translate. arXiv preprint arXiv:1409.0473 (2014).[3] Iz Beltagy, Matthew E Peters, and Arman Cohan. 2020. Longformer: The long-document transformer. arXiv preprint

arXiv:2004.05150 (2020).[4] Francesco Bodria, Fosca Giannotti, Riccardo Guidotti, Francesca Naretto, Dino Pedreschi, and Salvatore Rinzivillo.

2021. Benchmarking and survey of explanation methods for black box models. arXiv preprint arXiv:2102.13076 (2021).[5] Diane Bouchacourt and Ludovic Denoyer. 2019. EDUCE: Explaining model Decisions through Unsupervised Concepts

Extraction. arXiv preprint arXiv:1905.11852 (2019).[6] Zhi Chen, Yijie Bei, and Cynthia Rudin. 2020. Concept whitening for interpretable image recognition. Nature Machine

Intelligence 2, 12 (01 Dec 2020), 772–782. https://doi.org/10.1038/s42256-020-00265-z[7] Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural

Language Processing (Almost) from Scratch. The Journal of Machine Learning Research 12 (Nov. 2011), 2493–2537.[8] Arun Das and Paul Rad. 2020. Opportunities and challenges in explainable artificial intelligence (XAI): A survey. arXiv

preprint arXiv:2006.11371 (2020).[9] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional

Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of theAssociation for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Associationfor Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423

[10] Nir Friedman, Dan Geiger, and Moises Goldszmidt. 1997. Bayesian Network Classifiers. Machine Learning 29, 2 (01Nov 1997), 131–163. https://doi.org/10.1023/A:1007465528199

[11] Reza Ghaeini, Xiaoli Fern, and Prasad Tadepalli. 2018. Interpreting Recurrent andAttention-BasedNeuralModels: a CaseStudy on Natural Language Inference. In Proceedings of the 2018 Conference on Empirical Methods in Natural LanguageProcessing. Association for Computational Linguistics, Brussels, Belgium, 4952–4957. https://doi.org/10.18653/v1/D18-1537

[12] Amirata Ghorbani, Abubakar Abid, and James Zou. 2019. Interpretation of neural networks is fragile. In Proceedings ofthe AAAI Conference on Artificial Intelligence, Vol. 33. 3681–3688.

[13] Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. 2019. Towards Automatic Concept-based Explana-tions. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2019/file/77d2afcb31f6493e350fca61764efb9a-Paper.pdf

[14] Leilani H. Gilpin, David Bau, Ben Z. Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal. 2018. ExplainingExplanations: An Overview of Interpretability of Machine Learning. In 2018 IEEE 5th International Conference on DataScience and Advanced Analytics (DSAA). 80–89. https://doi.org/10.1109/DSAA.2018.00018

[15] Max Grusky, Mor Naaman, and Yoav Artzi. 2018. Newsroom: A Dataset of 1.3 Million Summaries with DiverseExtractive Strategies. In Proceedings of the 2018 Conference of the North American Chapter of the Association forComputational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for ComputationalLinguistics, New Orleans, Louisiana, 708–719. https://doi.org/10.18653/v1/N18-1065

[16] Ruining He and Julian McAuley. 2016. Ups and Downs: Modeling the Visual Evolution of Fashion Trends withOne-Class Collaborative Filtering. In Proceedings of the 25th International Conference on World Wide Web (Montréal,


https://doi.org/10.1038/s42256-020-00265-z

https://doi.org/10.18653/v1/N19-1423

https://doi.org/10.1023/A:1007465528199

https://doi.org/10.18653/v1/D18-1537

https://doi.org/10.18653/v1/D18-1537

https://proceedings.neurips.cc/paper/2019/file/77d2afcb31f6493e350fca61764efb9a-Paper.pdf

https://proceedings.neurips.cc/paper/2019/file/77d2afcb31f6493e350fca61764efb9a-Paper.pdf

https://doi.org/10.1109/DSAA.2018.00018

https://doi.org/10.18653/v1/N18-1065


Québec, Canada) (WWW ’16). International World Wide Web Conferences Steering Committee, Republic and Cantonof Geneva, CHE, 507–517. https://doi.org/10.1145/2872427.2883037

[17] Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and PhilBlunsom. 2015. Teaching Machines to Read and Comprehend. In Advances in Neural Information Processing Systems,C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (Eds.), Vol. 28. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2015/file/afdec7005cc9f14302cd0474fd0f3c96-Paper.pdf

[18] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural computation 9, 8 (Nov. 1997),1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

[19] Ting Hua, Chang-Tien Lu, Jaegul Choo, and Chandan K Reddy. 2020. Probabilistic topic modeling for comparativeanalysis of document collections. ACM Transactions on Knowledge Discovery from Data (TKDD) 14, 2 (2020), 1–27.

[20] Sarthak Jain and Byron C. Wallace. 2019. Attention is not Explanation. In Proceedings of the 2019 Conference ofthe North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 3543–3556. https://doi.org/10.18653/v1/N19-1357

[21] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, and Rory Sayres. 2018. Inter-pretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV). In Proceedingsof the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), JenniferDy and Andreas Krause (Eds.). PMLR, 2668–2677. http://proceedings.mlr.press/v80/kim18d.html

[22] Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference onEmpirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar,1746–1751. https://doi.org/10.3115/v1/D14-1181

[23] Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T. Schütt, Sven Dähne, Dumitru Erhan,and Been Kim. 2019. The (Un)reliability of Saliency Methods. Springer International Publishing, Cham, 267–280.https://doi.org/10.1007/978-3-030-28954-6_14

[24] Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).

[25] Solomon Kullback. 1997. Information theory and statistics. Courier Corporation.[26] Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017.

A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130 (2017).[27] Frederick Liu and Besim Avci. 2019. Incorporating Priors with Feature Attribution on Text Classification. In Proceedings

of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics,Florence, Italy, 6274–6283. https://doi.org/10.18653/v1/P19-1631

[28] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer,and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692(2019).

[29] Scott M. Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Proceedings of the31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). CurranAssociates Inc., Red Hook, NY, USA, 4768–4777.

[30] Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective Approaches to Attention-based Neural MachineTranslation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association forComputational Linguistics, Lisbon, Portugal, 1412–1421. https://doi.org/10.18653/v1/D15-1166

[31] Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. LearningWord Vectors for Sentiment Analysis. In Proceedings of the 49th Annual Meeting of the Association for ComputationalLinguistics: Human Language Technologies. Association for Computational Linguistics, Portland, Oregon, USA, 142–150.https://www.aclweb.org/anthology/P11-1015

[32] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, AlbanDesmaison, Luca Antiga, and Adam Lerer. 2017. Automatic Differentiation in PyTorch. In NeurIPS Autodiff Workshop.

[33] Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global Vectors for Word Representation.In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association forComputational Linguistics, Doha, Qatar, 1532–1543. https://doi.org/10.3115/v1/D14-1162

[34] Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer.2018. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapterof the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association forComputational Linguistics, New Orleans, Louisiana, 2227–2237. https://doi.org/10.18653/v1/N18-1202

[35] Maria Pontiki, Dimitris Galanis, Haris Papageorgiou, Ion Androutsopoulos, Suresh Manandhar, Mohammad AL-Smadi, Mahmoud Al-Ayyoub, Yanyan Zhao, Bing Qin, Orphée De Clercq, Véronique Hoste, Marianna Apidianaki,Xavier Tannier, Natalia Loukachevitch, Evgeniy Kotelnikov, Nuria Bel, Salud María Jiménez-Zafra, and Gülşen Eryiğit.


https://doi.org/10.1145/2872427.2883037

https://proceedings.neurips.cc/paper/2015/file/afdec7005cc9f14302cd0474fd0f3c96-Paper.pdf

https://proceedings.neurips.cc/paper/2015/file/afdec7005cc9f14302cd0474fd0f3c96-Paper.pdf

https://doi.org/10.1162/neco.1997.9.8.1735

https://doi.org/10.18653/v1/N19-1357

https://doi.org/10.18653/v1/N19-1357

http://proceedings.mlr.press/v80/kim18d.html

https://doi.org/10.3115/v1/D14-1181

https://doi.org/10.1007/978-3-030-28954-6_14

https://doi.org/10.18653/v1/P19-1631

https://doi.org/10.18653/v1/D15-1166

https://www.aclweb.org/anthology/P11-1015

https://doi.org/10.3115/v1/D14-1162

https://doi.org/10.18653/v1/N18-1202

1:16 Shi et al.

2016. SemEval-2016 Task 5: Aspect Based Sentiment Analysis. In Proceedings of the 10th International Workshopon Semantic Evaluation (SemEval-2016). Association for Computational Linguistics, San Diego, California, 19–30.https://doi.org/10.18653/v1/S16-1002

[36] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Model-agnostic interpretability of machine learning.arXiv preprint arXiv:1606.05386 (2016).

[37] Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT:smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).

[38] Sofia Serrano and Noah A. Smith. 2019. Is Attention Interpretable?. In Proceedings of the 57th Annual Meeting ofthe Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 2931–2951.https://doi.org/10.18653/v1/P19-1282

[39] Tian Shi, Kyeongpil Kang, Jaegul Choo, and Chandan K. Reddy. 2018. Short-Text Topic Modeling via Non-NegativeMatrix Factorization Enriched with Local Word-Context Correlations. In Proceedings of the 2018 World Wide WebConference (Lyon, France) (WWW ’18). International World Wide Web Conferences Steering Committee, Republic andCanton of Geneva, CHE, 1105–1114. https://doi.org/10.1145/3178876.3186009

[40] Tian Shi, Liuqing Li, Ping Wang, and Chandan K Reddy. 2021. A Simple and Effective Self-Supervised ContrastiveLearning Framework for Aspect Detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35.13815–13824.

[41] Tian Shi, Ping Wang, and Chandan K Reddy. 2019. LeafNATS: An Open-Source Toolkit and Live Demo System forNeural Abstractive Text Summarization. In Proceedings of the 2019 Conference of the North American Chapter of theAssociation for Computational Linguistics (Demonstrations). 66–71.

[42] Hendrik Strobelt, Sebastian Gehrmann, Michael Behrisch, Adam Perer, Hanspeter Pfister, and Alexander M. Rush.2019. Seq2seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models. IEEE Transactions on Visualization andComputer Graphics 25, 1 (2019), 353–363. https://doi.org/10.1109/TVCG.2018.2865044

[43] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic Attribution for Deep Networks. In Proceedingsof the 34th International Conference on Machine Learning - Volume 70 (Sydney, NSW, Australia) (ICML’17). JMLR.org,3319–3328.

[44] Erico Tjoa and Cuntai Guan. 2020. A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI. IEEETransactions on Neural Networks and Learning Systems PP (October 2020). https://doi.org/10.1109/tnnls.2020.3027314

[45] Jesse Vig. 2019. A Multiscale Visualization of Attention in the Transformer Model. In Proceedings of the 57th AnnualMeeting of the Association for Computational Linguistics: System Demonstrations. Association for ComputationalLinguistics, Florence, Italy, 37–42. https://doi.org/10.18653/v1/P19-3007

[46] Oriol Vinyals, Ł ukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, and Geoffrey Hinton. 2015. Grammaras a Foreign Language. In Advances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee,M. Sugiyama, and R. Garnett (Eds.), Vol. 28. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2015/file/277281aada22045c03945dcb2ca6f2ec-Paper.pdf

[47] Yequan Wang, Minlie Huang, Xiaoyan Zhu, and Li Zhao. 2016. Attention-based LSTM for Aspect-level SentimentClassification. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Associationfor Computational Linguistics, Austin, Texas, 606–615. https://doi.org/10.18653/v1/D16-1058

[48] Sarah Wiegreffe and Yuval Pinter. 2019. Attention is not not Explanation. In Proceedings of the 2019 Conference onEmpirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural LanguageProcessing (EMNLP-IJCNLP). 11–20.

[49] Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac,Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite,Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. 2020.Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methodsin Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38–45.https://doi.org/10.18653/v1/2020.emnlp-demos.6

[50] Weibin Wu, Yuxin Su, Xixian Chen, Shenglin Zhao, Irwin King, Michael R. Lyu, and Yu-Wing Tai. 2020. TowardsGlobal Explanations of Convolutional Neural Networks With Concept Attribution. In Proceedings of the IEEE/CVFConference on Computer Vision and Pattern Recognition (CVPR).

[51] Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V Le. 2019. XLNet: GeneralizedAutoregressive Pretraining for Language Understanding. arXiv preprint arXiv:1906.08237 (2019).

[52] Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical AttentionNetworks for Document Classification. In Proceedings of the 2016 Conference of the North American Chapter of theAssociation for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics,San Diego, California, 1480–1489. https://doi.org/10.18653/v1/N16-1174


https://doi.org/10.18653/v1/S16-1002

https://doi.org/10.18653/v1/P19-1282

https://doi.org/10.1145/3178876.3186009

https://doi.org/10.1109/TVCG.2018.2865044

https://doi.org/10.1109/tnnls.2020.3027314

https://doi.org/10.18653/v1/P19-3007

https://proceedings.neurips.cc/paper/2015/file/277281aada22045c03945dcb2ca6f2ec-Paper.pdf

https://proceedings.neurips.cc/paper/2015/file/277281aada22045c03945dcb2ca6f2ec-Paper.pdf

https://doi.org/10.18653/v1/D16-1058

https://doi.org/10.18653/v1/2020.emnlp-demos.6

https://doi.org/10.18653/v1/N16-1174


[53] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. 2020. On Completeness-aware Concept-Based Explanations in Deep Neural Networks. In Advances in Neural Information Processing Systems,H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 20554–20565.https://proceedings.neurips.cc/paper/2020/file/ecb287ff763c169694f682af52c1f309-Paper.pdf

[54] Mohammad Nokhbeh Zaeem and Majid Komeili. 2021. Cause and Effect: Concept-based Explanation of NeuralNetworks. arXiv preprint arXiv:2105.07033 (2021).

[55] Xuchao Zhang, Fanglan Chen, Chang-Tien Lu, and Naren Ramakrishnan. 2019. Mitigating Uncertainty in DocumentClassification. In Proceedings of the 2019 Conference of the North American Chapter of the Association for ComputationalLinguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics,Minneapolis, Minnesota, 3126–3136. https://doi.org/10.18653/v1/N19-1316

[56] Bolei Zhou, Yiyou Sun, David Bau, and Antonio Torralba. 2018. Interpretable Basis Decomposition for VisualExplanation. In Proceedings of the European Conference on Computer Vision (ECCV).


https://proceedings.neurips.cc/paper/2020/file/ecb287ff763c169694f682af52c1f309-Paper.pdf

https://doi.org/10.18653/v1/N19-1316

Documents

Corpus-level and Concept-based Explanations for