23
A Study on Multimodal and Interactive Explanations for Visual Question Answering Kamran Alipour, 1 Jurgen P. Schulze, 1 Yi Yao, 2 Avi Ziskind, 2 and Giedrius Burachas 2 1 UC San Diego 2 SRI International

A Study on Multimodal and Interactive Explanations for ...€¦ · A Study on Multimodal and Interactive Explanations for Visual Question Answering Kamran Alipour,1 Jurgen P. Schulze,1

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Study on Multimodal and Interactive Explanations for ...€¦ · A Study on Multimodal and Interactive Explanations for Visual Question Answering Kamran Alipour,1 Jurgen P. Schulze,1

A Study on Multimodal and Interactive Explanations for Visual Question Answering

Kamran Alipour,1 Jurgen P. Schulze,1 Yi Yao,2 Avi Ziskind,2 and Giedrius Burachas 2

1UC San Diego2SRI International

Page 2: A Study on Multimodal and Interactive Explanations for ...€¦ · A Study on Multimodal and Interactive Explanations for Visual Question Answering Kamran Alipour,1 Jurgen P. Schulze,1

Introduction

Visual Question Answering

○ Answering a question in natural language regarding a given image

○ Attention-based models on image features influenced by the question to produce answers

What is the animal in this picture?

Zebra.

Page 3: A Study on Multimodal and Interactive Explanations for ...€¦ · A Study on Multimodal and Interactive Explanations for Visual Question Answering Kamran Alipour,1 Jurgen P. Schulze,1

Introduction

eXplainable AI (XAI)

○ Used for years in medicine, robotics, ...

○ Mainly attention-based

○ Various formats: visual, textual, ...What is the animal

in this picture?

Zebra.

Because:

Page 4: A Study on Multimodal and Interactive Explanations for ...€¦ · A Study on Multimodal and Interactive Explanations for Visual Question Answering Kamran Alipour,1 Jurgen P. Schulze,1

Contributions

○ Generated multiple explanation modes from attention features and annotations

○ Introduced an interactive explanation mode

○ Prediction task: A novel assessment of the efficacy of explanations

○ A user study to evaluate our XAI system

Page 5: A Study on Multimodal and Interactive Explanations for ...€¦ · A Study on Multimodal and Interactive Explanations for Visual Question Answering Kamran Alipour,1 Jurgen P. Schulze,1

eXplainable Visual Question Answering (XVQA)

FC

FeaturesImage

FeaturesQuestion Layer

Attention Answer

ResNetFrisbee

LSTMWhat is in the dog's mouth?

Page 6: A Study on Multimodal and Interactive Explanations for ...€¦ · A Study on Multimodal and Interactive Explanations for Visual Question Answering Kamran Alipour,1 Jurgen P. Schulze,1

User Study

Input

- Images and Questions

- Explanations

- VQA correctness and confidence

TASK:

- Rank explanations

- Predict if VQA answers correctly based on explanations.

Page 7: A Study on Multimodal and Interactive Explanations for ...€¦ · A Study on Multimodal and Interactive Explanations for Visual Question Answering Kamran Alipour,1 Jurgen P. Schulze,1

User StudyGroup

NE Control group

SP Spatial attention

SA Active (steerable) attention

SE Semantic

OA Object attention

AL All explanations

Subjects: 90

Trials: 10513

Page 8: A Study on Multimodal and Interactive Explanations for ...€¦ · A Study on Multimodal and Interactive Explanations for Visual Question Answering Kamran Alipour,1 Jurgen P. Schulze,1

Explanation modes

Spatial attention

Shows the parts of the image the model is focused on while preparing the answer.

QUESTION: Where is this place?

ANSWER: Airport.

Page 9: A Study on Multimodal and Interactive Explanations for ...€¦ · A Study on Multimodal and Interactive Explanations for Visual Question Answering Kamran Alipour,1 Jurgen P. Schulze,1

Explanation modes

Active (streerable) attention

FeaturesImage

FeaturesQuestion

LayerAttention

ResNet

LSTMWhat covers the ground?

Attention map drawn on the image by the user

Multiplied to image features

FC Answer

Snow

Page 10: A Study on Multimodal and Interactive Explanations for ...€¦ · A Study on Multimodal and Interactive Explanations for Visual Question Answering Kamran Alipour,1 Jurgen P. Schulze,1

Explanation modes

Semantic:

Scene graphBounding box

Page 11: A Study on Multimodal and Interactive Explanations for ...€¦ · A Study on Multimodal and Interactive Explanations for Visual Question Answering Kamran Alipour,1 Jurgen P. Schulze,1

Explanation modes

Semantic:Textual Explanation [Ghosh et al. 2019]

QUESTION: What is this sport?

ANSWER: Soccer.

Page 12: A Study on Multimodal and Interactive Explanations for ...€¦ · A Study on Multimodal and Interactive Explanations for Visual Question Answering Kamran Alipour,1 Jurgen P. Schulze,1

Explanation modes

Object attention [Ray et al. 2019]

QUESTION: What food is he eating?

ANSWER: Sandwich.

Page 13: A Study on Multimodal and Interactive Explanations for ...€¦ · A Study on Multimodal and Interactive Explanations for Visual Question Answering Kamran Alipour,1 Jurgen P. Schulze,1

Results

Prediction accuracies in different study groups

Page 14: A Study on Multimodal and Interactive Explanations for ...€¦ · A Study on Multimodal and Interactive Explanations for Visual Question Answering Kamran Alipour,1 Jurgen P. Schulze,1

Results

Prediction accuracy vs. explanation ratings when system accurate

Page 15: A Study on Multimodal and Interactive Explanations for ...€¦ · A Study on Multimodal and Interactive Explanations for Visual Question Answering Kamran Alipour,1 Jurgen P. Schulze,1

Results

Prediction accuracy vs. explanation ratings when system inaccurate

Page 16: A Study on Multimodal and Interactive Explanations for ...€¦ · A Study on Multimodal and Interactive Explanations for Visual Question Answering Kamran Alipour,1 Jurgen P. Schulze,1

Results

Prediction accuracy progress

Page 17: A Study on Multimodal and Interactive Explanations for ...€¦ · A Study on Multimodal and Interactive Explanations for Visual Question Answering Kamran Alipour,1 Jurgen P. Schulze,1

Results

User confidence progress: Active attention vs. other explanations

Page 18: A Study on Multimodal and Interactive Explanations for ...€¦ · A Study on Multimodal and Interactive Explanations for Visual Question Answering Kamran Alipour,1 Jurgen P. Schulze,1

Discussion

○ Interactive explanations improved users confidence compared to other explanations

○ When AI is inaccurate, explanations are more helpful

○ Getting exposed to multiple explanation modes can be conflicting/overwhelming and reduce prediction accuracy

Page 19: A Study on Multimodal and Interactive Explanations for ...€¦ · A Study on Multimodal and Interactive Explanations for Visual Question Answering Kamran Alipour,1 Jurgen P. Schulze,1

Conclusion and Future Work

○ Interactive experiment to probe explanation effectiveness

○ Explanations help accuracy prediction

○ Users confidence improved when exposed to explanations

Page 20: A Study on Multimodal and Interactive Explanations for ...€¦ · A Study on Multimodal and Interactive Explanations for Visual Question Answering Kamran Alipour,1 Jurgen P. Schulze,1

Thank You!

Page 21: A Study on Multimodal and Interactive Explanations for ...€¦ · A Study on Multimodal and Interactive Explanations for Visual Question Answering Kamran Alipour,1 Jurgen P. Schulze,1

User Study StatisticsGroup Subjects Trials

NE Control group 15 4124

SP Spatial attention 15 1826

SA Active (steerable) attention 15 1021

SE Semantic 15 1261

OA Object attention 15 1435

AL All explanations 15 846

Total : 90 10513

Page 22: A Study on Multimodal and Interactive Explanations for ...€¦ · A Study on Multimodal and Interactive Explanations for Visual Question Answering Kamran Alipour,1 Jurgen P. Schulze,1

User Study Structure

Practice Block

ExpTrial

Control BlockExplanation Block

No-ExpTrial

Practice Block

Control Block

No-ExpTrial

Control Group

Explanation Groups No-Exp

TrialExpTrial5 x No-Exp

Trial5 x

Control BlockExplanation Block

ExpTrial5 x No-Exp

Trial5 x

No-ExpTrial

No-ExpTrial

No-ExpTrial

No-ExpTrial

No-ExpTrial

No-ExpTrial

No-ExpTrial

No-ExpTrial

No-ExpTrial

No-ExpTrial

Page 23: A Study on Multimodal and Interactive Explanations for ...€¦ · A Study on Multimodal and Interactive Explanations for Visual Question Answering Kamran Alipour,1 Jurgen P. Schulze,1

User Study Structure

Input

- Images and Questions

- Explanations

- VQA correctness and confidence

TASK:

- Rank explanations

- Predict if VQA answers correctly based on explanations.