27
BatchBALD Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning NeurIPS 2019 Andreas Kirsch*, Joost van Amersfoort*, Yarin Gal OATML - University of Oxford Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk

NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

BatchBALDEfficient and Diverse Batch Acquisition for Deep Bayesian Active Learning

NeurIPS 2019

Andreas Kirsch*, Joost van Amersfoort*, Yarin GalOATML - University of Oxford

Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk

Page 2: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

Introduction to Active Learning

● A key problem in Deep Learning is data efficiency● It is expensive to a-priori obtain a large labelled dataset● In Active Learning, we instead iteratively acquire labels for only the most

interesting datapoints

Goal: achieve particular accuracy (e.g. 95%) with fewest number of labelled data points

Page 3: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

Active Learning

(labelled) (unlabelled)

Goal: 95% accuracy(labelled)

Page 4: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

Active Learning

(labelled) (unlabelled)

Goal: 95% accuracy(labelled)

Page 5: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

Acquisition Function

Instead of acquiring a random data point, we want to pick an informative point.

We acquire points using the acquisition function , which takes as input data points and the model parameters:

Distribution over model parameters

Page 6: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

Active Learning

(labelled) (unlabelled)

Goal: 95% accuracy(labelled)

Page 7: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

Uncertainty in Deterministic Networks

We have the following two approaches:

1. Variation ratio (the max of the softmax output)2. Entropy of the softmax output

Problem: uncertainty isn’t reliable. Standard deep models extrapolate arbitrarily!

Page 8: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

Active Learning - Bayesian Approaches

Alternative: use a Bayesian Neural Network

● Dropout as inference method● Scales well and is easy to implement and train● Enables more reliable uncertainty quantification:

○ Entropy of marginalised softmax output○ Mutual information between parameters and the true label (BALD)

Page 9: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

BALD: “Bayesian Active Learning by Disagreement”

Intuitively:1. First term captures general uncertainty of model2. Second term captures the uncertainty of a given draw of the model parameters

Score is high when model is uncertain (high entropy) and per sample certain (expectation of sample entropy high)

Page 10: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

Problems in Practice

● Re-training a deep model is computationally expensive

● Mobilising an expert is time consuming

→ In practice: acquire a batch of data by selecting top k

Page 11: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

Active Learning

(labelled) (unlabelled)

Goal: 95% accuracy(labelled)

Page 12: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

Problems in Practice

Naively acquiring the top k scoring points can lead to redundant acquisitions!

Page 13: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

BALD vs BatchBALD

Lack of DiversityPerforms worse than random on Repeated MNIST

Page 14: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

BatchBALD I - Introduction

→ Score batches of points jointlyJoint entropy: Hard to compute

Expectation over entropy: easy to compute

Page 15: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

BALD BatchBALD

No double-counting with BatchBALDDark areas are counted double

Page 16: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

BatchBALD II - MC estimator

We derive an MC estimator for the first term (implicitly conditioned on x):

● k - the number of samples of the parameter distribution● ŷ - a configuration of a batch of labels● Size of ŷ grows exponentially with .

Page 17: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

BatchBALD III - Greedy Approximation

To avoid combinatorial explosion: we grow the acquisition batch one by one.

Page 18: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

Greedy Approximation and Submodularity

● We can compute the greedy joint entropy exact for the first 4 points, afterwards we use the Monte Carlo estimator

● We prove in the paper that the greedy approximation is a submodular function and therefore its error is bounded by 1 - 1/e.

Page 19: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

Consistent dropout

There is a lot of variance in dropout uncertainty estimates. By using the same dropout masks across the unlabelled set we reduce variance:

Page 20: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

Results MNIST

Page 21: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

Results EMNIST

Page 22: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

Diversity

Page 23: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

Results CINIC-10 (CIFAR+ImageNet)

Page 24: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

Conclusion

BatchBALD improves over BALD:

1. Performs as well with acquisition size 10 as BALD with 12. Much lower total experiment time than BALD3. Performs well on datasets with repetition4. Performs well when the number of classes is high

Page 25: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

Future Work

● Estimating joint entropy with dropout samples is noisy● Training a deep model on a small dataset with dropout is difficult

○ Uncertainty of deterministic ResNet is (surprisingly) good in practice

● In Active Learning the labelled set does not follow the original data distribution○ This leads to difficulties with imbalanced datasets○ Random acquisition doesn’t have this problem

Page 26: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

Thanks!

Joost van Amersfoort*Andreas Kirsch* Yarin Gal

Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk

Page 27: NeurIPS 2019 Efficient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

Gal, Yarin, and Zoubin Ghahramani. "Dropout as a bayesian approximation: Representing model uncertainty in deep learning." international conference on machine learning. 2016.

Gal, Yarin, Riashat Islam, and Zoubin Ghahramani. "Deep bayesian active learning with image data." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.

Houlsby, Neil, et al. "Bayesian active learning for classification and preference learning." arXiv preprint arXiv:1112.5745(2011).