EDiT: Interpreting Ensemble Models via Compact Soft ... · Jaemin Yoo (SNU) 11 Idea 2: Sparse Weights(1) nWeight sparsification qImproves the efficiency by sparse weights nPropose

Jaemin Yoo (SNU) 1

EDiT: Interpreting Ensemble Models via Compact Soft Decision Trees

ICDM 2019

Jaemin YooSeoul National University

Seoul, South Korea

Lee SaelAjou University

Suwon, South Korea

Jaemin Yoo (SNU) 2

Outline

n Introductionn Proposed Methodn Experimentsn Conclusion

Jaemin Yoo (SNU) 3

Black Box Models

n Most ML models are black boxesq Learned structures are random and complexq Their decisions are not explainable

Image from https://www.investopedia.com/terms/b/blackbox.asp

https://www.investopedia.com/terms/b/blackbox.asp

Jaemin Yoo (SNU) 4

Interpretable ML

n Research to interpret a model’s decisionsq Important when each decision is irreversible

n Two types of interpretable models:q Linear modelsq Decision trees

n However, their accuracy is not good

Jaemin Yoo (SNU) 5

Ensemble Models

n Ensemble modelsq Combine the predictions of weak modelsq Produce robust and accurate predictions

n However, they have low interpretabilityq Decisions are made by hundreds of learners

Image from https://dsc-spidal.github.io/harp/docs/examples/rf/

https://dsc-spidal.github.io/harp/docs/examples/rf/

Jaemin Yoo (SNU) 6

Problem Definition

n Given a trained ensemble model 𝑀n Train an interpretable classifier 𝑆n Such that

q 𝑆 achieves similar accuracy to 𝑀q 𝑆 contains fewer parameters than 𝑀

Jaemin Yoo (SNU) 7

Outline


Jaemin Yoo (SNU) 8

Proposed Method

n Ensemble to Distilled Tree (EDiT)q Given an ensemble modelq Trains a compact soft decision tree

n Interpretable & more efficient than SDTs

n EDiT is based on three main ideasq Idea 1: Knowledge distillationq Idea 2: Weight sparsificationq Idea 3: Tree pruning

Jaemin Yoo (SNU) 9

Preliminary: SDTs

n SDTs are interpretable tree-based modelsq Each internal node is a linear classifierq Each leaf node learns a probability distribution

Image from “Rule-Extraction from Soft Decision Trees” (L. Huang, M. Hsieh, and M. Rajati, BDAI 2019)

Jaemin Yoo (SNU) 10

Idea 1: Distillation

n Knowledge distillationq Transfers the knowledge of a teacher to a student

n Replace the labels 𝐲 in training data 𝒟 as

𝐲% ←𝑀 𝐱% + 𝐲%

2 for each 𝐱%, 𝐲% in 𝒟

q 𝐱% is a feature vector that corresponds to 𝑦%

Jaemin Yoo (SNU) 11

Idea 2: Sparse Weights (1)

n Weight sparsificationq Improves the efficiency by sparse weights

n Propose three different approachesq 1) L1 regularization

n Adds an L1 regularizer to the loss functionq 2) Weight masking

n Inactivates randomly some of the weightsq 3) Weight pruning

n Prunes weights whose learned values are small

Jaemin Yoo (SNU) 12

Idea 2: Sparse Weights (2)

n Weight masking: 2 steps

n Weight pruning: 3 steps

Random masking Training

Pre-training Pruning Fine-tuning

Jaemin Yoo (SNU) 13

Idea 3: Tree Pruning

n Tree pruningq Removes nodes of small arrival probabilities q Enables a large depth to be adopted

n Tree pruning vs. weight pruningq Weight pruning removes redundant weightsq Tree pruning removes redundant tree nodes

Jaemin Yoo (SNU) 14

Summary

n Result of applying our ideas to an SDT

q Sparse weights from sparsificationq Narrow tree structure from tree pruning

Jaemin Yoo (SNU) 15

Outline


Jaemin Yoo (SNU) 16

Accuracy & Complexity

n Does EDiT outperform the baselines?n EDiT shows the best balance in all cases

q High accuracy with only a few parameters

Jaemin Yoo (SNU) 17

Sparsification Methods

n Which is the best sparsification method?n Weight pruning works generally the best

q L1 regularization fails even with large 𝜆

Jaemin Yoo (SNU) 18

Outline


Jaemin Yoo (SNU) 19

Conclusion

n Ensemble to Distilled Tree (EDiT)q Our approach to interpret ensemble models

n Idea 1: Knowledge distillationn Idea 2: Weight sparsificationn Idea 3: Tree pruning

n EDiT gives the most efficient predictionsq Accuracy: DT ≪ RF ≈ SDT ≈ EDiTq Parameters: DT ≈ EDiT ≪ SDT < RF

Jaemin Yoo (SNU) 20

Thank you !GitHub: https://github.com/leesael/EDiT

Documents

EDiT: Interpreting Ensemble Models via Compact Soft ... · Jaemin Yoo (SNU) 11 Idea 2: Sparse Weights(1) nWeight sparsification qImproves the efficiency by sparse weights nPropose