19
Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer 1 , Luca Costabello 2 , Chan Le Van 2 , Sumit Pai 2 , Christophe Gueret 2 , Georgiana Ifrim 1 , Freddy Lecue 2 16.09.19 1 Insight Centre for Data Analytics University College Dublin 2 Accenture Labs Dublin

Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,

Centre for Data Analytics

Background Knowledge Injection for Interpretable Sequence Classification

Severin Gsponer1, Luca Costabello2, Chan Le Van2, Sumit Pai2, Christophe Gueret2, Georgiana Ifrim1, Freddy Lecue2

16.09.19

1 Insight Centre for Data AnalyticsUniversity College Dublin

2 Accenture LabsDublin

Page 2: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,

Contributions

Groups:Regex-like expansion of traditional k-mers

Emb-SEQL:Injection of Background Knowledge into Sequence Learning Algorithm

Semantic Fidelity:Metric to quantify interpretability

2Insight Centre for Data Analytics

Page 3: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,

Symbolic Sequence Classification

Sequence Class

ABBCBAABCBABCBABBBCBABBABBCB +1CCBBABACAABBABBAAABBBCCBBABA -1ACBBCACCCBAABCBABCCABCAABCCA +1

BABACCBABCTABCBABBCAABCBBBCA ?

Σ = {A, B, C}

3Insight Centre for Data Analytics

k-mers: Consecutive sequences k of symbols2-mer: AB5-mer: BCTCB

Page 4: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,

SEQL - Sequence Learner

● Integrated approach

● Learns sparse k-mer based linear models

● Feature space of all possible k-mers

● Gauss-Southwell coordinate descent

● Iteratively add best k-mer to model

● Exploits structure in feature space

4Insight Centre for Data Analytics

Page 5: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,

SEQL - Sequence Learner with Groups

● Exploit structure in symbol space

● Use groups to gain more flexibility

● Groups are built by combining basic symbols with OR

● Groups predefined by user orautomatically generated

5Insight Centre for Data Analytics

Page 6: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,

Automatic Group Generation

6Insight Centre for Data Analytics

Page 7: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,

Emb-SEQL Pipeline

7Insight Centre for Data Analytics

Page 8: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,

Interpretability

● Interpretability is crucial in some problems

● Accuracy-interpretability trade-off

● Measuring interpretability of models is an open question

Semantic Fidelity intuition:

● Positive features should be “close” to target class

● Negative features should be “close” to non-target class

Functional grounded protocol as proxy measurement

8Insight Centre for Data Analytics

Page 9: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,

k-mer - Target Class Distance

Weighted Target Class Distance

Semantic Fidelity

Semantic Fidelity

9Insight Centre for Data Analytics

Page 10: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,

Experiment

Opportunity - Human Composite Activity Recognition (HAR) [1]:- Predict Composite Activity- Multiclass classification problem- Combinations of 5 low level features categories (|Σ| > 1400 symbols)

PhosphoELM - Protein Classification [2]:- Binary classification problem - Predict Kinase group- Amino acid sequences (|Σ| = 21 symbols; 438 sequences)

10Insight Centre for Data Analytics

Page 11: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,

Predict composite activity based on sequence of low-level activity

Composite Activity Recognition

11Insight Centre for Data Analytics

<stand, open, drawer, none, none><stand, none, none, grab, milk><stand, none, none, put, milk><stand, close, drawer, none, none> ...<sit, move, plate, move, cup>

Coffee timeEarly morningCleanupSandwich timeRelaxing

?

Seq

uenc

e

Page 12: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,

Composite Activity Recognition

12Insight Centre for Data Analytics

Sandwich time model

Atomic symbol embeddings

Page 13: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,

Results - Semantic Fidelity

13Insight Centre for Data Analytics

Page 14: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,

PCA Visualization

14Insight Centre for Data Analytics

Target: Coffee timeEmbedding: WordNet

Page 15: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,

Results - Classification Quality

15Insight Centre for Data Analytics

SCIS_MA and HMM results from [2]

Page 16: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,

Achievements & Conclusion

● Introduction of Groups, regex like k-mer symbols

● Generation of Groups from background knowledge sources

● Emb-SEQL, a method to learn sparse linear models

● Semantic Fidelity a way to measure interpretability

● Background knowledge injection improves interpretability measured by

Semantic Fidelity without hurting accuracy of learned model

16Insight Centre for Data Analytics

Page 17: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,

Limitations & Future Work

● High memory demand of Emb-SEQL for large Groups

● Clustering method and Group size is crucial

● Background knowledge source is needed

● Semantic Fidelity for non-linear models

● Human-based evaluation of Semantic Fidelity

17Insight Centre for Data Analytics

Page 18: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,

18Insight Centre for Data Analytics

Thank you!

Please email [email protected] if you have further questions

This work was funded by ScienceFoundation Ireland (SFI) under Insight Centre for Data Analytics (grant 12/RC/2289)

Page 19: Background Knowledge Injection for Interpretable Sequence ... · Centre for Data Analytics Background Knowledge Injection for Interpretable Sequence Classification Severin Gsponer1,

References

[1] D. Roggen, A. Calatroni, M. Rossi, T. Holleczek, K. Förster, G. Tröster, P. Lukowicz, D. Bannach, G. Pirkl, A. Ferscha, J. Doppler, C. Holzmann, M. Kurz, G. Holl,R. Chavarriaga, H. Sagha, H. Bayati, M. Creatura, and J. d. R. Millàn. Collecting complex activity datasets in highly rich networked sensor environments. In 7th International Conference on Networked Sensing Systems (INSS)

[2] C. Zhou, B. Cule, and B. Goethals. Pattern based sequence classification. IEEE Transactions on Knowledge and Data Engineering, 28(5):1285–1298, 2016.

19Insight Centre for Data Analytics