View
12
Download
0
Category
Preview:
Citation preview
Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep
Neural Networks
Jack Lanchantin, Ritambhara Singh, Beilun Wang, Yanjun Qi
1
University of Virginia, Department of Computer Science
deepmotif.org
2University of Virginia
“Dog”
DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
3University of Virginia
“Dog”
DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
4
“Dog”
ATGCGATCAAGTCTG “Protein Binding Site”
DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
“Dog”
ATGCGATCAAGTCTG “Protein Binding Site”
5
ATGCGATCAAGTCTG “Protein Binding Site”
6DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
???
ATGCGATCAAGTCTG “Protein Binding Site”
Deep Motif Dashboard: Opening the black box for deep-learning based genomic sequence classifications
7DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
8DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
Deep Motif Dashboard: Opening the black box for deep-learning based genomic sequence classifications
ATGCGATCAAGTCTG “Protein Binding Site”
IntroductionTFBS Classification TaskNeural ModelsVisualization MethodsEvaluation and Results
9DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
GCGACGAATCG AACGATATGCT CATATCATTTC TGTCAAG CTCGAGTC TATCAAGCTG
Transcription Factor Binding Sites (TFBSs)
TF1 TF1TF2 TF3
10
TF2TF3
DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
AACGATATGCT
TGTCAAGCAAG
ATATCGATATA
AGCATATGCGA
TF1 ( )
TFBS Classification Datasets
11
GCGACGAATCG
CTCGAGTCTCA
CGATATGCTTC
AAGAAGCATTA
CATATCATTTC
TATCAAGCTGG
CGAATGCATAC
ACGACGATTAT
TF2 ( )
TF3 ( )
DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
Deep Motif (DeMo) Dashboard Approach
GAAGCTTGTACGCTATGGACTCGATCGAATCGCATGTCATGAGATCATGCTTCATCTCTCGATCGAATCGCATATGTGTCAACTATGCTCTCGAA
1.TFBS
NO TFBSTFBSTFBS
NO TFBS
12DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
Deep Motif (DeMo) Dashboard Approach
GAAGCTTGTACGCTATGGACTCGATCGAATCGCATGTCATGAGATCATGCTTCATCTCTCGATCGAATCGCATATGTGTCAACTATGCTCTCGAA
1.
2.
TFBSNO TFBS
TFBSTFBS
NO TFBS
13DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
IntroductionTFBS Classification TaskNeural ModelsVisualization MethodsEvaluation and Results
14DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
1. Convolutional (CNN)2. Recurrent (RNN)
3. Convolutional-Recurrent (CNN-RNN)
Neural Network Models
15DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
3 Neural Models
16
Input SequenceProbability ofBinding Site
1. Convolutional (CNN) (short local patterns, or motifs)
3. Convolutional- Recurrent (CNN-RNN)
DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
2. Recurrent (RNN) (long term dependencies)
(long term dependencies among motifs)
IntroductionTFBS Classification TaskNeural ModelsVisualization MethodsEvaluation and Results
17DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
1. Saliency Maps2. Temporal Output Values
3. Class Optimization
Visualization Methods
18DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
1. Saliency Map
Which nucleotides are most important for classification?
positive binding siteX S+
19DeMo Dashboard - Lanchantin, Singh, Wang, & Qi
1. Saliency Map
positive binding siteX S+
20
= “saliency map”
DeMo Dashboard - Lanchantin, Singh, Wang, & Qi
1. Saliency Map
Positive sentimentX S+
21
This movie has one of the best plots I have seen
= important for classification
This movie has one of the best plots I have seen
DeMo Dashboard - Lanchantin, Singh, Wang, & Qi
1. Saliency Map
Positive Test Sequence
Saliency Map
= important nucleotide for prediction
positive binding siteX S+
22DeMo Dashboard - Lanchantin, Singh, Wang, & Qi
2. Temporal Output Values
What are the model’s predictions at each timestep of the DNA sequence?
positive binding siteX S+
23DeMo Dashboard - Lanchantin, Singh, Wang, & Qi
2. Temporal Output Values
Check the RNN’s prediction scores when we vary the input of the RNN starting from the beginning to the end of a sequence.
positive binding siteX S+
24DeMo Dashboard - Lanchantin, Singh, Wang, & Qi
I don’t like the actors, but I really enjoyed this movie
2. Temporal Output Values
positive sentimentXS+
I don’t like the actors, but I really enjoyed this movie
= negative sentiment = positive sentiment25
DeMo Dashboard - Lanchantin, Singh, Wang, & Qi
2. Temporal Output Values
positive binding siteX S+
Positive Test Sequence
RNN Forward Output
RNN Backward Output
= negative binding site prediction = positive binding site prediction 26DeMo Dashboard - Lanchantin, Singh, Wang, & Qi
1. Saliency Maps2. Temporal Output Values
3. Class Optimization
Visualization Methods
27
SequenceSpecific
TF Specific
DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
3. Class Optimization
For a particular TF, what does the optimal binding site sequence look like?
? positive binding site for TF “CBX3”
28DeMo Dashboard - Lanchantin, Singh, Wang, & Qi
3. Class Optimization
positive binding site for TF “CBX3”
Where X is the input sequence and the score S+ is probability of sequence X being a positive binding site
29DeMo Dashboard - Lanchantin, Singh, Wang, & Qi
Optimal binding site for TF “CBX3”
3. Class Optimization
positive binding site for TF “CBX3”
30DeMo Dashboard - Lanchantin, Singh, Wang, & Qi
IntroductionTFBS Classification TaskNeural ModelsVisualization MethodsEvaluation and Results
31DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
Experimental Setup
Dataset● Alipanahi et al. “Predicting the sequence specificities of DNA- and
RNA-binding proteins by deep learning”. Nature Biotechnology 2015.● 108 cancer cell TFs (train separate model for each TF)● Each sequence is 101-length centered around ChIP-seq peak
Models● Test several variations of 3 different models (CNN, RNN, CNN-RNN)
Evaluation● Compare models using AUC scores on test set● Evaluate visualization methods manually and by motif matching
32DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
Model Accuracy (AUC Scores)
Our Models33
DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
1. Saliency Maps
34DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
= important nucleotide for prediction
GATA1
2. Temporal Output Values
35DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
= positive binding site prediction
= negative binding site prediction
NFYB
Saliency Map AND Temporal Output Values
36DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
NFYB
3. Class Optimization
37DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
GATA1
DeMo Dashboard
38DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
Deep Motif (DeMo) Dashboard Contributions and Results
1. Comparative analysis of 3 different neural models on TFBS task
● CNN-RNNs perform the best
2. Presented 3 different visualization techniques to understand the predictions of neural models
● Although TFBSs are influenced by motifs, the interactions among motifs are also important
39DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
Thank You!
40
UVA Machine Learning and Biomedicine Group
Ritambhara Singh Beilun Wang Dr. Yanjun Qi
code available at: deepmotif.org
DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia
Recommended