Upload
y-h-taguchi
View
131
Download
2
Embed Size (px)
Citation preview
Heuristic Principal Component Analysis Based unsupervised Feature Extraction and its Application to Gene Expression
Analysis of Amyotrophic Lateral Sclerosis
Yh. Taguchi, Dept. Phys.,Mitsuo Iwadate, Dept. Bio. Sci.,Hideaki Umeyama, Dept. Bio. Sci.,
Chuo University, Japan
What is PCA based unsupervised FE?
N features
Categorical multiclasses
In contrast to usual usage of PCA, not samples but features are embedded into Q dimensional space.
PC
A
PC1
samples
M samplesN × M Matrix X (numerical values)
PC2
PC1 ++ ++ +
+++
++ ++ ++
+
No distinction between classes
Synthetic example
10 samples10 samples
90 features 10 featuresN(0)N()
[N()+N(0)]/2
+:Top 10 outliersThus, extracting outliers selects features distinct between two classes in an unsupervised way.Accuracy:(100 trials)Accuracy:(100 trials) 89.5% ( 52.6% (
PC1
PC2
PC1:99%
f
PC2:0.4%
PC3:0.2% PC4:0.1%
PC loadings
Fogel, B. L., Cho, E., Wahnich, A. et al. Mutation of senataxin alters diseasespecific transcriptional networks in patients with ataxia with oculomotor apraxia type 2. Hum. Mol. Genet. 23, 4758–4769 (2014).
AOA2 patients(○)vs Healthy Controls(△): Fibroblast cell line(2 vs 2)
PC1: 99% contribution with no distinction between patients and healthy controls.PC2,PC3,PC4 < 1% contributions by with distinctions
Application to ALS
Outliers extraction using PC2, PC3, PC4 scores attributed to genes
Z i=( PC 2iσPC 2
)2
+( PC 3 iσPC 3
)2
+( PC 4 iσPC 4
)2
→P(Z i<) 2 distribution
Adjusted by BenjaminiHochberg
Pi<0.01708 genes are extracted
ith gene's PC scores
Identification of outliers
Biological validation1:KEGG pathway analysis (DAVID)Alzheimer's, Parkinson's and Huntington's diseases
Other analyses identify biological terms, too.
13
708 211
〜20,000
Selected genes
ALS related genes (by Gendoo)
P = 4 × 104
708 genes have significant overlaps with ALS related genes
PC2:0.7% PC3:0.4%
○△+:3 mutated genes – normal controls → aberrant gene expression independent of mutated genes
Fogel, B. L., Cho, E., Wahnich, A. et al. Mutation of senataxin alters diseasespecific transcriptional networks in patients with ataxia with oculomotor apraxia type 2. Hum. Mol. Genet. 23, 4758–4769 (2014).
ALS related genes: No.2Transfection of ALS related mutated genes to cell lines4 cell lines vs 3 mutared genes
715 genes extracted as outliers using PC2 and PC3 (BH criterion adjusted P <0.01)Biological validation2:KEGG pathway analysis (DAVID)Alzheimer's, Parkinson's and Huntington's diseases
Other analyses identify biological terms, too.
14
715 211
〜20,000
Selected genes 2
ALS related genes (by Gendoo)
P = 2 × 103
708 715
〜20,000
Selected genes 2
Selected genes
393
PPI analysisCount(degree) degree∝ a
Count(betweenness) ∝
betweenessa
△:708genesO:715genes
degree
betw
eenn
ess Loglog plots
Real, regulatory networks
708genes 715genes
Identification of especially critical genes ← product set of top 100 genes
29
100in
708
100 in
715
Many ALS related genes
Selected genes 2
Selected genes
Network composed of a part of 29 genes
In silico drug discovery (with InSilico Sciences Inc.InSilico Sciences Inc. )Target: CCR6CCR6
Facts:T helper type 17 (Th17) cells: known inflammatory factor. Increased in ALS patients bloodRegulatory T (Treg) cells: known antiinflammatory factor. Decreased in ALS patients blood
Experimental autoimmune encephalomyelitis, EAE:CCL20 associated CCR6 induced, but CCL20 nonassociated CCR6 not induced. →Activation/inhibition of CCR6 may be therapy target of ALS
Cf. CCL20 associated CCR6 was once targeted for rheumatoid arthritis therapy
Methodology:
FAMSFAMS: Inference of Protein Structure from amino acid sequence (homology modeling)
ChooseLDChooseLD : comparative docking of drug compounds candidates (c.a. 1000 compounsd screened from AkosSamples, more than million candidates, by Tanimoto index with known agonists/antagonists)
Agonist candidates
Known agonist
Antagonist candidates Known antagonist
Conclusion
1. PCA based unsupervised FE applied to two exps. Using ALS fibroblastfibroblast cell lines.
2. Genes extracted were coincident as well as biologically highly feasible
3. Among those identified genes, CCR6CCR6 was selected as therapy target and in silico drug discovery was performed.