19
Heuristic Principal Component Analysis Based unsupervised Feature Extraction and its Application to Gene Expression Analysis of Amyotrophic Lateral Sclerosis Y-h. Taguchi, Dept. Phys., Mitsuo Iwadate, Dept. Bio. Sci., Hideaki Umeyama, Dept. Bio. Sci., Chuo University, Japan

Heuristic Principal Component Analysis Based unsupervised Feature Extraction and its Application to Gene Expression Analysis of Amyotrophic Lateral Sclerosis

Embed Size (px)

Citation preview

Page 1: Heuristic Principal Component Analysis  Based unsupervised Feature Extraction  and its Application to Gene Expression  Analysis of Amyotrophic Lateral Sclerosis

Heuristic Principal Component Analysis Based unsupervised Feature Extraction and its Application to Gene Expression 

Analysis of Amyotrophic Lateral Sclerosis

Y­h. Taguchi, Dept. Phys.,Mitsuo Iwadate, Dept. Bio. Sci.,Hideaki Umeyama, Dept. Bio. Sci.,

Chuo University, Japan

Page 2: Heuristic Principal Component Analysis  Based unsupervised Feature Extraction  and its Application to Gene Expression  Analysis of Amyotrophic Lateral Sclerosis

What is PCA based unsupervised FE?

 N features

Categorical multiclasses

In contrast to usual usage of PCA, not samples but features are embedded into Q dimensional space.

PC

A

PC1

samples

M samplesN × M Matrix X (numerical values)

PC2

PC1 ++ ++ +

+++

++ ++ ++

+

No distinction between classes

Page 3: Heuristic Principal Component Analysis  Based unsupervised Feature Extraction  and its Application to Gene Expression  Analysis of Amyotrophic Lateral Sclerosis

Synthetic example

10 samples10 samples

90 features 10 featuresN(0)N()

[N()+N(0)]/2

+:Top 10 outliersThus, extracting outliers selects features distinct between two classes in an unsupervised way.Accuracy:(100 trials)Accuracy:(100 trials) 89.5% ( 52.6% (

PC1

PC2

Page 4: Heuristic Principal Component Analysis  Based unsupervised Feature Extraction  and its Application to Gene Expression  Analysis of Amyotrophic Lateral Sclerosis

PC1:99%

f

PC2:0.4%

PC3:0.2% PC4:0.1%

PC loadings

Fogel, B. L., Cho, E., Wahnich, A. et al. Mutation of senataxin alters disease­specific transcriptional networks in patients with ataxia with oculomotor apraxia type 2. Hum. Mol. Genet. 23, 4758–4769 (2014).

AOA2 patients(○)vs Healthy Controls(△): Fibroblast cell line(2 vs 2)

PC1: 99% contribution with no distinction between patients and healthy controls.PC2,PC3,PC4 < 1% contributions by with distinctions

Application to ALS

Outliers extraction using PC2, PC3, PC4 scores attributed to genes

Page 5: Heuristic Principal Component Analysis  Based unsupervised Feature Extraction  and its Application to Gene Expression  Analysis of Amyotrophic Lateral Sclerosis

Z i=( PC 2iσPC 2

)2

+( PC 3 iσPC 3

)2

+( PC 4 iσPC 4

)2

→P(Z i<) 2 distribution

Adjusted by Benjamini­Hochberg

Pi<0.01708 genes are extracted

ith gene's PC scores

Identification of outliers 

Page 6: Heuristic Principal Component Analysis  Based unsupervised Feature Extraction  and its Application to Gene Expression  Analysis of Amyotrophic Lateral Sclerosis

Biological validation1:KEGG pathway analysis (DAVID)Alzheimer's, Parkinson's and Huntington's diseases

Other analyses identify biological terms, too.

Page 7: Heuristic Principal Component Analysis  Based unsupervised Feature Extraction  and its Application to Gene Expression  Analysis of Amyotrophic Lateral Sclerosis

13

708 211

〜20,000

Selected genes

ALS related genes (by Gendoo)

P = 4 × 10­4

708 genes have significant overlaps with ALS related genes

Page 8: Heuristic Principal Component Analysis  Based unsupervised Feature Extraction  and its Application to Gene Expression  Analysis of Amyotrophic Lateral Sclerosis

PC2:0.7% PC3:0.4%

○△+:3 mutated genes – normal controls → aberrant gene expression independent of mutated genes

Fogel, B. L., Cho, E., Wahnich, A. et al. Mutation of senataxin alters disease­specific transcriptional networks in patients with ataxia with oculomotor apraxia type 2. Hum. Mol. Genet. 23, 4758–4769 (2014).

ALS related genes: No.2Transfection of ALS related mutated genes to cell lines4 cell lines vs 3 mutared genes

Page 9: Heuristic Principal Component Analysis  Based unsupervised Feature Extraction  and its Application to Gene Expression  Analysis of Amyotrophic Lateral Sclerosis

715 genes extracted as outliers using PC2 and PC3 (BH criterion adjusted P <0.01)Biological validation2:KEGG pathway analysis (DAVID)Alzheimer's, Parkinson's and Huntington's diseases

Other analyses identify biological terms, too.

Page 10: Heuristic Principal Component Analysis  Based unsupervised Feature Extraction  and its Application to Gene Expression  Analysis of Amyotrophic Lateral Sclerosis

14

715 211

〜20,000

Selected genes 2

ALS related genes (by Gendoo)

P = 2 × 10­3

Page 11: Heuristic Principal Component Analysis  Based unsupervised Feature Extraction  and its Application to Gene Expression  Analysis of Amyotrophic Lateral Sclerosis

708 715

〜20,000

Selected genes 2

Selected genes

393

Page 12: Heuristic Principal Component Analysis  Based unsupervised Feature Extraction  and its Application to Gene Expression  Analysis of Amyotrophic Lateral Sclerosis

PPI analysisCount(degree)   degree∝ ­a

Count(betweenness) ∝

 betweeness­a

△:708genesO:715genes

degree

betw

eenn

ess Log­log plots

Real, regulatory networks

708genes 715genes

Page 13: Heuristic Principal Component Analysis  Based unsupervised Feature Extraction  and its Application to Gene Expression  Analysis of Amyotrophic Lateral Sclerosis

Identification of especially critical genes ← product set of top 100 genes 

29

100in

708

100 in 

715

Many ALS related genes

Selected genes 2

Selected genes

Page 14: Heuristic Principal Component Analysis  Based unsupervised Feature Extraction  and its Application to Gene Expression  Analysis of Amyotrophic Lateral Sclerosis

Network composed of a part of 29 genes

Page 15: Heuristic Principal Component Analysis  Based unsupervised Feature Extraction  and its Application to Gene Expression  Analysis of Amyotrophic Lateral Sclerosis

In silico drug discovery  (with In­Silico Sciences Inc.In­Silico Sciences Inc. )Target: CCR6CCR6

Facts:T helper type 17 (Th17) cells: known inflammatory factor. Increased in ALS patients bloodRegulatory T (Treg) cells: known anti­inflammatory factor. Decreased in ALS patients blood

Experimental autoimmune encephalomyelitis, EAE:CCL20 associated CCR6 induced, but CCL20 non­associated CCR6 not induced.   →Activation/inhibition of CCR6 may be therapy target of ALS

Cf. CCL20 associated CCR6 was once targeted for rheumatoid arthritis therapy

Page 16: Heuristic Principal Component Analysis  Based unsupervised Feature Extraction  and its Application to Gene Expression  Analysis of Amyotrophic Lateral Sclerosis

Methodology:

FAMSFAMS: Inference of Protein Structure from amino acid sequence (homology modeling)

ChooseLDChooseLD : comparative docking of drug compounds candidates (c.a. 1000 compounsd screened from AkosSamples, more than million candidates, by Tanimoto index with known agonists/antagonists)

Page 17: Heuristic Principal Component Analysis  Based unsupervised Feature Extraction  and its Application to Gene Expression  Analysis of Amyotrophic Lateral Sclerosis

Agonist candidates

Known agonist

Page 18: Heuristic Principal Component Analysis  Based unsupervised Feature Extraction  and its Application to Gene Expression  Analysis of Amyotrophic Lateral Sclerosis

Antagonist candidates Known antagonist

Page 19: Heuristic Principal Component Analysis  Based unsupervised Feature Extraction  and its Application to Gene Expression  Analysis of Amyotrophic Lateral Sclerosis

Conclusion

1. PCA based unsupervised FE applied to two exps. Using ALS fibroblastfibroblast cell lines.

2. Genes extracted were coincident as well as biologically highly feasible

3. Among those identified genes, CCR6CCR6 was selected as therapy target and in silico drug discovery was performed.