30
Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France

Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications Andrei Zinovyev Institute des Hautes Etudes Scientifique, France

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Non-linear Principal Manifoldsa Useful Tool in Bioinformatics and Medical Applications

Andrei ZinovyevInstitute des Hautes Etudes

Scientifique,France

Plan of the talk

Object of study Definition of principal manifold

(PM) Constructing PMs: elastic maps Examples of biomedical

applications

Principal manifoldsElastic maps framework

SVM

Principal manifolds

Regression,approximation

Supervisedclassification

K-means

SOM

Clustering

Multidim.scaling

VisualizationPCA

Factor analysis

LLE ISOMAP

Non-linearData-miningmethods

Finite set of objects in RN

X i

i=1..m

IRIS database

Petal heght

Petal width

Sepal width

Sepal height

SPECIES

4.9 3 1.4 0.2 Iris-setosa

4.7 3.2 1.3 0.3 Iris-setosa

4.6 3.1 1.5 0.2 Iris-setosa

7 3.2 4.7 1.4 Iris-versicolor

6.4 3.2 4.5 1.5 Iris-versicolor

6.9 3.1 4.9 1.5 Iris-versicolor

6.3 3.3 6 2.5 Iris-virginica

5.8 2.7 X 1.9 Iris-virginica

7.1 3 5.9 2.1 Iris-virginica

6.3 2.9 5.6 1.8 Iris-virginica

Mean point

m

iiX

mX

1

1

min1

2

m

ii XX

K-meansclustering

min1

2

m

ii YclosestX

Principal “Object”

,

min1

2

m

i

Principal Component Analysis

,

Max

imal

disp

ersio

n

1st Principalaxis

2nd principalaxis

Principal manifold

What do we want?

Non-linear surface (1D, 2D, 3D …) Smooth and not twisted The data model is unknown Speed (time linear with Nm) Uniqueness

Fast way to project datapoints

Metaphor of elasticity

Datapoints

Graphnodes

U(Y)U(E), U(R)

Constructing elastic nets

y E (0) E (1) R (1) R (0) R (2)

Definition of elastic energy

)()()( REY UUUU

2)(

1

)(

)()(

1 ijp

i Kx

Y yXN

Uij

2)()(

1

)( )0()1( iis

ii

E EEU

r

i

iiii

R RRRU1

2)()()()( )0(2)2()1(.

E (0) E (1)

R (1) R (0) R (2)

y

Xj

00 , ii

Elastic manifold

Global minimum and softening

0, 0 103

0, 0 102

0, 0 101

0, 0 10-1

Adaptive algorithms

Growing net

Adaptive net

Refining net:

Idea of scaling:

Projection onto the manifold

Closest node of the net

Closest point of the manifold

Colorings: visualize any function

Density visualization

Example: different topologies

RN

R2

VIDAExpert tool and elmap C++ package

Regression and principal manifolds

regression principal component

x

F(x)

min2 ii Pxx min)(

2 ii xFx

Data

Gen.curve

Grid

Image skeletonization or clustering around curves

Approximation of molecular surfaces

Application: economical data

Gross output

Density

ProfitGrowth temp

Medical table1700 patients with infarctus myocarde

Lethal casesPatients map, density

Medical table1700 patients with infarctus myocarde

128 indicators

Age Numberof infarctusin anamnesis

Stenocardia functionalclass

Codon usage in all genes of one genome

Escherichia coli Bacillus subtilis

Majority of genes

Highly expressed genes

“Foreign” genes

“Hydrophobic” genes

Golub’s leukemia dataset3051 genes, 38 samples (ALL/B-cell,ALL/T-cell,AML)

ALL sample AML sample

Map of genes: vote for ALL vote for AML used by T.Golub used by W.Lie

Golub’s leukemia datasetmap of samples: AML ALL/B-cell ALL/T-cell

density

Cystatin C Retinoblastomabinding protein P48

CA2 Carbonic anhydrase II

X-linked Helicase II

Thank you for your attention!

Questions?