33
1 Prof. Yechiam Yemini (YY) Computer Science Department Columbia University Chapter 5: Microarray Techniques 5.3 Classification & Machine Learning Techniques 2 Overview Principal components analysis (PCA) Linear classifiers; perceptrons; neural nets.. SVM Classifiers

Chapter 5: Microarray Techniques

  • Upload
    vukhue

  • View
    226

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chapter 5: Microarray Techniques

1

Prof. Yechiam Yemini (YY)

Computer Science DepartmentColumbia University

Chapter 5: Microarray Techniques

5.3 Classification & Machine Learning Techniques

2

Overview

Principal components analysis (PCA) Linear classifiers; perceptrons; neural nets.. SVM Classifiers

Page 2: Chapter 5: Microarray Techniques

2

3

gm

g1g2

gi

T1T2 TnTj

Genes

Tests/experiments/samples/…

Heat map X

Microarray Heat Map Microarray measurements may be

organized in a heat-map matrix Row represent genes Columns represent tests Xij=expression level of gi under test Tj Expression level is visualized via colors

Green= under expressed (down regulated)Red = over expressed (up regulated)

4

Heat Map Provides Expressions Profiles

gm

g1g2

gi

T1T2 TnTj

Genes

Tests/experiments/samples/conditions

Tj

Test expression profile

gi

Gene expression profile

Xij

Page 3: Chapter 5: Microarray Techniques

3

5

Microarray Experiments

There are two typical experiments:

Differentiation Compare expression levels under different conditions A test Tj represents expression levels of a condition E.g., cancer or drug-treated cell vs. normal cell

Temporal expression Explore temporal evolution of expression levels A test Tj represents expression levels at a given time E.g., study cell response to heat-shock, starvation

gi

Tj

gi

Gene expression profile

Xij

Tj

6

Some Basic Geometry Genes/tests may be modeled as n dimensional vectors

Define tj=(0,0…0,1,0….0) then gi=ΣXijti

Differentiation example: t1,t2 have similar expression by g2 and g3 (X31=X32 X21~X22)But g1 provides good differentiation of t1 and t2

j1 n

X32X31

g1

g2

g3

X12

X11

X22X21

t2=(X12 ,X22, X32)t1 =(X11 ,X21, X31)

g1

g2

t1 t2

g3

Page 4: Chapter 5: Microarray Techniques

4

7

Principal ComponentAnalysis

8

Goal: Reduce Dimensionality

g2 , g3 do not differentiate

g1

g2g3=0g1

g1 maximizes differentiation

differentiation ~ variance

g1

g2

g3

t2

g1

g2

t1

g3

t1 t2

t1 t2

Page 5: Chapter 5: Microarray Techniques

5

9

Key Idea: Change Coordinates

Change coordinates to “virtual genes”maximizing differentiation (variance)

g1

g2g3

g1

g2g3

p1

p3

p2

p1

p2

p1

Reduce non-informative dimensions (noise)

t2

g1

g2

t1

g3

10

Maximizing Variance Given n samples {xk} of a random m-dimensional vector x

Assume, for simplicity, that E[x]=0

Which direction w (||w||=1) maximizes the variance VAR[wTx]? Define an nxm data matrix X whose columns are the samples {xk} VAR[wTx]=E[(wTx)(xTw)]= E[wTxxTw)]= wTE[xxT]w= =(1/n2) wT[ΣxrxT

s]w =(1/n2)wTXXTw Therefore, maximizing variance is equivalent to: Max{wTCw | wTw=1}

where C=XXT is the auto-covariance matrix of x

How do principal eigenvectors arise? Use Lagrangian to solve the constrained quadratic optimization L(w,λ)=wTCw-λwTw; the solution must satisfy 0=gradwL= Cw-λw Therefore Cw=λw w is an eigenvector. Furthermore wTCw =λwTw=λ the variance maxing direction is the eigenvector with largest eigenvalue

x

yz

{wTCw | wTw=1}is an ellipsoid whose axesare the eigenvectors of C

Page 6: Chapter 5: Microarray Techniques

6

11

Principal Components Analysis Represent the data in the eigenvectors space

Compute autocovariance: XTX Eigenvectors of XTX are the principal coordinates Principal coordinates maximize residual variance Eigenvalues correspond to maximal residual variance

Use Singular Value Decomposition (SVD) to compute PCA Compute factorization: XTX = U Λ UT

The transformation to principal coordinates is: y=Ux This PCA coordinate change is also called: Karhunen-Loeve transform

Eliminate eigenvectors with small eigenvalues Project data unto a subspace with maximal residual variance This reduces dimensionality while maxing discrimination

12

PCA Through Example

(Orly Alter et al., PNAS, 2000, 97(18) 10101-10106)

Cell cycle analysisTime

Genes

Page 7: Chapter 5: Microarray Techniques

7

13

PCA AnalysisArrays

gene

sgenes

X

XTA

rray

s

Arrays

XTXCovariance

Arr

ays

XTX = U Λ UT

Eigenvalues

SVD Decomposition

ΛU

Eige

ngar

rays

Data: (5981 genes) x(14 arrays)PCA: reduce dimensionalitySVD for array-space

14

Results

Fig. 7. SVD of the normalized and sorted elutriation data. Raster display of data with overexpression(red), no change in expression (black), and underexpression (green). Showing a linear transformation of the data from the 5, 981-genes × 14-arrays space to the reduced diagonalized 14-eigenarrays × 14-eigengenes space using the 5, 981-genes × 14-eigenarrays and 14-eigengenes × 14-arrays basis sets.

Alter, Brown & Botstein PNAS | August 29, 2000 | vol. 97 | no. 18 | 10101–10106 | AppendixA1–A6

Page 8: Chapter 5: Microarray Techniques

8

15

Notes On PCA Effective in reducing dimensionality More predictable and analyzable than clustering Intuitive interpretation

Eigengene = linear combination of gene profiles maxing variance Let Pk be the projection on the subspace Uk=Span{u1,u2…uk}; uk+1

maximizes the residual variance of the projections {(I- Pk)gi} SVD is often simpler to compute in array-space

Results may be applied and interpreted in gene-space

g1

g2g3

16

Linear Classifiers

Page 9: Chapter 5: Microarray Techniques

9

17

Basic Classification ConceptsGiven: sample data {Xk} and class association Yk∈{-1,1}Goal: find a “good” function f(X) such that Y=sgn[f(X)]

There are numerous classification techniques Classical statistics machine learning… We consider only basics

Supervised Learning input {Xk,Yk}; output f(X) Avoid over-fitting

X1

X2

18

Linear ClassifierClassifier is a hyperplane f(x)=wTx-b=w1x1+w2x2-by=sgn(wTx-b) classifies xSimplify this: y=sgn(wx) w=(w1,w2,b), x=(x1,x2,-1)

X1

X2 w

b

f(x)=wTx-b

f(x)=0 f(x)>0f(x)<0

xwTx-b

Page 10: Chapter 5: Microarray Techniques

10

19

Neural Networks (NN) BackgroundHistory:

McCulloch-Pitt [1943]: synaptic connection as a linear classifierHebb [49]: reinforcement learningRosenblatt [57]: the perceptron training algorithmMinsky & Papert [69]: what can a perceptron compute?Starting mid 80’s large explosion of NN models and apps…

Rosenblatt’s model:

Retina Associative units

Response unit

Fixed weights

Variable weights

Step activation function

firing rule:x1w1 + x2w2 + b > 0

1

X2

Yw1

w2

X1

b

20

The Perceptron Training RuleTraining data {x(k),y(k)}Weight update rule: w(t+1)w(t) +ηε(t)x(t)

ε(t)=y(t)-sgn[w(t)x(t)] is the classification error which is 0, -2, or 2 η is constant (some variants use η=η(t), e.g., η(t)=1/t2 )

+ηε(t)x(t)

x(t)

w(t+1)

X1

X2 w(t)

x(t)

w(t+1)

X1

X2 w(t)

Page 11: Chapter 5: Microarray Techniques

11

21

Training Geometry Initialize: w(0)=(1,-1) η=1/2 Iterate: w(t+1)w(t)+ηε(t)x(t); ε(t)=y(t)-sgn[w(t)x(t)]

W(1)=(1,-1)+(1/2)*2*(0,2)=(1,1)

W(0)=(1,-1)

X(1)=(0,2)

W(0)=(1,-1)

misclassified

W(2)=(1,1)-(2,-1)=(-1,2)

W(1)=(1,1)

X(2)=(2,-1)

W(t)

±X(t)W(t+1)Add/subtract misclassified points

22

Trainingw(t+1)w(t)+ηε(t)x(t); ε(t)=y(t)-sgn[w(t)x(t)]

W(3)=(-1,2)+(2,0)=(1,2)

W(2)=(-1,2)

X(3)=(2,0)

W(3)=(-1,2)+(2,0)=(1,2)

W(3)=(1,2)

W(3)=(-1,1)+X(1)-x(2)+X(3)

Classifier found

Page 12: Chapter 5: Microarray Techniques

12

23

More GenerallyA linear classifier y=sgn[f(x)]=sgn[wTx+w0]The classifier may be represented as: f(x)=(wT,w0) x 1

The perceptron training problem:Given: a training sample S={x(k),y(k)}Compute: w such that y=sgn[wTx] is consistent with S

24

Training To Classify Temporal Expression Classify temporal gene profile curves as convex~ 1 or concave ~ -1

Note: co-expression is revealed through montonicity & convexity

Training patterns:

Initialize weights w(0)=(1,1,1,1,1,0); η=1/2 It takes 3 iterations to converge (see table) to the classifier

f(X)=sgn[(-2,1,3,0,-2)X]

X(1)=(0,2,3,1,0)y(1)=1

X(2)=(0,1,2,2,0)y(2)=1

X(3)=(1,3,2,1,0)y(3)=1

X(4)=(3,2,1,2,3)y(4)=-1

X(5)=(2,1,1,2,3)y(5)=-1

3210t

no errors (stop)-2,1,3,0,-2,020,2,3,1,0,1-2,-1,0,-1,-2,-1-23,2,1,2,3,11,1,1,1,1,000,2,3,1,0,11,1,1,1,1,0

ε(t) x(t)W(t) w(t+1)w(t)+ηε(t)x(t); ε(t)=y(t)-sgn[w(t)x(t)]

Page 13: Chapter 5: Microarray Techniques

13

25

Example Notes Does f(X)=sgn[(-2,1,3,0,-2)X] discriminate convex from concave?

Try convex X=(2,4,2,1,0) f(X)=1; concave X=(2,0,1,3,4) f(X)=-1 But for X=(5,6,1,0,0) f(X)=-1; X=(0,3,4,2,0) f(X)=1; both are classification errors

What did the perceptron learn from the training samples? Assign heavily negative weights to the extremes; positive weights to the middle Concave curves are higher at extremes; convex ones are higher at middle points Training discovers weights distinguishing class features Classification errors occur when patterns are mismatched with these features

X(1)=(0,2,3,1,0)y(1)=1

X(2)=(0,1,2,2,0)y(2)=1

X(3)=(1,3,2,1,0)y(3)=1

X(4)=(3,2,1,2,3)y(4)=-1

X(5)=(2,1,1,2,3)y(5)=-1

26

Linear Classifiers May Be Generalized Linear classifiers are limited and sensitive to noise Would like to generalize them to admit non-linearity & noise Generalize to multilayer Neural Networks

Can handle non-linearity but Have limited noise resiliency, are difficult to scale, train or interpret… Convergence problems of gradient learning (e.g., local minima).. Unclear how to avoid overfitting.. Have limited use in handling microarray data

Generalize to Support Vector Machines (SVM) Retain simplicity while offering new capabilities

A multilayer NN

Page 14: Chapter 5: Microarray Techniques

14

27

Support Vector Machines(SVM)

28

Generalizing The PerceptronThe perceptron rule may be rewritten as:

if y(t)w(t)x(t)<0 then w(t+1)w(t)+ηy(t)x(t)This means that w=Σα(t)y(t)x(t) where α(t)>0

Learning: compute α(t) from sample data {y(t),x(t)}

There could be many separating hyperplanesSVM: find the “best” hyperplane =max margins Leads to a quadratic optimization problem

X1

X2

X1

X2Support vectors

Page 15: Chapter 5: Microarray Techniques

15

29

The Dual Learning Rule Rewrite the classifier: f(x)=<wx>+b=Σα(t)y(t)<x(t)x>+b f(x)=Σα(t)y(t)K(x(t),x)+b

K(x,z)=x.z -- the kernel --measures correlation between x(t) and xDual learning rule:

if y(i)[Σα(t)y(t)K(x(t)x(i))+b]<0 then α(i)α(i)+η

X1

Predicted classificationReal classification

30

Training A Linear SVM

Compute the kernel matrixIterate the SVM learning rule

Computing Kernel

if y(i)[Σα(t)y(t)K(t,i)+b]<0 then α(i)α(i)+η

Page 16: Chapter 5: Microarray Techniques

16

31

An SVM Training ExampleConsider again the convex/concave classification

Compute a linear kernel

X(1)=(0,2,3,1,0)y(1)=1

X(2)=(0,1,2,2,0)y(2)=1

X(3)=(1,3,2,1,0)y(3)=1

X(4)=(3,2,1,2,3)y(4)=-1

X(5)=(2,1,1,2,3)y(5)=-1

0,2,3,1,0,10,1,2,2,0,11,3,2,1,0,13,2,1,2,3,12,1,1,2,3,1

X= K=XTX=

0,0,1,3,22,1,3,2,13,2,2,1,11,2,1,2,20,0,0,3,31,1,1,1,1

0,0,1,3,22,1,3,2,13,2,2,1,11,2,1,2,20,0,0,3,31,1,1,1,1

=

x(1)…..x(5)

15,11,14,10, 811,10,10, 9, 814,10,16,14,1010, 9,14,28, 23 8, 8,10,23,20

32

Training Initialize: η=0.5 α=(-1,0,1,0,0)

Iterate:K=

15,11,14,10, 811,10,10, 9, 814,10,16,14,1010, 9,14,28, 23 8, 8,10,23,20

if y(j)[Σα(t)y(t)K(t,j)]<0 then α(j)α(j)+η

1

5432

j-1

0010

-0.5

0010

+0.5

Iterations of α

-1

-2-42-1

Values of y(j)[Σα(t)y(t)K(t,j)]

-0.5

00.510

+0.5

6.5

-6-99

4.5

-0.5

00.51

0.5+0.5

1.5

5.5520

7

1.50.575

terminate

-0.5

00.51

0.5

>0

y=(1,1,1,-1,-1)

Page 17: Chapter 5: Microarray Techniques

17

33

SVM Example ContinuedTraining result: α=(-0.5,0.5,1,0.5,0)=0.5(-1,1,2,1,0)

Computing the SVM classifier:f(x)=<[Σα(t)y(t)x(t)],x>=<w,x>w =Σα(t)y(t)x(t)=0.5(-x(1)+x(2)+2x(3)-x(4))= =0.5(-1,3,2,1,-3,1)

Classifier: f(x)= -x1+3x2+2x3+x4-3 x5+1

X(1)=(0,2,3,1,0)y(1)=1

X(2)=(0,1,2,2,0)y(2)=1

X(3)=(1,3,2,1,0)y(3)=1

X(4)=(3,2,1,2,3)y(4)=-1

X(5)=(2,1,1,2,3)y(5)=-1

34

Notes On SVM TrainingWhat did the SVM classifier learn about convexity?

f(x)= -x1+3x2+2x3+x4-3 x5+1 much like perceptron, assigns negativeweights to the extremes and positive to the middle

Consider the samples misclassified by the perceptron:X1=(5,6,1,0,0); X2=(0,3,4,2,0)

The SVM classifier classifies X1 correctly but errs in classifying X2 (What is the source of the error? What training samples can improve

this?)

Page 18: Chapter 5: Microarray Techniques

18

35

Handling Non-Separable Data

When data is not separable use soft-margins Optimize margins given a relative cost of error

X1

X2

X1

X2

36

Generalization To Kernel MachinesSimplified approach to non-linear classification

Map data to feature space via non-linear transformationUse linear classification in feature space F may have different dimension than X (e.g., reduce dimensionality)

o

x

xx

xx

x

o

o

o

o o

X Fф

Ф(x)Ф(x)

Ф(x)

Ф(x)

Ф(x)

Ф(x)Ф(o)

Ф(o)

Ф(o)

Ф(o)Ф(o)

Ф(o)

Page 19: Chapter 5: Microarray Techniques

19

37

Linear Classification in Feature Space Consider the classification in feature space:

f(x)=Σα(t)y(t)<φ(x(t))φ(x)>+b

Define the Kernel of the transformation: K(u,v)=<φ(u),φ(v)>

The kernel specifies the “feature space” classifier:f(x)=Σα(t)y(t)K(x(t),x)+b

Example Kernel Functions:1) Polynomial, Φ(xi,xj)=(xi xj +1)d

2) Gaussian, Φ(xi,xj)=e-|| xi-xj||/σ2

38

Example: Polynomial KernelK(xi,xj)=(1 + xi

Txj)2

K(xi,xj)=1+ xi12xj1

2 + 2 xi1xj1 xi2xj2+ xi2

2xj22 + 2xi1xj1 + 2xi2xj2

= [1, xi12, √2 xi1xi2 ,xi2

2 ,√2xi1 ,√2xi2]T [1,xj12,√2 xj1xj2 ,xj2

2 ,√2xj1 ,√2xj2]

K(xi,xj)== φ(xi) Tφ(xj), where φ(x) = [1,x1

2,√2 x1x2 ,x22,√2x1 ,√2x2]

wTφ(x)=constant

Φ: x → φ(x)

Page 20: Chapter 5: Microarray Techniques

20

39

The Kernel “Trick” Consider the classifier: f(x)=Σα(t)y(t)K(x(t),x)+b and

training algorithm: y(i)[Σa(t)y(t)K(x(t),x(i))+b]<0 then α(i)α(i)+η

An SVM classifier may be computed from the kernel aloneNo need to know the underlying mapping φ(x)

We just need to know that the kernel is appropriateK(u,v)=<φ(u),φ(v)> for some φ

Mercer: any symmetric positive definite matrix is a kernel

40

Example: SVM Classification of MicroarraysG

enes

Tests

( )!!

!=

iiii

ii

YYXX

YXYXK rrrr

rrrr,

Kernel measures test similarity

Kernel provides a measure of similarity

Page 21: Chapter 5: Microarray Techniques

21

41

Applying SVMRepresent the biological question as a classification problem

Represent the as vectors

Establish a kernel matrix to represent similarity

Train an SVM classifier

Evaluate performance of classifier

Classifier: f(x)=Σα(t)y(t)K(x(t),x)+b

Training: y(i)[Σa(t)y(t)K(x(t),x(i))+b]<0 then α(i)α(i)+η

42

Cancer ClassificationWith SVMA. Zhang, DIMACS, 2007

Page 22: Chapter 5: Microarray Techniques

22

43

Cancer Classification StudyA. Zhang, DIMACS, 2007

Microarrays provide a small sample of high-dimensional data Key challenge: overfitting

Comparative study of classifiers over Microarray DBs Use SVM with improved kernel (max discrimination) Compare with kNN and linear discriminant classifiers

http://dimacs.rutgers.edu/Workshops/MLTechniques/slides/z...

44

Cancer ClassificationLung

ALL-AML

Prostate

Page 23: Chapter 5: Microarray Techniques

23

45

But….www.sci.usq.edu.au/research/seminars/files//seminar135/ausdm1.ppt

Decision-trees Boosting

46

Cancer Studies

A. Statnikov, C. F. Aliferis,I. Tsamardinos.

Vanderbilt University, MEDINFO 2004

Page 24: Chapter 5: Microarray Techniques

24

47

Total:• ~1300 samples• 74 diagnostic

categories• 41 cancer types and 12 normal tissue types

Sam- ples

Variables (genes)

Cate- gories

11_Tumors 174 12533 11 Su, 2001

14_Tumors 308 15009 26 Ramaswamy, 2001

9_Tumors 60 5726 9 Staunton, 2001

Brain_Tumor1 90 5920 5 Pomeroy, 2002

Brain_Tumor2 50 10367 4 Nutt, 2003

Leukemia1 72 5327 3 Golub, 1999

Leukemia2 72 11225 3 Armstrong, 2002

Lung_Cancer 203 12600 5 Bhattacherjee, 2001

SRBCT 83 2308 4 Khan, 2001

Prostate_Tumor 102 10509 2 Singh, 2002

DLBCL 77 5469 2 Shipp, 2002

Dataset nameNumber of

Reference

Microarray Datasets

48

Classifiers

• K-Nearest Neighbors (KNN)• Backpropagation Neural Networks (NN)• Probabilistic Neural Networks (PNN)• Multi-Class SVM: One-Versus-Rest (OVR)• Multi-Class SVM: One-Versus-One (OVO)• Multi-Class SVM: DAGSVM• Multi-Class SVM by Weston & Watkins (WW)• Multi-Class SVM by Crammer & Singer (CS)• Weighted Voting: One-Versus-Rest• Weighted Voting: One-Versus-One• Decision Trees: CART

kernel-based

neural networks

voting

decision trees

instance-based

Page 25: Chapter 5: Microarray Techniques

25

49

9_Tum

ors

14_T

umors

Bra

in_T

umor2

Bra

in_T

umor1

11

_Tum

orsLeu

kem

ia1

Leuke

mia

2 Lung_C

ance

r SRBCT

Prost

ate_

Tumor

DLBCL

OVROVODAGSVMWWCSKNNNNPNN

MC-

SVM

0

20

40

60

80

100

Accu

racy

, %

Without Gene Selection

50

1. Signal-to-noise (S2N) ratio inone-versus-rest (OVR)fashion;

2. Signal-to-noise (S2N) ratio inone-versus-one (OVO)fashion;

3. Kruskal-Wallisnonparametric one-wayANOVA (KW);

4. Ratio of genes between-categories to within-categorysum of squares (BW).

genes

Uninformative genesHighly discriminatory genes

Gene Selection

Page 26: Chapter 5: Microarray Techniques

26

51

OVR OVO DAGSVM WW CS KNN NN PNN

10

20

30

40

50

60

70

Impr

ovem

ent i

n ac

cura

cy, %

SVM non-SVM

Improvement of diagnosticperformance by gene selection(averages for the four datasets)

Average reduction of genes is 10-30 times

Diagnostic performancebefore and after gene selection

SVM non-SVM SVM non-SVM

20

40

60

80

100

20

40

60

80

100

Accu

racy

, %

9_Tumors 14_Tumors

Brain_Tumor1 Brain_Tumor2

With Gene Selction

52

Protein Classification

(Based on W. S. NobleU. Washington)

Page 27: Chapter 5: Microarray Techniques

27

53

Classifying Transmembrane proteins

mRNA expression data

protein-protein interaction data

sequence data

Challenge: build a classification model for diverse data

54

Key Idea Represent classification data in terms of SVM kernels Combine kernels to best apply all discriminating data

Protein

SV1 SV2 SV3 SVn...

sgn(Σ)λ1

λ2 λ3 λn

Page 28: Chapter 5: Microarray Techniques

28

55

>ICYA_MANSEGDIFYPGYCPDVKPVNDFDLSAFAGAWHEIAKLPLENENQGKCTIAEYKYDGKKASVYNSFVSNGVKEYMEGDLEIAPDAKYTKQGKYVMTFKFGQRVVNLVPWVLATDYKNYAINYNCDYHPDKKAHSIHAWILSKSKVLEGNTKEVVDNVLKTFSHLIDASKFISNDFSEAACQYSTTYSLTGPDRH

>LACB_BOVINMKCLLLALALTCGAQALIVTQTMKGLDIQKVAGTWYSLAMAASDISLLDAQSAPLRVYVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTKIPAVFKIDALNENKVLVLDTDYKKYLLFCMENSAEPEQSLACQCLVRTPEVDDEALEKFDKALKALPMHIRLSFNPTQLEEQCHI

Classification Based on Sequence

How do we model strings-similarity in terms of a kernel matrix? Define sequence kernel in terms of similarity score

56

Pairwise comparison kernel

Page 29: Chapter 5: Microarray Techniques

29

57

Classification By Interaction Profile How do we build classification model from protein

interactions graph? Interaction Kernel: # of common neighbors

1 0 0 1 0 1 0 11 0 1 0 1 1 0 10 0 0 0 1 1 0 00 0 1 0 1 1 0 10 0 1 0 1 0 0 11 0 0 0 0 0 0 10 0 1 0 1 0 0 0

protein

protein1 0 0 1 0 1 0 11 0 1 0 1 1 0 10 0 0 0 1 1 0 00 0 1 0 1 1 0 10 0 1 0 1 0 0 11 0 0 0 0 0 0 10 0 1 0 1 0 0 0

3Interaction Kernel

Incidence matrix

58

Diffusion kernel General metric of similarity between graph nodes Based upon a random walk Kernel ~average time for random walk starting at x to first visit y

(# paths connecting two nodes, weighted by path lengths)

Page 30: Chapter 5: Microarray Techniques

30

59

Hydrophobicity Kernel

Transmembrane regions are typically hydrophobicThe hydrophobicity profile is conservedRepresent data in terms of a kernel

60

Protein

K1

...

sgn(Σ)λ1 λ2 λ3 λn

K2 K3 Kn

radial basis kernelgene expressionKE

diffusion kernelprotein interactionsKD

linear kernelprotein interactionsKLI

FFThydropathy profileKFFT

Pfam HMMprotein sequenceKHMM

BLASTprotein sequenceKB

Smith-Watermanprotein sequenceKSW

Similarity measureDataKernel

Combining Kernel MachinesGiven kernels Ki(u,v)= <φi(u),φi(v)> Define a combined kernel K(u,v)=ΣλiKi(u,v) (1=Σλi)

Corresponds to the mapping φ(u)=(φ1(u), φ2(u),…φn(u))And weighted inner product <φ(u),φ(v)>=Σλi<φi(u),φi(v)>

Extend training to combined SVM

Page 31: Chapter 5: Microarray Techniques

31

61

Membrane Proteins

Simple rules fromhydrophobicity profile

TMHMM

Combined SVM

%TP at 1%FP

62

Cytoplasmatic Ribosomal Proteins

What Can Errors Teach ?

Page 32: Chapter 5: Microarray Techniques

32

63

SVM Final NotesKernel machines provide powerful classifiers

Kernels admit flexible modeling of similaritySimple and general training procedureMultiple classifiers may be combined to improve results…

But..Choosing a good kernel is an art Training results may be sensitive to training sample

Other classification ideasBoostingDecision trees….

64

Conclusions

Page 33: Chapter 5: Microarray Techniques

33

65

Microarray AnalysisMicroarrays provide rich information on gene expression

Identify variance between cell behaviors Determine co-expression patterns of genes Analyze temporal behavior of genome …

Low-level analysis improves data quality Normalization, noise reduction…

High-level analysis improves data interpretation Study correlations of gene expressions Clustering determines similarity PCA analyzes variance Classifiers analyze features