44
A Dendrogram 3/15/05 1 Bioinformatics (Lec 17)

Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

A Dendrogram

3/15/05 1Bioinformatics (Lec 17)

Page 2: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

Hierarchical Clustering [Johnson, SC, 1967]• Given n points in Rd, compute the distance

between every pair of points• While (not done)

– Pick closest pair of points si and sj and make them part of the same cluster.

– Replace the pair by an average of the two sij

Try the applet at:http://www.cs.mcgill.ca/~papou/#applet

3/15/05 2Bioinformatics (Lec 17)

Page 3: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

Distance Metrics• For clustering, define a distance function:

– Euclidean distance metrics

– Pearson correlation coefficient

k=2: Euclidean Distancekd

i

kiik YXYXD

/1

1)(),( ⎥⎦

⎤⎢⎣

⎡−= ∑

=

⎟⎟⎠

⎞⎜⎜⎝

⎛ −⎟⎟⎠

⎞⎜⎜⎝

⎛ −= ∑

= y

i

x

id

ixy

YYXXd σσ

ρ1

1-1 ≤ ρxy ≥ 1

3/15/05 3Bioinformatics (Lec 17)

Page 4: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

Start

End

3/15/05 4Bioinformatics (Lec 17)

Page 5: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

K-Means Clustering [McQueen ’67]Repeat– Start with randomly chosen cluster centers– Assign points to give greatest increase in score– Recompute cluster centers– Reassign pointsuntil (no changes)

Try the applet at: http://www.cs.mcgill.ca/~bonnef/project.html

3/15/05 5Bioinformatics (Lec 17)

Page 6: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

Self-Organizing Maps [Kohonen]• Kind of neural network.• Clusters data and find complex relationships

between clusters.• Helps reduce the dimensionality of the data.• Map of 1 or 2 dimensions produced.• Unsupervised Clustering• Like K-Means, except for visualization

3/15/05 6Bioinformatics (Lec 17)

Page 7: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

SOM Algorithm• Select SOM architecture, and initialize

weight vectors and other parameters.• While (stopping condition not satisfied) do

for each input point x– winning node q has weight vector closest to x.– Update weight vector of q and its neighbors.– Reduce neighborhood size and learning rate.

3/15/05 7Bioinformatics (Lec 17)

Page 8: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

SOM Algorithm Details

• Distance between x and weight vector:• Winning node: • Weight update function (for neighbors):

• Learning rate:

iwx −

)]()()[,,()()1( kwkxixkkwkw iii −+=+ µ

ii

wxxq −= min)(

⎟⎟

⎜⎜

⎛ −−= 2

2)(

exp)(),,( 0

σηµ

xqi rrkixk

3/15/05 8Bioinformatics (Lec 17)

Page 9: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

3/15/05 9Bioinformatics (Lec 17)

World Poverty SOM

Page 10: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

3/15/05 10Bioinformatics (Lec 17)

World Poverty Map

Page 11: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

Neural Networks

ΣInput X

SynapticWeights W

ƒ(•)

Bias θ

Output y

3/15/05 11Bioinformatics (Lec 17)

Page 12: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

Learning NN

×

×

× Σ

Σ

Adaptive Algorithm

Input X

1

Weights W

−+

Desired Response

Error

3/15/05 12Bioinformatics (Lec 17)

Page 13: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

Types of NNs• Recurrent NN• Feed-forward NN• Layered

Other issues• Hidden layers possible• Different activation functions possible

3/15/05 13Bioinformatics (Lec 17)

Page 14: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

Application: Secondary Structure Prediction

3/15/05 14Bioinformatics (Lec 17)

Page 15: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

Support Vector Machines• Supervised Statistical Learning Method for:

– Classification– Regression

• Simplest Version:– Training: Present series of labeled examples

(e.g., gene expressions of tumor vs. normal cells)– Prediction: Predict labels of new examples.

3/15/05 15Bioinformatics (Lec 17)

Page 16: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

3/15/05 16Bioinformatics (Lec 17)

Learning Problems

AA

A

B

ABB

B

Page 17: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

SVM – Binary Classification• Partition feature space with a surface.• Surface is implied by a subset of the

training points (vectors) near it. These vectors are referred to as Support Vectors.

• Efficient with high-dimensional data. • Solid statistical theory• Subsume several other methods.

3/15/05 17Bioinformatics (Lec 17)

Page 18: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

Learning Problems• Binary Classification• Multi-class classification• Regression

3/15/05 18Bioinformatics (Lec 17)

Page 19: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

3/15/05 19Bioinformatics (Lec 17)

Page 20: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

3/15/05 20Bioinformatics (Lec 17)

Page 21: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

3/15/05 21Bioinformatics (Lec 17)

Page 22: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

SVM – General Principles• SVMs perform binary classification by

partitioning the feature space with a surface implied by a subset of the training points (vectors) near the separating surface. These vectors are referred to as Support Vectors.

• Efficient with high-dimensional data. • Solid statistical theory• Subsume several other methods.

3/15/05 22Bioinformatics (Lec 17)

Page 23: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

SVM Example (Radial Basis Function)

3/15/05 23Bioinformatics (Lec 17)

Page 24: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

SVM Ingredients• Support Vectors• Mapping from Input Space to Feature Space• Dot Product – Kernel function• Weights

3/15/05 24Bioinformatics (Lec 17)

Page 25: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

Classification of 2-D (Separable) data

3/15/05 25Bioinformatics (Lec 17)

Page 26: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

3/15/05 26Bioinformatics (Lec 17)

Classification of(Separable) 2-D data

Page 27: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

3/15/05 27Bioinformatics (Lec 17)

Classification of (Separable) 2-D data

+1

-1

•Margin of a point•Margin of a point set

Page 28: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

3/15/05 28Bioinformatics (Lec 17)

Classification using the Separator

x

Separatorw•x + b = 0

w•xi + b > 0

w•xj + b < 0

x

Page 29: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

3/15/05 29Bioinformatics (Lec 17)

Perceptron Algorithm (Primal)

Given separable training set S and learning rate η>0 w0 = 0; // Weightb0 = 0; // Biask = 0; R = max 7xi7repeat

for i = 1 to N if yi (wk•xi + bk) ≤ 0 then

wk+1 = wk + ηyixibk+1 = bk + ηyiR2

k = k + 1Until no mistakes made within loopReturn k, and (wk, bk) where k = # of mistakes

Rosenblatt, 1956

w = Σ aiyixi

Page 30: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

Theorem: If margin m of S is positive, then

i.e., the algorithm will always converge, and will converge quickly.

Performance for Separable Data

k ≤ (2R/m)2

3/15/05 30Bioinformatics (Lec 17)

Page 31: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

Perceptron Algorithm (Dual)Given a separable training set S a = 0; b0 = 0; R = max 7xi7repeat

for i = 1 to N if yi (Σaj yj xi•xj + b) ≤ 0 then

ai = ai + 1b = b + yiR2

endifUntil no mistakes made within loopReturn (a, b)

3/15/05 31Bioinformatics (Lec 17)

Page 32: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

Non-linear Separators

3/15/05 32Bioinformatics (Lec 17)

Page 33: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

Main idea: Map into feature space

3/15/05 33Bioinformatics (Lec 17)

Page 34: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

Non-linear Separators

X F

3/15/05 34Bioinformatics (Lec 17)

Page 35: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

Useful URLs• http://www.support-vector.net

3/15/05 35Bioinformatics (Lec 17)

Page 36: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

Perceptron Algorithm (Dual)Given a separable training set S a = 0; b0 = 0; R = max 7xi7repeat

for i = 1 to N if yi (Σaj yj k(xi ,xj) + b) ≤ 0 then

ai = ai + 1b = b + yiR2

Until no mistakes made within loopReturn (a, b)

k(xi ,xj) = Φ(xi)• Φ(xj)

3/15/05 36Bioinformatics (Lec 17)

Page 37: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

Different Kernel Functions• Polynomial kernel

• Radial Basis Kernel

• Sigmoid Kernel

dYXYX )(),( •=κ

⎟⎟

⎜⎜

⎛ −−= 2

2

2exp),(

σκ

YXYX

))(tanh(),( θωκ +•= YXYX

3/15/05 37Bioinformatics (Lec 17)

Page 38: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

SVM Ingredients• Support Vectors• Mapping from Input Space to Feature Space• Dot Product – Kernel function

3/15/05 38Bioinformatics (Lec 17)

Page 39: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

Generalizations• How to deal with more than 2 classes?

Idea: Associate weight and bias for each class.• How to deal with non-linear separator?

Idea: Support Vector Machines.• How to deal with linear regression?• How to deal with non-separable data?

3/15/05 39Bioinformatics (Lec 17)

Page 40: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

Applications• Text Categorization & Information Filtering

– 12,902 Reuters Stories, 118 categories (91% !!)• Image Recognition

– Face Detection, tumor anomalies, defective parts in assembly line, etc.

• Gene Expression Analysis• Protein Homology Detection

3/15/05 40Bioinformatics (Lec 17)

Page 41: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

3/15/05 41Bioinformatics (Lec 17)

Page 42: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

3/15/05 42Bioinformatics (Lec 17)

Page 43: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

3/15/05 43Bioinformatics (Lec 17)

Page 44: Bioinformatics (Lec 17) 3/15/05 1giri/teach/Bioinf/S05/Lecx7.pdf · Self-Organizing Maps [Kohonen] • Kind of neural network. • Clusters data and find complex relationships between

SVM Example (Radial Basis Function)

3/15/05 44Bioinformatics (Lec 17)