42
Tutorial 8 Clustering 1

Tutorial 8

Embed Size (px)

DESCRIPTION

Tutorial 8. Clustering. Clustering. General Methods Unsupervised Clustering Hierarchical clustering K-means clustering Expression data GEO UCSC ArrayExpress Tools EPCLUST Mev. Microarray - Reminder. Expression Data Matrix. - PowerPoint PPT Presentation

Citation preview

Page 1: Tutorial 8

1

Tutorial 8

Clustering

Page 2: Tutorial 8

2

Clustering• General Methods

– Unsupervised Clustering• Hierarchical clustering• K-means clustering

• Expression data– GEO– UCSC– ArrayExpress

• Tools– EPCLUST– Mev

Page 3: Tutorial 8

3

Microarray - Reminder

Page 4: Tutorial 8

4

Expression Data Matrix

• Each column represents all the gene expression levels from a single experiment.

• Each row represents the expression of a gene across all experiments.

Exp1 Exp 2 Exp3 Exp4 Exp5 Exp6

Gene 1 -1.2 -2.1 -3 -1.5 1.8 2.9

Gene 2 2.7 0.2 -1.1 1.6 -2.2 -1.7

Gene 3 -2.5 1.5 -0.1 -1.1 -1 0.1

Gene 4 2.9 2.6 2.5 -2.3 -0.1 -2.3

Gene 5 0.1 2.6 2.2 2.7 -2.1

Gene 6 -2.9 -1.9 -2.4 -0.1 -1.9 2.9

Page 5: Tutorial 8

5

Expression Data Matrix

Each element is a log ratio: log2 (T/R). T - the gene expression level in the testing sample

R - the gene expression level in the reference sample

Exp1 Exp 2 Exp3 Exp4 Exp5 Exp6

Gene 1 -1.2 -2.1 -3 -1.5 1.8 2.9

Gene 2 2.7 0.2 -1.1 1.6 -2.2 -1.7

Gene 3 -2.5 1.5 -0.1 -1.1 -1 0.1

Gene 4 2.9 2.6 2.5 -2.3 -0.1 -2.3

Gene 5 0.1 2.6 2.2 2.7 -2.1

Gene 6 -2.9 -1.9 -2.4 -0.1 -1.9 2.9

Page 6: Tutorial 8

6

Microarray Data Matrix

Black indicates a log ratio of zero, i.e.

T=~R

Green indicates a negative log ratio,

i.e. T<R

Red indicates a positive log ratio, i.e. T>R

Grey indicates missing data

Page 7: Tutorial 8

7-4

-3

-2

-1

0

1

2

3

4

1 2 3 4 5 6

Exp

Log

ratio

Exp

Log

ratio

Microarray Data:Different representations

T<R

T>R

Page 8: Tutorial 8

8

A real example

~500 genes3 knockdown conditions

To complicate to analyze without “help”

Page 9: Tutorial 8

9

Microarray Data:Clusters

Page 10: Tutorial 8

10

How to determine the similarity between two genes? (for clustering)

Patrik D'haeseleer, How does gene expression clustering work?, Nature Biotechnology 23, 1499 - 1501 (2005) , http://www.nature.com/nbt/journal/v23/n12/full/nbt1205-1499.html

Page 11: Tutorial 8

11

Unsupervised Clustering

Hierarchical Clustering

Page 12: Tutorial 8

12

genes with similar expression patterns are grouped together and are connected by a series of branches (dendrogram).

16

352 4

16

35 2 4

Leaves (shapes in our case) represent genes and the length of the paths between leaves represents the distances between genes.

Hierarchical Clustering

Page 13: Tutorial 8

13

If we want a certain number of clusters we need to cut the tree at a level indicates that number (in this case - four).

Hierarchical clustering finds an entire hierarchy of clusters.

Page 14: Tutorial 8

14

Hierarchical clustering result

Five clusters

Page 15: Tutorial 8

15

An algorithm to classify the data into K number of groups.

K=4

K-means Clustering

Page 16: Tutorial 8

16

How does it work?

The algorithm divides iteratively the genes into K groups and calculates the center of each group. The results are the optimal groups (center distances) for K clusters.

1 2 3 4

k initial "means" (in this casek=3) are randomly selected from the data set (shown in color).

k clusters are created by associating every observation with the nearest mean

The centroid of each of the k clusters becomes the new means.

Steps 2 and 3 are repeated until convergence has been reached.

Page 17: Tutorial 8

17

Different types of clustering – different results

Page 18: Tutorial 8

18

How to search for expression profiles

• GEO (Gene Expression Omnibus)http://www.ncbi.nlm.nih.gov/geo/

• Human genome browserhttp://genome.ucsc.edu/

• ArrayExpresshttp://www.ebi.ac.uk/arrayexpress/

Page 19: Tutorial 8

19

Page 20: Tutorial 8

20

Datasets - suitable for analysis with GEO tools

Expression profiles by gene

Microarray experiments

Probe sets

Groups of related microarray experiments

Searching for expression profiles in the GEO

Page 21: Tutorial 8

21

Download dataset

Clustering

Statistic analysis

Page 22: Tutorial 8

22

Clustering analysis

Page 23: Tutorial 8

23

Download dataset

Clustering

Statistic analysis

Page 24: Tutorial 8

24

The expression distribution for different lines in the cluster

Page 25: Tutorial 8

25

Page 26: Tutorial 8

26

Searching for expression profiles in the Human Genome browser.

Page 27: Tutorial 8

27

Keratine 10 is highly expressed

in skin

Page 28: Tutorial 8

28

http://www.ebi.ac.uk/arrayexpress/

ArrayExpress

Page 29: Tutorial 8

29

Page 30: Tutorial 8

30

What can we do with all the expression profiles?

Clusters!

How?

EPCLUSThttp://www.bioinf.ebc.ee/EP/EP/EPCLUST/

Page 31: Tutorial 8

31

Page 32: Tutorial 8

32

Page 33: Tutorial 8

33

Page 34: Tutorial 8

34

Page 35: Tutorial 8

35

Page 36: Tutorial 8

36

Page 37: Tutorial 8

37

Edit the input matrix: Transpose,Normalize,Randomize

Hierarchical clustering

K-means clustering

In the input matrix each column should represents a gene and each row should represent an experiment (or individual).

Page 38: Tutorial 8

38

Clusters

Data

Page 39: Tutorial 8

39

Edit the input matrix: Transpose,Normalize,Randomize

Hierarchical clustering

K-means clustering

In the input matrix each column should represents a gene and each row should represent an experiment (or individual).

Page 40: Tutorial 8

40

Graphical representation of the

cluster

Graphical representation of the

cluster

Samples found in cluster

Page 41: Tutorial 8

41

10 clusters, as requested

Page 42: Tutorial 8

42

http://www.tm4.org/mev/

Multi experiment viewer