57
Tutorial 7 Gene expression analysis 1

Tutorial 7

Embed Size (px)

DESCRIPTION

Tutorial 7. Gene expression analysis. Gene expression analysis. Expression data GEO UCSC ArrayExpress General clustering methods Unsupervised Clustering Hierarchical clustering K-means clustering Tools for clustering EPCLUST Mev Functional analysis Go annotation. - PowerPoint PPT Presentation

Citation preview

Page 1: Tutorial 7

Tutorial 7

Gene expression analysis

1

Page 2: Tutorial 7

Gene expression analysis• Expression data

– GEO– UCSC– ArrayExpress

• General clustering methods– Unsupervised Clustering

• Hierarchical clustering• K-means clustering

• Tools for clustering– EPCLUST– Mev

• Functional analysis– Go annotation

2

Page 3: Tutorial 7

Gene expression data sources

3

Microarrays RNA-seq experiments

Page 4: Tutorial 7

Expression Data Matrix

• Each column represents all the gene expression levels from a single experiment.

• Each row represents the expression of a gene across all experiments.

Exp1 Exp 2 Exp3 Exp4 Exp5 Exp6

Gene 1 -1.2 -2.1 -3 -1.5 1.8 2.9

Gene 2 2.7 0.2 -1.1 1.6 -2.2 -1.7

Gene 3 -2.5 1.5 -0.1 -1.1 -1 0.1

Gene 4 2.9 2.6 2.5 -2.3 -0.1 -2.3

Gene 5 0.1 2.6 2.2 2.7 -2.1

Gene 6 -2.9 -1.9 -2.4 -0.1 -1.9 2.9

4

Page 5: Tutorial 7

Expression Data Matrix

Each element is a log ratio: log2 (T/R). T - the gene expression level in the testing sample

R - the gene expression level in the reference sample

Exp1 Exp 2 Exp3 Exp4 Exp5 Exp6

Gene 1 -1.2 -2.1 -3 -1.5 1.8 2.9

Gene 2 2.7 0.2 -1.1 1.6 -2.2 -1.7

Gene 3 -2.5 1.5 -0.1 -1.1 -1 0.1

Gene 4 2.9 2.6 2.5 -2.3 -0.1 -2.3

Gene 5 0.1 2.6 2.2 2.7 -2.1

Gene 6 -2.9 -1.9 -2.4 -0.1 -1.9 2.9

5

Page 6: Tutorial 7

Expression Data Matrix

Black indicates a log ratio of zero, i.e.

T=~R

Green indicates a negative log ratio,

i.e. T<R

Red indicates a positive log ratio, i.e. T>R

Grey indicates missing data

6

Page 7: Tutorial 7

Exp

Log

ratio

Exp

Log

ratio

Microarray Data:Different representations

T<R

T>R

7

Page 8: Tutorial 7

8

How to search for expression profiles

• GEO (Gene Expression Omnibus)http://www.ncbi.nlm.nih.gov/geo/

• Human genome browserhttp://genome.ucsc.edu/

• ArrayExpresshttp://www.ebi.ac.uk/arrayexpress/

Page 9: Tutorial 7

9

Page 10: Tutorial 7

Datasets - suitable for analysis with GEO tools

Expression profiles by gene

Microarray experiments

Probe sets

Groups of related microarray experiments

10

Searching for expression profiles in the GEO

Page 11: Tutorial 7

Download dataset

Clustering

Statistic analysis

11

Page 12: Tutorial 7

Clustering analysis

12

Page 13: Tutorial 7

Download dataset

Clustering

Statistic analysis

13

Page 14: Tutorial 7

14

The expression distribution for different lines in the cluster

Page 15: Tutorial 7

Searching for expression profiles in the Human Genome browser.

15

Page 16: Tutorial 7

Keratine 10 is highly expressed

in skin

16

Page 17: Tutorial 7

17

http://www.ebi.ac.uk/arrayexpress/

ArrayExpress

Page 18: Tutorial 7

18

Page 19: Tutorial 7

19

Page 20: Tutorial 7

20

Page 21: Tutorial 7

21

Page 22: Tutorial 7

22

How to analyze gene expression data

Page 23: Tutorial 7

Unsupervised Clustering - Hierarchical Clustering

23

Page 24: Tutorial 7

genes with similar expression patterns are grouped together and are connected by a series of branches (dendrogram).

16

352 4

16

35 2 4

24

Leaves (shapes in our case) represent genes and the length of the paths between leaves represents the distances between genes.

Hierarchical Clustering

Page 25: Tutorial 7

How to determine the similarity between two genes? (for clustering)

Patrik D'haeseleer, How does gene expression clustering work?, Nature Biotechnology 23, 1499 - 1501 (2005) , http://www.nature.com/nbt/journal/v23/n12/full/nbt1205-1499.html

25

Page 26: Tutorial 7

26

If we want a certain number of clusters we need to cut the tree at a level indicates that number (in this case - four).

Hierarchical clustering finds an entire hierarchy of clusters.

Page 27: Tutorial 7

Hierarchical clustering result

27Five clusters

Page 28: Tutorial 7

An algorithm to classify the data into K number of groups.

28

K=4

Unsupervised Clustering – K-means clustering

Page 29: Tutorial 7

How does it work?

29

The algorithm divides iteratively the genes into K groups and calculates the center of each group. The results are the optimal groups (center distances) for K clusters.

1 2 3 4

k initial "means" (in this casek=3) are randomly selected from the data set (shown in color).

k clusters are created by associating every observation with the nearest mean

The centroid of each of the k clusters becomes the new means.

Steps 2 and 3 are repeated until convergence has been reached.

Page 30: Tutorial 7

30

How should we determine K?

•Trial and error•Take K as square root of gene number

Page 31: Tutorial 7

31

http://www.bioinf.ebc.ee/EP/EP/EPCLUST/

Tools for clustering - EPclust

Page 32: Tutorial 7

32

Page 33: Tutorial 7

33

Page 34: Tutorial 7

34

Page 35: Tutorial 7

35

Page 36: Tutorial 7

36

Page 37: Tutorial 7

37

Page 38: Tutorial 7

Edit the input matrix: Transpose,Normalize,Randomize

38

Hierarchical clustering

K-means clustering

In the input matrix each column should represents a gene and each row should represent an experiment (or individual).

Page 39: Tutorial 7

39

Hierarchical clustering

In the input matrix each column should represents a gene and each row should represent an experiment (or individual).

Page 40: Tutorial 7

40

Clusters

Data

Page 41: Tutorial 7

41

K-means clustering

In the input matrix each column should represents a gene and each row should represent an experiment (or individual).

Page 42: Tutorial 7

Graphical representation of the

cluster

Graphical representation of the

cluster

Samples found in cluster

42

Page 43: Tutorial 7

10 clusters, as requested

43

Page 44: Tutorial 7

44

http://www.tm4.org/mev/

Tools for clustering - MeV

Page 45: Tutorial 7

45

1007_s_at1053_at117_at121_at1255_g_at1294_at1316_at1320_at1405_i_at1431_at1438_at1487_at1494_f_at1598_g_at

What can we learn from clusters?

Gene expression function analysis

Page 46: Tutorial 7

Gene Ontology (GO)http://www.geneontology.org/

The Gene Ontology project provides an ontology of defined terms representing gene product properties. The ontology covers three domains:

Page 47: Tutorial 7

47

• Cellular Component (CC) - the parts of a cell or its extracellular environment.• Molecular Function (MF) - the elemental activities of a gene product at the molecular level, such as binding or catalysis.• Biological Process (BP) - operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms.

Gene Ontology (GO)

Page 48: Tutorial 7

The GO tree

Page 49: Tutorial 7

GO sources

ISS Inferred from Sequence/Structural SimilarityIDA Inferred from Direct AssayIPI Inferred from Physical InteractionTAS Traceable Author StatementNAS Non-traceable Author StatementIMP Inferred from Mutant PhenotypeIGI Inferred from Genetic InteractionIEP Inferred from Expression PatternIC Inferred by CuratorND No Data availableIEA Inferred from electronic annotation

Page 50: Tutorial 7

Search by AmiGO

Page 51: Tutorial 7

Results for alpha-synuclein

Page 52: Tutorial 7

DAVID

Functional Annotation Bioinformatics Microarray Analysis

 

• Identify enriched biological themes, particularly GO terms• Discover enriched functional-related gene/protein groups• Cluster redundant annotation terms• Explore gene names in batch

http://david.abcc.ncifcrf.gov/

Page 53: Tutorial 7

ID conversion

annotation

classification

Page 54: Tutorial 7

Functional annotationUpload

Annotation options

Page 55: Tutorial 7
Page 56: Tutorial 7

56

Page 57: Tutorial 7

Gene expression analysis• Expression data

– GEO– UCSC– ArrayExpress

• General clustering methods– Unsupervised Clustering

• Hierarchical clustering• K-means clustering

• Tools for clustering– EPCLUST– Mev

• Functional analysis– Go annotation

57