Upload
shana-logan
View
49
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Tutorial 7. Gene expression analysis. Gene expression analysis. Expression data GEO UCSC ArrayExpress General clustering methods Unsupervised Clustering Hierarchical clustering K-means clustering Tools for clustering EPCLUST Mev Functional analysis Go annotation. - PowerPoint PPT Presentation
Citation preview
Tutorial 7
Gene expression analysis
1
Gene expression analysis• Expression data
– GEO– UCSC– ArrayExpress
• General clustering methods– Unsupervised Clustering
• Hierarchical clustering• K-means clustering
• Tools for clustering– EPCLUST– Mev
• Functional analysis– Go annotation
2
Gene expression data sources
3
Microarrays RNA-seq experiments
Expression Data Matrix
• Each column represents all the gene expression levels from a single experiment.
• Each row represents the expression of a gene across all experiments.
Exp1 Exp 2 Exp3 Exp4 Exp5 Exp6
Gene 1 -1.2 -2.1 -3 -1.5 1.8 2.9
Gene 2 2.7 0.2 -1.1 1.6 -2.2 -1.7
Gene 3 -2.5 1.5 -0.1 -1.1 -1 0.1
Gene 4 2.9 2.6 2.5 -2.3 -0.1 -2.3
Gene 5 0.1 2.6 2.2 2.7 -2.1
Gene 6 -2.9 -1.9 -2.4 -0.1 -1.9 2.9
4
Expression Data Matrix
Each element is a log ratio: log2 (T/R). T - the gene expression level in the testing sample
R - the gene expression level in the reference sample
Exp1 Exp 2 Exp3 Exp4 Exp5 Exp6
Gene 1 -1.2 -2.1 -3 -1.5 1.8 2.9
Gene 2 2.7 0.2 -1.1 1.6 -2.2 -1.7
Gene 3 -2.5 1.5 -0.1 -1.1 -1 0.1
Gene 4 2.9 2.6 2.5 -2.3 -0.1 -2.3
Gene 5 0.1 2.6 2.2 2.7 -2.1
Gene 6 -2.9 -1.9 -2.4 -0.1 -1.9 2.9
5
Expression Data Matrix
Black indicates a log ratio of zero, i.e.
T=~R
Green indicates a negative log ratio,
i.e. T<R
Red indicates a positive log ratio, i.e. T>R
Grey indicates missing data
6
Exp
Log
ratio
Exp
Log
ratio
Microarray Data:Different representations
T<R
T>R
7
8
How to search for expression profiles
• GEO (Gene Expression Omnibus)http://www.ncbi.nlm.nih.gov/geo/
• Human genome browserhttp://genome.ucsc.edu/
• ArrayExpresshttp://www.ebi.ac.uk/arrayexpress/
9
Datasets - suitable for analysis with GEO tools
Expression profiles by gene
Microarray experiments
Probe sets
Groups of related microarray experiments
10
Searching for expression profiles in the GEO
Download dataset
Clustering
Statistic analysis
11
Clustering analysis
12
Download dataset
Clustering
Statistic analysis
13
14
The expression distribution for different lines in the cluster
Searching for expression profiles in the Human Genome browser.
15
Keratine 10 is highly expressed
in skin
16
17
http://www.ebi.ac.uk/arrayexpress/
ArrayExpress
18
19
20
21
22
How to analyze gene expression data
Unsupervised Clustering - Hierarchical Clustering
23
genes with similar expression patterns are grouped together and are connected by a series of branches (dendrogram).
16
352 4
16
35 2 4
24
Leaves (shapes in our case) represent genes and the length of the paths between leaves represents the distances between genes.
Hierarchical Clustering
How to determine the similarity between two genes? (for clustering)
Patrik D'haeseleer, How does gene expression clustering work?, Nature Biotechnology 23, 1499 - 1501 (2005) , http://www.nature.com/nbt/journal/v23/n12/full/nbt1205-1499.html
25
26
If we want a certain number of clusters we need to cut the tree at a level indicates that number (in this case - four).
Hierarchical clustering finds an entire hierarchy of clusters.
Hierarchical clustering result
27Five clusters
An algorithm to classify the data into K number of groups.
28
K=4
Unsupervised Clustering – K-means clustering
How does it work?
29
The algorithm divides iteratively the genes into K groups and calculates the center of each group. The results are the optimal groups (center distances) for K clusters.
1 2 3 4
k initial "means" (in this casek=3) are randomly selected from the data set (shown in color).
k clusters are created by associating every observation with the nearest mean
The centroid of each of the k clusters becomes the new means.
Steps 2 and 3 are repeated until convergence has been reached.
30
How should we determine K?
•Trial and error•Take K as square root of gene number
31
http://www.bioinf.ebc.ee/EP/EP/EPCLUST/
Tools for clustering - EPclust
32
33
34
35
36
37
Edit the input matrix: Transpose,Normalize,Randomize
38
Hierarchical clustering
K-means clustering
In the input matrix each column should represents a gene and each row should represent an experiment (or individual).
39
Hierarchical clustering
In the input matrix each column should represents a gene and each row should represent an experiment (or individual).
40
Clusters
Data
41
K-means clustering
In the input matrix each column should represents a gene and each row should represent an experiment (or individual).
Graphical representation of the
cluster
Graphical representation of the
cluster
Samples found in cluster
42
10 clusters, as requested
43
44
http://www.tm4.org/mev/
Tools for clustering - MeV
45
1007_s_at1053_at117_at121_at1255_g_at1294_at1316_at1320_at1405_i_at1431_at1438_at1487_at1494_f_at1598_g_at
What can we learn from clusters?
Gene expression function analysis
Gene Ontology (GO)http://www.geneontology.org/
The Gene Ontology project provides an ontology of defined terms representing gene product properties. The ontology covers three domains:
47
• Cellular Component (CC) - the parts of a cell or its extracellular environment.• Molecular Function (MF) - the elemental activities of a gene product at the molecular level, such as binding or catalysis.• Biological Process (BP) - operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms.
Gene Ontology (GO)
The GO tree
GO sources
ISS Inferred from Sequence/Structural SimilarityIDA Inferred from Direct AssayIPI Inferred from Physical InteractionTAS Traceable Author StatementNAS Non-traceable Author StatementIMP Inferred from Mutant PhenotypeIGI Inferred from Genetic InteractionIEP Inferred from Expression PatternIC Inferred by CuratorND No Data availableIEA Inferred from electronic annotation
Search by AmiGO
Results for alpha-synuclein
DAVID
Functional Annotation Bioinformatics Microarray Analysis
• Identify enriched biological themes, particularly GO terms• Discover enriched functional-related gene/protein groups• Cluster redundant annotation terms• Explore gene names in batch
http://david.abcc.ncifcrf.gov/
ID conversion
annotation
classification
Functional annotationUpload
Annotation options
56
Gene expression analysis• Expression data
– GEO– UCSC– ArrayExpress
• General clustering methods– Unsupervised Clustering
• Hierarchical clustering• K-means clustering
• Tools for clustering– EPCLUST– Mev
• Functional analysis– Go annotation
57