15
Lab 5 Unsupervised and supervised clustering Feb 22 th 2012 Daniel Fernandez Alejandro Quiroz

Lab 5 Unsupervised and supervised clustering

  • Upload
    jalen

  • View
    53

  • Download
    0

Embed Size (px)

DESCRIPTION

Lab 5 Unsupervised and supervised clustering. Feb 22 th 2012 Daniel Fernandez Alejandro Quiroz. Outline. Unsupervised Hierarchical clustering Principal component analysis Supervised LIMMA package Linear models for microarray data. Before any high level analysis…. - PowerPoint PPT Presentation

Citation preview

Page 1: Lab 5 Unsupervised and supervised clustering

Lab 5Unsupervised and supervised

clustering

Feb 22th 2012Daniel FernandezAlejandro Quiroz

Page 2: Lab 5 Unsupervised and supervised clustering

Outline

• Unsupervised– Hierarchical clustering– Principal component analysis

• Supervised– LIMMA package

• Linear models for microarray data

Page 3: Lab 5 Unsupervised and supervised clustering

Before any high level analysis….

• Download the data set used in lab 4– Go to and download GSE10940

• Load the .CEL files and use the custom CDF file annotation used in lab 4: “drosophila2dmrefseqcdf”

• Perform RMA normalization and obtain in a matrix the expression intensities– Obtain the genes that are up and down expressed with a

fold change of 2.• Store the gene ides in: X.top

Page 4: Lab 5 Unsupervised and supervised clustering

The data set• Secretory and transmembrane proteins traverse the

endoplasmic reticulum (ER) and Golgi compartments for final maturation prior to reaching their functional destinations.

• Members of the p24 protein family function in trafficking some secretory proteins in yeast and higher eukaryotes.

• Yeast p24 mutants have minor secretory defects and induce an ER stress response that likely results from accumulation of proteins in the ER due to disrupted trafficking.

• Test the hypothesis that loss of Drosophila melanogaster p24 protein function causes a transcriptional response characteristic of ER stress activation.

Page 5: Lab 5 Unsupervised and supervised clustering
Page 6: Lab 5 Unsupervised and supervised clustering

Supervised MethodLIMMA

• Linear Models for MicroArray data– A package for differential expression analysis from

microarray data. – Makes use of linear models to describe the

expression of each gene. – Uses empirical Bayes and other shrinkage methods to

borrow information across genes making the analyses stable even for experiments with small number of arrays.

Page 7: Lab 5 Unsupervised and supervised clustering

• LIMMA uses linear models to analyze microarray data.– The approach requires the definition of 2 matrices

• Design matrix– Provides the representation on how the different factors are

distributed in the data– It is assumed a linear model – Where yj contains the expression for gene j– The estimates of αj are provided by lmFit()

• Contrast matrix– Allows the definition of the comparison between factors of

interest– If the parameters are of interest

» C is the contrast matrix– These parameters are estimated by contrast.fit()

Page 8: Lab 5 Unsupervised and supervised clustering

• Given the large number of linear models fits arising from a microarray there is a pressing need to take advantage of the parallel structure whereby the same model is fitted to each gene

• Using a hierarchical framework, a moderate t-statistic is computed– Standard errors are shrunk towards a common

value using a Bayesian model• This borrows information for the inference of individual

genes• The degrees of freedom are increased

– Reflexes the greater reliability to the smoothed standard errors

Page 9: Lab 5 Unsupervised and supervised clustering

Unsupervised MethodHierarchical clustering

• Hierarchical clustering– First, need to calculate all the pair wise distances

• D=dist(t(X.top))– Finally, perform the hierarchical clustering

• H1=hclust(D,method=“single”)• H2=hclust(D,method=“complete”)• H3=hclust(D,method=“average”)• plot(Hi)

• Is there something odd from the clustering?

Page 10: Lab 5 Unsupervised and supervised clustering

Unsupervised MethodMDS

• Multidimensional scaling (MDS) is a set of related statistical techniques to explore similarities in data*.

• *Wikipedia.

Page 11: Lab 5 Unsupervised and supervised clustering

Unsupervised Method Principal component

• In R, the function prcomp performs principal component analysis

• In our context, the idea is to visualize the impact of possible dimension reduction in GENES– Important: Remember that in prcomp, the genes

have to be columns and the samples rows.

Page 12: Lab 5 Unsupervised and supervised clustering
Page 13: Lab 5 Unsupervised and supervised clustering
Page 14: Lab 5 Unsupervised and supervised clustering
Page 15: Lab 5 Unsupervised and supervised clustering