View
218
Download
2
Tags:
Embed Size (px)
Citation preview
A robust neural networks approach for spatial and intensity-dependent normalization of cDNA microarray
data
A.L. Tarca, J.E.K. Cooke and J. MacKay
Presented by Dana Mohamed
Microarrays
Importance of Microarrays (and that the data is correct)
• Assumption that microarray data linearly reflects amount of mRNA present in cell– In turn, reflects gene expression levels
• If the data is incorrect,– So is our interpretation of gene expression
• And therefore all the science built on that interpretation is also incorrect
Where error is • Intensity of Fluorescence– Overall imbalance of dye intensity• 2 dyes: Cy5 (R) and Cy3 (G)• If R & G expressed at equal levels, R/G = 1
• Space– Intensities variable on coordinates• Can be “dirty” on sides of microarray
Previous Methods• Many address intensity bias
• Few address spatial bias
• Most rely on M* = M – m–M* is the normalized values
–M is the raw log-ratio (M = log2R/G)
–m is the estimate of the bias
Important Variables• M = log2(R/G)
– Log ratio converts multiplicative error to additive error
• A = (1/2)0.5log2RG
– Average of the log-intensities
• Minus-add plots–M vs. A– Useful for assessing systematic bias
Calculating m in other methods• gMed – global median normalization
– m = median(Mi)– Mi are all the values of M
• pLo – print tip loess– m = ci (A)
• pLoGS – found in GeneSight biodiscovery.com
– Local group median (3x3 square regions) + print tip loess
• cPLo2D - print tip loess + pure 2D normalization– BioConductor bioconductor.org
– m = α ci (A) + β ci (SpotRow,SpotCol)– ci (SpotRow,SpotCol) is the loess estimate of M using spot row and
column coordinates inside the ith print tip
• gLoMedF – global loess normalization + spatial median filter
Robust Neural Networks Technique
pNN2DA – print tip robust neural nets 2D and A
– Attempt to find the best fit of M using A and the 2-D space coordinates of the spots:
m = ci (A,X,Y)
• Instead of using individual print tips – use 3x3 “bins” of them – X and Y – Accounts for spatial bias
Neural Nets Terminology• Uses multi-layer feedforward network
• Sigmoid Function
Neural Networks• Uses multi-layer feedforward network
• x is the vector (X,Y,A,1),• I = 3,• w are the weights, • sigma one represents the hidden neurons and
they are sigmoid functions, • sigma two is the single neuron in the output layer,
which is also sigmoid, • Sigma one J+1 accounts for the second layer
bias, • J represents the number of neurons in the hidden
layer of the network
Multi-layered FeedforwardUsually, J = 3 to take care of outliers but also so as to avoid over-fitting
Criteria & DatasetsCriteria:
a) reduce variability of log-ratios between replicated slides and within slides
b) ability to distinguish truly regulated genes from the other genes
Datasets:
1) Apo AI: a,b
2) Swirl Zebra Fish: a
3) Poplar experiment: a
4) Perturbed Apo AI: b
Classic Neural Nets vs. Robust NNets
Criteria refresher
• The ability to reduce the variability of log-ratios between replicated slides and within slides
• The ability to distinguish truly regulated genes from the other genes
Impact on Variability
Cont. – 3 Data Sets
Downregulated Gene Sorting – Apo AI set
DRGS – Perturbed Apo AI set
Spatial Uniformity of M values distribution
Results Table
Strengths/Weaknesses• Seems promising
• Uses multiple tests to determine efficacy
• Doesn’t use enough datasets
• Uses patterned perturbed dataset– But no “real” perturbed dataset
Future Work• More datasets
• When should this normalization technique be used over other techniques?
• Should this technique be combined with elements of other techniques to further improve it?
References• Tarca, A.L., J.E.K. Cooke, and J. Mackay.
“A robust neural networks approach for spatial and intensity-dependent normalization of cDNA microarray data." Bioinformatics Jun 2005; 21: 2674 - 2683
• Haykin, Simon. Neural Networks: A Comprehensive Foundation. New Jersey: Prentice Hall, 1999.
• Mount, David W. Bioinformatics: sequence and genome analysis. New York: Cold Spring Harbor Laboratory Press, 2001.