View
224
Download
0
Category
Tags:
Preview:
Citation preview
Genomic Signal Processing:Ensemble Dependence Model for
Classification and Prediction of Cancer Based on Gene Expression Data
Joseph DePasquale
Engineering Frontiers
26 Apr 07
Overview
• Motivation
• Background– Genes, Cancer, DNA Microarrays
• Ensemble Dependence Model– Basic structure– Inclusion in a classification system
• Results
• Conclusions
Motivation
• Estimated 1.4 million new cases of cancer– Roughly 550,000 will die from their disease
• In New Jersey 43,910 new cases – 17,720 deaths
• In 2005, NIH estimates that the overall cost for cancer → 210 billion dollars
Background
• What is cancer?– Uncontrolled division of damaged cells
• Apoptosis
– Risk increases with age
• Cause of unregulated cell growth
Background
• What is a gene?– Components– Functionality
• What is the importance of protein?– Essential to all living things– Participate in all functions within cells
• What is the significance of gene products?
DNA Microarrays
• Expression profiling– Represents the simultaneous activity of
thousands of individual genes
• Publicly available data– Complexity has led to a need for the
standardization of experimental setup• MIAME• MAQC
Taken from: http://en.wikipedia.org/wiki/DNA_microarray
Ensemble Dependence Model
• Genes with similar expression profiles are combined together into clusters– Expression profile of each cluster is the
average profile of all genes in that cluster
Taken from: P. Qui, Z. J. Wang, and K.J.R. Liu. “Genomic Processing for Cancer Classification and Prediction,” IEEE Signal Processing Magazine, vol. 24, no. 1, pp. 100-110, Jan. 2007.
Ensemble Dependence Model
4
3
2
1
4
3
2
1
434241
343231
242321
141312
4
3
2
1
*
0
0
0
0
n
n
n
n
x
x
x
x
aaa
aaa
aaa
aaa
x
x
x
x
NAXX
Ensemble Dependence Model
• Model-driven method– Feature selection
• Not all genes are relevant• T-test
– Gene clustering• Number of clusters• Gaussian mixture model
– Model learning/classification• Dependence matrices generated for two cases
Classification
• Maximum likelihood rule– Binary hypothesis-testing problem– Tests fit of unknown samples to each model
)(*)(5.0|)|)2log((5.0)|Pr( 11 CCC
TCCC
k MXAXVMXAXVHX
Normal Case:
Cancer Case:
)(*)(5.0|)|)2log((5.0)|Pr( 10 NNN
TNNN
k MXAXVMXAXVHX
EDM-Based Cancer Classification
Taken from: P. Qui, Z. J. Wang, and K.J.R. Liu. “Genomic Processing for Cancer Classification and Prediction,” IEEE Signal Processing Magazine, vol. 24, no. 1, pp. 100-110, Jan. 2007.
Results
Taken from: P. Qui, Z. J. Wang, and K.J.R. Liu. “Genomic Processing for Cancer Classification and Prediction,” IEEE Signal Processing Magazine, vol. 24, no. 1, pp. 100-110, Jan. 2007.
ResultsHere, 200 different subsets of gastric data are used to calculate 200
different dependence matrices, eigenvalues of these matrices are plotted
Taken from: P. Qui, Z. J. Wang, and K.J.R. Liu. “Genomic Processing for Cancer Classification and Prediction,” IEEE Signal Processing Magazine, vol. 24, no. 1, pp. 100-110, Jan. 2007.
NAXX
Results
Eigenvalues = {1, 1, 1, -3} NAXX
01
01
010
3
2
3
1
3
2
3
2
1
2
1
3
1
2
1
321
idealA
Results
Taken from: P. Qui, Z. J. Wang, and K.J.R. Liu. “Genomic Processing for Cancer Classification and Prediction,” IEEE Signal Processing Magazine, vol. 24, no. 1, pp. 100-110, Jan. 2007.
In Summary
Taken from: P. Qui, Z. J. Wang, and K.J.R. Liu. “Genomic Processing for Cancer Classification and Prediction,” IEEE Signal Processing Magazine, vol. 24, no. 1, pp. 100-110, Jan. 2007.
Conclusions
• EDM is a model-based system that is used for cancer classification and prediction based on publicly available gene expression data– Dependence of clusters to other clusters
• Classification results are comparable with widely accepted ML algorithm
• Eigenvalues of dependence matrix could be a valuable cancer prediction tool
References[1] P. Qui, Z. J. Wang, and K.J.R. Liu. “Genomic Processing for Cancer Classification
and Prediction,” IEEE Signal Processing Magazine, vol. 24, no. 1, pp. 100-110, Jan. 2007.
[2] P. Qui, Z. J. Wang, and K.J.R. Liu. “Ensemble dependence model for classification and prediction of cancer and normal gene expression data,” Bioinformatics, vol. 21, no. 14, pp. 3114-3121, May 2005.
[3] D. Anastassiou. “Genomic Signal Processing,” IEEE Signal Processing Magazine, vol. 18, no. 4, pp. 8-20, July 2001.
[4] J. Astola, I. Tabus, I. Shmelevich, and, E. Dougherty. “Genomic Signal Processing,” Signal Processing (Elsevier), vol. 83, pp. 691-694, 2003.
[5] American Cancer Society. “Cancer Facts and Figures 2006,” ACS :: Statistics for 2006 [Online]. Available: http://www.cancer.org/downloads/STT/CAFF2006PWSecured.pdf
[6] http://en.wikipedia.org/wiki/Gene[7] http://en.wikipedia.org/wiki/Gene_expression[8] http://en.wikipedia.org/wiki/Protein[9] http://en.wikipedia.org/wiki/DNA_microarray[10] M. Karnick. “Genomic Signal Processing,” Engineering Frontiers, The presentation
directly previous to mine, Apr 2007.
Recommended