Microarray Analysis: Image processing and Filter design Instructors: Dr.Ravi Sankar Dr.Wei Qian Student: Kun Li Nov 2006

Microarray Analysis: Image processinMicroarray Analysis: Image processing and Filter designg and Filter design

Instructors: Dr.Ravi Sankar Dr.Wei Instructors: Dr.Ravi Sankar Dr.Wei QianQian

Student: Kun LiStudent: Kun LiNov 2006

Introduction of Microarray AnalysisIntroduction of Microarray Analysis

Microarray is a new technology of molecular biology research. It is an excellent tool to monitor gene transcription for thousands of genes at a time. The first step of this technique involves spotting known sequences on a substrate, which in most cases are glass slides or nylon membranes. This is followed by reverse transcription of mRNA isolated from the biological subjects under study into cDNA. During the process of reverse transcription, the control and the experimental materials are differentially labeled, pooled and hybridized to the arrays. cDNA strands in this pool hybridize to complementary sequences on the array by competing for them. The relative abundance of the corresponding mRNA from the two sources will be assessed by the mesured signal.

Continue…Continue…

The objectives of microarray experiments are to reveal unknown genes and new gene functions as a result of experimental treatments, to find new gene expression patterns and use them as a basis for classification of physiological or pathological processes.


Microarray Image ProcessingMicroarray Image Processing

We know there are many differences in We know there are many differences in between Micro-array images of patients and between Micro-array images of patients and normal person. Analysis of micro-array images normal person. Analysis of micro-array images will help us in cancer detection and diagnosis, will help us in cancer detection and diagnosis, and more importantly it can help us to identify and more importantly it can help us to identify cancer related genes. Actually, many cancer related genes. Actually, many researches about recognition and comparison researches about recognition and comparison of gene expression pattern have been done.of gene expression pattern have been done.

Literature ReviewLiterature Review Microarray Analysis attracts lots of interests from researchers, there sMicroarray Analysis attracts lots of interests from researchers, there s

o many literatures. I got 137 papers published only in 2006. Here I list o many literatures. I got 137 papers published only in 2006. Here I list 20 of them as below:20 of them as below:

SE Ahnert, K Willbrand, FCS Brown, TMA Fink (2006), "Unbiased pattern detection in SE Ahnert, K Willbrand, FCS Brown, TMA Fink (2006), "Unbiased pattern detection in microarray data series", Bioinformatics, 22(12):1471-1476. microarray data series", Bioinformatics, 22(12):1471-1476.

David B Allison, Xiangqin Cui1, Grier P Page1, Mahyar Sabripou (2006), "Microarray data David B Allison, Xiangqin Cui1, Grier P Page1, Mahyar Sabripou (2006), "Microarray data analysis: from disarray to consolidation and consensus", Nature Reviews Genetics, 7:55-65. analysis: from disarray to consolidation and consensus", Nature Reviews Genetics, 7:55-65.

Claes R Andersson, Anders Isaksson, Mats G Gustafsson (2006), "Bayesian detection of Claes R Andersson, Anders Isaksson, Mats G Gustafsson (2006), "Bayesian detection of periodic mRNA time profiles without use of training examples", BMC Bioinformatics, 7:63. periodic mRNA time profiles without use of training examples", BMC Bioinformatics, 7:63.

Richard P Auburn, Roslin R Russell, Bettina Fischer, Lisa A Meadows, Santiago Sevillano Richard P Auburn, Roslin R Russell, Bettina Fischer, Lisa A Meadows, Santiago Sevillano Matilla, Steven Russell (2006), "SimArray: a user-friendly and user-configurable microarray Matilla, Steven Russell (2006), "SimArray: a user-friendly and user-configurable microarray design tool", BMC Bioinformatics, 7:102. design tool", BMC Bioinformatics, 7:102.

Simon Barkow, Stefan Bleuler, Amela Prelic, Philip Zimmermann, and Eckart Zitzler (2006), Simon Barkow, Stefan Bleuler, Amela Prelic, Philip Zimmermann, and Eckart Zitzler (2006), "BicAT: a biclustering analysis toolbox", Bioinformatics, 22(10):1282-1283. "BicAT: a biclustering analysis toolbox", Bioinformatics, 22(10):1282-1283.

Continue…Continue… Anders Bengtsson, Henrik Bengtsson (2006), "Microarray image analysis: background Anders Bengtsson, Henrik Bengtsson (2006), "Microarray image analysis: background

estimation using quantile and morphological filters", BMC Bioinformatics, 7:96. estimation using quantile and morphological filters", BMC Bioinformatics, 7:96.

Henrik Bengtsson, Ola Hossjer (2006), "Methodological study of affine transformations of Henrik Bengtsson, Ola Hossjer (2006), "Methodological study of affine transformations of gene expression data with proposed robust non-parametric multi-dimensional normalization gene expression data with proposed robust non-parametric multi-dimensional normalization method", BMC Bioinformatics, 7:100. method", BMC Bioinformatics, 7:100.

Daniel Berrar, Ian Bradbury, Werner Dubitzky (2006), "Avoiding model selection bias in Daniel Berrar, Ian Bradbury, Werner Dubitzky (2006), "Avoiding model selection bias in small-sample genomic data sets", Bioinformatics, 22(10):1245-1250. small-sample genomic data sets", Bioinformatics, 22(10):1245-1250.

Daniel Berrar, Ian Bradbury, Werner Dubitzky (2006), "Instance-based concept learning from Daniel Berrar, Ian Bradbury, Werner Dubitzky (2006), "Instance-based concept learning from multiclass DNA microarray data", BMC Bioinformatics, 7:73. multiclass DNA microarray data", BMC Bioinformatics, 7:73.

Ghislain Bidaut, Karsten Suhre, Jean-Michel Claverie, Michael F Ochs (2006), Ghislain Bidaut, Karsten Suhre, Jean-Michel Claverie, Michael F Ochs (2006), "Determination of strongly overlapping signaling activity from microarray data", BMC "Determination of strongly overlapping signaling activity from microarray data", BMC Bioinformatics, 7:99. Bioinformatics, 7:99.

Jonathon Blake, Christian Schwager, Misha Kapushesky, and Alvis Brazma (2006), Jonathon Blake, Christian Schwager, Misha Kapushesky, and Alvis Brazma (2006), "ChroCoLoc: an application for calculating the probability of co-localization of microarray "ChroCoLoc: an application for calculating the probability of co-localization of microarray gene expression", Bioinformatics, 22:765-767. gene expression", Bioinformatics, 22:765-767.

Continue…Continue… Marta Blangiardo, Simona Toti, Betti Giusti, Rosanna Abbate, Alberto Magi, Filippo Poggi, Marta Blangiardo, Simona Toti, Betti Giusti, Rosanna Abbate, Alberto Magi, Filippo Poggi,

Luciana Rossi, Francesca Torricelli, and Annibale Biggeri (2006) "Using a calibration Luciana Rossi, Francesca Torricelli, and Annibale Biggeri (2006) "Using a calibration experiment to assess gene-specific information: full Bayesian and empirical Bayesian models experiment to assess gene-specific information: full Bayesian and empirical Bayesian models for two-channel microarray data", Bioinformatics, 22:50-57. for two-channel microarray data", Bioinformatics, 22:50-57.

Philippe Broët, Vladimir A. Kuznetsov, Jonas Bergh, Edison T. Liu, Lance D. Miller (2006), Philippe Broët, Vladimir A. Kuznetsov, Jonas Bergh, Edison T. Liu, Lance D. Miller (2006), "Identifying gene expression changes in breast cancer that distinguish early and late relapse "Identifying gene expression changes in breast cancer that distinguish early and late relapse among uncured patients", Bioinformatics, 22(12):1477-1485. among uncured patients", Bioinformatics, 22(12):1477-1485.

Ljubomir J. Buturovic (2006), "PCP: a program for supervised classification of gene Ljubomir J. Buturovic (2006), "PCP: a program for supervised classification of gene expression profiles", Bioinformatics, 22:245-247. expression profiles", Bioinformatics, 22:245-247.

Roger D Canales, Yuling Luo, James C Willey, Bradley Austermiller, Catalin C Barbacioru, Roger D Canales, Yuling Luo, James C Willey, Bradley Austermiller, Catalin C Barbacioru, Cecilie Boysen, Kathryn Hunkapiller, Roderick V Jensen, Charles R Knight, Kathleen Y Lee, Cecilie Boysen, Kathryn Hunkapiller, Roderick V Jensen, Charles R Knight, Kathleen Y Lee, Yunqing Ma, Botoul Maqsodi, Adam Papallo, Elizabeth Herness Peters, Karen Poulter, Yunqing Ma, Botoul Maqsodi, Adam Papallo, Elizabeth Herness Peters, Karen Poulter, Patricia L Ruppel, Raymond R Samaha, Leming Shi, Wen Yang, Lu Zhang, Federico M Patricia L Ruppel, Raymond R Samaha, Leming Shi, Wen Yang, Lu Zhang, Federico M Goodsaid (2006), "Evaluation of DNA microarray results with quantitative gene expression Goodsaid (2006), "Evaluation of DNA microarray results with quantitative gene expression platforms", Nature Biotechnology, 24:1115-1122. platforms", Nature Biotechnology, 24:1115-1122.

Continue…Continue… Pedro Carmona-Saez, Monica Chagoyen, Andres Rodriguez, Oswaldo Trelles, Jose Pedro Carmona-Saez, Monica Chagoyen, Andres Rodriguez, Oswaldo Trelles, Jose

M Carazo, Alberto Pascual-Montano (2006), "Integrated analysis of gene M Carazo, Alberto Pascual-Montano (2006), "Integrated analysis of gene expression by association rules discovery", BMC Bioinformatics, 7:54. expression by association rules discovery", BMC Bioinformatics, 7:54.

Pedro Carmona-Saez, Roberto D Pascual-Marqui, Francisco Tirado, Jose M Pedro Carmona-Saez, Roberto D Pascual-Marqui, Francisco Tirado, Jose M Carazo, Alberto Pascual-Montano (2006), "Biclustering of gene expression data by Carazo, Alberto Pascual-Montano (2006), "Biclustering of gene expression data by non-smooth non-negative matrix factorization", BMC Bioinformatics, 7:78. non-smooth non-negative matrix factorization", BMC Bioinformatics, 7:78.

Yian A Chen, Cheng-Chung Chou, Xinghua Lu, Elizabeth H Slate, Konan Peck, Yian A Chen, Cheng-Chung Chou, Xinghua Lu, Elizabeth H Slate, Konan Peck, Wenying Xu, Eberhard O Voit, Jonas S Almeida (2006), "A multivariate prediction Wenying Xu, Eberhard O Voit, Jonas S Almeida (2006), "A multivariate prediction model for microarray cross-hybridization", BMC Bioinformatics, 7:101. model for microarray cross-hybridization", BMC Bioinformatics, 7:101.

H Chipman, R Tibshirani (2006), "Hybrid hierarchical clustering with applications H Chipman, R Tibshirani (2006), "Hybrid hierarchical clustering with applications to microarray data", Biostatistics, 7(2):286-301. to microarray data", Biostatistics, 7(2):286-301.

A Choudhary, M Brun, J Hua, J Lowey, E Suh, ER Dougherty (2006), "Genetic test A Choudhary, M Brun, J Hua, J Lowey, E Suh, ER Dougherty (2006), "Genetic test bed for feature selection," Bioinformatics, 22(7):837-842. bed for feature selection," Bioinformatics, 22(7):837-842.


Most of these researches focus on pattern recognition Most of these researches focus on pattern recognition using Neural Network and Support Vector Machine, using Neural Network and Support Vector Machine, Gene Identification and improving the image processiGene Identification and improving the image processing methods, such as optimizing Normalization and Nng methods, such as optimizing Normalization and Noise reduction method.oise reduction method.

Our research is different in the sense that it combines Our research is different in the sense that it combines image processing and signal processing and focuses oimage processing and signal processing and focuses on maping the relations between genes associated with n maping the relations between genes associated with breast cancer.breast cancer.

Thinking About Our New MethodThinking About Our New Method

The most important thing of cancer detection, The most important thing of cancer detection, diagnosis and treatment is to detect cancer and diagnosis and treatment is to detect cancer and identify its type in the early stage when no obvidentify its type in the early stage when no obvious symptoms that can be detected by traditioious symptoms that can be detected by traditional methods developed. nal methods developed.

From a new microarray image, how can we detFrom a new microarray image, how can we detect its cancer development “potential” ?ect its cancer development “potential” ?

ContinueContinue

We believe the image pattern will give us some “hintWe believe the image pattern will give us some “hints” for cancer detection.s” for cancer detection.

In fact, cancer development process involves lots of gIn fact, cancer development process involves lots of genes, that means before a cancer gene expressed, the eenes, that means before a cancer gene expressed, the expression level of many other genes have changed. Sxpression level of many other genes have changed. So, if we can find out the “implicit” relations between o, if we can find out the “implicit” relations between cancer related genes, the problem solved.cancer related genes, the problem solved.

We are planning to design some filters that can be apWe are planning to design some filters that can be applied on microarray image to generate some specific plied on microarray image to generate some specific “signatures” for cancer and normal.“signatures” for cancer and normal.

It’s important to emphasize early stage here.It’s important to emphasize early stage here. Why? Because detecting cancer after people get it is not as meaningful as Why? Because detecting cancer after people get it is not as meaningful as

predicting the cancer developing probability “before” people get it. The predicting the cancer developing probability “before” people get it. The figures below are small parts of normal and cancer microarray images.figures below are small parts of normal and cancer microarray images.

Normal Cancer

?This mid-stage (developing/early) is critical, if we know the gene expression patterns of mid-stage, we can accurately predict cancer development. However, it’s not possible for us to get these patterns because we have to use other methods to detect cancers, then decide a pattern belong to cancer or normal. If cancers have been detected by traditional methods, it not the mid-stage we want.

Normal CancerIn developing process

How to resolve mid-stage problemHow to resolve mid-stage problem

We can assume there is a cycle as below:We can assume there is a cycle as below:

?Normal

Cancer

In developing process

?After treatment

In the cycle in previous page, we can assume In the cycle in previous page, we can assume that the two question marks have some that the two question marks have some similarities. Therefore, we can use the gene similarities. Therefore, we can use the gene expression patterns of “after treatment” as a expression patterns of “after treatment” as a type of control of the gene expression pattern type of control of the gene expression pattern in “developing”. (although they have in “developing”. (although they have similarities, they won’t be exactly the same, so similarities, they won’t be exactly the same, so we can only use pattern of “after treatment” as we can only use pattern of “after treatment” as reference.)reference.)

Assumption and HypothesisAssumption and Hypothesis

Assumption: we assume gene expression pattern of Assumption: we assume gene expression pattern of “after treatment” and “developing” have some “after treatment” and “developing” have some similarities and the pattern of “after treatment” can be similarities and the pattern of “after treatment” can be use as reference.use as reference.

Hypothesis I: There are differences in between gene Hypothesis I: There are differences in between gene expression pattern of “normal”, “developing”, expression pattern of “normal”, “developing”, “cancer” and “after treatment” stages. These “cancer” and “after treatment” stages. These differences can be distinguished via using differences can be distinguished via using computational methods. The gene expression pattern computational methods. The gene expression pattern of “developing” stage can be derived from other three of “developing” stage can be derived from other three stages with relatively high reliability, stages with relatively high reliability,


Hypothesis II: We can design filters and apply Hypothesis II: We can design filters and apply them on microarray images of the four stages them on microarray images of the four stages to generate “signatures” of them.to generate “signatures” of them.

Hypothesis III: The “signatures” from Hypothesis III: The “signatures” from different stages can be used to predict cancer different stages can be used to predict cancer developing probabilities.developing probabilities.

MaterialMaterial

All images we processed in this project are froAll images we processed in this project are from aCGH tumor, provided by Pollack, Jonathan m aCGH tumor, provided by Pollack, Jonathan in Stanford University. Thanks to him!!!in Stanford University. Thanks to him!!!

Method (we use an example to Method (we use an example to explain our method)explain our method)

This is a small portion of a microarray image containing 4800 spots.

Red = Cancer

Green = Control

Yellow = Mixed

The first thing to do is to separate the red and green layers.

Sample and Control LayerSample and Control Layer

Sample Layer Control Layer

Convert RGB Image To Grayscale ImageConvert RGB Image To Grayscale Image

For spots finding, we need to convert the RGB image to grayscale image

Compute The Mean Intensity of The ImageCompute The Mean Intensity of The Image

To set up regular grid, we compute the mean To set up regular grid, we compute the mean intensity of the column of the image, this will intensity of the column of the image, this will help us identify the center of spots and gap help us identify the center of spots and gap between them.between them.

Mean Intensity Profile

Use Autocorrelation to Enhance the ResultUse Autocorrelation to Enhance the Result

Ideally the spots would be periodically spaced, but in Ideally the spots would be periodically spaced, but in practice, they have different shape, size and intensity, so the practice, they have different shape, size and intensity, so the mean profile looks irregular. We can use autocorrelation to mean profile looks irregular. We can use autocorrelation to enhance the result.enhance the result.

Peaks SegmentationPeaks Segmentation

Remove Background noise, set some threshold to Remove Background noise, set some threshold to segment the peaks.segment the peaks.

Enhanced Mean Intensity Profile

Peak Segmentation

Grid Point LocatingGrid Point Locating

The grid point location should be the middle point The grid point location should be the middle point of two adjacent peaks.of two adjacent peaks.

Red Lines show the grid location

Transpose and RepeatingTranspose and Repeating

We have done vertical grid. To do horizontal We have done vertical grid. To do horizontal grid, simply transpose the image and repeat the grid, simply transpose the image and repeat the process mentioned before.process mentioned before.

Set Up Bounding BoxesSet Up Bounding Boxes

Now we can form bounding Now we can form bounding box regions to address each box regions to address each spot individually by using spot individually by using pairs of neighboring grid pairs of neighboring grid

points.points.

Segment Spots From BackgroundSegment Spots From Background

Apply logarithmic transformation and do global threshold.

Global Threshold

ContinueContinue

Since we already get the bounding boxes, we can try local threshold.

Local Threshold

ContinueContinue

Advantages and disadvantages of the two Advantages and disadvantages of the two method mentioned before:method mentioned before:

Log threshold is good, but some weak points Log threshold is good, but some weak points missed. Local threshold shows those weak missed. Local threshold shows those weak spots, but the spots with strong intensity are spots, but the spots with strong intensity are bad.bad.

Combine Logarithmic and Local ThresholdCombine Logarithmic and Local Threshold

It is reasonable to combine these two methods. The result is better.

Combined Threshold

What We Get Now…What We Get Now…

Sample Control

Spots segmentation and intensity Spots segmentation and intensity computationcomputation

Final results:Final results:

Cancer Control

The number in each bounding box shows the intensity of each spot.

Breast Cancer AnalysisBreast Cancer Analysis

Now we are ready to inspect the “implicit” Now we are ready to inspect the “implicit” relations of genes “hiding” in a microarray relations of genes “hiding” in a microarray image. Our idea is to design some type of image. Our idea is to design some type of filters which can be applied on microarray filters which can be applied on microarray image and generate breast cancer “signatures”.image and generate breast cancer “signatures”.

Here’s an Example…Here’s an Example… We us a 6*6 matrix from breast cancer microarray image as an example.We us a 6*6 matrix from breast cancer microarray image as an example. We use each row of the intensity matrix of normal control to filter the We use each row of the intensity matrix of normal control to filter the

control and cancer microarray images. Here are some results:control and cancer microarray images. Here are some results: red = cancer, green = controlred = cancer, green = control

Row Filter 1


Row Filter 2

Row Filter 3


Row Filter 4

Row Filter 5


Randomly choose spots to design filter:Randomly choose spots to design filter:

Random Filter


Choose specific (cancer related) spots to design filter:

Specific Filter


The good news is: in the processed result, the The good news is: in the processed result, the cancer and normal control have significant cancer and normal control have significant differences between them. It’s easy for us to differences between them. It’s easy for us to detect breast cancer.detect breast cancer.

The bad news is: we already know the image The bad news is: we already know the image is from breast cancer patient. How to detect is from breast cancer patient. How to detect early stage breast cancer reasonably?early stage breast cancer reasonably?


Another bad news is: based on our small data set, the Another bad news is: based on our small data set, the processed results don’t look converge to some standard.processed results don’t look converge to some standard.

Red = Breast Caner Sample 1; Blue = Breast Cancer Sample 2; Green = Control

OptimizationOptimization Take the first/second derivate of the processed results might (might not, I Take the first/second derivate of the processed results might (might not, I

think it depends on the results itself) be able to optimize the results.think it depends on the results itself) be able to optimize the results. An ExampleAn Example

First/Second DerivativesFirst/Second Derivatives

Another ExampleAnother Example

First/Second DerivativeFirst/Second Derivative

DiscussDiscuss

Take first/second derivatives make us focus on Take first/second derivatives make us focus on the essential differences between normal and the essential differences between normal and cancer. How ever, some expression level cancer. How ever, some expression level information lost.information lost.

Why the results don’t converge?Why the results don’t converge?

There are many reasons make the results not There are many reasons make the results not converge to a standard, such as the “gene converge to a standard, such as the “gene map” strongly depends on each individual map” strongly depends on each individual person; different researchers have different person; different researchers have different habit; researchers use different equipment and habit; researchers use different equipment and reagents; etc.reagents; etc.

Let’s try another method to design tLet’s try another method to design the filter…he filter…

We’ll identify genes that strongly related to caWe’ll identify genes that strongly related to cancer and design filter according to them.ncer and design filter according to them.

From the cancer (red) and normal (green) imagFrom the cancer (red) and normal (green) images shown before, we can construct two intensites shown before, we can construct two intensity matrix, let’s call them Ic and In, subtract In fy matrix, let’s call them Ic and In, subtract In from Ic, we get another matrix which shows the rom Ic, we get another matrix which shows the differences between Ic and In, let’s call it Id. differences between Ic and In, let’s call it Id.

For each image, we can get a specific Id, but fFor each image, we can get a specific Id, but for the filter design, we need a standard Id. So or the filter design, we need a standard Id. So we average those Ids to get the standard Id.we average those Ids to get the standard Id.

Id = (Id1+Id2+Id3+…+Idn)/nId = (Id1+Id2+Id3+…+Idn)/n Now we can use the big values in Id to design Now we can use the big values in Id to design

the filter.the filter. Note: Id may contain negative values, because Note: Id may contain negative values, because

some gene’s expression level maybe higher in some gene’s expression level maybe higher in normal than in cancer cells.normal than in cancer cells.

Here’s an example: the biggest positive value in Id is Here’s an example: the biggest positive value in Id is the 61st value: 150, we keep this value, and set all oththe 61st value: 150, we keep this value, and set all other values equal to 0, we call this new matrix F1. conver values equal to 0, we call this new matrix F1. convolve F1 with Id1, the result shows the relation betweeolve F1 with Id1, the result shows the relation between the 61st gene and all other genes.n the 61st gene and all other genes.

The biggest negative value in Id is the 32nd value: -5The biggest negative value in Id is the 32nd value: -50, we keep this value and set all other values equal to 0, we keep this value and set all other values equal to 0, we call this new matrix F2. convolve F2 with Id1, t0, we call this new matrix F2. convolve F2 with Id1, the result shows the relation between the 32nd gene anhe result shows the relation between the 32nd gene and all other genes.d all other genes.

Take square root…Take square root…

Take log…Take log…

Another example: linearly add the reAnother example: linearly add the results from different filter…sults from different filter…

This time, design the filters according to the 3This time, design the filters according to the 35th and 42nd value in the Id matrix. Set them a5th and 42nd value in the Id matrix. Set them as F1 and F2 respectively.s F1 and F2 respectively.

R1= conv(Id1, F1)R1= conv(Id1, F1) R2= conv(Id1, F2)R2= conv(Id1, F2) R3= a*R1+ b*R2 (a, b are constants)R3= a*R1+ b*R2 (a, b are constants)

Not good enough, try another idea…Not good enough, try another idea…

The filter design depends on the location of the The filter design depends on the location of the selected value in the standard Id matrix, it’s teselected value in the standard Id matrix, it’s tedious and not convenient.dious and not convenient.

Each spot in the microarray image indicates a sEach spot in the microarray image indicates a specific gene, how can we identify this specialitpecific gene, how can we identify this speciality. Our idea is to bind a specific frequency wity. Our idea is to bind a specific frequency with each specific gene. For example: bind Gene1 h each specific gene. For example: bind Gene1 with Sinwt, bind Gene2 with Sin2wt, and so owith Sinwt, bind Gene2 with Sin2wt, and so on.n.

The elements of Id look like this:The elements of Id look like this: [Value1*Sinwt Value2*Sin2wt Value3*Sin3[Value1*Sinwt Value2*Sin2wt Value3*Sin3

wt……]wt……] Now we convert the intensity matrix to frequeNow we convert the intensity matrix to freque

ncy domain.ncy domain.

Why we do this?Why we do this?

Advantage 1: Sin(iwt) is a orthogonal seriesAdvantage 1: Sin(iwt) is a orthogonal series While i != jWhile i != j

So we can design a feature extraction array and put all genes So we can design a feature extraction array and put all genes associated with cancer in it. For example, the array may look associated with cancer in it. For example, the array may look like this: E = [a*sin3wt b*sin17wt c*sin45wt…..], since we like this: E = [a*sin3wt b*sin17wt c*sin45wt…..], since we bind the frequency information with the intensity, it no longebind the frequency information with the intensity, it no longer depends on the location of the values.r depends on the location of the values.

0)sin()sin( dtjwtiwt

We can use this feature extraction matrix E to “scan” We can use this feature extraction matrix E to “scan” the Id, then select those critical genes out.the Id, then select those critical genes out.

Advantage 2: we can do inverse Fourier transform to Advantage 2: we can do inverse Fourier transform to transfer the intensity matrix to “time” domain. I think transfer the intensity matrix to “time” domain. I think the physical meaning is: at some specific time, the exthe physical meaning is: at some specific time, the expression level of all genes.pression level of all genes.

Advantage3: Maybe we can design band pass or band Advantage3: Maybe we can design band pass or band stop filter based on this.stop filter based on this.

Disadvantage:Disadvantage: Since sin(iwt) is a orthogonal series, the procesSince sin(iwt) is a orthogonal series, the proces

s mentioned before will select specific frequens mentioned before will select specific frequency only and wipe all other frequency out, so wcy only and wipe all other frequency out, so we can’t see the relations between a specific gene can’t see the relations between a specific gene and other genes. e and other genes.

Future workFuture work and Challenge and Challenge

Although the processed results don’t converge Although the processed results don’t converge to a standard, we can construct database to to a standard, we can construct database to store the breast cancer “signatures” as many as store the breast cancer “signatures” as many as possible, therefore, when we get a new possible, therefore, when we get a new microarray image signature, we can firstly try microarray image signature, we can firstly try to match it in our database or compute the to match it in our database or compute the “similarities” between the sample and cancer “similarities” between the sample and cancer or between the sample and control to predict or between the sample and control to predict cancer developing probability. cancer developing probability.

Finish the frequency and “time” domain analysFinish the frequency and “time” domain analysis.is.

Optimize the filter design.Optimize the filter design.


The big problem is that since the gene map strongly The big problem is that since the gene map strongly depends on each individual person, it might not be a depends on each individual person, it might not be a good idea to use a normal person to “measure” good idea to use a normal person to “measure” another people. We need microarray images from the another people. We need microarray images from the same person, before/after he/she got cancer and after same person, before/after he/she got cancer and after he/she received treatment. It’s very hard for us to get he/she received treatment. It’s very hard for us to get this type of images. We can use images of normal and this type of images. We can use images of normal and abnormal tissue from the same person instead, but we abnormal tissue from the same person instead, but we are lacking for these images either.are lacking for these images either.

Item neededItem needed

We need 50 or more microarray images of We need 50 or more microarray images of “normal”, “cancer” and “after treatment” “normal”, “cancer” and “after treatment” stages from the same person. (or from normal stages from the same person. (or from normal and cancer tissues)and cancer tissues)

Appendix: Softwares we can useAppendix: Softwares we can use

Although there are many softwares we can Although there are many softwares we can use, the listed below are free:use, the listed below are free:

F-scanF-scan P-scanP-scan ScanAlyze 2ScanAlyze 2 TIGR SpotfinderTIGR Spotfinder UCSF SpotUCSF Spot

Thank you!!!Thank you!!!

Documents

Microarray Analysis: Image processing and Filter design Instructors: Dr.Ravi Sankar Dr.Wei Qian Student: Kun Li Nov 2006