Upload
vukhue
View
226
Download
0
Embed Size (px)
Citation preview
1
Prof. Yechiam Yemini (YY)
Computer Science DepartmentColumbia University
Chapter 5: Microarray Techniques
5.3 Classification & Machine Learning Techniques
2
Overview
Principal components analysis (PCA) Linear classifiers; perceptrons; neural nets.. SVM Classifiers
2
3
gm
g1g2
gi
T1T2 TnTj
Genes
Tests/experiments/samples/…
Heat map X
Microarray Heat Map Microarray measurements may be
organized in a heat-map matrix Row represent genes Columns represent tests Xij=expression level of gi under test Tj Expression level is visualized via colors
Green= under expressed (down regulated)Red = over expressed (up regulated)
4
Heat Map Provides Expressions Profiles
gm
g1g2
gi
T1T2 TnTj
Genes
Tests/experiments/samples/conditions
Tj
Test expression profile
gi
Gene expression profile
Xij
3
5
Microarray Experiments
There are two typical experiments:
Differentiation Compare expression levels under different conditions A test Tj represents expression levels of a condition E.g., cancer or drug-treated cell vs. normal cell
Temporal expression Explore temporal evolution of expression levels A test Tj represents expression levels at a given time E.g., study cell response to heat-shock, starvation
gi
Tj
gi
Gene expression profile
Xij
Tj
6
Some Basic Geometry Genes/tests may be modeled as n dimensional vectors
Define tj=(0,0…0,1,0….0) then gi=ΣXijti
Differentiation example: t1,t2 have similar expression by g2 and g3 (X31=X32 X21~X22)But g1 provides good differentiation of t1 and t2
j1 n
X32X31
g1
g2
g3
X12
X11
X22X21
t2=(X12 ,X22, X32)t1 =(X11 ,X21, X31)
g1
g2
t1 t2
g3
4
7
Principal ComponentAnalysis
8
Goal: Reduce Dimensionality
g2 , g3 do not differentiate
g1
g2g3=0g1
g1 maximizes differentiation
differentiation ~ variance
g1
g2
g3
t2
g1
g2
t1
g3
t1 t2
t1 t2
5
9
Key Idea: Change Coordinates
Change coordinates to “virtual genes”maximizing differentiation (variance)
g1
g2g3
g1
g2g3
p1
p3
p2
p1
p2
p1
Reduce non-informative dimensions (noise)
t2
g1
g2
t1
g3
10
Maximizing Variance Given n samples {xk} of a random m-dimensional vector x
Assume, for simplicity, that E[x]=0
Which direction w (||w||=1) maximizes the variance VAR[wTx]? Define an nxm data matrix X whose columns are the samples {xk} VAR[wTx]=E[(wTx)(xTw)]= E[wTxxTw)]= wTE[xxT]w= =(1/n2) wT[ΣxrxT
s]w =(1/n2)wTXXTw Therefore, maximizing variance is equivalent to: Max{wTCw | wTw=1}
where C=XXT is the auto-covariance matrix of x
How do principal eigenvectors arise? Use Lagrangian to solve the constrained quadratic optimization L(w,λ)=wTCw-λwTw; the solution must satisfy 0=gradwL= Cw-λw Therefore Cw=λw w is an eigenvector. Furthermore wTCw =λwTw=λ the variance maxing direction is the eigenvector with largest eigenvalue
x
yz
{wTCw | wTw=1}is an ellipsoid whose axesare the eigenvectors of C
6
11
Principal Components Analysis Represent the data in the eigenvectors space
Compute autocovariance: XTX Eigenvectors of XTX are the principal coordinates Principal coordinates maximize residual variance Eigenvalues correspond to maximal residual variance
Use Singular Value Decomposition (SVD) to compute PCA Compute factorization: XTX = U Λ UT
The transformation to principal coordinates is: y=Ux This PCA coordinate change is also called: Karhunen-Loeve transform
Eliminate eigenvectors with small eigenvalues Project data unto a subspace with maximal residual variance This reduces dimensionality while maxing discrimination
12
PCA Through Example
(Orly Alter et al., PNAS, 2000, 97(18) 10101-10106)
Cell cycle analysisTime
Genes
7
13
PCA AnalysisArrays
gene
sgenes
X
XTA
rray
s
Arrays
XTXCovariance
Arr
ays
XTX = U Λ UT
Eigenvalues
SVD Decomposition
ΛU
Eige
ngar
rays
Data: (5981 genes) x(14 arrays)PCA: reduce dimensionalitySVD for array-space
14
Results
Fig. 7. SVD of the normalized and sorted elutriation data. Raster display of data with overexpression(red), no change in expression (black), and underexpression (green). Showing a linear transformation of the data from the 5, 981-genes × 14-arrays space to the reduced diagonalized 14-eigenarrays × 14-eigengenes space using the 5, 981-genes × 14-eigenarrays and 14-eigengenes × 14-arrays basis sets.
Alter, Brown & Botstein PNAS | August 29, 2000 | vol. 97 | no. 18 | 10101–10106 | AppendixA1–A6
8
15
Notes On PCA Effective in reducing dimensionality More predictable and analyzable than clustering Intuitive interpretation
Eigengene = linear combination of gene profiles maxing variance Let Pk be the projection on the subspace Uk=Span{u1,u2…uk}; uk+1
maximizes the residual variance of the projections {(I- Pk)gi} SVD is often simpler to compute in array-space
Results may be applied and interpreted in gene-space
g1
g2g3
16
Linear Classifiers
9
17
Basic Classification ConceptsGiven: sample data {Xk} and class association Yk∈{-1,1}Goal: find a “good” function f(X) such that Y=sgn[f(X)]
There are numerous classification techniques Classical statistics machine learning… We consider only basics
Supervised Learning input {Xk,Yk}; output f(X) Avoid over-fitting
X1
X2
18
Linear ClassifierClassifier is a hyperplane f(x)=wTx-b=w1x1+w2x2-by=sgn(wTx-b) classifies xSimplify this: y=sgn(wx) w=(w1,w2,b), x=(x1,x2,-1)
X1
X2 w
b
f(x)=wTx-b
f(x)=0 f(x)>0f(x)<0
xwTx-b
10
19
Neural Networks (NN) BackgroundHistory:
McCulloch-Pitt [1943]: synaptic connection as a linear classifierHebb [49]: reinforcement learningRosenblatt [57]: the perceptron training algorithmMinsky & Papert [69]: what can a perceptron compute?Starting mid 80’s large explosion of NN models and apps…
Rosenblatt’s model:
Retina Associative units
Response unit
Fixed weights
Variable weights
Step activation function
firing rule:x1w1 + x2w2 + b > 0
1
X2
Yw1
w2
X1
b
20
The Perceptron Training RuleTraining data {x(k),y(k)}Weight update rule: w(t+1)w(t) +ηε(t)x(t)
ε(t)=y(t)-sgn[w(t)x(t)] is the classification error which is 0, -2, or 2 η is constant (some variants use η=η(t), e.g., η(t)=1/t2 )
+ηε(t)x(t)
x(t)
w(t+1)
X1
X2 w(t)
x(t)
w(t+1)
X1
X2 w(t)
11
21
Training Geometry Initialize: w(0)=(1,-1) η=1/2 Iterate: w(t+1)w(t)+ηε(t)x(t); ε(t)=y(t)-sgn[w(t)x(t)]
W(1)=(1,-1)+(1/2)*2*(0,2)=(1,1)
W(0)=(1,-1)
X(1)=(0,2)
W(0)=(1,-1)
misclassified
W(2)=(1,1)-(2,-1)=(-1,2)
W(1)=(1,1)
X(2)=(2,-1)
W(t)
±X(t)W(t+1)Add/subtract misclassified points
22
Trainingw(t+1)w(t)+ηε(t)x(t); ε(t)=y(t)-sgn[w(t)x(t)]
W(3)=(-1,2)+(2,0)=(1,2)
W(2)=(-1,2)
X(3)=(2,0)
W(3)=(-1,2)+(2,0)=(1,2)
W(3)=(1,2)
W(3)=(-1,1)+X(1)-x(2)+X(3)
Classifier found
12
23
More GenerallyA linear classifier y=sgn[f(x)]=sgn[wTx+w0]The classifier may be represented as: f(x)=(wT,w0) x 1
The perceptron training problem:Given: a training sample S={x(k),y(k)}Compute: w such that y=sgn[wTx] is consistent with S
24
Training To Classify Temporal Expression Classify temporal gene profile curves as convex~ 1 or concave ~ -1
Note: co-expression is revealed through montonicity & convexity
Training patterns:
Initialize weights w(0)=(1,1,1,1,1,0); η=1/2 It takes 3 iterations to converge (see table) to the classifier
f(X)=sgn[(-2,1,3,0,-2)X]
X(1)=(0,2,3,1,0)y(1)=1
X(2)=(0,1,2,2,0)y(2)=1
X(3)=(1,3,2,1,0)y(3)=1
X(4)=(3,2,1,2,3)y(4)=-1
X(5)=(2,1,1,2,3)y(5)=-1
3210t
no errors (stop)-2,1,3,0,-2,020,2,3,1,0,1-2,-1,0,-1,-2,-1-23,2,1,2,3,11,1,1,1,1,000,2,3,1,0,11,1,1,1,1,0
ε(t) x(t)W(t) w(t+1)w(t)+ηε(t)x(t); ε(t)=y(t)-sgn[w(t)x(t)]
13
25
Example Notes Does f(X)=sgn[(-2,1,3,0,-2)X] discriminate convex from concave?
Try convex X=(2,4,2,1,0) f(X)=1; concave X=(2,0,1,3,4) f(X)=-1 But for X=(5,6,1,0,0) f(X)=-1; X=(0,3,4,2,0) f(X)=1; both are classification errors
What did the perceptron learn from the training samples? Assign heavily negative weights to the extremes; positive weights to the middle Concave curves are higher at extremes; convex ones are higher at middle points Training discovers weights distinguishing class features Classification errors occur when patterns are mismatched with these features
X(1)=(0,2,3,1,0)y(1)=1
X(2)=(0,1,2,2,0)y(2)=1
X(3)=(1,3,2,1,0)y(3)=1
X(4)=(3,2,1,2,3)y(4)=-1
X(5)=(2,1,1,2,3)y(5)=-1
26
Linear Classifiers May Be Generalized Linear classifiers are limited and sensitive to noise Would like to generalize them to admit non-linearity & noise Generalize to multilayer Neural Networks
Can handle non-linearity but Have limited noise resiliency, are difficult to scale, train or interpret… Convergence problems of gradient learning (e.g., local minima).. Unclear how to avoid overfitting.. Have limited use in handling microarray data
Generalize to Support Vector Machines (SVM) Retain simplicity while offering new capabilities
A multilayer NN
14
27
Support Vector Machines(SVM)
28
Generalizing The PerceptronThe perceptron rule may be rewritten as:
if y(t)w(t)x(t)<0 then w(t+1)w(t)+ηy(t)x(t)This means that w=Σα(t)y(t)x(t) where α(t)>0
Learning: compute α(t) from sample data {y(t),x(t)}
There could be many separating hyperplanesSVM: find the “best” hyperplane =max margins Leads to a quadratic optimization problem
X1
X2
X1
X2Support vectors
15
29
The Dual Learning Rule Rewrite the classifier: f(x)=<wx>+b=Σα(t)y(t)<x(t)x>+b f(x)=Σα(t)y(t)K(x(t),x)+b
K(x,z)=x.z -- the kernel --measures correlation between x(t) and xDual learning rule:
if y(i)[Σα(t)y(t)K(x(t)x(i))+b]<0 then α(i)α(i)+η
X1
Predicted classificationReal classification
30
Training A Linear SVM
Compute the kernel matrixIterate the SVM learning rule
Computing Kernel
if y(i)[Σα(t)y(t)K(t,i)+b]<0 then α(i)α(i)+η
16
31
An SVM Training ExampleConsider again the convex/concave classification
Compute a linear kernel
X(1)=(0,2,3,1,0)y(1)=1
X(2)=(0,1,2,2,0)y(2)=1
X(3)=(1,3,2,1,0)y(3)=1
X(4)=(3,2,1,2,3)y(4)=-1
X(5)=(2,1,1,2,3)y(5)=-1
0,2,3,1,0,10,1,2,2,0,11,3,2,1,0,13,2,1,2,3,12,1,1,2,3,1
X= K=XTX=
0,0,1,3,22,1,3,2,13,2,2,1,11,2,1,2,20,0,0,3,31,1,1,1,1
0,0,1,3,22,1,3,2,13,2,2,1,11,2,1,2,20,0,0,3,31,1,1,1,1
=
x(1)…..x(5)
15,11,14,10, 811,10,10, 9, 814,10,16,14,1010, 9,14,28, 23 8, 8,10,23,20
32
Training Initialize: η=0.5 α=(-1,0,1,0,0)
Iterate:K=
15,11,14,10, 811,10,10, 9, 814,10,16,14,1010, 9,14,28, 23 8, 8,10,23,20
if y(j)[Σα(t)y(t)K(t,j)]<0 then α(j)α(j)+η
1
5432
j-1
0010
-0.5
0010
+0.5
Iterations of α
-1
-2-42-1
Values of y(j)[Σα(t)y(t)K(t,j)]
-0.5
00.510
+0.5
6.5
-6-99
4.5
-0.5
00.51
0.5+0.5
1.5
5.5520
7
1.50.575
terminate
-0.5
00.51
0.5
>0
y=(1,1,1,-1,-1)
17
33
SVM Example ContinuedTraining result: α=(-0.5,0.5,1,0.5,0)=0.5(-1,1,2,1,0)
Computing the SVM classifier:f(x)=<[Σα(t)y(t)x(t)],x>=<w,x>w =Σα(t)y(t)x(t)=0.5(-x(1)+x(2)+2x(3)-x(4))= =0.5(-1,3,2,1,-3,1)
Classifier: f(x)= -x1+3x2+2x3+x4-3 x5+1
X(1)=(0,2,3,1,0)y(1)=1
X(2)=(0,1,2,2,0)y(2)=1
X(3)=(1,3,2,1,0)y(3)=1
X(4)=(3,2,1,2,3)y(4)=-1
X(5)=(2,1,1,2,3)y(5)=-1
34
Notes On SVM TrainingWhat did the SVM classifier learn about convexity?
f(x)= -x1+3x2+2x3+x4-3 x5+1 much like perceptron, assigns negativeweights to the extremes and positive to the middle
Consider the samples misclassified by the perceptron:X1=(5,6,1,0,0); X2=(0,3,4,2,0)
The SVM classifier classifies X1 correctly but errs in classifying X2 (What is the source of the error? What training samples can improve
this?)
18
35
Handling Non-Separable Data
When data is not separable use soft-margins Optimize margins given a relative cost of error
X1
X2
X1
X2
36
Generalization To Kernel MachinesSimplified approach to non-linear classification
Map data to feature space via non-linear transformationUse linear classification in feature space F may have different dimension than X (e.g., reduce dimensionality)
o
x
xx
xx
x
o
o
o
o o
X Fф
Ф(x)Ф(x)
Ф(x)
Ф(x)
Ф(x)
Ф(x)Ф(o)
Ф(o)
Ф(o)
Ф(o)Ф(o)
Ф(o)
19
37
Linear Classification in Feature Space Consider the classification in feature space:
f(x)=Σα(t)y(t)<φ(x(t))φ(x)>+b
Define the Kernel of the transformation: K(u,v)=<φ(u),φ(v)>
The kernel specifies the “feature space” classifier:f(x)=Σα(t)y(t)K(x(t),x)+b
Example Kernel Functions:1) Polynomial, Φ(xi,xj)=(xi xj +1)d
2) Gaussian, Φ(xi,xj)=e-|| xi-xj||/σ2
38
Example: Polynomial KernelK(xi,xj)=(1 + xi
Txj)2
K(xi,xj)=1+ xi12xj1
2 + 2 xi1xj1 xi2xj2+ xi2
2xj22 + 2xi1xj1 + 2xi2xj2
= [1, xi12, √2 xi1xi2 ,xi2
2 ,√2xi1 ,√2xi2]T [1,xj12,√2 xj1xj2 ,xj2
2 ,√2xj1 ,√2xj2]
K(xi,xj)== φ(xi) Tφ(xj), where φ(x) = [1,x1
2,√2 x1x2 ,x22,√2x1 ,√2x2]
wTφ(x)=constant
Φ: x → φ(x)
20
39
The Kernel “Trick” Consider the classifier: f(x)=Σα(t)y(t)K(x(t),x)+b and
training algorithm: y(i)[Σa(t)y(t)K(x(t),x(i))+b]<0 then α(i)α(i)+η
An SVM classifier may be computed from the kernel aloneNo need to know the underlying mapping φ(x)
We just need to know that the kernel is appropriateK(u,v)=<φ(u),φ(v)> for some φ
Mercer: any symmetric positive definite matrix is a kernel
40
Example: SVM Classification of MicroarraysG
enes
Tests
( )!!
!=
iiii
ii
YYXX
YXYXK rrrr
rrrr,
Kernel measures test similarity
Kernel provides a measure of similarity
21
41
Applying SVMRepresent the biological question as a classification problem
Represent the as vectors
Establish a kernel matrix to represent similarity
Train an SVM classifier
Evaluate performance of classifier
Classifier: f(x)=Σα(t)y(t)K(x(t),x)+b
Training: y(i)[Σa(t)y(t)K(x(t),x(i))+b]<0 then α(i)α(i)+η
42
Cancer ClassificationWith SVMA. Zhang, DIMACS, 2007
22
43
Cancer Classification StudyA. Zhang, DIMACS, 2007
Microarrays provide a small sample of high-dimensional data Key challenge: overfitting
Comparative study of classifiers over Microarray DBs Use SVM with improved kernel (max discrimination) Compare with kNN and linear discriminant classifiers
http://dimacs.rutgers.edu/Workshops/MLTechniques/slides/z...
44
Cancer ClassificationLung
ALL-AML
Prostate
23
45
But….www.sci.usq.edu.au/research/seminars/files//seminar135/ausdm1.ppt
Decision-trees Boosting
46
Cancer Studies
A. Statnikov, C. F. Aliferis,I. Tsamardinos.
Vanderbilt University, MEDINFO 2004
24
47
Total:• ~1300 samples• 74 diagnostic
categories• 41 cancer types and 12 normal tissue types
Sam- ples
Variables (genes)
Cate- gories
11_Tumors 174 12533 11 Su, 2001
14_Tumors 308 15009 26 Ramaswamy, 2001
9_Tumors 60 5726 9 Staunton, 2001
Brain_Tumor1 90 5920 5 Pomeroy, 2002
Brain_Tumor2 50 10367 4 Nutt, 2003
Leukemia1 72 5327 3 Golub, 1999
Leukemia2 72 11225 3 Armstrong, 2002
Lung_Cancer 203 12600 5 Bhattacherjee, 2001
SRBCT 83 2308 4 Khan, 2001
Prostate_Tumor 102 10509 2 Singh, 2002
DLBCL 77 5469 2 Shipp, 2002
Dataset nameNumber of
Reference
Microarray Datasets
48
Classifiers
• K-Nearest Neighbors (KNN)• Backpropagation Neural Networks (NN)• Probabilistic Neural Networks (PNN)• Multi-Class SVM: One-Versus-Rest (OVR)• Multi-Class SVM: One-Versus-One (OVO)• Multi-Class SVM: DAGSVM• Multi-Class SVM by Weston & Watkins (WW)• Multi-Class SVM by Crammer & Singer (CS)• Weighted Voting: One-Versus-Rest• Weighted Voting: One-Versus-One• Decision Trees: CART
kernel-based
neural networks
voting
decision trees
instance-based
25
49
9_Tum
ors
14_T
umors
Bra
in_T
umor2
Bra
in_T
umor1
11
_Tum
orsLeu
kem
ia1
Leuke
mia
2 Lung_C
ance
r SRBCT
Prost
ate_
Tumor
DLBCL
OVROVODAGSVMWWCSKNNNNPNN
MC-
SVM
0
20
40
60
80
100
Accu
racy
, %
Without Gene Selection
50
1. Signal-to-noise (S2N) ratio inone-versus-rest (OVR)fashion;
2. Signal-to-noise (S2N) ratio inone-versus-one (OVO)fashion;
3. Kruskal-Wallisnonparametric one-wayANOVA (KW);
4. Ratio of genes between-categories to within-categorysum of squares (BW).
genes
Uninformative genesHighly discriminatory genes
Gene Selection
26
51
OVR OVO DAGSVM WW CS KNN NN PNN
10
20
30
40
50
60
70
Impr
ovem
ent i
n ac
cura
cy, %
SVM non-SVM
Improvement of diagnosticperformance by gene selection(averages for the four datasets)
Average reduction of genes is 10-30 times
Diagnostic performancebefore and after gene selection
SVM non-SVM SVM non-SVM
20
40
60
80
100
20
40
60
80
100
Accu
racy
, %
9_Tumors 14_Tumors
Brain_Tumor1 Brain_Tumor2
With Gene Selction
52
Protein Classification
(Based on W. S. NobleU. Washington)
27
53
Classifying Transmembrane proteins
mRNA expression data
protein-protein interaction data
sequence data
Challenge: build a classification model for diverse data
54
Key Idea Represent classification data in terms of SVM kernels Combine kernels to best apply all discriminating data
Protein
SV1 SV2 SV3 SVn...
sgn(Σ)λ1
λ2 λ3 λn
28
55
>ICYA_MANSEGDIFYPGYCPDVKPVNDFDLSAFAGAWHEIAKLPLENENQGKCTIAEYKYDGKKASVYNSFVSNGVKEYMEGDLEIAPDAKYTKQGKYVMTFKFGQRVVNLVPWVLATDYKNYAINYNCDYHPDKKAHSIHAWILSKSKVLEGNTKEVVDNVLKTFSHLIDASKFISNDFSEAACQYSTTYSLTGPDRH
>LACB_BOVINMKCLLLALALTCGAQALIVTQTMKGLDIQKVAGTWYSLAMAASDISLLDAQSAPLRVYVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTKIPAVFKIDALNENKVLVLDTDYKKYLLFCMENSAEPEQSLACQCLVRTPEVDDEALEKFDKALKALPMHIRLSFNPTQLEEQCHI
Classification Based on Sequence
How do we model strings-similarity in terms of a kernel matrix? Define sequence kernel in terms of similarity score
56
Pairwise comparison kernel
29
57
Classification By Interaction Profile How do we build classification model from protein
interactions graph? Interaction Kernel: # of common neighbors
1 0 0 1 0 1 0 11 0 1 0 1 1 0 10 0 0 0 1 1 0 00 0 1 0 1 1 0 10 0 1 0 1 0 0 11 0 0 0 0 0 0 10 0 1 0 1 0 0 0
protein
protein1 0 0 1 0 1 0 11 0 1 0 1 1 0 10 0 0 0 1 1 0 00 0 1 0 1 1 0 10 0 1 0 1 0 0 11 0 0 0 0 0 0 10 0 1 0 1 0 0 0
3Interaction Kernel
Incidence matrix
58
Diffusion kernel General metric of similarity between graph nodes Based upon a random walk Kernel ~average time for random walk starting at x to first visit y
(# paths connecting two nodes, weighted by path lengths)
30
59
Hydrophobicity Kernel
Transmembrane regions are typically hydrophobicThe hydrophobicity profile is conservedRepresent data in terms of a kernel
60
Protein
K1
...
sgn(Σ)λ1 λ2 λ3 λn
K2 K3 Kn
radial basis kernelgene expressionKE
diffusion kernelprotein interactionsKD
linear kernelprotein interactionsKLI
FFThydropathy profileKFFT
Pfam HMMprotein sequenceKHMM
BLASTprotein sequenceKB
Smith-Watermanprotein sequenceKSW
Similarity measureDataKernel
Combining Kernel MachinesGiven kernels Ki(u,v)= <φi(u),φi(v)> Define a combined kernel K(u,v)=ΣλiKi(u,v) (1=Σλi)
Corresponds to the mapping φ(u)=(φ1(u), φ2(u),…φn(u))And weighted inner product <φ(u),φ(v)>=Σλi<φi(u),φi(v)>
Extend training to combined SVM
31
61
Membrane Proteins
Simple rules fromhydrophobicity profile
TMHMM
Combined SVM
%TP at 1%FP
62
Cytoplasmatic Ribosomal Proteins
What Can Errors Teach ?
32
63
SVM Final NotesKernel machines provide powerful classifiers
Kernels admit flexible modeling of similaritySimple and general training procedureMultiple classifiers may be combined to improve results…
But..Choosing a good kernel is an art Training results may be sensitive to training sample
Other classification ideasBoostingDecision trees….
64
Conclusions
33
65
Microarray AnalysisMicroarrays provide rich information on gene expression
Identify variance between cell behaviors Determine co-expression patterns of genes Analyze temporal behavior of genome …
Low-level analysis improves data quality Normalization, noise reduction…
High-level analysis improves data interpretation Study correlations of gene expressions Clustering determines similarity PCA analyzes variance Classifiers analyze features