Upload
afi
View
56
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Get Rich and Cure Cancer with Support Vector Machines. (Your Summer Projects). Kernel Trick. https://www.youtube.com/watch?v= 3liCbRZPrZA. This is achieved with a polynomial kernel. Feature map: Kernel:. Optimization of transformed problem: Only kernel matters. - PowerPoint PPT Presentation
Citation preview
+
Get Rich and Cure Cancer
with Support Vector Machines
(Your Summer Projects)
+Kernel Trick
https://www.youtube.com/watch?v=3liCbRZPrZA
+This is achieved with a polynomial kernel
Feature map:
Kernel:
+Optimization of transformed problem: Only kernel matters Dual Lagrangian for transformed problem:
Optimal weight vector:
Thus, optimal hyperplane:
+Kernel Trick We can choose the kernel without first defining a
feature map.
How to get a feature map from a kernel?
Define
i.e. map vectors in the original feature space to functions.
Inner product on transformed space:
+Get rich off of support vectors
+Making 5-day forecasts of financial futures
Given data on the returns for 5 days
Predict the return on the next day
To achieve this, we need to figure out which 5-day stretches tend to predict good returns on the 6th day, and which predict not-so-good returns
A training data set is used for this purpose
+Making 5-day forecasts of financial futures
Day 1 Day 2 Day 3 Day 4 Day 5
x11 x12 x13 x14 x15
x21 x22 x23 x24 x25
x31 x32 x33 x34 x35
x41 x42 x43 x44 x45
… … … … …
Day 6
y1
y2
y3
y4
y5
5-dimensional feature space Return on 6th day is classifier for data
Routine learns how to classify 5-day-return data points by working with a training data set for 500 days. Constructs a dividing hypersurface and uses it to decide what the 6th-day return should be for new data points.
+Good results – you can try it yourself!
Complete with R code: http://www.r-bloggers.com/trading-with-support-vector-machines-svm/
+Another example: gene expression in normal and cancerous tissue
Gene = unit of heredity
Human genome contains about 21,000 genes
Public domain image from Wikipedia
+Another example: gene expression in normal and cancerous tissue
DNA transcribes to RNA which translates to proteins
This is the process whereby the “genetic code” is made manifest as biological characteristics (genotype gives rise to phenotype)
Wikimedia Commons image by Madeleine Price Ball
+Big question: Which genes are responsible for which outcomes?
In various tissues (e.g. tumor versus normal), which genes are active, hyperactive, and silent?
Can use DNA microarrays to measure gene expression levels.
+DNA Microarray
https://www.youtube.com/watch?v=_6ZMEZK-alM
Source: National Human Genome Research Institute
+Using support vector machines to determine which genes are important for cancer classification
+Data
Data points: Patients
Features: Gene expression coefficients (activity level of a given gene)
Feature space will have a huge number of dimensions! Need a way to reduce.
Could examine all possible subspaces of feature space, but note that if dimension (N) of feature space represents thousands of genes, will mean that number of n-dimensional subspaces is
Too large for practical examination of each subspace
+Generate ranking of features
A ranking of features allows us to make a nested sequence of subspaces of feature space F
and then determine the optimum subspace to work with
One possibility for ranking: Work with each gene individually, get its correlation coefficient with the classifier (i.e. find correlation of gene expression level with classification of tissue into tumor v. normal or into two different types of cancer
Note: ranking by correlation coefficient assumes all the features are independent of one another.
+Generate ranking of features
Another possible way to generate a ranking of features: sensitivity analysis.
Have training data set, already classified into two classes (cancerous v. non, or cancer type 1 v. cancer type 2)
Construct a cost function to estimate error in classification
Sensitivity of cost function to removal of a feature measures the importance of that feature and allows the construction of a ranking.
+Ranking by Support Vector Machines Recursive Feature Elimination
Idea of how to use SVM to identify important features: Consider a cartoon scenario.
x1
x2
Indicates that the x1 direction is completely superfluous for classification.
+Ranking by Support Vector Machines
This suggests the following recursive algorithm for ranking features:
Find weight vector, using all features
Identify the least important feature to be the one with the smallest (in absolute value) component of the weight vector
List that feature as least important and eliminate itfrom the data
Iterate the procedure, with the least important feature thrown out.
End result: Ranked list of features!
+Try this at home!
Data is available online!
http://www.broadinstitute.org/software/cprg/?q=node/55
Classify two types of leukemia.