Upload
adrian-smith
View
6
Download
0
Embed Size (px)
Citation preview
Computational Classification Techniques for Neuroimaging
A Machine Learning Based Approach
Adrian Smith – UndergraduateComputer Science Department
Sonoma State University
Fundamentals• Understanding the human
brain has been a central theme of human history
• By growing our understanding of the brain, we improve our ability to treat diseases (Gur2002)
• Understanding the brain helps us be aware of it’s limitations
Artist’s Depiction of NeuronsUCI Research
Courtesy of OSA Student Chapter at UCI Art in Science Contest. Photo by: Ardy Rahman
fMRI Scanning• Functional Magnetic
Resonance Imaging (fMRI) allows us to measure localized brain activity
• This allows one to find relationships between cognition and brain activity
• Blood oxygen is used as a measure of activity (BOLD imaging)
• This technique produces rich data, but contains high levels of noise
CSRB (Keck MRI Center)
Data Collection• One major advantage of
researching fMRI data is it’s availability on a variety of online locations
• We worked with 1452 total brain scans each corresponding to one of 9 categories
• The categories refer to the image a subject was observing
Analysis Goals• Our goal was to be able to, given the
fMRI scan of a subject, predict what image they were observing
• This means differentiating scans based on the image the subject is observing
• What is the relationship?
Haxby2001 Stimulus Images
Machine Learning Techniques• Machine learning is an
information processing technique
• The field of machine learning is at the heart of understanding “Big Data”
• We aimed to use modern machine learning techniques to help classify fMRI data.
How does Machine Learning Work?• Machine Learning classification
focuses on designing algorithms which are trained to categorize objects
• This is done by combining some defining characteristics and a label
• The algorithm trains on one set of data, and then is tested to see how accurately it can predict the label of some piece of data.
• What is the data?
By Antti Ajanki AnAj (Own work) [GFDL (http://www.gnu.org/copyleft/fdl.html), CC-BY-SA-3.0
(http://creativecommons.org/licenses/by-sa/3.0/)
Which is active before processing?
Unprocessed Active Unprocessed Rest
Which is active after processing?
Processed Active Processed Rest
Preprocessing• We applied masks that came with
the dataset in order to focus on the Ventral Temporal cortex, our region of interest
• We then applied a polynomial detrender, which eliminates systematic trends, such as signal increase as the machine warms up
• This was followed by a key step, z-scoring against the rest position
Graph of Normal DistributionPublic Domain
Classification• We now had to decide how to
process the image data
• This meant choosing features that best represented the data we sought
• We also tested a variety of classification algorithms which would label images based on the chosen feature
Features• We started with the our preprocessed values, and then looked at a
variety of transforms
• We chose the full vector and the PCA reduced version as our main features of interest
• Principle Component Analysis (PCA) is a tool to reduce the dimensionality of a dataset
PCA
Full Vector (Samples)
50 Highest Values
Histogram
[0.5, .01, -.02, 1.5, 2.0, … -3.0]576
One Volume
Experimental Design• Data was split evenly and randomly into training and test
• We used several feature vectors to test each classifier
• We primarily focused on k Nearest Neighbor (kNN) and Support Vector Machine (SVM) classifiers
• Tests were repeated 15 times and scores averagedTrain
Feature
TrainingLabel
TrainedClassifier
TestingFeature
PredictedLabel
TestingLabel
Comparison
Accuracy andConfusion
Matrix
Classifier
kNN vs. SVM• SVM preforms better than kNN
• Increase in accuracy is likely due to the weakness of kNN when dealing with high dimensionality
SVM on samples, 90.9% accuracy
kNN on samples, 75.6% accuracy
• We applied PCA to the processed data
• This produced a vector over half the size of our original
• This smaller vector produces more accurate results
Samples vs. PCA
PCA (SVM), 92.1% accuracy
SVM on samples, 90.9% accuracy
• PCA and SVM in combination gave the best results after repeated testing
• We achieved on average 92.1% accuracy among 9 labels, with a 2.0% standard deviation.
• Our classification methods are effective and repeatable
• We also gained a variety of insights about the nature of the data
Classification Results: Accuracy
• We saw several labels which repeatedly misclassified, and saw accuracy improve as they were removed
• One area of further study is investigating whether these patterns exist between multiple subjects, and why
PCA (SVM), 92.1% accuracy
Classification Results: Insights
Future Exploration• We intend to move towards classifying
across multiple subjects
• This is of utmost importance to clinical applications of fMRI data
• Multisubject comparison presents challenges due to the variation in brain structure
• We intend to build upon previous work on feature detection and scaling maps (Gill2014)
Sources• Gur, R. E., McGrath, C., Chan, R. M., Schroeder, L., Turner, T., Turetsky, B. I., ...
& Gur, R. C. (2002). An fMRI study of facial emotion processing in patients with schizophrenia. American Journal of Psychiatry.
• Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A., Schouten, J. L., & Pietrini, P. (2001). Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293(5539), 2425-2430.
• Gill, G., Bauer, C., & Beichel, R. R. (2014). A method for avoiding overlap of left and right lungs in shape model guided segmentation of lungs in CT volumes. Medical physics, 41(10), 101908.
• Dataset: This data was obtained from the OpenfMRI database. Its accession number is ds000105. The original authors of :ref:`Haxby et al. (2001) <HGF+01>` hold the copyright of this dataset and made it available under the terms of the `Creative Commons Attribution-Share Alike 3.0`_ license.
Acknowledgments• Dr. Gurman Gill – Mentor
• OpenfMRI – Source of all data, and amazing example of open data in science
• pyMVPA – Python toolkit used in preprocessing
• Scikit-learn – Python toolkit used in classification
• Dr. Yaroslav Halchenko – Researcher who provided extensive aid in understanding and dealing with fMRI data
Questions?
Extra Graphics
SVM of top 400 values. 30.9% accuracy SVM on 90% PCA. 92.2%
accuracy