22
Computational Classification Techniques for Neuroimaging A Machine Learning Based Approach Adrian Smith – Undergraduate Computer Science Department Sonoma State University

CSU_comp

Embed Size (px)

Citation preview

Page 1: CSU_comp

Computational Classification Techniques for Neuroimaging

A Machine Learning Based Approach

Adrian Smith – UndergraduateComputer Science Department

Sonoma State University

Page 2: CSU_comp

Fundamentals• Understanding the human

brain has been a central theme of human history

• By growing our understanding of the brain, we improve our ability to treat diseases (Gur2002)

• Understanding the brain helps us be aware of it’s limitations

Artist’s Depiction of NeuronsUCI Research

Courtesy of OSA Student Chapter at UCI Art in Science Contest. Photo by: Ardy Rahman

Page 3: CSU_comp

fMRI Scanning• Functional Magnetic

Resonance Imaging (fMRI) allows us to measure localized brain activity

• This allows one to find relationships between cognition and brain activity

• Blood oxygen is used as a measure of activity (BOLD imaging)

• This technique produces rich data, but contains high levels of noise

CSRB (Keck MRI Center)

Page 4: CSU_comp

Data Collection• One major advantage of

researching fMRI data is it’s availability on a variety of online locations

• We worked with 1452 total brain scans each corresponding to one of 9 categories

• The categories refer to the image a subject was observing

Page 5: CSU_comp

Analysis Goals• Our goal was to be able to, given the

fMRI scan of a subject, predict what image they were observing

• This means differentiating scans based on the image the subject is observing

• What is the relationship?

Haxby2001 Stimulus Images

Page 6: CSU_comp

Machine Learning Techniques• Machine learning is an

information processing technique

• The field of machine learning is at the heart of understanding “Big Data”

• We aimed to use modern machine learning techniques to help classify fMRI data.

Page 7: CSU_comp

How does Machine Learning Work?• Machine Learning classification

focuses on designing algorithms which are trained to categorize objects

• This is done by combining some defining characteristics and a label

• The algorithm trains on one set of data, and then is tested to see how accurately it can predict the label of some piece of data.

• What is the data?

By Antti Ajanki AnAj (Own work) [GFDL (http://www.gnu.org/copyleft/fdl.html), CC-BY-SA-3.0

(http://creativecommons.org/licenses/by-sa/3.0/)

Page 8: CSU_comp

Which is active before processing?

Unprocessed Active Unprocessed Rest

Page 9: CSU_comp

Which is active after processing?

Processed Active Processed Rest

Page 10: CSU_comp

Preprocessing• We applied masks that came with

the dataset in order to focus on the Ventral Temporal cortex, our region of interest

• We then applied a polynomial detrender, which eliminates systematic trends, such as signal increase as the machine warms up

• This was followed by a key step, z-scoring against the rest position

Graph of Normal DistributionPublic Domain

Page 11: CSU_comp

Classification• We now had to decide how to

process the image data

• This meant choosing features that best represented the data we sought

• We also tested a variety of classification algorithms which would label images based on the chosen feature

Page 12: CSU_comp

Features• We started with the our preprocessed values, and then looked at a

variety of transforms

• We chose the full vector and the PCA reduced version as our main features of interest

• Principle Component Analysis (PCA) is a tool to reduce the dimensionality of a dataset

PCA

Full Vector (Samples)

50 Highest Values

Histogram

[0.5, .01, -.02, 1.5, 2.0, … -3.0]576

One Volume

Page 13: CSU_comp

Experimental Design• Data was split evenly and randomly into training and test

• We used several feature vectors to test each classifier

• We primarily focused on k Nearest Neighbor (kNN) and Support Vector Machine (SVM) classifiers

• Tests were repeated 15 times and scores averagedTrain

Feature

TrainingLabel

TrainedClassifier

TestingFeature

PredictedLabel

TestingLabel

Comparison

Accuracy andConfusion

Matrix

Classifier

Page 14: CSU_comp

kNN vs. SVM• SVM preforms better than kNN

• Increase in accuracy is likely due to the weakness of kNN when dealing with high dimensionality

SVM on samples, 90.9% accuracy

kNN on samples, 75.6% accuracy

Page 15: CSU_comp

• We applied PCA to the processed data

• This produced a vector over half the size of our original

• This smaller vector produces more accurate results

Samples vs. PCA

PCA (SVM), 92.1% accuracy

SVM on samples, 90.9% accuracy

Page 16: CSU_comp

• PCA and SVM in combination gave the best results after repeated testing

• We achieved on average 92.1% accuracy among 9 labels, with a 2.0% standard deviation.

• Our classification methods are effective and repeatable

• We also gained a variety of insights about the nature of the data

Classification Results: Accuracy

Page 17: CSU_comp

• We saw several labels which repeatedly misclassified, and saw accuracy improve as they were removed

• One area of further study is investigating whether these patterns exist between multiple subjects, and why

PCA (SVM), 92.1% accuracy

Classification Results: Insights

Page 18: CSU_comp

Future Exploration• We intend to move towards classifying

across multiple subjects

• This is of utmost importance to clinical applications of fMRI data

• Multisubject comparison presents challenges due to the variation in brain structure

• We intend to build upon previous work on feature detection and scaling maps (Gill2014)

Page 19: CSU_comp

Sources• Gur, R. E., McGrath, C., Chan, R. M., Schroeder, L., Turner, T., Turetsky, B. I., ...

& Gur, R. C. (2002). An fMRI study of facial emotion processing in patients with schizophrenia. American Journal of Psychiatry.

• Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A., Schouten, J. L., & Pietrini, P. (2001). Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293(5539), 2425-2430.

• Gill, G., Bauer, C., & Beichel, R. R. (2014). A method for avoiding overlap of left and right lungs in shape model guided segmentation of lungs in CT volumes. Medical physics, 41(10), 101908.

• Dataset: This data was obtained from the OpenfMRI database. Its accession number is ds000105. The original authors of :ref:`Haxby et al. (2001) <HGF+01>` hold the copyright of this dataset and made it available under the terms of the `Creative Commons Attribution-Share Alike 3.0`_ license.

Page 20: CSU_comp

Acknowledgments• Dr. Gurman Gill – Mentor

• OpenfMRI – Source of all data, and amazing example of open data in science

• pyMVPA – Python toolkit used in preprocessing

• Scikit-learn – Python toolkit used in classification

• Dr. Yaroslav Halchenko – Researcher who provided extensive aid in understanding and dealing with fMRI data

Page 21: CSU_comp

Questions?

Page 22: CSU_comp

Extra Graphics

SVM of top 400 values. 30.9% accuracy SVM on 90% PCA. 92.2%

accuracy