CSU_comp

Computational Classification Techniques for Neuroimaging

A Machine Learning Based Approach

Adrian Smith – UndergraduateComputer Science Department

Sonoma State University

Fundamentals• Understanding the human

brain has been a central theme of human history

• By growing our understanding of the brain, we improve our ability to treat diseases (Gur2002)

• Understanding the brain helps us be aware of it’s limitations

Artist’s Depiction of NeuronsUCI Research

Courtesy of OSA Student Chapter at UCI Art in Science Contest. Photo by: Ardy Rahman

https://www.flickr.com/photos/126293865@N04/14953538130

fMRI Scanning• Functional Magnetic

Resonance Imaging (fMRI) allows us to measure localized brain activity

• This allows one to find relationships between cognition and brain activity

• Blood oxygen is used as a measure of activity (BOLD imaging)

• This technique produces rich data, but contains high levels of noise

CSRB (Keck MRI Center)

https://picasaweb.google.com/lh/photo/0oRjZFmjCItJKeoeJpusgA

Data Collection• One major advantage of

researching fMRI data is it’s availability on a variety of online locations

• We worked with 1452 total brain scans each corresponding to one of 9 categories

• The categories refer to the image a subject was observing

Analysis Goals• Our goal was to be able to, given the

fMRI scan of a subject, predict what image they were observing

• This means differentiating scans based on the image the subject is observing

• What is the relationship?

Haxby2001 Stimulus Images

Machine Learning Techniques• Machine learning is an

information processing technique

• The field of machine learning is at the heart of understanding “Big Data”

• We aimed to use modern machine learning techniques to help classify fMRI data.

How does Machine Learning Work?• Machine Learning classification

focuses on designing algorithms which are trained to categorize objects

• This is done by combining some defining characteristics and a label

• The algorithm trains on one set of data, and then is tested to see how accurately it can predict the label of some piece of data.

• What is the data?

By Antti Ajanki AnAj (Own work) [GFDL (http://www.gnu.org/copyleft/fdl.html), CC-BY-SA-3.0

(http://creativecommons.org/licenses/by-sa/3.0/)

Which is active before processing?

Unprocessed Active Unprocessed Rest

Which is active after processing?

Processed Active Processed Rest

Preprocessing• We applied masks that came with

the dataset in order to focus on the Ventral Temporal cortex, our region of interest

• We then applied a polynomial detrender, which eliminates systematic trends, such as signal increase as the machine warms up

• This was followed by a key step, z-scoring against the rest position

Graph of Normal DistributionPublic Domain

Classification• We now had to decide how to

process the image data

• This meant choosing features that best represented the data we sought

• We also tested a variety of classification algorithms which would label images based on the chosen feature

Features• We started with the our preprocessed values, and then looked at a

variety of transforms

• We chose the full vector and the PCA reduced version as our main features of interest

• Principle Component Analysis (PCA) is a tool to reduce the dimensionality of a dataset

PCA

Full Vector (Samples)

50 Highest Values

Histogram

[0.5, .01, -.02, 1.5, 2.0, … -3.0]576

One Volume

Experimental Design• Data was split evenly and randomly into training and test

• We used several feature vectors to test each classifier

• We primarily focused on k Nearest Neighbor (kNN) and Support Vector Machine (SVM) classifiers

• Tests were repeated 15 times and scores averagedTrain

Feature

TrainingLabel

TrainedClassifier

TestingFeature

PredictedLabel

TestingLabel

Comparison

Accuracy andConfusion

Matrix

Classifier

kNN vs. SVM• SVM preforms better than kNN

• Increase in accuracy is likely due to the weakness of kNN when dealing with high dimensionality

SVM on samples, 90.9% accuracy

kNN on samples, 75.6% accuracy

• We applied PCA to the processed data

• This produced a vector over half the size of our original

• This smaller vector produces more accurate results

Samples vs. PCA

PCA (SVM), 92.1% accuracy

SVM on samples, 90.9% accuracy

• PCA and SVM in combination gave the best results after repeated testing

• We achieved on average 92.1% accuracy among 9 labels, with a 2.0% standard deviation.

• Our classification methods are effective and repeatable

• We also gained a variety of insights about the nature of the data

Classification Results: Accuracy

• We saw several labels which repeatedly misclassified, and saw accuracy improve as they were removed

• One area of further study is investigating whether these patterns exist between multiple subjects, and why

PCA (SVM), 92.1% accuracy

Classification Results: Insights

Future Exploration• We intend to move towards classifying

across multiple subjects

• This is of utmost importance to clinical applications of fMRI data

• Multisubject comparison presents challenges due to the variation in brain structure

• We intend to build upon previous work on feature detection and scaling maps (Gill2014)

Sources• Gur, R. E., McGrath, C., Chan, R. M., Schroeder, L., Turner, T., Turetsky, B. I., ...

& Gur, R. C. (2002). An fMRI study of facial emotion processing in patients with schizophrenia. American Journal of Psychiatry.

• Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A., Schouten, J. L., & Pietrini, P. (2001). Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293(5539), 2425-2430.

• Gill, G., Bauer, C., & Beichel, R. R. (2014). A method for avoiding overlap of left and right lungs in shape model guided segmentation of lungs in CT volumes. Medical physics, 41(10), 101908.

• Dataset: This data was obtained from the OpenfMRI database. Its accession number is ds000105. The original authors of :ref:`Haxby et al. (2001) <HGF+01>` hold the copyright of this dataset and made it available under the terms of the `Creative Commons Attribution-Share Alike 3.0`_ license.

Acknowledgments• Dr. Gurman Gill – Mentor

• OpenfMRI – Source of all data, and amazing example of open data in science

• pyMVPA – Python toolkit used in preprocessing

• Scikit-learn – Python toolkit used in classification

• Dr. Yaroslav Halchenko – Researcher who provided extensive aid in understanding and dealing with fMRI data

Questions?

Extra Graphics

SVM of top 400 values. 30.9% accuracy SVM on 90% PCA. 92.2%

accuracy

Documents

CSU_comp