Upload
alaura
View
46
Download
0
Embed Size (px)
DESCRIPTION
Dr. Mahout: Analyzing clinical data using scalable and distributed computing. Shannon Quinn CPCB [email protected] | [email protected] November 10, 2011. 1/29. Punchline. Cloud computing for biological and clinical data analysis Problem: high- dimensional, noisy!. tech2date.com. - PowerPoint PPT Presentation
Citation preview
Dr. Mahout:Analyzing clinical data using scalable
and distributed computingShannon Quinn
[email protected] | [email protected]
November 10, 2011
1/29
Punchline Cloud computing for biological and
clinical data analysis Problem: high- dimensional, noisy!
Heart tissue: biomedcentralfMRI: wikipediasegmentation: biodynamics UCSD
tech2date.com
2/29
Disclaimer
3/29
Biology jargon
Academic jargon
My Background 2nd year Ph.D. student in CPCB Program
Research in bioimage informatics
4/29
My Background Other
5/29
http://collegefootballbelt.com/Logos/
http://s3.amazonaws.com/data.tumblr.com/
Computational biology and …the cloud?
Biological data• is BIG
• requires repetitive analysis in chunks
• modeling involves linear algebra and statistics
6/29
Use case 1: protein behavior
timescale of relevant motionsbond vibration side-chain
rotationdomain shifts/max. catalysis
protein folding
global conformational shifts
[
10-15 10-6 10-3 10010-910-12
detail sampling
a common tradeoff…7/29
Molecular dynamics
8/29
“The curse of [MD] dimensionality”
MD := for every atom for every t …€
F = ma
9/29
http://icanhascheezburger.files.wordpress.com/http://www.pdb.org/pdb/explore/explore.do?structureId=3fxi
Pipeline for MD trajectory analysis
Find a “surface” of protein shapes1. MD output2. Define surface
(graph!)3. Partition surface
10/29
http://www.dillgroup.ucsf.edu/
Mahout implementationDefining surface/graph:
MatrixMultiplicationJob (matrixmult)
TransposeJob (transpose)
DistributedLanczosSolver (svd)
StochasticSVD (ssvd)
Partitioning surface/graph:
SpectralKMeans (spectralkmeans)
Eigencuts (eigencuts)
Kmeans (kmeans)
. . .
11/29
MD in Mahout conclusion MD simulations
(x@Home projects)
Existing Mahout functionality
Additional algorithms
http://folding.stanford.edu/
12/29
Use case 2: diseases affecting cilia What are cilia?• Hairlike structures• Keep things
moving• Diseased
cilia =
13/29
http://fc06.deviantart.net/fs71/f/2010/177/d/5/Sad_Panda_by_jinxii24.jpg
Importance of correct diagnoses Symptoms look
familiar Consequences do
not
14/29
Beat pattern of cilia tells a lot! Clinicians look at cilia motion in making
their diagnoses1. What is the motion called?2. Can we create a database of motions?
15/29
Clinicians’ ultimate goal
Category 1 Category 2 Category 3? ? ?
16/29
Cilia as dynamic textures Computer vision
Saisan et al 2001
Properties
17/29
The [proposed] pipeline Step 1• Clinician captures video and uploads it
http://googolplex.dyndns.org/cilia/
18/29
The [proposed] pipeline Step 2• Mahout job: autoregressive modeling
€
y t ~ Cx t
€
x t ~ A1x t−1 + ...
Appearance Model Dynamic Model
http://web.media.mit.edu/~tristan/phd/dissertation/figures/manifold2.jpg
19/29
The [proposed] pipeline Step 3• Add the transition matrices to cloud library
A =
20/29
The [proposed] pipeline Step 4• Recompute network with added videos
Axis
2
Axis 1
?
21/29
One more thing… What’s really cool about AR models:• Can you spot the fake?
Synthetic Original
22/29
Mahout implementationLearning autoregressive models:
MatrixMultiplicationJob (matrixmult)
TransposeJob (transpose)
DistributedLanczosSolver (svd)
StochasticSVD (ssvd)
Comparing autoregressive parameters:
SpectralKMeans (spectralkmeans)
Eigencuts (eigencuts)
Frobenius norm
Tensors
? ? ?
23/29
Cilia on Mahout conclusions Autoregressive modeling uses linear algebra
that is already implemented
Maintaining AR library requires new functionality
Mahout framework gives us elbow room
24/29
Final Thoughts Biological / biomedical data is large,
high-dimensional, and noisy
We extend Mahout’s current linear algebra framework (spectral clustering, autoregressive models)
We provide a cloud framework!
25/29
Research Group University of Pittsburgh• Dr. Chakra Chennubhotla Lab (advisor)
CMU@Qatar• Dr. Majd Sakr Lab (collaborator)
University of Pittsburgh Medical Center• Dr. Cecilia Lo Lab (collaborator)
26/29
Sources Resources• Apache Mahout• Spectrally Clustered
Links• Categorizing ciliary motion defects (BSEC 2011)• Eigencuts spectral clustering algorithm
Technical report (coming soon!)
27/29
Contact Shannon Quinn• [email protected] | [email protected] • http://www.magsolweb.net/
28/29
Thank you!
29/29
http://icanhascheezburger.files.wordpress.com/