Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Seeing the Big Picture Segmenting Images to Create Data
15.071x – The Analytics Edge
Image Segmentation
15.071x – Seeing the Big Picture: Segmenting Images to Create Data 1
• Divide up digital images to salient regions/clusters corresponding to individual surfaces, objects, or natural parts of objects
• Clusters should be uniform and homogenous with respect to certain characteristics (color, intensity, texture)
• Goal: Useful and analyzable image representation
Wide Applications
2
• Medical Imaging • Locate tissue classes, organs, pathologies and tumors • Measure tissue/tumor volume
• Object Detection • Detect facial features in photos • Detect pedestrians in footages of surveillance videos
• Recognition tasks • Fingerprint/Iris recognition
15.071x – Seeing the Big Picture: Segmenting Images to Create Data
Various Methods
3
• Clustering methods • Partition image to clusters based on differences in pixel
colors, intensity or texture
• Edge detection • Based on the detection of discontinuity, such as an abrupt
change in the gray level in gray-scale images
• Region-growing methods • Divides image into regions, then sequentially merges
sufficiently similar regions
15.071x – Seeing the Big Picture: Segmenting Images to Create Data
In this Recitation…
4
• Review hierarchical and k-means clustering in R
• Restrict ourselves to gray-scale images • Simple example of a flower image (flower.csv) • Medical imaging application with examples of transverse
MRI images of the brain (healthy.csv and tumor.csv)
• Compare the use, pros and cons of all analytics methods we have seen so far
15.071x – Seeing the Big Picture: Segmenting Images to Create Data
Grayscale Images
• Image is represented as a matrix of pixel intensity values ranging from 0 (black) to 1 (white)
• For 8 bits/pixel (bpp), 256 color levels
0 .2 .5 0 0 0 .2
.2 0 0 .5 .5 .2 0
0 .5 .7 0 0 0 .5
.5 0 1 .7 .7 .5 0
0 .5 .7 0 0 0 .5
.2 0 0 .5 .5 .2 0
0 .2 .5 0 0 0 .2
1 15.071x – Seeing the Big Picture: Segmenting Images to Create Data
7 pixels
7 pixels
Grayscale Image Segmentation
• Cluster pixels according to their intensity values
2
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
0 .2 .5 0 0 0 .2
.2 0 0 .5 .5 .2 0
0 .5 .7 0 0 0 .5
.5 0 1 .7 .7 .5 0
0 .5 .7 0 0 0 .5
.2 0 0 .5 .5 .2 0
0 .2 .5 0 0 0 .2
0
.2
0
.5
0
.2
0
.7
1
.7
.5
0
.2
0 Intensity Vector of size n = 7x7
7 pixels Pairwise distances n(n-1)/2
15.071x – Seeing the Big Picture: Segmenting Images to Create Data
Dendrogram Example
1
ABCDEF
A B C F
BCDEF
BC
DEF
DE
D E
15.071x – Seeing the Big Picture: Segmenting Images to Create Data
Level 1
Level 2
Level 3
Level 4
Level 5
Data Observations
Cluster
Distance between clusters BCDEF and BC
Dendrogram Example
2
ABCDEF
A B C F
BCDEF
BC
DEF
DE
D E
15.071x – Seeing the Big Picture: Segmenting Images to Create Data
Level 1
Level 2
Level 3
Level 4
Level 5
A, BC, DE, F 4 clusters
A, BC, DEF 3 clusters
A, BCDEF 2 clusters
CO
AR
SER
Dendrogram Example
3
ABCDEF
A B C F
BCDEF
BC
DEF
DE
D E
15.071x – Seeing the Big Picture: Segmenting Images to Create Data
Level 1
Level 2
Level 3
Level 4
Level 5
Flower Dendrogram
15.071x – Predictive Diagnosis: Discovering Patterns for Disease Detection 4
4 clusters
3 clusters
2 clusters
k-Means Clustering
1
• The k-means clustering aims at partitioning the data into k clusters in which each data point belongs to the cluster whose mean is the nearest
k-Means Clustering Algorithm
1. Specify desired number of clusters k
2. Randomly assign each data point to a cluster
3. Compute cluster centroids
4. Re-assign each point to the closest cluster centroid
5. Re-compute cluster centroids
6. Repeat 4 and 5 until no improvement is made
15.071x – Seeing the Big Picture: Segmenting Images to Create Data
k-Means Clustering
2
k-Means Clustering Algorithm
1. Specify desired number of clusters k
15.071x – Seeing the Big Picture: Segmenting Images to Create Data
k = 2
k-Means Clustering
3
k-Means Clustering Algorithm
1. Specify desired number of clusters k
2. Randomly assign each data point to a cluster
15.071x – Seeing the Big Picture: Segmenting Images to Create Data
k = 2
k-Means Clustering
4
k-Means Clustering Algorithm
1. Specify desired number of clusters k
2. Randomly assign each data point to a cluster
15.071x – Seeing the Big Picture: Segmenting Images to Create Data
k = 2
k-Means Clustering
5
k-Means Clustering Algorithm
1. Specify desired number of clusters k
2. Randomly assign each data point to a cluster
15.071x – Seeing the Big Picture: Segmenting Images to Create Data
k = 2
k-Means Clustering
6
k-Means Clustering Algorithm
1. Specify desired number of clusters k
2. Randomly assign each data point to a cluster
3. Compute cluster centroids
15.071x – Seeing the Big Picture: Segmenting Images to Create Data
k = 2 k-Means Clustering Algorithm
1. Specify desired number of clusters k
2. Randomly assign each data point to a cluster
3. Compute cluster centroids
4. Re-assign each point to the closest cluster centroid
k-Means Clustering
7
k-Means Clustering Algorithm
1. Specify desired number of clusters k
2. Randomly assign each data point to a cluster
3. Compute cluster centroids
4. Re-assign each point to the closest cluster centroid
15.071x – Seeing the Big Picture: Segmenting Images to Create Data
k = 2
k-Means Clustering
8
6. Repeat 4 and 5 until no improvement is made
k-Means Clustering Algorithm
1. Specify desired number of clusters k
2. Randomly assign each data point to a cluster
3. Compute cluster centroids
4. Re-assign each point to the closest cluster centroid
5. Re-compute cluster centroids
15.071x – Seeing the Big Picture: Segmenting Images to Create Data
k = 2
Segmented MRI Images
15.071x – Predictive Diagnosis: Discovering Patterns for Disease Detection 1
T2 Weighted MRI Images
15.071x – Predictive Diagnosis: Discovering Patterns for Disease Detection 2
First Taste of a Fascinating Field
3
• MRI image segmentation is subject of ongoing research
• k-means is a good starting point, but not enough • Advanced clustering techniques such as the modified
fuzzy k-means (MFCM) clustering technique • Packages in R specialized for medical image analysis http://cran.r-project.org/web/views/MedicalImaging.html
15.071x – Seeing the Big Picture: Segmenting Images to Create Data
3D Reconstruction
4
• 3D reconstruction from 2D cross sectional MRI images
• Important in the medical field for diagnosis, surgical planning and biological research
15.071x – Seeing the Big Picture: Segmenting Images to Create Data
Comparison of Methods
1
Used For Pros Cons
Predicting a continuous outcome (salary, price, number of votes, etc.)
• Simple, well recognized
• Works on small and large datasets
• Assumes a linear relationship
Lin
ear
Reg
ress
ion
Predicting a categorical outcome (Yes/No, Sell/Buy, Accept/Reject, etc.)
• Computes probabilities that can be used to assess confidence of the prediction
• Assumes a linear relationship
Log
isti
c R
egre
ssio
n
15.071x – Seeing the Big Picture: Segmenting Images to Create Data
Y = a log(X)+b
Comparison of Methods C
AR
T
Same as CART • Can improve accuracy over CART
• Many parameters to adjust
• Not as easy to explain as CART R
ando
m
For
ests
Used For Pros Cons
Predicting a categorical outcome (quality rating 1--5, Buy/Sell/Hold) or a continuous outcome (salary, price, etc.)
• Can handle datasets without a linear relationship
• Easy to explain and interpret
• May not work well with small datasets
15.071x – Seeing the Big Picture: Segmenting Images to Create Data 2
Comparison of Methods
3
Hie
rarc
hica
l C
lust
erin
g
Same as Hierarchical Clustering
• Works with any dataset size
• Need to select number of clusters before algorithm k-
mea
ns
Clu
ster
ing
Used For Pros Cons
• Finding similar groups
• Clustering into smaller groups and applying predictive methods on groups
• No need to select number of clusters a priori
• Visualize with a dendrogram
• Hard to use with large datasets
15.071x – Seeing the Big Picture: Segmenting Images to Create Data