The University of
Ontario
CS 4487/9587
Algorithms for Image Analysis
Basic Image Segmentation
The University of
Ontario
CS 4487/9587 Algorithms for Image Analysis Basic Image Segmentation
Segmentation examples• unsupervised: background subtraction, recognition• supervised: photo-shop, medical image analysis
Segmentation features and “naïve” methods• intensities, colors ← thresholding, likelihood ratio test • contrast edges ← region growing, watersheds
Clustering techniques and segmentation• parametric methods: K-means, GMM• non-parametric: mean-shift• RGB and RGB+XY spaces
Other readings: Sonka at.al. Ch. 5 Gonzalez and Woods, Ch. 10
Szeliski, Sec 5.3
Szeliski, Sec 5.2
The University of
Ontario
Goal:
find coherent “blobs” or specific “objects”lower level tasks
(e.g. “superpixels”)higher level tasks
(e.g. cars, humans, or organs)large grey area
in-between
accurate boundary delineation is often required
Segmentation
The University of
Ontario
Simplest way to define blob coherence is as similarity in brightness or color:
The tools become blobsThe house, grass, and sky make different blobs
Coherent “blobs”
The University of
OntarioWhy is this useful?
AIBO RoboSoccer(VelosoLab)
The University of
OntarioIdeal Segmentation
can recognize objectswith knownsimplecolormodels
???
The University of
Ontario
Result of a segmentation method(first learn how to get this, then how to get better results)
even if knownsimplecolor
The University of
OntarioBasic ideas
intensities, colors ← thresholding, likelihood ratio test
contrast edges ← region growing, watersheds
basic features basic (naïve) methods
The University of
OntarioBasic ideas
intensities, colors ← thresholding, likelihood ratio test
The University of
Ontario
(segmentation ← intensities/colors)Thresholding
Basic segmentation operation:mask(x,y) = 1 if im(x,y) > T mask(x,y) = 0 if im(x,y) < T
T is threshold• user-defined• or automatic
Same as histogram partitioning:
The University of
OntarioSometimes works well…Virtual colonoscopy,bronchoscopy, etc.
from real device to non-invasive virtual test
a) threshold CT volume -> binary mask
b) extract surface mash from binary mask(fast marching cubes method)
The University of
OntarioSometimes works well…
TThresholding could be derived as
statistical decision: likelihood ratio test
)()(
log:0
1
p
pp IP
IPr
0pr
0pr pixel p is objectpixel p is background
P1 and P2 are object and background
known color models
The University of
OntarioSometimes works well…
Thresholding could be derived as statistical decision: likelihood ratio test
)()(
log:0
1
p
pp IP
IPr
0pr
0pr pixel p is objectpixel p is background
P1 and P2 are object and background
known color models
),( 11 NP
),( 22 NP
T
Example: assume known probability distributions
μ1μ2
TIr 0
221
T
The University of
Ontario
- =I= Iobj - Ibkg
Sometimes works well…
background subtraction
T
),0(1 NP
UP 2
0 255
The University of
Ontario
- =I= Iobj - Ibkg
Threshold intensities below T problems when color models have overlapping support
Sometimes works well… ?
background subtraction
),0(1 NP
UP 2
T0 255
The University of
Ontario…but more often not
Adaptive thresholding
The University of
OntarioBasic ideas
contrast edges ← region growing, watersheds
The University of
Ontario
(segmentation ← contrast edges) Region growing
Method stops at contrast edges
• Start with initial set of pixels K (initial seed(s))• Add to pixels p in K their neighbors q if |Ip-Iq| < T • Repeat until nothing changes
contrast edges
The University of
Ontario
What can go wrongwith region growing ?
Region growth may “leak” through a single week spot
in the boundary
The University of
OntarioRegion growing
- =I= Iobj - Ibkg
| Ip|
Breadth First Search (seeds) :| Ip| < T
seeds
The University of
OntarioRegion growing
See region leaks into sky due to a weak boundary
between them
The University of
OntarioWatersheds
1. gradient magnitudes
image
2. find catchment basins
3. copy over original image
need tricks to build dambs closing gaps
The University of
Ontario
Due to Pedro Felzenszwalb and Daniel Huttenlocher
Motivating example
This image has three perceptually distinct regions
difference along border between A and B is less then differences within C
A CB Q: Where would image thresholding fail?
Q: Where would region growing fail?
A: Region A would be divided in two sets and region C will be split into a large number of arbitrary small subsets A: Either A and B are merged or region C is split into many small subsets Also, B and C are merged
The University of
OntarioTowards better segmentation
- Formulate segmentation quality function (objective or energy)
E(S) = C(S) + B(S) +…
- Optimize = find the best solution S
(soon)
Combine color and boundary edge information. How?
The University of
Ontario
First, more complex decisions in color space
How to move away from manual decision boundaries
(i.e. user-set threshold) in color space ? some standard solutions:
1. use known probability appearance models, if
available (leads to likelihood ratio tests, as shown earlier)
a. K-means, GMM (parametric)b. mean shift, medoid-shift (non-parametric)c. kernel-K-means, normalized cuts (non-parametric)
2. automatic clustering methodsyields complex decision boundaries
0)()(
log0
1 p
p
IPIP
The University of
Ontario
Decision boundaries in feature spaces
color quantizationsuperpixels
should break colors (in RGB or LUV space) into multiple clusters
The University of
Ontario
General Grouping or Clustering(a.k.a. unsupervised learning)
• Have data points (samples, also called feature vectors, examples, etc. ) x1,…,xn
• Cluster similar points into groups
• points are not pre-labeled• think of clustering as ‘discovering’ labels
horror movies
documentaries
sci-fi movies
slides from Olga Veksler
The University of
Ontario
How does this Relate to Image Segmentation?
• Represent image pixels as feature vectors x1,…,xn
• For example, each pixel can be represented as• intensity, gives one dimensional feature vectors • color, gives three-dimensional feature vectors• color + coordinates, gives five-dimensional feature vectors
• Cluster them into k clusters, i.e. k segments
84
2 5
5 8
32
7
92
4 7
1 3
88
6
95
4 2
3 9
14
4
input image feature vectors for clustering based on color
[9 4 2] [7 3 1] [8 6 8]
[8 2 4] [5 8 5] [3 7 2]
[9 4 5] [2 9 3] [1 4 4]
RGB (or LUV) space clustering
The University of
Ontario
How does this Relate to Image Segmentation?
84
2 5
5 8
32
7
92
4 7
1 3
88
6
95
4 2
3 9
14
4
input imagefeature vectors for clustering based on
color and image coordinates
[9 4 2 0 0] [7 3 1 0 1] [8 6 8 0 2]
[8 2 4 1 0] [5 8 5 1 1] [3 7 2 1 2]
[9 4 5 2 0] [2 9 3 2 1] [1 4 4 2 2]
RGBXY (or LUVXY) space clustering
• Represent image pixels as feature vectors x1,…,xn
• For example, each pixel can be represented as• intensity, gives one dimensional feature vectors • color, gives three-dimensional feature vectors• color + coordinates, gives five-dimensional feature vectors
• Cluster them into k clusters, i.e. k segments
The University of
Ontario
K-means Clustering: Objective Function• Probably the most popular clustering algorithm
• assumes the number of clusters is given - k• optimizes (approximately) the following objective function for variables Di and µi
k
i Dxik
i
xSSEE1
2
D1 D2
D3
3
1
2
SSE + +
sum of squared errors from cluster center µi
The University of
Ontario
K-means Clustering: Objective Function
D1 D2
D3
3
1
2
SSE + +
D1
D2
D3
3
1
2
Good (tight) clusteringsmaller value of SSE
Bad (loose) clusteringlarger value of SSE
SEE + +
The University of
Ontario
K-means Clustering: Algorithm• Initialization step
1. pick k cluster centers randomly
The University of
Ontario
K-means Clustering: Algorithm• Initialization step
1. pick k cluster centers randomly
The University of
Ontario
K-means Clustering: Algorithm• Initialization step
1. pick k cluster centers randomly2. assign each sample to closest center
The University of
Ontario
K-means Clustering: Algorithm• Initialization step
1. pick k cluster centers randomly2. assign each sample to closest center
• Iteration steps1. compute means in each cluster
i
iDx
Di x||1
The University of
Ontario
K-means Clustering: Algorithm• Initialization step
1. pick k cluster centers randomly2. assign each sample to closest center
• Iteration steps1. compute means in each cluster2. re-assign each sample to the closest mean
i
iDx
Di x||1
The University of
Ontario
K-means Clustering: Algorithm• Initialization step
1. pick k cluster centers randomly2. assign each sample to closest center
• Iteration steps1. compute means in each cluster2. re-assign each sample to the closest mean
• Iterate until clusters stop changing
i
iDx
Di x||1
The University of
Ontario
K-means Clustering: Algorithm• Initialization step
1. pick k cluster centers randomly2. assign each sample to closest center
• Iteration steps1. compute means in each cluster2. re-assign each sample to the closest mean
• Iterate until clusters stop changing
i
iDx
Di x||1
The University of
Ontario
K-means Clustering: Algorithm• Initialization step
1. pick k cluster centers randomly2. assign each sample to closest center
• Iteration steps1. compute means in each cluster2. re-assign each sample to the closest mean
• Iterate until clusters stop changing
• This procedure decreases the value of the objective function
k
i Dxik
i
xDE1
2),( ),...,( 1 kDDD ),...,( 1 k
optimization variables
block-coordinate descent: step 1 optimizes µ, step 2 optimizes D
i
iDx
Di x||1
The University of
OntarioK-means: Approximate Optimization• K-means is fast and often works well in practice• But can get stuck in a local minimum of objective Ek
• not surprising, since the problem is NP-hard
global minimumconverged to local min
initialization
The University of
Ontario
μ1μ2
In this case K-means (K=2) automatically finds good threshold (between 2 clusters)
T
K-means clustering examples:Segmentation
K-means finds compact clusters
The University of
Ontario
k = 3
k = 10k = 5
K-means clustering examples:Segmentation?
(random colors are used to better show segments/clusters)
The University of
Ontario
K-means clustering examples:Color Quantization
NOTEbias toequal-sizeclusters
The University of
Ontario
K-means clustering examples:Adding XY features
RGB features
color quantization
RGBXY features
superpixels
XY features only
Voronoi cells
related to HW 1
The University of
Ontario
Apply K-means to RGBXY features
K-means clustering examples:Superpixels
[SLIC superpixels, Achanta et al., PAMI 2011]
The University of
Ontario
slide from Naotoshi Seo
…..
),...,( 1 Npp ffp
K-meansfor points in RN
instead of R3
responses to N filters (convolution)
Q: what is µi ?
K-means clustering examples:Texture Analysis
texton weighted
combination of filters
[Zhu et al. ECCV’02]
),...,( 1iN
ii
coordinates of patch Pp at p w.r.t. basis vectors
F (if filters F are orthonormal)
i
iipp FfFP
F - bank of N filters for texture, e.g. Gabor filters
The University of
Ontario
K-means Properties
• Works best when clusters are spherical (blob like)
• Fails for elongated clusters • SSE is not an appropriate objective function in this case
• Sensitive to outliers
The University of
Ontario
K-means as parametric clustering
maximum likelihood (ML) fitting of parameters μi (means) of Gaussian distributions
k
i Dxik
i
xE1
2
k
i Dxik
i
constxPE1
)|(log~
2
2
21
2||||exp)|(
i
ixxPGaussian distribution
equivalent (easy to check)
The University of
Ontario
K-means as non-parametric clustering
k
i Dyx ik
iDyxE
1 ,
2
||2||||
equivalent (easy to check)
k
i Dxik
i
xE1
2
i
ii
iDyx
DDx
iDi yxxD,
2||2
12||
1 ||||||||)var( 2sample variance:
just plug-inexpression
i
iDy
Di y||1
1
Di
Di
The University of
Ontario
K-means as variance clustering criteria
i
ii
iDyx
DDx
iDi yxxD,
2||2
12||
1 ||||||||)var( 2sample variance:
)var(||1
i
k
iik DDE
1
Di
Di
both formulas can be written as
The University of
Ontario
• Disadvantages• Only a local minimum is found (sensitive to initialization)• May fail for non-blob like clusters • Sensitive to outliers• Sensitive to choice of k
K-means Summary
• Advantages• Principled (objective function) approach to clustering• Simple to implement (the approximate iterative optimization)• Fast
K-means fits Gaussian modelsQuadratic errors are such
Can add sparsity term and make k an additional variable
||1
2 kxEk
i Dxi
i
Akaike Information Criterion (AIC) orBayesian Information Criterion (BIC)
The University of
Ontario
K-means – common extensions
k
i Dxdi
i
x1
• Parametric methods with arbitrary likelihoods P( ˑ|θ) (probabilistic K-means)
k
i Dxi
i
xP1
)|(log
Examples of P (ˑ|θ) : Gaussian, gamma, exponential, Gibbs, etc.
• Parametric methods with arbitrary distortion measure (distortion clustering)
d||||
k
i Dyx i
d
iD
yx
1 , ||
• Non-parametric methods: pair-wise clustering with arbitrary distortion measure (kernel K-means, normalized cuts, average association, average distortion)
d||||
Examples of : quadratic absolute truncated (K-means) (K-medians) (K-modes)
d||||
The University of
Ontario
K-means – common extensions
k
iiiiGMM xNxP
1
),|()|( GMM distribution:
• Soft clustering using Gaussian Mixture Model (GMM)
- no “hard” assignments of points to distinct (Gaussian) clusters Di
- all points are used to estimate parameters of one complex multi-modal distribution (GMM)
optionalmaterial
1
1
2
2
3
3
x
)|( xPGMM
three Gaussian modes (k=3)of the mixture PGMM
simple1D example:
GMMs estimate “true” data distributions(continuous density analog of histograms)
mixing coefficients
)1|,,( kiiiiGMM
means and variances ofGaussian modes
The University of
Ontario
K-means – common extensions
k
iiiiGMM xNxP
1
),|()|( GMM distribution:
• Soft clustering using Gaussian Mixture Model (GMM)
- no “hard” assignments of points to distinct (Gaussian) clusters Di
- all points are used to estimate parameters of one complex multi-modal distribution (GMM)
optionalmaterial
1
1
2
2
3
3
x
)|( xPGMM
three Gaussian modes (k=3)of the mixture PGMM
x
GMMGMM xPE )|(log)( approximate optimization
via EM algorithm
see SzeliskiSec. 5.3.1
or Christopher Bishop
“Pattern Recogn and Machine Learning”,
Ch.9
like K-means,sensitive to
local minima
maximum likelihood(ML) objective
The University of
Ontario
r
g
b
K-means – common extensions
k
iiiiGMM xNxP
1
),|()|( GMM distribution:
• Soft clustering using Gaussian Mixture Model (GMM)
- no “hard” assignments of points to distinct (Gaussian) clusters Di
- all points are used to estimate parameters of one complex multi-modal distribution (GMM)
x
GMMGMM xPE )|(log)(
optionalmaterial
maximum likelihood(ML) objective
The University of
Ontario
GMM(elliptic) K-meanscolor indicates locally strongest mode color indicates assigned cluster
Gaussian clusters/modes in: K-means vs. GMM
optionalmaterial
k=6 k=6
hard assignment to clusters- separates data points into multiple
Gaussian blobs
only estimates means μi
soft mode searching- estimates data distribution with
multiple Gaussian modes
estimates both mean μi and (co)variance Σi for each mode- Σi can also be added as a cluster
parameter (elliptic K-means)
The University of
Ontario
Gaussian clusters/modes in: K-means vs. GMM
optionalmaterial
hard clustering may not work wellwhen clusters overlap
k=4 k=4
(generally not a problem in segmentation, since objects do not “overlap” in RGBXY)
While this is an optimal GMM,it is hard to find for standard EMalgorithm due to local minima.
hard assignment to clusters- separates data points into multiple
Gaussian blobs
only estimates means μi
- Σi can also be added as a cluster parameter (elliptic K-means)
soft mode searching- estimates data distribution with
multiple Gaussian modes
estimates both mean μi and (co)variance Σi for each mode
The University of
Ontario
Gaussian clusters/modes in: K-means vs. GMM
soft mode searching- estimates data distribution with
multiple Gaussian modes
estimates both mean μi and (co)variance Σi for each mode
more expensive (EM algorithm, Szeliski Sec.5.3.1)
sensitive to local minima
does not scale to higher dimensions
hard assignment to clusters- separates data points into multiple
Gaussian blobs
only estimates means μi
- Σi can also be added as a cluster parameter (elliptic K-means)
computationally cheap(block-coordinate descent)
sensitive to local minima
scales to higher dimensions (kernel K-means)
optionalmaterial
The University of
Ontario
A simple non-parametric alternative: mean-shift clustering
Formulates clustering as histogram partitioning• Also looks for modes in data histograms
data points data histogram and its modes clustering
• Does not assume the number of clusters known
The University of
OntarioFinding Modes in a Histogram
How Many Modes Are There?• Easy to see, hard to compute
The University of
Ontario
Mean Shift[Fukunaga and Hostetler 1975, Cheng 1995, Comaniciu & Meer 2002]
1. Initialize random seed, and fixed window2. Calculate center of gravity ‘x’ of the window (the“mean”)3. Translate the search window to the mean4. Repeat Step 2 until convergence
Iter
ativ
e M
ode
Sear
ch
x
o
x x
mode
The University of
Ontario
Mean Shift[Fukunaga and Hostetler 1975, Cheng 1995, Comaniciu & Meer 2002]
The University of
Ontario
Mean-shift results for segmentation
RGB+XY clustering[Comaniciu & Meer 2002]
5D featuresadding XY helps
“coherence”in the image domain
The University of
Ontario
Mean-shift results for segmentation
RGB+XY clustering[Comaniciu & Meer 2002]
The University of
Ontario
RGB+XY clustering[Comaniciu & Meer 2002]
Mean-shift results for segmentation
works well for color quantization
The University of
OntarioIssues: Window size (kernel bandwidth) selection is critical
- can not be too small or too large - indirectly controls the number of clusters (k)- different width in RGB and XY parts of the space
Color may not be sufficient(e.g. color overlap between object and background)
Integrating detailed boundary cues • contrast edges• explicit shape priors (smoothness, curvature, convexity, atlas)
The University of
OntarioSome terminology
ClusteringPartitioning
Segmentation
Q: differences between these terms?One answer: not really. But, segmentation often implies
image features (pixels) and importance of geometric component (XY).
The University of
OntarioHW assignment 1 RGB and RGB+XY clustering
• relative weights w for XY part:• from color quantization to superpixels
Parametric methods (K-means)• fixed K (estimate k for extra credit)• sensitivity to initialization (use seeds to initialize)• weighted errors (different weight in RGB and XY)
Non-parameric methods (mean- or medoid-shift)• extra credit for undergrads, required for grads• sensitivity to bandwidth
],,,,[ , ppppp YwXwBGR