Upload
fleta
View
53
Download
0
Embed Size (px)
DESCRIPTION
MURI Update 7/26/12. UC Berkeley Prof. Trevor Darrell (UCB) Prof. Michael Jordan (UCB) Prof. Brian Kulis ( fmr UCB postdoc, now Asst. Prof. OSU) Dr. Stefanie Jegelka (UCB postdoc ). Recent Effort. NPB inference of visual structures - Objects - Trajectories - PowerPoint PPT Presentation
Citation preview
MURI Update 7/26/12UC Berkeley
Prof. Trevor Darrell (UCB)Prof. Michael Jordan (UCB)
Prof. Brian Kulis (fmr UCB postdoc, now Asst. Prof. OSU)
Dr. Stefanie Jegelka (UCB postdoc)
Recent Effort
• NPB inference of visual structures- Objects- Trajectories
• Develop richer representations (appearance, shape)
• Efficient implementation in constrained environments
• Distributed and decentralized variants
NPB Trajectory Models• Extend NPB models to
trajectory domains:– find structure in motion trajectories– identify anomalies– Considering Bluegrass data and
possibly ARGUS track data via LLNL collaboration
• HDP and hard clustering variants
• Decentralized / distributed implementation
topics topics
New Algorithms for Hard Clustering via Bayesian Nonparametrics
Mixture ofGaussians
??k-means
Fix Covariances,
Take Limit
DP Mixture
Make Bayesian, Take Limit
Generalized to Hard HDP as well; [Kulis and Jordan ICML2012]
• Kulis Presentation
Small Variance Asymptotics, Bayesian Nonparametrics,
and k-meansBrian Kulis
Ohio State University
Joint work with Michael Jordan (Berkeley), Ke Jiang (OSU), and Tamara Broderick
(Berkeley)
Generalizing k-means
Mixture ofGaussians
????k-means
Fix Covariances,
Take Limit
DP Mixture
Make Bayesian, Take Limit
[Kulis and Jordan, ICML 2012]
Why Should We Care?•For “k-means people”•Bayesian techniques permit
great flexibility•Extensions to multiple data sets
would not be obvious without this connection
•For Bayesians•Hard clustering methods scale
better•Connections to graph cuts and
spectral methods•k-means just works
Gaussian Mixture Models
•No closed-form for optimizing likelihood
• Typically resort to EM algorithm
EM AlgorithmE-Step
M-Step
k-means
Mixture ofGaussians
k-means
Fix Covariances,
Take Limit
In the limit as sigma goes to 0, this value is 1 if centroid c is the
closest, and 0 otherwise
Linear-Gaussian Latent Feature Models
For mixture of Gaussians:
Latent Variables:
Data:
Likelihood:
Log-Likelihood Asymptotics
Other Linear-Gaussian Models
(Essentially) probabilistic PCA
Let Z have a continuous distribution
As sigma goes to 0, get standard PCA
The Polya Urn
•Imagine an urn with theta balls for each of k colors
•Pick a ball from the urn, replace the ball and add another ball of that color to the urn
•Repeat n times•Induces a distribution over Z
The Dirichlet-Multinomial Distribution
(If theta an integer, gamma functions are factorials.)
Small-Variance Asymptotics
Nothing interesting happens unless:
In this case, we obtain:
The Chinese Restaurant Process
•Customers sequentially enter the restaurant
•First customer sits at first table•Subsequent customers sit at
occupied table with probability proportional to number of occupants
•Start a new table with probability propotional to theta
...1 2
3 45 6
The Chinese Restaurant Process
The exchangeable partition probability
function (EPPF):
Select theta as before, yields
asymptotically:
Algorithmic Perspective: Gibbs Sampling
•Suppose we want to sample from p(x), where
•Repeatedly sample as follows
Collapsed Gibbs Sampling for CRP/DP
Mixtures
•Want to sample from the posterior:
•Need the following to do Gibbs sampling:
Asymptotics of the Gibbs Sampler
•Now, would like to see what happens when sigma goes to 0
•Need 1 additional thing• must be a function of
sigma:????
DP Mixture
Asymptotics of the Gibbs Sampler
Existing Clusters Start a New Cluster
In the limit:
DP-means
Underlying Objective Function
Theorem: The DP-means algorithm monotonically minimizes this objective
until local convergence.
3 Gaussians
Clustering with Multiple Data Sets
Want to cluster each data set, but also want to share cluster structure
Use the Hierarchical Dirichlet Process (HDP)!
The Hard Gaussian HDP: Objective
Theorem: The Hard HDP algorithm monotonically minimizes this objective
until local convergence.
Exponential Families
•What happens when we replace the Gaussian likelihood with an arbitrary exponential family distribution?
•Do asymptotics where the mean is fixed but the covariance goes to 0:
Exponential Families•Use the standard EF conjugate prior
•Utilize Bregman divergences:
•Asymptotics of Gibbs require Laplace’s method on the marginal likelihood
•End up with same result as before, with Bregman divergence replacing sq. Euclidean
Illustration: Hard Topic Models
Overlapping Clusters / Binary Feature Models
So far, have considered single
assignment models
What if each point can be assigned to multiple
clusters?
Overlapping ClustersSingle Assignment Multi Assignment
Non-Bayesian
BayesianBayesian
Nonparametric
Can perform analogous small-variance asymptotics using the above multi-assignment distributions
Indian Buffet Process•First customer samples
dishes•Subsequent customer i samples
existing dishes with probability equal to fraction of previous customers sampling that dish
•Also samples new dishes
Indian Buffet Process Small-Variance Asymptotics
Number of points possessing feature c
Number of new dishes sampled by
customer i
Indian Buffet Process Small-Variance Asymptotics
(Reduces to DP-means obj. for
single assignments)
Spectral Clustering Connections
•Standard spectral relaxation•Write as
•Relax Y to be an arbitrary orthogonal matrix
•Take matrix of top k eigenvectors of K
Spectral Relaxation for DP-means
•How do we extend this?•Write as
•Relax Y to be an arbitrary orthogonal matrix
•Take matrix of eigenvectors of K whose eigenvalues are greater than lambda
Graph Clustering•Can further take advantage of
connections between k-means and graph clustering
•Generalize hard DP objective to weighted and kernel forms•Special cases lead to graph
clustering objectives that penalize the number of clusters in the graph
Conclusions and Open Questions•Focus was on obtaining non-
probabilistic models from probabilistic ones
•Small-variance asymptotics for Bayesian nonparametrics yields models which regularize / penalize•Number of clusters•Number of features•Number of topics
•Also yields algorithmic insights
Conclusions and Open Questions•Spectral or semidefinite relaxation
for the hard HDP?•Better algorithms?•Local Search•Multilevel Methods•Split/Merge•Incorporate ideas from other
inference schemes•Applications?•Other models?
Thanks!
2) Decentralized models• Distributed Hard Topic Models• Jegelka presentation
Decentralized k-means algorithms
• Data distributed• Clusters shared globally
– Cluster assignment– Update centers
CommunicationWhere store centers?
Decentralized k-means
• Local copies, gossip [Datta et al,…]– Local assignment & update– Sharing & averaging with neighbor(s)
Decentralized k-means
• Local copies, gossip [Datta et al,…]• Single copy of each mean
– Summaries & pruning for restricted comparisons [Papatreou et al.]
New questions
• Decentralized cluster creation?– Sequential vs. parallel
• Naturally hierarchical• Exploit structure to reduce communication
– Partial sharing, e.g. locality– Integration into model?
• Generalization to IBP, PYP, …
Decentralized clustering• Observe trajectories in
a scene• Cluster locations by
local traffic behavior: HDP
anomalies, traffic prediction, …
• Clusters are not omnipresent:partial sharing
3) Trajectory Data• Scalable Topic Models for Trajectories• Apply current results on scalable topic modeling to
problems in vision– find structure in motion trajectories– identify anomalies– Considering Bluegrass data and possibly ARGUS track
data via LLNL collaboration
Trajectory Video
50
Experimental Results
Road NetworkTrack Fragment Vocabulary (Colored by Speed)
51
Experimental Results
Road networkQuery trajectory
Query trajectory with vocabulary
below
Highlights the vocabulary assigned to the query trajectory
52
Querying Results
Note this track “word”
Northbound Trajectory stops at intersection, turns left, continues East
53
Querying Results
Eastbound Trajectory
54
Querying Results
Eastbound Trajectory, Stops at Traffic Light
55
Querying Results
Westbound Trajectory, Stops at Traffic Light
Right-Hand Turn
K-Nearest NeighborsTrajectories on Map
QUERY
Trajectory segments are 10 seconds in length, sampled every 2.5 seconds (75% overlap).
1
2
3
4
5
10
20
U-Turn
K-Nearest NeighborsTrajectories on Map
QUERY
Trajectory segments are 10 seconds in length, sampled every 2.5 seconds (75% overlap).
1
2
3
4
5
10
20
Coda: Timely Detection
[Karayev and Darrell, in review]