Upload
joselyn-ellerd
View
230
Download
0
Embed Size (px)
Citation preview
FOCUS: Clustering Crowdsourced Videos by Line-of-Sight
Puneet Jain, Justin Manweiler, Arup Acharya, and Kirk Beaty
Clustered by shared
subject
CHALLENGES
CAN IMAGE PROCESSING SOLVE THIS PROBLEM?
5
Camera 2
Camera 4Camera 3
Camera 1
LOGICAL similarity does not imply VISUAL similarity
6
VISUAL similarity does not imply LOGICAL similarity
CAN SMARTPHONE SENSING SOLVE THIS PROBLEM?
Sensors are noisy, hard to distinguish subjects…
Why not triangulate?
GPS-COMPASS Line-of-Sight
INSIGHT
Don’t need to visually identify actual SUBJECT, can use background as PROXY
hard to identify
easy to identify
Simplifying Insight 1
same basic structure persists
Simplifying Insight 2
Don’t need to directly match videos, can compare all to a predefined visual MODEL
Simplifying Insight 3
Light-of-sight (triangulation) is almost enough, just not via sensing (alone)
FOCUSFast Optical Clustering of live User Streams
Sensing
Cloud
Vision
Hadoop/HDFSFailover, elasticity
Image processingComputer visionVideo Streams
(Android, iOS, etc.)
Clustered Videos
FOCUS Cloud Video Analytics
VideoExtraction
Watching Livehome: 2 away: 1
Users Select & Watch Organized Streams
Change Angle
ChangeFocus
Clustered Videos
FOCUS Cloud Video Analytics
VideoExtraction
Watching Livehome: 2 away: 1
Users Select & Watch Organized Streams
Change Angle
ChangeFocus
pre-defined reference “model”
Hadoop/HDFSFailover, elasticity
Image processingComputer vision
17Model construction technique based onPhoto Tourism: Exploring image collections in 3DSnavely et al., SIGGRAPH 2006
zmulti-view reconstructionzkeypoint
extraction
estimates camera POSE and content in field-of-view
Multi-view Stereo Reconstruction
Visualizing Camera Pose
19
~ 1 second at 90th%
~ 18 seconds at 90th%
zmulti-view reconstructionzkeypoint
extraction zframe-by-framevideo to model
alignmentzsensory inputs
• Given a pre-defined 3D, align incoming video frames to the model
• Also known as camera pose estimation
20
zmulti-view reconstructionzkeypoint
extraction zintegration of sensory inputs
Gyroscope, provides “diff” from vision initial position
0 1 2 3 4 t - 1 t - 2
Filesize ≈ 1/Blur Sampled FrameGyroscope
21
Field-of-view
Using POSE + model POINT CLOUD, FOCUS geometrically identifies the set of model points in background of view
zmulti-view reconstructionzkeypoint
extraction zpairwise model image analysis
1
3
2
Similarity between image 1 & 2 = 18
Similarity betweenimage 1 & 3 = 13
22
Finding the similarity across videos as size of point cloud set intersection
zmulti-view reconstructionzkeypoint
extraction zpairwise model image analysis
Clustering “similar” videos
Similarity Score1
33
22
1Application of Modularity Maximization
high modularity implies:• high correlation among the
members of a cluster • minor correlation with the
members of other clusters
RESULTS
25
Collegiate Football Stadium
• Stadium 33K seats56K maximum attendance
• Model: 190K points 412 images (2896 x 1944 resolution)
• Android Appon Samsung Galaxy Nexus, S3
• 325 videos captured 15-30 seconds each
26
Line-of-Sight Accuracy (visual)
27
Line-of-Sight Accuracy
GPS/Compass LOS estimation is <260 meters for the same percentage
In >80% of the cases, Line-of-sight estimation is off by < 40 meters
28
FOCUS Performance
75% true positives
Trigger GPS/Compass failover techniques
Natural Questions
• What if 3D model is not available?– Online model generation from first few uploads
• Stadiums look very different on a game day?– Rigid structures in the background persists
• Where it won’t work?– Natural or dynamic environment are hard
Conclusion
• Computer vision and image processing are often computation hungry, restricting real-time deployment
• Mobile Sensing is a powerful metadata, can often reduce computation burden
• Computer vision + Mobile Sensing + Geometry, along with right set of BigData tools, can enable many real-time applications
• FOCUS, displays one such fusion, a ripe area for further research
Thank You
http://cs.duke.edu/~puneet