Clustering Crowdsourced Videos by Line-of-Sight FOCUS: Clustering Crowdsourced Videos by Line-of-Sight Puneet Jain, Justin Manweiler, Arup Acharya, and

FOCUS: Clustering Crowdsourced Videos by Line-of-Sight

Puneet Jain, Justin Manweiler, Arup Acharya, and Kirk Beaty

Clustered by shared

subject

CHALLENGES

CAN IMAGE PROCESSING SOLVE THIS PROBLEM?

5

Camera 2

Camera 4Camera 3

Camera 1

LOGICAL similarity does not imply VISUAL similarity

6

VISUAL similarity does not imply LOGICAL similarity

CAN SMARTPHONE SENSING SOLVE THIS PROBLEM?

Sensors are noisy, hard to distinguish subjects…

Why not triangulate?

GPS-COMPASS Line-of-Sight

INSIGHT

Don’t need to visually identify actual SUBJECT, can use background as PROXY

hard to identify

easy to identify

Simplifying Insight 1

same basic structure persists


Don’t need to directly match videos, can compare all to a predefined visual MODEL


Light-of-sight (triangulation) is almost enough, just not via sensing (alone)

FOCUSFast Optical Clustering of live User Streams

Sensing

Cloud

Vision

Hadoop/HDFSFailover, elasticity

Image processingComputer visionVideo Streams

(Android, iOS, etc.)

Clustered Videos

FOCUS Cloud Video Analytics

VideoExtraction

Watching Livehome: 2 away: 1

Users Select & Watch Organized Streams

Change Angle

ChangeFocus

Clustered Videos

FOCUS Cloud Video Analytics

VideoExtraction

Watching Livehome: 2 away: 1

Users Select & Watch Organized Streams

Change Angle

ChangeFocus

pre-defined reference “model”

Hadoop/HDFSFailover, elasticity

Image processingComputer vision

17Model construction technique based onPhoto Tourism: Exploring image collections in 3DSnavely et al., SIGGRAPH 2006

zmulti-view reconstructionzkeypoint

extraction

estimates camera POSE and content in field-of-view

Multi-view Stereo Reconstruction

Visualizing Camera Pose

19

~ 1 second at 90th%

~ 18 seconds at 90th%


extraction zframe-by-framevideo to model

alignmentzsensory inputs

• Given a pre-defined 3D, align incoming video frames to the model

• Also known as camera pose estimation

20


extraction zintegration of sensory inputs

Gyroscope, provides “diff” from vision initial position

0 1 2 3 4 t - 1 t - 2

Filesize ≈ 1/Blur Sampled FrameGyroscope

21

Field-of-view

Using POSE + model POINT CLOUD, FOCUS geometrically identifies the set of model points in background of view


extraction zpairwise model image analysis

1

3

2

Similarity between image 1 & 2 = 18

Similarity betweenimage 1 & 3 = 13

22

Finding the similarity across videos as size of point cloud set intersection


extraction zpairwise model image analysis

Clustering “similar” videos

Similarity Score1

33

22

1Application of Modularity Maximization

high modularity implies:• high correlation among the

members of a cluster • minor correlation with the

members of other clusters

RESULTS

25

Collegiate Football Stadium

• Stadium 33K seats56K maximum attendance

• Model: 190K points 412 images (2896 x 1944 resolution)

• Android Appon Samsung Galaxy Nexus, S3

• 325 videos captured 15-30 seconds each

26

Line-of-Sight Accuracy (visual)

27

Line-of-Sight Accuracy

GPS/Compass LOS estimation is <260 meters for the same percentage

In >80% of the cases, Line-of-sight estimation is off by < 40 meters

28

FOCUS Performance

75% true positives

Trigger GPS/Compass failover techniques

Natural Questions

• What if 3D model is not available?– Online model generation from first few uploads

• Stadiums look very different on a game day?– Rigid structures in the background persists

• Where it won’t work?– Natural or dynamic environment are hard

Conclusion

• Computer vision and image processing are often computation hungry, restricting real-time deployment

• Mobile Sensing is a powerful metadata, can often reduce computation burden

• Computer vision + Mobile Sensing + Geometry, along with right set of BigData tools, can enable many real-time applications

• FOCUS, displays one such fusion, a ripe area for further research

Thank You

http://cs.duke.edu/~puneet

Documents

Clustering Crowdsourced Videos by Line-of-Sight FOCUS: Clustering Crowdsourced Videos by Line-of-Sight Puneet Jain, Justin Manweiler, Arup Acharya, and