2016-1-12 E-V: Efficient Visual Surveillance with Electronic Footprints Jin Teng, Junda Zhu, Boying Zhang, Dong Xuan and Yuan F. Zheng IEEE Infocom 2012

23/4/21

E-V: Efficient Visual Surveillancewith Electronic Footprints

Jin Teng, Junda Zhu, Boying Zhang, Dong Xuan and Yuan F. Zheng

IEEE Infocom 2012

Outline

Deficiency of Visual Surveillance Systems A Brief of Our E-V System A Case Study A Broader View of Our E-V System Final Remarks

23/4/21 2

Visual Surveillance

23/4/21 3

Failure Examples

Chicago police installed 10,000 surveillance cameras in the city, only 1 of 200 crimes is captured by the visual surveillance [2]!

One of the bombers in London bombing (July, 2005) is not identified by the surveillance system and escaped [3]!

23/4/21 4

Why fail?

Large volume of video dataTemporal: 2.07*106 frames per camera per daySpatial: tons of surveillance cameras in a city

e.g. New York has 4176 video cameras in lower Manhattan area[1].

Monitored objects may be visually occluded or have multiple inconsistent appearance

23/4/21 5

† Big Apple is Watching You:

http://www.slate.com/articles/news_and_politics/explainer/2010/05/big_apple_is_watching_you.html

Visual technologies are not efficient and accurate enough to do automatic localization and tracking, and a lot of human power is needed!

Outline


23/4/21 6

Our Methodology: E-V Integration

Combining electronic and visual signals for efficient surveillance

E-V Integration makes it possible to efficiently and accurately localize and identify objects in a large volume of video data

23/4/21 7

Indexing & Sorting Localization Accuracy

E-Signal Easy Low

V-Signal Hard High

Electronic Signals

Name Distance Frequency Data Rate (down)

GSM 35 km850, 900,

1800, 1900 MHz80 kb/s (GPRS), 236 kb/s (EDGE)

LTE 30 km–100 km 700 MHz–2.6 GHz >100 Mb/s

WiFi 100 m2.4 GHz (802.11b/g),

5 GHz (802.11a)54 Mb/s

2.4 GHz, 5 GHz 450 Mb/s

Bluetooth 10 m2.4 GHz,

Frequency Hopping2.1 Mb/s

(up to 24 Mb/s)

NFC < 4 cm 13.56 MHz 106 kb/s–424 kb/s

8

Wireless channels: Wireless address, such as WiFi MAC address Content etc.

Electronic signals are emitted by many mobile devices Mobile device’s popularity is increasing

Smartphone as an example: 302.6 million shipped in 2010

Pervasiveness of Electronic Signals

9

Our E-V System: A Bird’s Eye View

10

Our E-V System: Layers

23/4/21

Surveillance HealthTraining

LocalizationOther

TechnologiesIdentification

Electronic Visual Other Signals

Specific Applications

Technologies

Sensing Methods

11

Related Work on E-V Integration

Fuse multiple sensors for tracking [4] Visual camera + RFID for monitoring [5] Existing work cannot achieve accuracy and efficiency

for visual surveillance at the same time!

23/4/21 12

Outline


23/4/21 13

A Typical Surveillance Scenario

Find a specific person given some vague visual information, i.e., retrieve his appearance in videos of a long period of time

If we depend on videos alone, we may need Extract all human figures in each frame, which may come in the number of

thousands, and compare them with a designated vague picture. Involve a large amount of human efforts to stare at the videos, which may

last several hours or even days, from a number of cameras.

With E-V integration, how can we do?

23/4/21 14

Problem Formulation: Notations V-sensing: V-ID and V Frame

V-ID: Visual identity, such as human figure VID*: Our target V-ID V Frame: a set of V-IDs with some background captured by visual sensors (cameras) in certain area and time

E-sensing: E-ID and E Frame E-ID: Electronic identity such as MAC address etc. EID*: Our target E-ID E Frame: a set of E-IDs captured by electronic sensors in certain area and time

Vagueness and completeness Vagueness: reflect how clearly a V-ID/E-ID can be identified Completeness: reflect if V-IDs/E-IDs are complete in a V/E frame

15

Problem Formulation: Cases

23/4/21 16

Input Target Input Frames

EID* VID* EIDs VIDs

Vagueness Clear

Vague Completeness Complete

Incomplete

Baseline case ( ): Input: clear EID*, (and vague VID*), and a set of E frames with clear

and complete EIDs and V frames with vague and complete VIDs Output: VID* in video frames (VID* may be different from given

vague VID*)

√

√ √

√

General case: Input: EID* (and VID*), and a set of E frames and corresponding V

frames Output: VID* in video frames

√

√

√

A Naïve Solution to the Baseline Case

Two steps: Step 1: Find out all E frames which include EID* (example) Step 2: Identify VID* in their corresponding V frames

Comments: Few V frames to process because V frames without VID* are filtered out, but there may be still many V frames

17Suppose we have three E/V frames. We go through them one by one.

E frame 2

EID* EID2

EID3

E frame 3

EID* EID2

E frame 1

EID* EID1

E-Filtering Find the minimum number of E Frames, whose intersection

is the given E-ID, i.e. EID* Much less frames for further V side processing We will formulate it into the Element Distinguishing

Problem (EDP)

V-Retrieval Retrieve the V-ID from the filtered frames through

intersection to determine VID* We will formulate it into the n-partite Best Matching

Problem (nBM).

Our Solution

18

EID*

E-filtering Overview

19

E frame 1

EID* EID1

E frame 3

EID* EID2

E frame 2

EID* EID3

EID2Two E Frames are enough to identify EID* through intersection.

E frame 1

EID1

E frame 2

EID3EID2

Nature of E-Filtering

20

Finding the minimum number of frames, whose intersection is EID*

NP-complete: equivalent to the set cover problem Whether each E-ID appears in each E frame is summarized

in a matrix, with 1 meaning ‘appear’ and 0 ‘not appear’. At least one 0 in each non-EID* column Use these 0s to ‘cover’ all non-EID* column

EID* EID1 EID2 EID3

e1 1 1 0 0

e2 1 0 1 1

e3 1 0 1 0

At least one 0 in each non-EID* column

Solution: EDP Algorithm Element Distinguishing Problem (EDP)

The element to be distinguished is EID*

Greedily select E Frames in which the most number of E-IDs can be told apart from EID* In the example, the greedy algorithm will select e1 or e3

first, because we can tell two E-IDs are not EID* Repeat the greedy selection until EID* is distinguishable

EID* EID1 EID2 EID3

e1 1 1 0 0

e2 1 0 1 1

e3 1 0 1 0

EDP(cont’d) Approximation results can be achieved with the greedy

heuristic algorithm for the set cover problem

22

V-Retrieval General Problem

Find the corresponding VID* from the frames selected by E-Filtering

VID* is the only one that should appear in all the frames after E filtering. So an intersection operation can give VID*.

Largest Challenge Indistinct V-IDs: do not know for sure which person is

which in different frames

Solution nBM algorithm: find the VID with the largest probability of

appearing in all V frames.

23

The nBM Algorithm n-partite Best Match Problem (nBM)

Find the VID* that matches the visual appearance of EID* best

Put all VIDs in different frames in n different circles

n-partite graph (right)

Find whether an VID appears in each V frame based on similarity scores

Using Maximum Likelihood Criterion to choose the VID whose appearance/ disappearance agrees with EID* best.

1VID

1v

2v

1VID

2VID 4VID

3VID 5VID

3v

1VID

6VID

7VID 9VID

8VID

24

Similar?

1VID

Dummy VID to indicate that VID1 is not similar to any VIDs in this frame

Practical Considerations In the baseline case, we assumed that the information

of E-IDs and V-IDs is complete. However, in realistic cases, we may have

Ghost V-ID or missing V-ID Missing E-ID

25

√

√

√ The baseline case we have studied □ practical case of our focus solved

Input Target Input Frames

EID* VID* EIDs VIDs

Vagueness Clear □ □

Vague □ □Completeness Complete

Incomplete □ □

√

√ √

√

√

√

Solutions to Practical Problems Careful Deployment

Make sure that the coverage of the camera and the wireless detectors are roughly the same

nBM is probability based, so it is naturally resistant to noises Select appropriate threshold in nBM for better tradeoff between noise

resistance and performance

Generalized EDP Handle missing/ghost E-ID Introduction of fuzzy logic to improve the robustness of EDP Use RSSI for estimation and smoothing

EID* EID1 EID2 EID3 EID4

e1' 0.98 0.95 0.1 0.01 0.06

e2' 0.9 0.01 1 0.94 0.04

e3' 0.88 0.99 0.03 0.1 0.12

e4' 0.99 0.02 0.89 0.27 0.23

EIDi 10

EIDi 1010

10

smoothing

smoothing

Time

EIDi

EIDi

26

A Quick Recap of Our Solutions

27

ID Complete ID Incomplete

E-Filtering on EIDs

EDP GEDP

V-Retrieval on VIDs

nBM nBM+Deployment

Implementation

Real world implementationOne camera viewing from above to collect V frames1-3 laptops around sniffing the WiFi traffic to

collect E frames Tested on campus

GymnasiumLibrary

28

Experimental Evaluations Real world experiments

Successfully find the VID* Minimum frames needed for Scenario 1 is 3, and we achieve 3 Minimum frames needed for Scenario 2 is 3, and we achieve 4

Scenario 1:Gymnasium6 people28 frames

Scenario 2:Library8 people40 frames

29

Large Scale Simulation-based Evaluations Evaluation settings

Networks of cameras and wireless detectors at three locations

~120 people moving randomly Much less video frames to process (left)

High Accuracy (right)30

E-V Surveillance: Problem Space

31

TrackingOnsite Offline

Cooperative

Uncooperative

Final Remarks Existing visual surveillance system is not efficient Our E-V system

Integrates the E signals and V signals for efficient visual surveillance

Implemented in real world

Many open issues left, still a long way to go

32

References

[1] Big Apple is Watching You: http://www.slate.com/articles/news_and_politics/explainer/2010/05/big_apple_is_watching_you.html

[2] http://articles.chicagotribune.com/2010-05-06/news/ct-oped-0506-chapman-20100506_1_surveillance- cameras-vandalism-effect-on-violent-crime

[3] http://news.bbc.co.uk/2/hi/4659093.stm

[4] D. Smith, et.al, “Approaches to Multisensor Data Fusion in TargetTracking: A Survey”, Knowledge and Data Engineering, IEEE Transactionson, 2006.

[5] S. Cho, et.al, “Association and Identification in HeterogeneousSensors Environment with Coverage Uncertainty”, IEEE AdvancedVideo and Signal Based Surveillance, 2009.

23/4/21 33

Backup Slides

A Case Study

A typical surveillance scenario Problem formation in E-V integration Our solution Implementation and Evaluations

23/4/21 35

GEDP Algorithm Clearly NP-hard

We can reduce EDP to GEDP

Heuristic algorithm based on the subset sum approximation algorithm

36

The nBM Algorithm n-partite Best Match Problem (nBM)

Find the VID* that matches the visual appearance of EID* best

Put all VIDs in different frames in n different circles

n-partite graph (right)

Similarity matrix for all V-IDs which have appeared

1VID

1v

2v

1VID

2VID 4VID

3VID 5VID

3v

1VID

6VID

7VID 9VID

8VID

37

nBM (cont’d) Maximum Likelihood matching

Given the observed VID1 … VIDm Which VID is the best candidate

Calculate the probability of all VIDi across all V frames Select the VID with the largest probability

1VID

1v

2v

1VID

2VID 4VID

3VID 5VID

VID1 is not in v2

VID1 is in v2, and appears as VID2

38

Documents

2016-1-12 E-V: Efficient Visual Surveillance with Electronic Footprints Jin Teng, Junda Zhu, Boying Zhang, Dong Xuan and Yuan F. Zheng IEEE Infocom 2012