39
Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh University of Southern California Oytun Ulutan University of California, Santa Barbara B.S. Manjunath University of California, Santa Barbara Kevin Chan ARL Ramesh Govindan University of Southern California

Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Caesar: Cross-Camera Complex Activity Recognition

Xiaochen Liu University of Southern CaliforniaPradipta Ghosh University of Southern CaliforniaOytun Ulutan University of California, Santa BarbaraB.S. Manjunath University of California, Santa BarbaraKevin Chan ARLRamesh Govindan University of Southern California

Page 2: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Cameras are the Most Powerful and Ubiquitous Sensors Today

2

Page 3: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Cameras Can be Used to Detect Various Activities

3

Public safety

Surveillance that captures the Las Vegas mass shooter in the hotel

Smart shoppingTraffic monitoring

Page 4: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Today’s Activity Analytics is Highly Manual

4

Page 5: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Ideal: Automated Activity Analytics

5

Monitoring System

Detected frames and objects

User-defined activity

“A person gets into a car then walk with bag”

Camera Cluster

Page 6: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Caesar for Complex Activity Detection

6

User-defined activity

“A person gets into a car then walks with bag”

No prior work on complex activity detection

But there is work on atomic activity detection

Caesar fills the gap

- Gets into a car- Walk - With bag

Page 7: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Atomic Activity & Complex Activity

7

- Use Phone (Camera 1)- Take Bag (Camera 1)- Give Bag (Camera 2)

Atomic Activities

Camera 1 Camera 2

Complex Activity“A person uses phone then takes bag and gives bag”

Page 8: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Prior Work on Atomic Activity Detection

8

BoundingBoxes

Object Detector

Object Detection: Detect people/objects as bounding boxes

Prior Work: YOLO (CVPR’17), SSD (ECCV’16), ResNet (CVPR’16)

Page 9: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

9

Tubes

Tracker

6

BoundingBoxes

Object Detector

Tracking: Track each bounding box’s movement as a tube

Prior Work: DeepSORT (WACV’18), RPP (ECCV’18), DG-Net (CVPR’19)

Prior Work on Atomic Activity Detection

Page 10: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

10

Action Detector

Tubes Results

Tracker

6

BoundingBoxes

Object Detector

Action DetectionDetect atomic actions in a tube using DNN or rules

Prior Work: ACAM (WACV’20), RPNN (ICCV’19), ODAS (ECCV’18)

Prior Work on Atomic Activity Detection

Page 11: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Tube clips with labels

DNN-Based Action Detection

11

Result:Use Phone

Training

Use Phone

Talking

Reading

… …

Input

Images from AVA dataset

Page 12: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Pros and Cons of DNN for Action Detection

12

Can detect various atomic activities with high accuracy

No training data for complex activities, which have multi-tubes and long duration

Even with data, training a separate DNN for each complex activity is an overkill and not generalizable

Page 13: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Rule-Based Activity Detection

13

Person 2 Tube

Results:Two person getting close

Simple Rule: The pixel distance between two tubes keeps decreasing

Person 1 Tube

Page 14: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Intuitive and simple input (just bounding box)

Generalizable to describe a wide range of interactions between tubes

No clue about the content in the box so can only detect very limited types of tube actions

Pros and Cons of Simple Rule-Based Action Detection

14

Page 15: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Caesar: An Framework with DNNs + Rules

15

Caesar’s Rule: “A person skateboarding moves with a person riding a bike”

Person 1 Tube

Person 2 Tube

From DNNs

From simple rules

Define many more complex activities than before

Page 16: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Flexibility: Detects user-defined complex activities correctly

Design Goals & Challenges

16

How to specify and detect complex activities?

Page 17: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Flexibility: Detects user-defined complex activities correctly

Scalability: Supports concurrent video streams in realtime

Design Goals & Challenges

17

How to specify and detect complex activities?

How to optimize the hardware resource usage?

Page 18: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Flexibility: Detects user-defined complex activities correctly

Scalability: Supports concurrent video streams in realtime

Efficiency: Saves energy and wireless data on mobile platforms

Design Goals & Challenges

18

How to specify and detect complex activities?

How to optimize the hardware resource usage?

How to collaborate the server and the mobile efficiently?

Page 19: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Flexibility: Specify complex activities as spatio-temporal relationships over an extensible vocabulary

Caesar’s Contributions

Efficiency: Lazily retrieve frames from mobile devices

Scalability: Use lazy graph matching to reduce DNN usage on GPUs

19

Page 20: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Represent Complex Activities as Graphs

20

Talking Use Phone ...Move/StopFrom Camera XAtomic Actions:

Caesar uses different approaches’ atomic actions as graph elements

Cross-camera person Re-identification DNN

Single-camera action detection DNN

Single-camera rule-based

action detection

Page 21: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Represent Complex Activities as Graphs

21

Talking Use Phone ...Move/StopFrom Camera XAtomic Actions:

Then ...And NotLogic Relation:

...Near Approach LeaveSpatial Relation: Close

Caesar also uses pre-defined spatial and logic relations to express rules

Page 22: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Represent Complex Activities as Graphs

22

Then ...And NotLogic Relation:

With Bag With Bag

ThenMove

And

Person: Use PhoneAn Example ofActivity Graph Not

...Near Approach LeaveSpatial Relation: Close

Talking Use Phone ...Move/StopFrom Camera XAtomic Actions:

Page 23: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Workflow: Rule Definition and Parsing

23

(1) User inputs action definition (2) Parse definition into a graph

With Bag

ThenMove

And

Person: Use Phone

With Bag

Not

Action Name: use_phone_then_move_with_bag

Subjects: Person p

Action Definition: (p use_phone) and (not p with_bag) then (p move) and (p with_bag)

Page 24: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Workflow: Graph Matching

24

(3) Matching the graph to the tubes and actions

With Bag

ThenMove

And

Use Phone

With Bag

Not

THEN

With Bag

ThenMove

And

Use Phone

With Bag

Not

Page 25: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Flexibility: Specify complex activities as spatio-temporal relationships over an extensible vocabulary

Caesar’s Contributions

Efficiency: Lazily retrieve frames from mobile devices

Scalability: Use lazy graph matching to reduce DNN usage on GPUs

25

Page 26: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

The Action DNN is the Major Scalability Bottleneck

26

DNN FPS

Object Detection 20~30

Tracking/ReID 30~45

Action Detection 50~60 (per tube)Too slow when

- 10 people: 5~6 FPS- 20 people: 2~3 FPS

Due to the nature of per-tube detection, the action DNN becomes the bottleneck in the processing pipeline

Page 27: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Lazy Graph Matching

27

Match spatiotemporal actions in a graph before any DNN action matching

Simple-rule actions: ● stop, move, close, disappear...

DNN actions: ● talk, use phone, sit, hug...

Skip heavy nodes and try to match the rest of the graph

Complete matching

Page 28: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

An Example of Lazy Graph Matching

28

With Bag

ThenMove

And

Use Phone

With Bag

Not

With Bag

ThenMove

And

Use Phone

With Bag

Not

THEN

With Bag

ThenMove

And

Use Phone

With Bag

Not

Go back and run DNN

Action DNN Usage (with lazy matching)

Action DNN Usage (without lazy matching)

Page 29: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Flexibility: Specify complex activities as spatio-temporal relationships over an extensible vocabulary

Caesar’s Contributions

Efficiency: Lazily retrieve frames from mobile devices

Scalability: Use lazy graph matching to reduce DNN usage on GPUs

29

Page 30: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Different modules have different data requirements:

● The spatial-temporal action detection

○ Needs bounding box positions in every frame

● The DNN action detection and tracking/ReID

○ Needs bounding box images of some tubes

● Visualization and video storage

○ Needs full frames

On-Demand Data Fetching for Mobile Devices

Data Usage

30

Page 31: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Caesar Mobile uploads bounding boxes by default, and uploads

requested bounding box images or full frames

Data Fetching Logic on Caesar Mobile

Caesar Mobile

Input Frames

Object Detection

Image Cache

Boxes Boxes Frames

Regular Upload:- Obj BBox - Obj Type- Frame ID- Device ID- ...

On-Demand Upload:BBox Images / Full Frames

On-Demand Query from Server: Start TimeEnd TimeData Type

31

Page 32: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

System Architecture

Edge Server

Caesar Server

32

Caesar Mobile

Fixed Camera

Caesar Mobile

Page 33: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Evaluation Setup

33

Caesar Mobile: Eight Nvidia TX1/TX2 boards, connected with WiFi

Caesar Server: Equipped with three Nvidia 2080 GPUs

Dataset: 30-min video clips from eight cameras in DukeMTMC dataset

- We manually labeled these in the videos:

- 1289 atomic activities

- 149 complex activities (73.8% are cross-camera)

Accuracy Metric: The ground truth is matched only when both the person

ID and all atomic actions are matched

Page 34: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

- Caesar achieves 61.0% recall and 59.5% precision

- All errors come from imperfect DNNs:

- Better action and ReID DNNs could improve the accuracy to 100%

Caesar’s Accuracy & Analysis

34

Page 35: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Scalability Optimization

35

- Strawman: without lazy graph matching and on-demand data fetching- For spatial-only activities (only non-DNN nodes), Caesar has FPS > 17- For mixed activities (contain some DNN nodes), Caesar keeps FPS > 15

Page 36: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Data Usage & Energy Consumption on Caesar Mobile

On-demand data fetching reduces- 3x wireless data usage for mixed activities; 15x for spatial-only activities- 6x energy consumption for mixed activities; 10x for spatial-only activities

36

Page 37: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Demo Video

37

1

2

3

Page 38: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

● Cross-camera complex activity recognition is essential in many fields

● Caesar combines DNNs and rule-based metrics to detect customized complex activities across cameras

● Caesar leverages lazy matching and on-demand data retrieval to improve efficiency and scalability

● Caesar reduces data usage and detection latency by >3X, up to 10X

Takeaway

38

Page 39: Caesar: Cross-Camera Complex Activity Recognition · 2020. 1. 13. · Caesar: Cross-Camera Complex Activity Recognition Xiaochen Liu University of Southern California Pradipta Ghosh

Caesar: Cross-Camera Complex Activity RecognitionXiaochen Liu University of Southern CaliforniaPradipta Ghosh University of Southern CaliforniaOytun Ulutan University of California, Santa BarbaraB.S. Manjunath University of California, Santa BarbaraKevin Chan ARLRamesh Govindan University of Southern California

https://github.com/USC-NSL/Caesar