14
TRECVID-2016 Concept Localization : Overview George Awad NIST Dakota Consulting, Inc 2 TRECVID 2016

TRECVID 2016 : Concept Localization

Embed Size (px)

Citation preview

Page 1: TRECVID 2016 : Concept Localization

TRECVID-2016

Concept Localization : Overview

George Awad

NIST

Dakota Consulting, Inc

2TRECVID 2016

Page 2: TRECVID 2016 : Concept Localization

• Goal

• Make concept detection more precise in time and space than

current shot-level evaluation.

• Encourage more reusable concept detectors design that is

independent from the context.

• Task

• For each of the 10 new test concepts, NIST provided set of ~1000

shots.

• Any shot may or may not contain the target concept.

• For each I-Frame within the shot that contains the target, return

the x,y coordinates of the (UL,LR) vertices of a bounding

rectangle containing all of the target concept and as little more as

possible.

• Systems were allowed to submit more than 1 bounding box per I-

frame but only the one with maximum f-score were scored.

3TRECVID 2016

Page 3: TRECVID 2016 : Concept Localization

• Animal

• Bicycling*

• Boy

• Dancing*

• Explosion_fire*

• Instrumental_musician*

• Running*

• Sitting_down*

• Skier*

• Baby

4

10 New evaluated concepts

* dynamic/action concepts

TRECVID 2016

Page 4: TRECVID 2016 : Concept Localization

NIST Evaluation framework

• Testing data• IACC.2.A-C (600 h, used between 2013-2015 in SIN task)

• About 1000 shots per concept were sampled from the G.T (with TPclips of max = 300, avg = 178, min = 12)

• Total of 9,587shots and 2,205,140 i-frames were distributed to systems

• Human assessors were given all the i-frames (total of 55,789 images) of all TP shots to create the ground truth (drawing bounding box around the concept if it exists).

• Human assessors had to watch the video clips of the images to verify the concepts.

5TRECVID 2016

Page 5: TRECVID 2016 : Concept Localization

Evaluation metrics

• Temporal localization: precision, recall and f-score

based on the judged I-frames.

• Spatial localization: precision, recall and f-score

based on the located pixels representing the

concept.

• An average of precision, recall and f-score for

temporal and spatial localization across all I-frames

for each concept and for each run.

6TRECVID 2016

Page 6: TRECVID 2016 : Concept Localization

Participants (Finishers: 3 out of 21)

• 3 teams submitted 11 runs

• TokyoTech Tokyo Institute of Technology

• NII_Hitachi_UIT Nat’l. Inst of Info.; Hitachi, Ltd; Univ. of Info. Tech

• UTS_CMU_D2DCRC Univ. of Technology, Sydney; CMU; D2DCRC

7TRECVID 2016

Page 7: TRECVID 2016 : Concept Localization

Temporal localization results by run (sorted by F-score)

00.10.20.30.40.50.60.70.80.9

1M

ean

per

run

acro

ss a

ll c

on

cep

ts

I-frame F-score

I-frame Precision

I-frame Recall

8TRECVID 2016

Page 8: TRECVID 2016 : Concept Localization

TRECVID 2016 9

0

0.2

0.4

0.6

0.8

1

Me

an

per

run

acro

ss a

ll c

on

ce

pts

2013

0

0.2

0.4

0.6

0.8

1

Mean

per

run

acro

ss a

ll c

on

cep

ts

2014

00.10.20.30.40.50.60.70.80.9

1

CC

NY

_su

b1

.re

sult.t

xt

CC

NY

_su

b2

.re

sult.t

xt

CC

NY

_su

b3

.re

sult.t

xt

CC

NY

_su

b4

.re

sult.t

xt

insig

htd

cu.D

CU

_L

o…

Me

dia

Mill

_Q

ua

lco…

Me

dia

Mill

_Q

ua

lco…

Me

dia

Mill

_Q

ua

lco…

Me

dia

Mill

_Q

ua

lco…

Pic

SO

M.P

icS

OM

_…

Pic

SO

M.P

icS

OM

_…

Pic

SO

M.P

icS

OM

_…

Pic

SO

M.P

icS

OM

_…

To

kyoT

ech

.run

_to

k…

To

kyoT

ech

.run

_to

k…

To

kyoT

ech

.run

_to

k…

To

kyoT

ech

.run

_to

k…

Tri

mp

s_1

.txt

Tri

mp

s_2

_N

EG

_04

Tri

mp

s_3

_N

EG

_N

Tri

mp

s_3

_N

OC

_0

1…

Me

an

per

run

acro

ss a

ll c

on

ce

pts

2015 2016 (mainly action) >> 2013 & 2014

(mainly objects)

ONLY TP shots were given

to systems to localize.

Temporal Localization results

Page 9: TRECVID 2016 : Concept Localization

Spatial Localization results by run (sorted by F-score)

00.10.20.30.40.50.60.70.80.9

1M

ean

per

run

acro

ss a

ll c

on

cep

ts

Mean Pixel F-score

Mean Pixel Precision

Mean Pixel Recall

10

Harder than

temporal

localization

TRECVID 2016

Page 10: TRECVID 2016 : Concept Localization

TRECVID 2016 11

00.20.40.60.8

1M

ea

n p

er

run

ac

ros

s a

ll c

on

ce

pts

2013

0

0.2

0.4

0.6

0.8

1

Me

an

pe

r ru

n a

cro

ss

all

co

nce

pts

2014

0

0.2

0.4

0.6

0.8

1

CC

NY

_su

b1

.re

sult.t

xt

CC

NY

_su

b2

.re

sult.t

xt

CC

NY

_su

b3

.re

sult.t

xt

CC

NY

_su

b4

.re

sult.t

xt

insig

htd

cu.D

CU

_L

o…

Me

dia

Mill

_Q

ua

lco…

Me

dia

Mill

_Q

ua

lco…

Me

dia

Mill

_Q

ua

lco…

Me

dia

Mill

_Q

ua

lco…

Pic

SO

M.P

icS

OM

_L…

Pic

SO

M.P

icS

OM

_L…

Pic

SO

M.P

icS

OM

_L…

Pic

SO

M.P

icS

OM

_L…

To

kyoT

ech

.run

_to

k…

To

kyoT

ech

.run

_to

k…

To

kyoT

ech

.run

_to

k…

To

kyoT

ech

.run

_to

k…

Tri

mp

s_1

.txt

Tri

mp

s_2

_N

EG

_04

Tri

mp

s_3

_N

EG

_N

Tri

mp

s_3

_N

OC

_0

1…

Me

an

pe

r ru

n a

cro

ss

all

co

nce

pts

20152016 (actions) > 2013 (objects)

2016 (actions) ~ 2014 (objects)

ONLY TP shots were given

to systems to localize.

Spatial Localization results

Page 11: TRECVID 2016 : Concept Localization

Results per concept

top 10 runs

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

F-s

co

re

Median

10

9

8

7

6

5

4

3

2

1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Me

an

F-s

co

re

Median

10

9

8

7

6

5

4

3

2

1

Temporal localization Spatial localization

Most concepts perform better in temporal compared to spatial localization

A lot of resemblance between same concepts

12TRECVID 2016

Page 12: TRECVID 2016 : Concept Localization

Results per concept across all runs

00.10.20.30.40.50.60.70.80.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

Precision

Temporal localization

00.10.20.30.40.50.60.70.80.9

1

0 0.10.20.30.40.50.60.70.80.9 1

Me

an

Rec

all

Mean precision

Spatial localization

13

submitted bounding boxes

approximate G.T boxes in size with

some overlap. Many systems are

good in finding the real box sizes.

Many systems submitted a lot of non-target I-frames, while few found a good balance.

TRECVID 2016

baby

Inst_musi

bicycling

Page 13: TRECVID 2016 : Concept Localization

General Observations• Consistent observations in the last 4 years

Temporal localization is easier than spatial localization.

Systems report approximate G.T box sizes.

• Performance of action/dynamic concepts are higher

than object concepts tested in 2013-2014.

• Assessment of action/dynamic concepts proved to be

challenging in many cases to the human assessors.

• Lower finishing% of teams compared to signups

14TRECVID 2016

Page 14: TRECVID 2016 : Concept Localization

Next team talks

• TokyoTech

• UTS_CMU_D2DCRC

TRECVID 2016 15