TRECVID 2016 : Concept Localization

  • View
    41

  • Download
    1

Embed Size (px)

Text of TRECVID 2016 : Concept Localization

  • TRECVID-2016

    Concept Localization : Overview

    George Awad

    NIST

    Dakota Consulting, Inc

    2TRECVID 2016

  • Goal

    Make concept detection more precise in time and space than

    current shot-level evaluation.

    Encourage more reusable concept detectors design that is

    independent from the context.

    Task

    For each of the 10 new test concepts, NIST provided set of ~1000

    shots.

    Any shot may or may not contain the target concept.

    For each I-Frame within the shot that contains the target, return

    the x,y coordinates of the (UL,LR) vertices of a bounding

    rectangle containing all of the target concept and as little more as

    possible.

    Systems were allowed to submit more than 1 bounding box per I-

    frame but only the one with maximum f-score were scored.

    3TRECVID 2016

  • Animal

    Bicycling*

    Boy

    Dancing*

    Explosion_fire*

    Instrumental_musician*

    Running*

    Sitting_down*

    Skier*

    Baby

    4

    10 New evaluated concepts

    * dynamic/action concepts

    TRECVID 2016

  • NIST Evaluation framework

    Testing data IACC.2.A-C (600 h, used between 2013-2015 in SIN task)

    About 1000 shots per concept were sampled from the G.T (with TPclips of max = 300, avg = 178, min = 12)

    Total of 9,587shots and 2,205,140 i-frames were distributed to systems

    Human assessors were given all the i-frames (total of 55,789 images) of all TP shots to create the ground truth (drawing bounding box around the concept if it exists).

    Human assessors had to watch the video clips of the images to verify the concepts.

    5TRECVID 2016

  • Evaluation metrics

    Temporal localization: precision, recall and f-score

    based on the judged I-frames.

    Spatial localization: precision, recall and f-score

    based on the located pixels representing the

    concept.

    An average of precision, recall and f-score for

    temporal and spatial localization across all I-frames

    for each concept and for each run.

    6TRECVID 2016

  • Participants (Finishers: 3 out of 21)

    3 teams submitted 11 runs

    TokyoTech Tokyo Institute of Technology

    NII_Hitachi_UIT Natl. Inst of Info.; Hitachi, Ltd; Univ. of Info. Tech

    UTS_CMU_D2DCRC Univ. of Technology, Sydney; CMU; D2DCRC

    7TRECVID 2016

  • Temporal localization results by run (sorted by F-score)

    00.10.20.30.40.50.60.70.80.9

    1M

    ean

    per

    run

    acro

    ss a

    ll c

    on

    cep

    ts

    I-frame F-score

    I-frame Precision

    I-frame Recall

    8TRECVID 2016

  • TRECVID 2016 9

    0

    0.2

    0.4

    0.6

    0.8

    1

    Me

    an

    per

    run

    acro

    ss a

    ll c

    on

    ce

    pts

    2013

    0

    0.2

    0.4

    0.6

    0.8

    1

    Mean

    per

    run

    acro

    ss a

    ll c

    on

    cep

    ts

    2014

    00.10.20.30.40.50.60.70.80.9

    1

    CC

    NY

    _su

    b1

    .re

    sult.t

    xt

    CC

    NY

    _su

    b2

    .re

    sult.t

    xt

    CC

    NY

    _su

    b3

    .re

    sult.t

    xt

    CC

    NY

    _su

    b4

    .re

    sult.t

    xt

    insig

    htd

    cu.D

    CU

    _L

    o

    Me

    dia

    Mill

    _Q

    ua

    lco

    Me

    dia

    Mill

    _Q

    ua

    lco

    Me

    dia

    Mill

    _Q

    ua

    lco

    Me

    dia

    Mill

    _Q

    ua

    lco

    Pic

    SO

    M.P

    icS

    OM

    _

    Pic

    SO

    M.P

    icS

    OM

    _

    Pic

    SO

    M.P

    icS

    OM

    _

    Pic

    SO

    M.P

    icS

    OM

    _

    To

    kyoT

    ech

    .run

    _to

    k

    To

    kyoT

    ech

    .run

    _to

    k

    To

    kyoT

    ech

    .run

    _to

    k

    To

    kyoT

    ech

    .run

    _to

    k

    Tri

    mp

    s_1

    .txt

    Tri

    mp

    s_2

    _N

    EG

    _04

    Tri

    mp

    s_3

    _N

    EG

    _N

    Tri

    mp

    s_3

    _N

    OC

    _0

    1

    Me

    an

    per

    run

    acro

    ss a

    ll c

    on

    ce

    pts

    2015 2016 (mainly action) >> 2013 & 2014

    (mainly objects)

    ONLY TP shots were given

    to systems to localize.

    Temporal Localization results

  • Spatial Localization results by run (sorted by F-score)

    00.10.20.30.40.50.60.70.80.9

    1M

    ean

    per

    run

    acro

    ss a

    ll c

    on

    cep

    ts

    Mean Pixel F-score

    Mean Pixel Precision

    Mean Pixel Recall

    10

    Harder than

    temporal

    localization

    TRECVID 2016

  • TRECVID 2016 11

    00.20.40.60.8

    1M

    ea

    n p

    er

    run

    ac

    ros

    s a

    ll c

    on

    ce

    pts

    2013

    0

    0.2

    0.4

    0.6

    0.8

    1

    Me

    an

    pe

    r ru

    n a

    cro

    ss

    all

    co

    nce

    pts

    2014

    0

    0.2

    0.4

    0.6

    0.8

    1

    CC

    NY

    _su

    b1

    .re

    sult.t

    xt

    CC

    NY

    _su

    b2

    .re

    sult.t

    xt

    CC

    NY

    _su

    b3

    .re

    sult.t

    xt

    CC

    NY

    _su

    b4

    .re

    sult.t

    xt

    insig

    htd

    cu.D

    CU

    _L

    o

    Me

    dia

    Mill

    _Q

    ua

    lco

    Me

    dia

    Mill

    _Q

    ua

    lco

    Me

    dia

    Mill

    _Q

    ua

    lco

    Me

    dia

    Mill

    _Q

    ua

    lco

    Pic

    SO

    M.P

    icS

    OM

    _L

    Pic

    SO

    M.P

    icS

    OM

    _L

    Pic

    SO

    M.P

    icS

    OM

    _L

    Pic

    SO

    M.P

    icS

    OM

    _L

    To

    kyoT

    ech

    .run

    _to

    k

    To

    kyoT

    ech

    .run

    _to

    k

    To

    kyoT

    ech

    .run

    _to

    k

    To

    kyoT

    ech

    .run

    _to

    k

    Tri

    mp

    s_1

    .txt

    Tri

    mp

    s_2

    _N

    EG

    _04

    Tri

    mp

    s_3

    _N

    EG

    _N

    Tri

    mp

    s_3

    _N

    OC

    _0

    1

    Me

    an

    pe

    r ru

    n a

    cro

    ss

    all

    co

    nce

    pts

    20152016 (actions) > 2013 (objects)

    2016 (actions) ~ 2014 (objects)

    ONLY TP shots were given

    to systems to localize.

    Spatial Localization results

  • Results per concept

    top 10 runs

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    F-s

    co

    re

    Median

    10

    9

    8

    7

    6

    5

    4

    3

    2

    1

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Me

    an

    F-s

    co

    re

    Median

    10

    9

    8

    7

    6

    5

    4

    3

    2

    1

    Temporal localization Spatial localization

    Most concepts perform better in temporal compared to spatial localization

    A lot of resemblance between same concepts

    12TRECVID 2016

  • Results per concept across all runs

    00.10.20.30.40.50.60.70.80.9

    1

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

    Recall

    Precision

    Temporal localization

    00.10.20.30.40.50.60.70.80.9

    1

    0 0.10.20.30.40.50.60.70.80.9 1

    Me

    an

    Rec

    all

    Mean precision

    Spatial localization

    13

    submitted bounding boxes

    approximate G.T boxes in size with

    some overlap. Many systems are

    good in finding the real box sizes.

    Many systems submitted a lot of non-target I-frames, while few found a good balance.

    TRECVID 2016

    baby

    Inst_musi

    bicycling

  • General Observations Consistent observations in the last 4 years

    Temporal localization is easier than spatial localization.

    Systems report approximate G.T box sizes.

    Performance of action/dynamic concepts are higher

    than object concepts tested in 2013-2014.

    Assessment of action/dynamic concepts proved to be

    challenging in many cases to the human assessors.

    Lower finishing% of teams compared to signups

    14TRECVID 2016

  • Next team talks

    TokyoTech

    UTS_CMU_D2DCRC

    TRECVID 2016 15