28
Challenges for Deep Scene Understanding Bolei Zhou MIT Hang Zhao Sanja Fidler (UToronto) Adela Barriuso Antonio Torralba Aditya Khosla Aude Oliva Xavier Puig Bolei Zhou

Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja

  • Upload
    lykhanh

  • View
    225

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja

Challenges for Deep Scene Understanding

Bolei Zhou

MIT

Hang

ZhaoSanja

Fidler(UToronto)

Adela

BarriusoAntonio

Torralba

Aditya

Khosla

Aude

Oliva

Xavier

Puig

Bolei

Zhou

Page 2: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja

ObjectsintheSceneContext

Page 3: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja

Challenge 1: Scene Classification

Top1: street

Top2: residential neighborhood

Top3: crosswalk

Top4: apartment building

Top5: office building

objects

Challenge 2: Scene Parsingstuff

Deep s

cene u

nders

tandin

g

Page 4: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja

• 8milliontrainingimagesfrom365categoriesofPlacesDatabase

• Testset:900imagespercategory

Webpage:http://places2.csail.mit.edu

Page 5: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja

Constructing Places Database

2.Queryanddownload

images

696adjectives+scenenames

~90million rawimagesdownloaded

1. Collectscenenames

fromdictionary

~1000scenenames

3.Annotatethrough

AmazonMechanicalTurk

Threeroundsofannotations

Page 6: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja

amusement park

arch

corral

windmill train station platform

Urban

tower

swimming pool street

soccer field

elevator door

bar

cafeteria

veterinarians office

bedroom

conference center

Indoor

staircase

shoe shop

field road

fishpond

watering hole

Nature

rainforest

Page 7: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja

Results

92 validsubmissionsfrom27 teams(eachteamallowstosubmit

atmost5submissions).

6.00%

8.00%

10.00%

12.00%

14.00%

16.00%

18.00%

20.00%

0 10 20 30 40 50 60 70 80 90 100

Top-5errorsofallthe92submission(sorted)

Baseline

singleResNet152:14.9%

Page 8: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja

Results

Team Name Top-5Error

Hikvision 9.01%

MW 10.19%

Trimps-Soushen 10.30%

SIAT_MMLAB 10.43%

NTU-SC 10.85%

Hikvision

Qiaoyong Zhong, ChaoLi,Yingying

Zhang,Haiming Sun,Shicai Yang,Di

Xie,Shiliang Pu.

Hikvision ResearchInstitute

MW

GangSunandJie Hu

ChineseAcademyofSciencesand

PekingUniversity

Trimps-Soushen

Jie Shao,Xiaoteng Zhang,Zhengyan

Ding,Yixin Zhao,Yanjun Chen,Jianying

Zhou,Wenfei Wang,LinMei,

Chuanping Hu

TheThirdResearchInstituteofthe

MinistryofPublicSecurity,China

92 validsubmissionsfrom27 teams.

ResNet152 14.93%

VGG16 14.99%

AlexNet 17.25%

Singlemodel

baselines

Page 9: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja

Ambiguouspredictions

1)Unusualactivityinascene 2)Multiplesceneparts

top-1: restaurant

top-2: ice cream parlor

top-3: coffee shop

top-4: pizzeria

top-5: cafeteria

aquarium

top-1: campsite

top-2: sandbox

top-3: beer garden

top-4: market outdoor

top-5: flea market indoor

junkyard

top-1: balcony interior

top-2: beach house

top-3: boardwalk

top-4: roof garden

top-5: restaurant patio

lagoon

top-1: martial arts gym

top-2: stable

top-3: boxing ring

top-4: locker room

top-5: basketball court

construction site

Page 10: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja

• Newchallengethisyear

• Eachpixeloftheimageisclassifiedintosomeclass

class labelsemantic mask

Scene

parsing

Page 11: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja

• 22,000imagesfortrainingandvalidation,3,000imagesfortesting

• 150classesofobjects(car,person,table,etc)andstuff(sky,road,ceiling,etc)

00.020.040.060.080.1

0.120.140.160.18

Pixelfrequencyinthetrainingset

Page 12: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja

• 22,000imagesfortrainingandvalidation,3,000imagesfortesting

• 150classesofobjects(car,person,table,etc)andstuff(sky,road,ceiling,etc)

Page 13: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja

ConstructingADEDataset

• Annotatingeachobjectinstancesinascene

• Singleexpertannotatorforafewyearsofwork

http://groups.csail.mit.edu/vision/datasets/ADE20K/

Labelme AnnotationTool

Ms. Adela Barriuso

Page 14: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja
Page 15: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja
Page 16: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja
Page 17: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja
Page 18: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja
Page 19: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja
Page 20: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja
Page 21: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja
Page 22: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja

Results75 validsubmissionsfrom22 teams

0.3

0.35

0.4

0.45

0.5

0.55

0.6

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75

Finalscore=(meanIoU +pixelaccuracy)/2forallthe75

submissions

Baseline(VGG-based):0.4567

Page 23: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja

Results

Team Name FinalScore

SenseCUSceneParsing 0.5721

Adelaide 0.5674

360+MCG-ICT-CAS_SP 0.5556

SegModel 0.5465

CASIA_IVA 0.5433

FinalScore=(meanIoU +pixelaccuracy)/2

SenseCUSceneParsing

Hengshuang Zhao,Jianping Shi,

Xiaojuan Qi,Xiaogang Wang,Tong

Xiao,Jiaya Jia

Sensetime andCUHK,HongKong

Adelaide

Zifeng Wu,Chunhua Shen,Antonvan

enHengel

UniversityofAdelaide,Australia

360+MCG-ICT-CAS_SP

YuLi,ShengTang,MinLin,Rui Zhang,

YunPeng Chen,YongDong Zhang,

JinTao Li,YuGang Han,ShuiCheng Yan

Qihoo 360,ChineseAcademyof

Sciences,NationalUniversityof

Singapore,ChinaandSingapore

75 validsubmissionsfrom22 teams

DilatedNet 0.4567

FCN-8s 0.4480

SegNet 0.4079

Singlemodel

baselines

Page 24: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja

DataConsistencyandHumanPerformance

• 61imagesfromval setarere-annotatedafter6months.

82.4%pixelsgotthesamelabel.

64.03% 64.77% 65.41%

73.67% 74.49% 74.73%

82.40%

50.00%

55.00%

60.00%

65.00%

70.00%

75.00%

80.00%

85.00%

PixelAccuracy

Page 25: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja

DataConsistencyandHumanPerformance

• 61imagesfromval setarere-annotatedafter6months.

82.4%pixelsgotthesamelabel.

64.03% 64.77% 65.41%

73.67% 74.49% 74.73%

82.40%

50.00%

55.00%

60.00%

65.00%

70.00%

75.00%

80.00%

85.00%

PixelAccuracy

floor

sky

wall

building

20.30%pixelaccuracy

averageimage annotationmode

Page 26: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja

96.55% 96.36% 96.27% 91.96% 91.04%

94.14% 95.63% 96.55% 94.19% 89.59%

93.71% 93.51% 94.84% 94.89% 94.54%

92.73% 91.88% 93.02% 92.62% 91.47%

92.25% 85.66% 90.47% 79.81% 84.85%

Image Ground-truth SenseCU… Adelaide 360+MCG… SegModel CASIA_IVA

Page 27: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja

62.89% 51.23% 50.25% 45.66% 67.91%

56.19% 42.86% 55.82% 81.53% 48.69%

grandstand

54.08% 52.81% 53.70% 50.49% 38.97%

41.49% 40.72% 49.07% 42.28% 45.04%

boat

car

booth

Image Ground-truth SenseCU… Adelaide 360+MCG… SegModel CASIAIVA

Page 28: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja

ThanksalltheParticipantsandAudiences!

Hang

ZhaoSanja

Fidler(UToronto)

Adela

BarriusoAntonio

Torralba

Aditya

Khosla

Aude

Oliva

Xavier

Puig

Bolei

Zhou

http://places2.csail.mit.edu

http://sceneparsing.csail.mit.edu