Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
2D Human Pose Estimation: New Benchmark and State of the Art AnalysisMykhaylo Andriluka1,3, Leonid Pishchulin1, Peter Gehler2 and Bernt Schiele1
1Max Planck Institute for Informatics, 2Max Planck Institute for Intelligent Systems, 3Stanford University,Germany Germany USA
Overview
We analyse the state of the art in articulated human poseestimation using a new large-scale benchmark dataset.
• Our MPII Human Pose benchmark includes morethan 40,000 images systematically collected using anestablished taxonomy of human activities [1]. Thecollected images cover 410 human activities in total.
• Our analysis is based on the rich annotations thatinclude body pose, body part occlusion, torso andhead viewpoint, and person activity.
• Dataset and web-based evaluation toolkit areavailable at human-pose.mpi-inf.mpg.de.
Related Datasets
Dataset #training #test img. type
Full body pose datasets
Parse [Ramanan, NIPS’06] 100 205 diverse
LSP [Johnson&Everingham, BMVC’10] 1,000 1,000 sports (8 types)
PASCAL Person Layout [Everingham et al., IJCV’10] 850 849 everyday
Sport [Wang et al., CVPR’11] 649 650 sports
UIUC people [Wang et al., CVPR’11] 346 247 sports (2 types)
LSP extended [Johnson&Everingham, CVPR’11] 10,000 - sports (3 types)
FashionPose [Dantone et al., CVPR’13] 6,530 775 fashion blogs
J-HMDB [Jhuang et al., ICCV’13] 31,838 - diverse (21 act.)
Upper body pose datasets
Buffy Stickmen [Ferrari et al, CVPR’08] 472 276 TV show (Buffy)
ETHZ PASCAL Stickmen [Eichner&Ferrari, BMVC’09] - 549 PASCAL VOC
Human Obj. Int. (HOI) [Yao&Fei-Fei, CVPR’10] 180 120 sports (6 types)
We Are Family [Eichner&Ferrari, ECCV’10] 350 imgs. 175 imgs. group photos
Video Pose 2 [Sapp et al., CVPR’11] 766 519 TV show (Friends)
FLIC [Sapp&Taskar, CVPR’13] 6,543 1,016 feature movies
Sync. Activities [Eichner&Ferrari, PAMI’12] - 357 imgs. dance / aerobics
Armlets [Gkioxari et al., CVPR’13] 9,593 2,996 PASCAL VOC/Flickr
MPII Human Pose (this paper) 28,821 11,701 diverse (491 act.)
(a) (b) (c)
Visualization of the upper body pose variability. (a) color coding ofthe body parts, (b) annotations from the Armlets dataset [Gkioxariet al. CVPR’13], and (c) annotations from our MPII Human Pose
dataset.
sports occupation water activities home activities condition. exerc. fishing and hunt. religious activ. winter activ. walking dancing lawn and garden self care bicycling miscellaneous home repair running music playing transportation
hockey, ice bakery skindiving, scuba child care home exercise trapping game sitting in church skiing, downhill descending stairs Anishinaabe Jingle planting trees grooming, washing unicycling chess game, sitting home repair running flute, sitting pushing plane
rope skipping typing, electric swimming, synchr. polishing floors ski machine hunting, birds serving food skiing, cross-countr. backpacking general dancing raking roof with snow taking medication bicycling, racing board game playing put on and remov. jogging accordion, sitting riding in a car
trampoline locksmith swimming, breaststr. kitchen activity stair-treadmill hunting kneeling in church skiing, climbing up walking, for exerc. aerobic, step watering lawn or gard. showering, toweling bicycling, BMX copying documents caulking, chinking running, training conducting orchestra motor scooter, motor
rock climbing masonry, concrete tubing, floating playing with child. elliptical trainer fishing, ice sitting, playing skating, ice dancing pushing a wheelchair ballet, modern driving tractor hairstyling bicycling standing, talking repairing appliances running, stairs, up piano, sitting riding in a bus
cricket, batting fire fighter, haul. windsurfing carrying groceries slimnastics, jazzer. hunting, bow and arr. standing, talking dog sledding climbing hills caribbean dance clearing brush eating, sitting bicycling, mountain sitting, arts and cr. carpentry, general running, marathon cello, sitting driving automobile
Randomly chosen activities and images from 18 top level categories of our MPII Human Pose dataset. One image per activity is shown. The full dataset is available at human-pose.mpi-inf.mpg.de.
Analysis of the state of the art
• Performance of the upper-body and full-bodystate-of-the-art approaches:
Setting Torso Upper Lower Upper Fore- Head Upper Fullleg leg arm arm body body
Gkioxari et al. [2] 51.3 - - 28.0 12.4 - 26.4 -Sapp&Taskar [5] 51.3 - - 27.4 16.3 - 27.8 -Yang&Ramanan [6] 61.0 36.6 36.5 34.8 17.4 70.2 33.1 38.3Pishchulin et al. [3] 63.8 39.6 37.3 39.0 26.8 70.7 39.1 42.3
Gkioxari et al. [2] + loc 65.1 - - 33.7 14.9 - 32.4 -Sapp&Taskar [5] + loc 65.1 - - 32.6 19.2 - 33.7 -Yang&Ramanan [6] + loc 67.2 39.7 39.4 37.4 18.6 75.7 35.8 41.4Pishchulin et al. [3] + loc 66.6 40.5 38.2 40.4 27.7 74.5 40.6 43.9
Yang&Ramanan [6] retrained 69.3 39.5 38.8 43.4 27.7 74.6 42.3 44.7Pishchulin et al. [3] retrained 68.4 42.7 42.8 42.0 29.2 76.3 42.1 46.1
⇒ PS and FMP models improve significantly afterretraining.⇒ Full-body approaches perform better than
upper-body ones.
• We define complexity measures with respect to bodypose, viewpoint of the torso, body part length, occlusion,and truncation, and analyze sensitivity of thestate-of-the-art approaches to each of these factors:
0 1000 2000 3000 4000 5000 6000 7000
30
40
50
60
70
80
Number of images
PCPm
, %
OcclusionPart lengthPoseVP torsoTruncation
0 1000 2000 3000 4000 5000 6000 7000
30
40
50
60
70
80
Number of images
PCPm
, %
OcclusionPart lengthPoseVP torsoTruncation
Pishchulin et al. [3] Yang&Ramanan [6]
0 1000 2000 3000 4000 5000 6000 7000
30
40
50
60
70
80
Number of images
PCPm
, %
OcclusionPart lengthPoseVP torsoTruncation
0 1000 2000 3000 4000 5000 6000 7000
30
40
50
60
70
80
Number of images
PCPm
, %
OcclusionPart lengthPoseVP torsoTruncation
Sapp&Taskar [5] Gkioxari et al. [2]
Complexity measures:
Occlusion: number of occluded bodypartsmocc(L) =∑N
i=1ρi
Part length: average deviation from themean part lengthm f (L) =∑N
i=1 |d(li)−mi|/mi
Pose: deviation from the mean pose rep-resented via PS prior distributionmpose(L) =∏
(i, j)∈E pps(li|l j)
VP torso: deviation of the torso from thefrontal viewpointmv(L) =∑3
i=1αi
Truncation: number of truncated bodypartsmt(L) =∑N
i=1τi
⇒Body pose has a significantly more profound effecton performance than occlusion and truncation.
• Detailed performance analysis for activity, viewpointand body poses types:
20
25
30
35
40
45
50
55
60
65
activity categories
PCPm
, %
sport
s
occu
patio
n
cond
itionin
g exe
rcise
danc
ing
water a
ctivitie
s
home r
epair
home a
ctivitie
s
lawn a
nd ga
rden
fishing
and h
untin
g
music p
laying
winter
activi
ties
walking
miscella
neou
s
bicycl
ing
runnin
g
inactiv
ity qu
iet/lig
ht
trans
porta
tion
self c
are
religio
us ac
tivitie
s
volun
teer a
ctivitie
s
Pishchulin et al.Yang&RamananSapp&TaskarGkioxari et al.
20
30
40
50
60
70
activity categories
PCPm
, %
sport
s
occu
patio
n
cond
itionin
g exe
rcise
danc
ing
water a
ctivitie
s
home r
epair
home a
ctivitie
s
lawn a
nd ga
rden
fishing
and h
untin
g
music p
laying
winter
activi
ties
walking
miscella
neou
s
bicycl
ing
runnin
g
inactiv
ity qu
iet/lig
ht
trans
porta
tion
self c
are
religio
us ac
tivitie
s
volun
teer a
ctivitie
s
Pishchulin et al.Pishchulin et al. retrainedYang&RamananYang&Ramanan retrained
⇒ Striking performance differences for all approachesacross activities and viewpoints even after retraining.⇒Results on sports activities are the best, even though
they have been previously considered among themost difficult for pose estimation.⇒Detailed analysis reveals differences between FMP
and PS, even though overall performance iscomparable.
0
10
20
30
40
50
60
70
80
PCPm
, %
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
Pishchulin et al.Yang&RamananSapp&TaskarGkioxari et al.
108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161
162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215
CV
PR
#2022C
VP
R#2022
CV
PR
2014S
ubmission
#2022.C
ON
FIDE
NTIA
LR
EV
IEW
CO
PY.D
ON
OT
DIS
TRIB
UTE
.
−100 −50 0 50 100
−20
0
20
40
60
80
100
120
140
160−100 −50 0 50 100
−20
0
20
40
60
80
100
120
140
160
−100 −50 0 50 100
−20
0
20
40
60
80
100
120
140
160
180−100 −50 0 50 100
−20
0
20
40
60
80
100
120
140
160
−100 −50 0 50 100
−20
0
20
40
60
80
100
120
140
160
180−100 −50 0 50 100
−20
0
20
40
60
80
100
120
140
160
−100 −50 0 50 100
−20
0
20
40
60
80
100
120
140
−150 −100 −50 0 50 100
−20
0
20
40
60
80
100
120
140
160
180−150 −100 −50 0 50 100 150
0
50
100
150
200
−100 −50 0 50 100 150
−20
0
20
40
60
80
100
120
140
160
−100 −80 −60 −40 −20 0 20 40 60 80
−20
0
20
40
60
80
100
120
−100 −50 0 50 100
−20
0
20
40
60
80
100
120
140
160
−80 −60 −40 −20 0 20 40 60 80
−20
0
20
40
60
80
100
−60 −40 −20 0 20 40 60 80 100 120
−20
0
20
40
60
80
100
120
−100 −50 0 50 100
−20
0
20
40
60
80
100
120
140
160
180−80 −60 −40 −20 0 20 40 60
−20
0
20
40
60
80
−100 −50 0 50 100
−50
0
50
100
150
−100 −50 0 50 100
−20
0
20
40
60
80
100
120
140
−60 −40 −20 0 20 40 60
−20
−10
0
10
20
30
40
50
60
70
−100 −80 −60 −40 −20 0 20 40 60 80 100
−20
0
20
40
60
80
100
120
140
−100 −50 0 50 100
−20
0
20
40
60
80
100
120
140
160
180−80 −60 −40 −20 0 20 40 60 80
−20
0
20
40
60
80
100
−100 −50 0 50 100
−20
0
20
40
60
80
100
120
140
160
−150 −100 −50 0 50 100
−40
−20
0
20
40
60
80
100
120
140
160
−150 −100 −50 0 50 100
−20
0
20
40
60
80
100
120
140
160
180
−80 −60 −40 −20 0 20 40 60
−20
0
20
40
60
80
100
−80 −60 −40 −20 0 20 40 60 80 100
−20
0
20
40
60
80
100
120
−100 −50 0 50 100
−20
0
20
40
60
80
100
120
140
160
180−100 −80 −60 −40 −20 0 20 40 60 80 100
−20
0
20
40
60
80
100
120
140−100 −80 −60 −40 −20 0 20 40 60 80
−20
0
20
40
60
80
100
120
−100 −50 0 50 100
−20
0
20
40
60
80
100
120
140
160−150 −100 −50 0 50 100
−20
0
20
40
60
80
100
120
140
160
180
−100 −50 0 50 100
−20
0
20
40
60
80
100
120
140
160
−80 −60 −40 −20 0 20 40 60 80 100 120
−20
0
20
40
60
80
100
120
140
−100 −50 0 50 100
−20
0
20
40
60
80
100
120
140
−80 −60 −40 −20 0 20 40 60 80
0
20
40
60
80
100
120−100 −80 −60 −40 −20 0 20 40 60 80 100
−20
0
20
40
60
80
100
120
140
−80 −60 −40 −20 0 20 40 60 80
−20
0
20
40
60
80
100
120−50 0 50 100
−20
0
20
40
60
80
100
−100 −50 0 50 100
−20
0
20
40
60
80
100
120
140
160−100 −80 −60 −40 −20 0 20 40 60
−20
0
20
40
60
80
−80 −60 −40 −20 0 20 40 60 80 100
0
20
40
60
80
100
120
140−120 −100 −80 −60 −40 −20 0 20 40
−20
0
20
40
60
80
−80 −60 −40 −20 0 20 40 60 80
−20
0
20
40
60
80
100
−100 −50 0 50 100
−20
0
20
40
60
80
100
120
140
160
−100 −50 0 50 100
−20
0
20
40
60
80
100
120
140
160
−80 −60 −40 −20 0 20 40 60 80
−20
0
20
40
60
80
100
−100 −80 −60 −40 −20 0 20 40 60
−20
0
20
40
60
80
100−60 −40 −20 0 20 40 60
−20
0
20
40
60
80−100 −50 0 50 100 150
−20
0
20
40
60
80
100
120
140
160
180
−100 −50 0 50 100
−20
0
20
40
60
80
100
120
140
160
2
Performance (mPCP) for different pose types.
Activity recognition
We evaluate the performance of the state-of-the-artactivity recognition methods on our benchmark in [4].
1 50 100 150 200 250 300 350 4000
50
100
150
200
250
activity
# ex
ampl
es/a
ctiv
ity
traintest
1 50 100 150 200 250 300 350 4000
5
10
15
20
25
30
# activities
mA
P
Dense trajectories (DT)GT single pose (GT)GT single pose + track (GT−T)PS single pose + track (PS−T)PS multi−pose (PS−M)PS−M + DTPS−M filter DT
(a) (b)
(a) number of examples per activity. (b) Comparison of the activityrecognition performance of [Wang et al., ICCV’13] (DT) andvariants of posed-based approach of [Jhuang et al.,ICCV’13]. Theplot shows performance (mAP) as a function of the number ofactivities. Activities are sorted by the number of training examples.
yoga, bicycling, skiing, cooking or skate- rope softball, forestrypower mountaindownhill food prep. boarding skipping general
Dense trajectories (DT) 10.6 14.5 51.9 0.5 11.4 36.0 12.7 8.4PS multi-pose (PS-M) 18.3 34.0 27.3 2.6 17.2 90.5 3.0 5.2PS-M + DT 19.6 40.7 32.9 2.2 19.5 88.7 3.9 7.2PS-M filter DT 16.1 20.4 52.2 0.8 13.5 55.7 4.2 10.6
carpentry, bicycling, golf rock ballet, aerobic resistance totalgeneral racing climbing modern step training
Dense trajectories (DT) 5.5 5.5 33.0 41.5 12.7 24.5 16.5 19.0PS multi-pose (PS-M) 3.4 8.6 47.9 4.7 22.9 10.4 7.2 20.2PS-M + DT 5.0 12.1 51.9 14.4 23.7 17.1 14.4 23.5PS-M filter DT 6.1 15.5 15.9 38.6 7.1 25.8 9.6 19.5
Activity recognition results (mAP) on the 15 largest classes.
References[1] B. Ainsworth, W. Haskell, S. Herrmann, N. Meckes, D. Bassett, C. Tudor-Locke, J. Greer,
J. Vezina, M. Whitt-Glover, and A. Leon. 2011 compendium of physical activities: a secondupdate of codes and MET values. MSSE’11.
[2] G. Gkioxari, P. Arbelaez, L. Bourdev, and J. Malik. Articulated pose estimation usingdiscriminative armlet classifiers. In CVPR’13.
[3] L. Pishchulin, M. Andriluka, P. Gehler, and B. Schiele. Strong appearance and expressivespatial models for human pose estimation. In ICCV’13.
[4] L. Pishchulin, M. Andriluka, and B. Schiele. Fine-grained activity recognition with holistic andpose based features. arXiv:1406.1881, 2014.
[5] B. Sapp and B. Taskar. Multimodal decomposable models for human pose estimation. InCVPR’13.
[6] Y. Yang and D. Ramanan. Articulated human detection with flexible mixtures of parts.PAMI’13.