Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Fashionpedia: Ontology, Segmentation, and anAttribute Localization Dataset
Menglin Jia?1, Mengyun Shi?1,4, Mikhail Sirotenko?3, Yin Cui?3,Claire Cardie1, Bharath Hariharan1, Hartwig Adam3, Serge Belongie1,2
1Cornell University 2Cornell Tech 3Google Research 4Hearst Magazines
Abstract. In this work we explore the task of instance segmentationwith attribute localization, which unifies instance segmentation (detectand segment each object instance) and fine-grained visual attribute cat-egorization (recognize one or multiple attributes). The proposed taskrequires both localizing an object and describing its properties. To illus-trate the various aspects of this task, we focus on the domain of fashionand introduce Fashionpedia as a step toward mapping out the visualaspects of the fashion world. Fashionpedia consists of two parts: (1) anontology built by fashion experts containing 27 main apparel categories,19 apparel parts, 294 fine-grained attributes and their relationships; (2)a dataset with everyday and celebrity event fashion images annotatedwith segmentation masks and their associated per-mask fine-grained at-tributes, built upon the Fashionpedia ontology. In order to solve thischallenging task, we propose a novel Attribute-Mask R-CNN model tojointly perform instance segmentation and localized attribute recogni-tion, and provide a novel evaluation metric for the task. Fashionpedia isavailable at: https://fashionpedia.github.io/home/.
Keywords: Dataset, Ontology, Instance Segmentation, Fine-Grained,Attribute, Fashion
1 Introduction
Recent progress in the field of computer vision has advanced machines’ abilityto recognize and understand our visual world, showing significant impacts infields including autonomous driving [52], product recognition [32,14], etc. Thesereal-world applications are fueled by various visual understanding tasks with thegoals of naming, describing (attribute recognition), or localizing objects withinan image.
Naming and localizing objects is formulated as an object detection task (Fig-ure 1(a-c)). As a hallmark for computer recognition, this task is to identify andindicate the boundaries of objects in the form of bounding boxes or segmentationmasks [12,37,17]. Attribute recognition [6,29,9,36] (Figure 1(d)) instead focuses
? equal contribution.
arX
iv:2
004.
1227
6v2
[cs
.CV
] 1
8 Ju
l 202
0
2 M. Jia et al.
!"#$ !%# !&#$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!'#
!"#$%&'()*&+),-- !"#$%&'()*$+,)%-+.+/0+.1 ()*$+,)%!"$$)#. 2+,0&3)$$) 45).+.1%(65)7).1$0 8+9:.";) <"+/$,+.)
!"#$%&
'()*
=.:,)%,).1$0
26;;)$#+9",
=>&?)@$0)@0+5%,).1$0
2+.1,)@>#)"/$)A
B#&55)A%/0&3,A)#
C)13,"#%D'+$E
!,"+.
8)9:,+.)
+,(%
2,))?)2,))?) !&9:)$!&9:)$
=>&?)@$0)@0+5%,).1$0
-,6%D45).+.1E
2,+;%D'+$E%
C)13,"#%D'+$E
26;;)$#+9",
<"/0)A
B+/$#)//)A%
!,"+.
!,"+.
8&#;",%<"+/$
F&,,"#
-".&*
/0"**%*
1"2
!&9:)$
+,(%
3.*%450%
G./$".9)%2)1;).$"$+&.
7&9",+H)A%=$$#+>3$)
.(('%$%&'(), I
26;;)$#+9",
=>&?)@$0)@0+5%,).1$0
2+.1,)@>#)"/$)A
B#&55)A%/0&3,A)#
C)13,"#%D'+$E
!,"+.
<"/0)A
=>&?)@$0)@0+5%,).1$0
C)13,"#%D'+$E
!,"+.
=.:,)%,).1$0
-,6%D45).+.1E
2,+;%D'+$E%
26;;)$#+9",
B+/$#)//)A%
!,"+.
8&#;",%<"+/$
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTR
IBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATT
RIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_A
TTRIB
UTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATT
RIBUTE
HAS_
ATTR
IBUT
E
HAS_ATT
RIBUTE
HAS_
ATTR
IBUT
E
HAS_A
TTRIB
UTE
HAS_A
TTRIB
UTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_A
TTRIB
UTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATT
RIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_AT
TRIB
UTE
HAS_
ATTR
IBUT
EHA
S_AT
TRIB
UTE
HAS_
ATTR
IBUT
EHA
S_AT
TRIB
UTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRI…
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATT
RIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_A
TTRIB
UTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
EHA
S_AT
TRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
I…
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
EHA
S_AT
TRIB
UTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_A
TTRIB
UTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_A
TTRIB
UTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
Asymm-etrical
Symm-etrical
Peplum
Circle
Flare
Fit and Flare
Trumpet
Merm-aid
Balloon /Bubble
Bell
Bell Bottom
Bootcut /Boot cut
Peg
Pencil
Straight
A-Line / ALine
Tent /Trapeze
Baggy
WideLeg
HighLow
Curved (fit)
Tight (fit) /Slim (fit) /Skinny (fit)
Regul-ar (fit)
Loose (fit)
Oversized /Oversize
EmpireWaistline
DroppedWaistline /
Highwaist /
NormalWaist /
Normal-w…
Lowwaist /Low Rise
Basque(wasitline)
No waistline
Above-the-…
Hip (length)
Micro (length)
Mini(length) /Mid-thigh(length)
Above-the-k…
Knee(Length) /Knee High
(length)
Below TheKnee
(length) /Below-th…
Midi /Mid-calf /mid calf /
Tea-length /tea len…
Maxi(length) /
Ankle(length)
Floor(length) / Full
(length)
Sleeveless
Short(length)
Elbow-length
Threequarter
(length) /
Wristlength
Singlebreasted
Doublebreasted
Lace Up /Lace-up
Wrapping /Wrap
Zip-up
Fly (Opening)
Chained(Opening)
Buckled(Opening)
Toggled(Opening)
noopening
Plastic
Rubber
Metal
Straw Feather
Gem /Gemstone
Bone
Ivory
Fur / Faux-fur
Leather /Faux-leather
Suede
Shearling
Crocodile
Snakeskin
Wood
nonon-textilematerial
Burnout
Distressed /Ripped
Washed
Embossed
FrayedPrinted
Ruched /Ruching
Quilted
Pleat /pleated
Gathering /gathered
Smocking /Shirring
Tiered /layered
Cutout
Slit
Perfor-ated
Lining / lined
Applique /Embroidery /
Patch
Bead
Rivet / Stud /Spike
Sequin
no specialmanufactur…
Plain(pattern)
Abstract
Cartoon
Letters &Numbers
Camou-flage
Check / Plaid/ Tartan
Dot
Fair Isle
Floral
Geometric
Paisley
Stripe
Houndstooth(pattern)
Herringbone(pattern)
Chevron
Argyle
Animal
Leopard
Snakeskin(pattern)
Cheetah
Peacock
Zebra
Giraffe
Toile de Jouy
Plant
Classic(T-shirt)
Polo (Shirt)
Under-shirt
Henley(Shirt)
Ringer(T-shirt)
Raglan(T-shirt)
Rugby (Shirt)
Sailor (Shirt)
Crop (Top) /Midriff (Top)
Halter (Top)
Camisole
Tank (Top)
Peasant(Top)
Tube (Top) /bandeau
(Top)
Tunic (Top)
Smock (Top)
Hoodie
Blazer
Pea (Jacket)
Puffer(Jacket) /
Down(Jacket)
Biker(Jacket) /
Moto(Jacket)
Trucker(Jacket) /
Denim(Jacket)
Bomber(Jacket)
Anorak
Safari(Jacket) /
Utility(Jacket) /
Cargo
Mao (Jacket)
Nehru(Jacket)
Norfolk(Jacket)
Classicmilitary(Jacket)
Track(Jacket)
Windbreaker Chanel(Jacket)
Bolero
Tuxedo(Jacket)
Varsity(Jacket)
Crop (Jacket)
Jeans
Sweatpants/ Jogger(Pants)
Leggings
Hip-huggers(Pants) / Hiphuggers (…
Cargo(Pants) /Military(Pants)
Culottes
Capri (Pants)
Harem(Pants)
Sailor (Pants)
Jodhpur /Breeches(Pants)
Peg (Pants)
Camo(Pants)
Track (Pants)
Crop (Pants)
Short(Shorts) / Hot
(Pants)
Booty(Shorts)
Berm-uda(Shorts)
Cargo(Shorts)
Trunks
Boardshorts/ Board(Shorts)
Skort
Roll-Up(Shorts) /Boyfriend(Shorts)
Tie-up(Shorts)
Culotte(Shorts)
Lounge(Shorts)
Bloom-ers
Tutu (Skirt) /Ballerina
(Skirt)
Kilt
Wrap (Skirt)
Skater (Skirt)
Cargo (Skirt)
Hobble(Skirt) /Wiggle(Skirt)
Sheath (Skirt)
Ball Gown(Skirt)
Gypsy(Skirt) /
Broomstick(Skirt)
Rah-rah(Skirt)
Prairie(Skirt)
Flame-nco
(Skirt)
Accordion(Skirt)
Sarong(Skirt)
Tulip (Skirt)
Dirndl(Skirt)
Godet (Skirt)
Blanket(Coat)
Parka
Trench (Coat)
Pea (Coat) /Reefer (Co…
Shearling(Coat)
TeddyBear (Coat)
/ Teddy(Coat) / Fur
(Coat)
Puffer (Coat)
Duster(Coat)
Raincoat
Kimono
Robe
Dress(Coat
Duffle(Coat) /
Duffel (Co…
Wrap (Coat)
Military(Coat)Swing
(Coat)
Halter(Dress)
Wrap (Dress)
Chemise(Dress)
Slip(Dress)
Cheongsams/ Qipao
Jumper(Dress)
Shift(Dress)
Sheath(Dress)
Shirt(Dress)
Sundress /Sun (Dress)
Kaftan /Caftan
Bodycon(Dress)
Nightgown
Gown
Sweater(Dress)
Tea (Dress)
Blouson(Dress)
Tunic (Dress)
Skater(Dress)
Asymmetric(Collar)
Regular(Collar)
Shirt (Collar)
Polo(Collar) / FlatKnit (Collar)
Chelsea(Collar)
Banded(Collar)
Mandarin(Collar) /Nehru(Collar)
PeterPan
(Collar)
Bow(Collar) / Tie
(Collar) /Ascot
(Collar)
Stand-away(Collar)
Jabot (Collar)
Sailor (Collar)
Oversized(Collar)
Notched(Lapel)
Peak (Lapel)
Shawl (Lapel)
Napoleon(Lapel)
Oversized(Lapel)
Set-in sleeve
Dropped-sh…
Raglan(Sleeve)
Cap (Sleeve)
Tulip(Sleeve) /
Petal(Sleeve)
Puff (Sleeve)
Bell(Sleeve) /
Circularflounce
(Sleeve) /Ruffle
(Sleeve)
Poet (Sleeve)
Dolman(Sleeve) &
Batwing(Sleeve)
Bishop(Sleeve)
Leg ofmutton
(Sleeve)
Kimono(Sleeve)
Cargo(Pocket) /
Gusset(Pocket) /
Bellow
Patch(Pocket)
Welt (Pocket)
Kang-aroo(Pocket)
Seam(Pocket)
Slash(Pocket) /Slashed
(Pocket) /slant (…
Curved(Pocket)
Flap (Pocket)
Collarless
Asymmetric(Neckline)
Crew(Neck)
Round(Neck) /
Roundneck
V-neck / V(Neck)
Surplice(Neck)
Oval(Neck)
U-neck / U(Neck)
Sweetheart(Neckline)
Queen anne(Neck)
Boat(Neck)
Scoop(Neck)
Square(Neckline)
Plunging(Neckline) /
Plunge(Neckline)
Keyhole(Neck)
Halter(Neck)
Crossover(Neck)
Choker(Neck)
High(Neck) /
Bottle (Neck)
Turtle(Neck) /
Mock (Neck)/ Polo(Neck)
Cowl (Neck)
Straightacross(Neck)
Illusion(Neck)
Off-the-sho…
Oneshoulder
Jacket
Shirt / Blouse
Tops
Sweater(pullover, without
opening)
Cardigan (withopening)
Vest / Gilet Pants
ShortsSkirt
Coat Dress
Jumpsuit
CapeCollar
Sleeve
Neckline
Lapel
Scarf
Umbrella
Epaulette
Buckle
Zipper
Applique
Bead
Bow
Ribbon
Rivet
Ruffle
Sequin
Tassel
Shoe
Bag, wallet
Flower
Fringe
Headband, Hair Accessory
Tie
Glove
Watch
Belt
Leg warmer
Tights, stockings
Sock
Glasses
Hat
ModaNetDeepFashion2
Apparel CategoriesFine-grained Attributes
!(#
JacketTops
Sweater
Cardigan (withopening)
Vest / Gilet Pants
ShortsSkirt
Coat Dress
Jumpsuit
CapeCollar
Sleeve
Neckline
Lapel
Scarf
Umbrella
Epaulette
Buckle
Zipper
Applique
Bead
Bow
Ribbon
Rivet
Ruffle
Sequin
Tassel
Shoe
Bag, wallet
Flower
Fringe
Tie
Glove
Watch
Belt
Leg warmer
Sock
Glasses
Hat
ModaNetDeepFashion2
Apparel CategoriesFine-grained Attributes
!)#$
Fig. 1: An illustration of the Fashionpedia dataset and ontology: (a)main garment masks; (b) garment part masks; (c) both main garment and gar-ment part masks; (d) fine-grained apparel attributes; (e) an exploded view ofthe annotation diagram: the image is annotated with both instance segmen-tation masks (white boxes) and per-mask fine-grained attributes (black boxes);(f) visualization of the Fashionpedia ontology: we created Fashionpedia ontol-ogy and separate the concept of categories (yellow nodes) and attributes [38](blue nodes) in fashion. It covers pre-defined garment categories used by bothDeepfashion2 [11] and ModaNet [54]. Mapping with DeepFashion2 also showsthe versatility of using attributes and categories. We are able to present all 13garment classes in DeepFashion2 with 11 main garment categories, 1 garmentpart, and 7 attributes
on describing and comparing objects, since an object also has many other prop-erties or attributes in addition to its category. Attributes not only provide acompact and scalable way to represent objects in the world, as pointed out byFerrari and Zisserman [9], attribute learning also enables transfer of existingknowledge to novel classes. This is particularly useful for fine-grained visualrecognition, with the goal of distinguishing subordinate visual categories such asbirds [46] or natural species [43].
In the spirit of mapping the visual world, we propose a new task, instancesegmentation with attribute localization, which unifies object detection and fine-grained attribute recognition. As illustrated in Figure 1(e), this task offers astructured representation of an image. Automatic recognition of a rich set of at-tributes for each segmented object instance complements category-level objectdetection and therefore advance the degree of complexity of images and sceneswe can make understandable to machines. In this work, we focus on the fash-ion domain as an example to illustrate this task. Fashion comes with rich and
Fashionpedia 3
complex apparel with attributes, influences many aspects of modern societies,and has a strong financial and cultural impact. We anticipate that the proposedtask is also suitable for other man-made product domains such as automobileand home interior.
Structured representations of images often rely on structured vocabular-ies [28]. With this in mind, we construct the Fashionpedia ontology (Figure 1(f))and image dataset (Figure 1(a-e)), annotating fashion images with detailed seg-mentation masks for apparel categories, parts, and their attributes. Our pro-posed ontology provides a rich schema for interpretation and organization ofindividuals’ garments, styles, or fashion collections [24]. For example, we cancreate a knowledge graph (see supplementary material for more details) by ag-gregating structured information within each image and exploiting relationshipsbetween garments and garment parts, categories, and attributes in the Fash-ionpedia ontology. Our insight is that a large-scale fashion segmentation andattribute localization dataset built with a fashion ontology can help computervision models achieve better performance on fine-grained image understandingand reasoning tasks.
The contributions of this work are as follows:A novel task of fine-grained instance segmentation with attribute localization.
The proposed task unifies instance segmentation and visual attribute recognition,which is an important step toward structural understanding of visual content inreal-world applications.
A unified fashion ontology informed by product descriptions from the internetand built by fashion experts. Our ontology captures the complex structure offashion objects and ambiguity in descriptions obtained from the web, containing46 apparel objects (27 main apparels and 19 apparel parts), and 294 fine-grainedattributes (spanning 9 super categories) in total. To facilitate the development ofrelated efforts, we also provide a mapping with categories from existing fashionsegmentation datasets, see Figure 1(f).
A dataset with a total of 48,825 clothing images in daily-life, street-style,celebrity events, runway, and online shopping annotated both by crowd workersfor segmentation masks and fashion experts for localized attributes, with thegoal of developing and benchmarking computer vision models for comprehensiveunderstanding of fashion.
A new model, Attribute-Mask R-CNN, is proposed with the aim of jointlyperforming instance segmentation and localized attribute recognition; a novelevaluation metric for this task is also provided.
2 Related Work
The combined task of fine-grained instance segmentation and attribute lo-calization has not received a great deal of attention in the literature. On onehand, COCO [31] and LVIS [15] represent the benchmarks of object detectionfor common objects. Panoptic segmentation is proposed to unify both seman-tic and instance segmentation, addressing both stuff and thing classes [27]. In
4 M. Jia et al.
Table 1: Comparison of fashion-related datasets: (Cls. = Classification, Segm. =Segmentation, MG = Main Garment, GP = Garment Part, A = Accessory, S =Style, FGC = Fine-Grained Categorization). To the best of our knowledge, weinclude all fashion-related datasets focusing on visual recognition
Name Category Annotation Type Attribute Annotation Type
Cls. BBox Segm. Unlocalized Localized FGC
Clothing Parsing [50] MG, A - - - - -Chic or Social [49] MG, A - - - - -Hipster [26] MG, A, S - - - - -Ups and Downs [19] MG - - - - -Fashion550k [23] MG, A - - - - -Fashion-MNIST [48] MG - - - - -
Runway2Realway [44] - - MG, A - - -ModaNet [54] - MG, A MG, A - - -Deepfashion2 [11] - MG MG - - -
Fashion144k [40] MG, A - - X - -Fashion Style-128 Floats [41] S - - X - -UT Zappos50K [51] A - - X - -Fashion200K [16] MG - - X - -FashionStyle14 [42] S - - X - -
Main Product Detection [39] - MG - X - -
StreetStyle-27K [35] - - - X - X
UT-latent look [21] MG, S - - X - XFashionAI [7] MG, GP, A - - X - XiMat-Fashion Attribute [14] MG, GP, A, S - - X - X
Apparel classification-Style [4] - MG - X - XDARN [22] - MG - X - XWTBI [25] - MG, A - X - XDeepfashion [32] S MG - X - X
Fashionpedia - MG, GP, A MG, GP, A - X X
spite of the domain differences, Fashionpedia has comparable mask qualitieswith LVIS and the similar total number of segmentation masks as COCO. Onthe other hand, we have also observed an increasing effort to curate datasets forfine-grained visual recognition, evolved from CUB-200 Birds [46] to the recentiNaturalist dataset [43]. The goal of this line of work is to advance the state-of-the-art in automatic image classification for large numbers of real world, fine-grained categories. A rather unexplored area of these datasets, however, is toprovide a structured representation of an image. Visual Genome [28] providesdense annotations of object bounding boxes, attributes, and relationships in thegeneral domain, enabling a structured representation of the image. In our work,we instead focus on fine-grained attributes and provide segmentation masks inthe fashion domain to advance the clothing recognition task.
Clothing recognition has received increasing attention in the computer vi-sion community recently. A number of works provide valuable apparel-relateddatasets [50,4,49,26,44,40,22,25,19,41,32,23,48,51,16,42,39,35,21,54,7,11]. Thesepioneering works enabled several recent advances in clothing-related recognition
Fashionpedia 5
and knowledge discovery [10,34]. Table 1 summarizes the comparison amongdifferent fashion datasets regarding annotation types of clothing categories andattributes. Our dataset distinguishes itself in the following three aspects.
Exhaustive annotation of segmentation masks: Existing fashion datasets[44,54,11] offer segmentation masks for the main garment (e.g., jacket, coat,dress) and the accessory categories (e.g., bag, shoe). Smaller garment objectssuch as collars and pockets are not annotated. However, these small objectscould be valuable for real-world applications (e.g., searching for a specific collarshape during online-shopping). Our dataset is not only annotated with the seg-mentation masks for a total of 27 main garments and accessory categories butalso 19 garment parts (e.g., collar, sleeve, pocket, zipper, embroidery).
Localized attributes: The fine-grained attributes from existing datasets[22,32,39,14] tend to be noisy, mainly because most of the annotations are col-lected by crawling fashion product attribute-level descriptions directly from largeonline shopping websites. Unlike these datasets, fine-grained attributes in ourdataset are annotated manually by fashion experts. To the best of our knowl-edge, ours is the only dataset to annotate localized attributes: fashion experts areasked to annotate attributes associated with the segmentation masks labeled bycrowd workers. Localized attributes could potentially help computational modelsdetect and understand attributes more accurately.
Fine-grained categorization: Previous studies on fine-grained attributecategorization suffer from several issues including: (1) repeated attributes be-longing to the same category (e.g., zip,zipped and zipper) [32,21]; (2) basiclevel categorization only (object recognition) and lack of fine-grained catego-rization [50,4,49,26,25,44,41,42,23,16,48,54,11]; (3) lack of fashion taxonomieswith the needs of real-world applications for the fashion industry, possibly dueto the research gap in fashion design and computer vision; (4) diverse taxonomystructures from different sources in fashion domain. To facilitate research in theareas of fashion and computer vision, our proposed ontology is built and veri-fied by fashion experts based on their own design experiences and informed bythe following four sources: (1) world-leading e-commerce fashion websites (e.g.,ZARA, H&M, Gap, Uniqlo, Forever21); (2) luxury fashion brands (e.g., Prada,Chanel, Gucci); (3) trend forecasting companies (e.g., WGSN); (4) academicresources [8,3].
3 Dataset Specification and Collection
3.1 Ontology specification
We propose a unified fashion ontology (Figure 1(f)), a structured vocabu-lary that utilizes the basic level categories and fine-grained attributes [38]. TheFashionpedia ontology relies on similar definitions of object and attributes as pre-vious well-known image datasets. For example, a Fashionpedia object is similarto “item” in Wikidata [45], or “object” in COCO [31] and Visual Genome [28]).In the context of Fashionpedia, objects represent common items in apparel (e.g.,
6 M. Jia et al.
!"#$%!&'()*%!+,##-./,'%"012!+,3-'!1*"1.$3*&0!,*%*%"!4*5!)5!+3*-.*&,'!-)61'7!8'37!5,3*%./5300'-%2!3&9::'0-*63,!%#.%#%!0';0*,'.:30'-*3,
!&*%",'.8-'3&0'7!+,#-3,!&9::'0-*63,
!&1*-0./6#,,3-2
!$-*&0!,'%"01!5,3*%./5300'-%2.!38#<'!01'!1*5.!%#.%#%!0';0*,'.:30'-*3,.!0*"10./+*02.!&9::'0-*63,
!5,3*%./5300'-%2.!38#<'!01'!1*5./,'%"012.!%#.%#%!0';0*,'.:30'-*3,.!0-)6='-./>36='02.!$3&1'7.!-'"),3-./+*02.!&*%",'.8-'3&0'7!&9::'0-*63,.!%#.$3*&0,*%'
!-'"),3-./6#,,3-2
!5,3*%./5300'-%2.!%#.%#%!0';0*,'.:30'-*3,.!-'"),3-./+*02.!+,9./#5'%*%"2.!5'"./53%0&2.!:3;*./,'%"012.!&9::'0-*63,.!5'"
!$',0./5#6='02
!$',0./5#6='02
!$-*&0!,'%"01!&'0!*%.&,''<'
!5,3*%./5300'-%2.!%#.%#%!0';0*,'.:30'-*3,.!7#)8,'.8-'3&0'7.!-'"),3-./+*02.!0-'%61./6#302.!:*%*./,'%"012.!&9::'0-*63,.!%#.$3*&0,*%'
!38#<'!01'!1*5./,'%"012!5,3*%./5300'-%2!%#-:3,.$3*&0!%#.%#%!0';0*,'.:30'-*3,!0*"10./+*02!6,3&&*6./0!&1*-02!&9::'0-*63,
!38#<'!01'!1*5./,'%"012!%#-:3,.$3*&0!%#.%#%!0';0*,'.:30'-*3,!-'"),3-./+*02!&*%",'.8-'3&0'7!&9::'0-*63,!7#0
!5,3*%./5300'-%2!%#.%#%!0';0*,'.:30'-*3,!-'"),3-./+*02!:*%*./,'%"012!8,3%='0./6#302!&*%",'.8-'3&0'7!,*%*%"!&9::'0-*63,!%#.$3*&0,*%'
!<!%'6=
!$-*&0!,'%"01!&'0!*%.&,''<'
!$-*&0!,'%"01
!5,3*%./5300'-%2!>'3%&!%#.%#%!0';0*,'.:30'-*3,!-'"),3-./+*02!$3&1'7!,#$.$3*&0!:3;*./,'%"012!&9::'0-*63,!+,9./#5'%*%"2!&0-3*"10
!53061./5#6='02
!6)-<'7./5#6='02
0#03,.:3&=&?.@A 0#03,.:3&=&?.@@
0#03,.630'"#-*'&?.BC 0#03,.630'"#-*'&?.BC 0#03,.630'"#-*'&?.B@ 0#03,.630'"#-*'&.?.B@
/32 /82
/62 /72 /'2 /+2 /"2 /12 /*2
Fig. 2: Image examples with annotated segmentation masks (a-f) and fine-grained attributes (g-i)
jacket, shirt, dress). In this section, we break down each component of the Fash-ionpedia ontology and illustrate the construction process. With this ontologyand our image dataset, a large-scale fashion knowledge graph can be built as anextended application of our dataset (more details can be found in the supple-mentary material).
Apparel categories. In the Fashionpedia dataset, all images are annotatedwith one or multiple main garments. Each main garment is also annotated withits garment parts. For example, general garment types such as jacket, dress,pants are considered as main garments. These garments also consist of severalgarment parts such as collars, sleeves, pockets, buttons, and embroideries. Maingarments are divided into three main categories: outerwear, intimate and acces-sories. Garment parts also have different types: garment main parts (e.g., collars,sleeves), bra parts, closures (e.g., button, zipper) and decorations (e.g., embroi-dery, ruffle). On average, each image consists of 1 person, 3 main garments,3 accessories, and 12 garment parts, each delineated by a tight segmentationmask (Figure 1(a-c)). Furthermore, each object is assigned to a synset ID in ourFashionpedia ontology.
Fine-grained attributes. Main garments and garment parts can be asso-ciated with apparel attributes (Figure 1(e)). For example, “button” is part ofthe main garment “jacket”; “jacket” can be linked with the silhouette attribute“symmetrical”; the garment part “button” could contain the attribute “metal”with a relationship of material. The Fashionpedia ontology provide attributes for13 main outerwear garments categories, and 5 out of 19 garments parts (“sleeve”,“neckline”, “pocket”, “lapel”, and “collar”). Each image has 16.7 attributes onaverage (max 57 attributes). As with the main garments and garment parts, wecanonicalize all attributes to our Fashionpedia ontology.
Relationships. Relationships can be formed between categories and at-tributes. There are three main types of relationships (Figure 1(e)): (1) outfitsto main garments, main garments to garment parts: meronymy (part-of) rela-tionship; (2) main garments to attributes or garment parts to attributes: theserelationship types can be garment silhouette (e.g., peplum), collar nickname
Fashionpedia 7
(e.g., peter pan collars), garment length (e.g., knee-length), textile finishing (e.g.,distressed), or textile-fabric patterns (e.g., paisley), etc.; (3) within garments,garment parts or attributes: there is a maximum of four levels of hyponymy(is-an-instance-of) relationships. For example, weft knit is an instance of knitfabric, and the fleece is an instance of weft knit.
3.2 Image Collection and Annotation Pipeline
Image collection. A total of 50,527 images were harvested from Flickrand free license photo websites, which included Unsplash, Burst by Shopify,Freestocks, Kaboompics, and Pexels. Two fashion experts verified the quality ofthe collected images manually. Specifically, the experts checked a scenes’ diversityand made sure clothing items were visible in the images. Fdupes [33] is used toremove duplicated images. After filtering, 48,825 images were left and used tobuild our Fashionpedia dataset.
Annotation Pipeline. Expert annotation is often a time-consuming pro-cess. In order to accelerate the annotation process, we decoupled the work be-tween crowd workers and experts. We divided the annotation process into thefollowing two phases.
First, segmentation masks with apparel objects are annotated by 28 crowdworkers, who were trained for 10 days before the annotation process (with pre-pared annotation tutorials of each apparel object, see supplementary materialfor details). We collected high-quality annotations by having the annotators fol-low the contours of garments in the image as closely as possible (See section 4.2for annotation analysis). This polygon annotation process was monitored dailyand verified weekly by a supervisor and by the authors.
Second, 15 fashion experts (graduate students in the apparel domain) wererecruited to annotate the fine-grained attributes for the annotated segmenta-tion masks. Annotators were given one mask and one attribute super-category(“textile pattern”, “garment silhouette” for example) at a time. An additionaltwo options, “not sure” and “not on the list” were added during the annotation.The option “not on the list” indicates that the expert found an attribute thatis not on the proposed ontology. If “not sure” is selected, it means the expertcan not identify the attribute of one mask. Common reasons for this selectioninclude occlusion of the masks and viewing angles of the image; (for example,a top underneath a closed jacket). More details can be found in Figure 2. Eachattribute supercategory is assigned to one or two fashion experts, dependingon the number of masks. The annotations are also checked by another expertannotator before delivery.
We split the data into training, validation and test sets, with 45,623, 1158,2044 images respectively. More details of the dataset creation can be found inthe supplementary material.
8 M. Jia et al.
4 Dataset Analysis
This section will discuss a detailed analysis of our dataset using the train-ing images. We begin by discussing general image statistics, followed by ananalysis of segmentation masks, categories, and attributes. We compare Fash-ionpedia with four other segmentation datasets, including two recent fashiondatasets DeepFashion2 [11] and ModaNet [54], and two general domain datasetsCOCO [31] and LVIS [15].
4.1 Image Analysis
We choose to use images with high resolutions during the curating processsince Fashionpedia ontology includes diverse fine-grained attributes for both gar-ments and garment parts. The Fashionpedia training images have an average di-mension of 1710 (width) × 2151 (height). Images with high resolutions are ableto show apparel objects in detail, leading to more accurate and faster annotationsfor both segmentation masks and attributes. These high resolution images canalso benefit downstream tasks such as detection and image generation. Examplesof detailed annotations can be found in Figure 2.
4.2 Mask Analysis
We define “masks” as one apparel instance that may have more than oneseparate components (the jacket in Figure 1 for example), “polygon” as a disjointarea.
Mask quantity. On average, there are 7.3 (median 7, max 74) number ofmasks per image in the Fashionpedia training set. Figure 3(a) shows that theFashionpedia has the largest median value in the 5 datasets used for comparison.Fashionpedia also has the widest range among three fashion datasets, and a com-parable range with COCO, which is a dataset in a general domain. Compared toModaNet and Deepfashion2 datasets, Fashionpedia has the widest range of maskcount distribution. However, COCO and LVIS maintain a wider distribution overFashionpedia owing to their more common objects in their dataset. Figure 3(d)illustrates the distribution within the Fashionpedia dataset. One image usuallycontains more garment parts and accessories than outerwears.
Mask sizes. Figure 3(b) and 3(e) compares relative mask sizes within Fash-ionpedia and against other datasets. Ours has a similar distribution as COCOand LVIS, except for a lack of larger masks (area > 0.95). DeepFashion2 has aheavier tail, meaning it contains a larger portion of garments with a zoomed-inview. Unlike DeepFashion2, our images mainly focused on the whole ensemble ofclothing. Since ModaNet focuses on outwears and accessories, it has more maskswith relative area between 0.2 and 0.4. whereas ours has an additional 19 apparelparts categories. As illustrated in Figure 3(e), garment parts and accessories arerelatively small compared to the outerwear (e.g., “dress”, “coat”).
Mask quality. Apparel categories also tend to have complex silhouettes.Table 2 shows that the Fashionpedia masks have the most complex boundaries
Fashionpedia 9
0 100 200 300 400 500Number of mask per image
10 3
10 2
10 1
100
101
102
Perc
ent o
f im
ages
DeepFashion2 [2|8]ModaNet [5|16]COCO2017 [4|93]LVIS [6|556]Fashionpedia [7|74]
(a) Mask count per image
across datasets
0.0 0.2 0.4 0.6 0.8 1.0Relative segmentation mask size
10 4
10 3
10 2
10 1
100
101
Perc
ent o
f ins
tanc
es
DeepFashion2ModaNetCOCO2017LVISFashionpedia
(b) Relative mask size across
datasets
2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0Number of categories per image
10 3
10 2
10 1
100
101
102
Perc
ent o
f im
ages
DeepFashion2ModaNetCOCO2017LVISFashionpedia
(c) Category diversity per im-
age across datasets
0 10 20 30 40 50 60 70Number of mask per image
10 3
10 2
10 1
100
101
Perc
ent o
f ins
tanc
es Outerwears [1|6]Garment Parts [3|69]Accessories [2|18]
(d) Mask count per image in
Fashionpedia
0.0 0.2 0.4 0.6 0.8 1.0Relative segmentation mask size
10 3
10 2
10 1
100
101
Perc
ent o
f ins
tanc
es OuterwearsGarment PartsAccessories
(e) Relative mask size in Fash-
ionpedia
0 50 100 150 200 250 300Categories and attributes
100
101
102
103
104
105
Coun
ts
CategoriesAttributes
(f) Categories and attributes
distribution in Fashionpedia
Fig. 3: Dataset statistics: First row presents comparison among datasets. Sec-ond row presents comparison within Fashionpedia. Y-axes are in log scale. Rela-tive segmentation mask size were calculated same as [15]. Relative mask size wasrounded to precision of 2. For mask count per image comparisons (Figure 3(a)and 3(d)), legends follow [median | max ] format. Values in X-axis for Figure 3(a)was discretized for better visual effect
amongst the five datasets (according to the measurement used in [15]). This sug-gests that our masks are better able to represent complex silhouettes of apparelcategories more accurately than ModaNet and DeepFashion2. We also report thenumber of vertices per polygons, which is a measurement representing how gran-ularity of the masks produced. Table 2 shows that we have the second-highestaverage number of vertices among five datasets, next to LVIS.
4.3 Category and Attributes Analysis
There are 46 apparel categories and 294 attributes presented in the Fash-ionpedia dataset. On average, each image was annotated with 7.3 instances, 5.4categories, and 16.7 attributes. Of all the masks with categories and attributes,each mask has 3.7 attributes on average (max 14 attributes). Fashionpedia hasthe most diverse number of categories within one image among three fashiondatasets, while comparable to COCO (Figure 3(c)), since we provide a com-prehensive ontology for the annotation. In addition, Figure 3(f) shows the dis-tributions of categories and attributes in the training set, and highlights thelong-tailed nature of our data.
During the fine-grained attributes annotation process, we also ask the expertsto choose “not sure” if they are uncertain to make a decision, “not on the list”if they find another attribute that not provided. the majority of “not sure”comes from three attributes superclasses, namely “Opening Type”, “Waistline”,
10 M. Jia et al.
Table 2: Comparison of segmentation mask complexity amongst segmentationdatasets in both fashion and general domain (COCO 2017 instance training datawas used). Each statistic (mean and median) represents a bootstrapped 95%confidence interval following [15]. Boundary complexity was calculated accordingto [15,2]. Reported mask boundary complexity for COCO and LVIS was differentcompared with [15] due to different image resolution and image sets. The numberof vertices per polygon is calculated as the number of vertices in one polygon.Polygon is defined as one disjoint area. Masks (and polygons) with zero areawere ignored
Dataset Boundary complexity No. of vertices per polygon Images countmean median mean median
COCO [31] 6.65 - 6.66 6.07 - 6.08 21.14 - 21.21 15.96 - 16.04 118,287LVIS [15] 6.78 - 6.80 5.89 - 5.91 35.77 - 35.95 22.91 - 23.09 57,263ModaNet [54] 5.87 - 5.89 5.26 - 5.27 22.50 - 22.60 18.95 - 19.05 52,377Deepfashion2 [11] 4.63 - 4.64 4.45 - 4.46 14.68 - 14.75 8.96 - 9.04 191,960
Fashionpedia 8.36 - 8.39 7.35 - 7.37 31.82 - 32.01 20.90 - 21.10 45, 623
“Length”. Since there are masks only show a limited portion of apparel (a topinside a jacket for example), the annotators are not sure how to identify thoseattributes due to occlusion or viewpoint discrepancies. Less than 15% of masksfor each attribute superclasses account for “not on the list”, which illustratesthe comprehensiveness of our proposed ontology (see supplementary materialfor more details of the extra dataset analysis).
5 Evaluation Protocol and Baselines
5.1 Evaluation metric
In object detection, a true positive (TP) for each category c is defined as asingle detected object that matches a ground truth object with a Intersectionover Union (IoU) over a threshold τIoU. COCO’s main evaluation metric usesaverage precision averaged across all 10 IoU thresholds τIoU ∈ [0.5 : 0.05 : 0.95]and all 80 categories. We denote such metric as APIoU.
In the case of instance segmentation and attribute localization, we extendstandard COCO metric by adding one more constraint: the macro F1 score forpredicted attributes of single detected object with category c (see supplementarymaterial for the average choice of f1-score). We denote the F1 threshold as τF1 ,and it has the same range as τIoU (τF1 ∈ [0.5 : 0.05 : 0.95]). The main metricAPIoU+F1
reports averaged precision score across all 10 IoU thresholds, all 10macro F1 scores, and all the categories. Our evaluation API, code and trainedmodels are available at: https://fashionpedia.github.io/home/Model_and_API.
html.
Fashionpedia 11
Jacket
Symmetrical
Above-the-hiplength
Single-breasted
Droppedshoulder
Regular(fit)Plain
Washed
InstanceSegmentation
LocalizedAttribute
Attributes
Class
Box
1024
Mask
28x28x46
28x28x256
HxWx256RoI
RoI HxWx256 x4
HxWx256
+
backbone
HxWx256 x4
Fig. 4: Attribute-Mask R-CNN adds a multi-label attribute prediction head uponMask R-CNN for instance segmentation with attribute localization
5.2 Attribute-Mask R-CNN
We perform two tasks on Fashionpedia: (1) apparel instance segmentation (ignoringattributes); (2) instance segmentation with attribute localization. To better facilitateresearch on Fashionpedia, we design a strong baseline model named Attribute-Mask R-CNN that is built upon Mask R-CNN [17]. As illustrated in Figure 4, we extend MaskR-CNN heads to include an additional multi-label attribute prediction head, which istrained with sigmoid cross-entropy loss. Attribute-Mask R-CNN can be trained end-to-end for jointly performing instance segmentation and localized attribute recognition.
We leverage different backbones including ResNet-50/101 (R50/101) [18] with fea-ture pyramid network (FPN) [30] and SpineNet-49/96/143 [5]. The input image is re-sized to 1024 of the longer edge to feed all networks except SpineNet-143. For SpineNet-143, we instead use the input size of 1280. During training, other than standard randomhorizontal flipping and cropping, we use large scale jittering that resizes an image toa random ratio between (0.5, 2.0) of the target input image size, following Zoph etal . [55]. We use an open-sourced Tensorflow [1] code base1 for implementation and allmodels are trained with a batch size of 256. We follow the standard training scheduleof 1× (5625 iterations), 2× (11250 iterations), 3× (16875 iterations) and 6× (33750iterations) in Detectron2 [47], with linear learning rate scaling suggested by Goyal etal . [13].
5.3 Results Discussion
Attribute-Mask R-CNN. From results in Table 3, we have the following observa-tions: (1) Our baseline models achieve promising performance on challenging Fashion-pedia dataset. (2) There is a significant drop (e.g ., from 48.7 to 35.7 for SpineNet-143)in box AP if we add τF1 as another constraint for true positive. This is further verifiedby per super-category mask results in Table 4. This suggests that joint instance seg-mentation and attribute localization is a significantly more difficult task than instancesegmentation, leaving much more room for future improvements.
Main apparel detection analysis. We also provide in-depth detector analysisfollowing COCO detection challenge evaluation [31] inspired by Hoiem et al . [20]. Fig-ure 5 illustrates a detailed breakdown of bounding boxes false positives produced bythe detectors.
1 https://github.com/tensorflow/tpu/tree/master/models/official/detection
12 M. Jia et al.
Table 3: Baseline results of Mask R-CNN and Attribute-Mask R-CNN on Fash-ionpedia. The big performance gap between APIoU and APIoU+F1
suggests thechallenging nature of our proposed task
Backbone Schedule FLOPs(B) params(M) APboxIoU/IoU+F1
APmaskIoU/IoU+F1
R50-FPN
1×
296.7 46.4
38.7 / 26.6 34.3 / 25.52× 41.6 / 29.3 38.1 / 28.53× 43.4 / 30.7 39.2 / 29.56× 42.9 / 31.2 38.9 / 30.2
R101-FPN
1×
374.3 65.4
41.0 / 28.6 36.7 / 27.62× 43.5 / 31.0 39.2 / 29.83× 44.9 / 32.8 40.7 / 31.46× 44.3 / 32.9 39.7 / 31.3
SpineNet-496×
267.2 40.8 43.7 / 32.4 39.6 / 31.4SpineNet-96 314.0 55.2 46.4 / 34.0 41.2 / 31.8SpineNet-143 498.0 79.2 48.7 / 35.7 43.1 / 33.3
Table 4: Per super-category results (for masks) using Attribute-Mask R-CNNwith SpineNet-143 backbone. We follow the same COCO sub-metrics for overalland three super-categories for apparel objects. Result format follows [APIoU
/ APIoU+F1 ] or [ARIoU / ARIoU+F1 ] (see supplementary material for per-classresults)
Category AP AP50 AP75 APl APm APs
overall 43.1 / 33.3 60.3 / 42.3 47.6 / 37.6 50.0 / 35.4 40.2 / 27.0 17.3 / 9.4
outerwear 64.1 / 40.7 77.4 / 49.0 72.9 / 46.2 67.1 / 43.0 44.4 / 29.3 19.0 / 4.4parts 19.3 / 13.4 35.5 / 20.8 18.4 / 14.4 28.3 / 14.5 23.9 / 16.4 12.5 / 9.8accessory 56.1 / - 77.9 / - 63.9 / - 57.5 / - 60.5 / - 25.0 / -
Figure 5(a) and 5(b) compare two detectors trained on Fashionpedia and COCO.Errors of the COCO detector are dominated by imperfect localization (AP is increasedby 28.3 from overall AP at τIoU = 0.75) and background confusion (+15.7) (5(b)).Unlike the COCO detector, no mistake in particular dominates the errors producedby the Fashionpedia detector. Figure 5(a) shows that there are errors from localiza-tion (+13.4), classification (+6.9), background confusions (+6.6). Due to the spaceconstraint, we leave super-category analysis in the supplementary material.
Prediction visualization. Baseline outputs (with both segmentation masks andlocalized attributes) are also visualized in Figure 6. Our Attribute-Mask R-CNN achievesgood results even for small objects like shoes and glasses. Model can correctly predictfine-grained attributes for some masks on one hand (e.g., Figure 6 second image at toprow ). On the other hand, it also incorrectly predict the wrong nickname (welt) topocket (e.g., Figure 6 third image at top row ). These results show that there isheadroom remaining for future development of more advanced computer vision modelson this task (see supplementary material for more details of the baseline analysis).
Fashionpedia 13
overall-all-all
0 0.2 0.4 0.6 0.8 1recall
0
0.2
0.4
0.6
0.8
1
prec
isio
n
[.537] C75[.630] C50[.671] Loc[.720] Sim[.740] Oth[.806] BG[1.00] FN
(a) Fashionpedia
0 0.2 0.4 0.6 0.8 1recall
0
0.2
0.4
0.6
0.8
1
prec
isio
n
overall-all-all
[.399] C75[.589] C50[.682] Loc[.695] Sim[.713] Oth[.870] BG[1.00] FN
(b) COCO (2015 challenge winner)
Fig. 5: Main apparel detectors analysis. Each plot shows 7 precision recall curveswhere each evaluation setting is more permissive than the previous. Specifically,C75: strict IoU (τIoU = 0.75); C50: PASCAL IoU (τIoU = 0.5); Loc: local-ization errors ignored (τIoU = 0.1); Sim: supercategory False Positives (FPs)removed; Oth: category FPs removed; BG: background (and class confusion)FPs removed; FN: False Negatives are removed. Two plots are a comparisonbetween two detectors trained on Fashionpedia and COCO respectively. The re-sults are averaged over all categories. Legends present the area under each curve(corresponds to AP metric) in brackets as well
6 Conclusion
In this work, we focus on a new task that unifies instance segmentation and at-tribute recognition. To solve challenging problems entailed in this task, we introducedthe Fashionpedia ontology and dataset. To the best of our knowledge, Fashionpediais the first dataset that combines part-level segmentation masks with fine-grained at-tributes. We presented Attribute-Mask R-CNN, a novel model for this task, along witha novel evaluation metric. We expect models trained on Fashionpedia can be appliedto many applications including better product recommendation in online shopping,enhanced visual search results, and resolving ambiguous fashion-related words for textqueries. We hope Fashionpedia will contribute to the advances in fine-grained imageunderstanding in the fashion domain.
7 Acknowledgements
This research was partially supported by a Google Faculty Research Award. Wethank Kavita Bala, Carla Gomes, Dustin Hwang, Rohun Tripathi, Omid Poursaeed,Hector Liu, and Nayanathara Palanivel, Konstantin Lopuhin for their helpful feedbackand discussion in the development of Fashionpedia dataset. We also thank Zeqi Gu,Fisher Yu, Wenqi Xian, Chao Suo, Junwen Bai, Paul Upchurch, Anmol Kabra, andBrendan Rappazzo for their help developing the fine-grained attribute annotation tool.
14 M. Jia et al.
!"#$% &'''()#*$+&',"-.'/()#*$+0$)-$.()'1"$$)2#&'1(".#'/1"$$)2#0
3())4)'5 &'''()#*$+&'62.%$7()#*$+#.89#",)&'%)$7.#'%())4)
:"89)$ &'''%.(+;<)$$)&'%=,,)$2.8"($)-$.()'1"$$)2#&'1(".#'/1"$$)2#0;1)#.#*'$=1)&'%.#*()'>2)"%$)?
!"#$% &'''$)-$.()'1"$$)2#&'1(".#'/1"$$)2#0()#*$+&',"-.'/()#*$+0'
3())4)'5 &'''#.89#",)&'%)$7.#'%())4)()#*$+&'62.%$7()#*$+()#*$+&'%+;2$'/()#*$+0'
@;(("2 &'''#.89#",)&'%+.2$'/8;(("20#.89#",)&'2)*<("2'/8;(("20'
A;1 &'''%.(+;<)$$)&'%=,,)$2.8"(6".%$(.#)&'#;'6".%$(.#)()#*$+&'">;4)7$+)7+.1'/()#*$+0
39.2$ &'''%.(+;<)$$)&'%=,,)$2.8"($)-$.()'1"$$)2#&'1(".#'/1"$$)2#0()#*$+&',.#.'/()#*$+0
3())4)'5 &''' #.89#",)&'%)$7.#'%())4)()#*$+&'62.%$7()#*$+
3())4)'B &''''#.89#",)&'%)$7.#'%())4)()#*$+&'62.%$7()#*$+
:"89)$ &'''%.(+;<)$$)&'%=,,)$2.8"($)-$.()'1"$$)2#&'1(".#'/1"$$)2#0;1)#.#*'$=1)&'%.#*()'>2)"%$)?
A;1 &'''%.(+;<)$$)&'%=,,)$2.8"($)-$.()'1"$$)2#&'1(".#'/1"$$)2#0''''()#*$+&'">;4)7$+)7+.1'/()#*$+0
3())4)'B &'''#.89#",)&'%)$7.#'%())4)()#*$+&'62.%$7()#*$+()#*$+&'%+;2$'/()#*$+0
!"#$%
3())4)'5
3())4)'B
:"89)$
3())4)'B
3())4)'5
A;1
!"#$%
:"89)$
A;1
39.2$
!;89)$'5 &'''#.89#",)&'1"$8+'/1;89)$0''#.89#",)&'C("1'/1;89)$0'#.89#",)&'6)($'/1;89)$0
D)89(.#) &'''#)89(.#)'$=1)&'2;<#?'/#)890 !;89)$'5 &'''#.89#",)&'1"$8+'/1;89)$0#.89#",)&'6)($'/1;89)$0#.89#",)&'C("1'/1;89)$0%.(+;<)$$)&'%=,,)$2.8"(
3())4)'5 &'''#.89#",)&'%)$7.#'%())4)()#*$+&'%+;2$'/()#*$+0'
3())4)'B &'''#.89#",)&'%)$7.#'%())4)()#*$+&'%+;2$'/()#*$+0'
!;89)$'5
D)89(.#)
!;89)$
@;(("2
E"1)( !"#$%
3())4)'B
3())4)'5 @;(("2
!;89)$'B!;89)$'5
!;89)$'F
3())4)'B3())4)'5
3+;)'5
3+;)'B
G("%%)%
G("%%)%
G("%%)%
3+;)'5 3+;)'B
D)89(.#)
3+;)'B
3+;)'5
D)89(.#) &'''#)89(.#)'$=1)&'47#)89#)89(.#)'$=1)&'2;<#?'/#)890#)89(.#)'$=1)&';4"('/#)890
@;(("2
A;1
/"0 />0 /80 /?0
!;89)$ &'''#.89#",)&'1"$8+'/1;89)$0''#.89#",)&'C("1'/1;89)$0'#.89#",)&'6)($'/1;89)$0
!"#$% & '()#*#+,$-()&,./-,,0'()#*#+1/)#+$2&,3"4*,0/)#+$21$)4$*/),("$$)5#&,(/"*#,0("$$)5#1
6'( &,%*/2'7)$$)&,%-33)$5*8"/,,/)#+$2&,"9':);$2);2*(,0/)#+$21,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,
&,#)8</*#),$-()&,5'7#=,,,0#)8<1,#)8</*#),$-()&,':"/,0#)8<1,#)8</*#),$-()&,:;#)8<
&,,#*8<#"3)&,875:)=,0('8<)$1,#*8<#"3)&,%/"%2,0('8<)$1,,,,,,,,#*8<#"3)&,("$82,0('8<)$1
>"+ &,%*/2'7)$$)&,%-33)$5*8"/,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,
&,,#*8<#"3)&,("$82,0('8<)$1#*8<#"3)&,?)/$,0('8<)$1,,,#*8<#"3)&,%/"%2,0('8<)$1,#*8<#"3)&,875:)=,0('8<)$1,#*8<#"3)&,./"(,0('8<)$1
&,#*8<#"3)&,%)$;*#,%/)):),/)#+$2&,?5*%$;/)#+$2
@'//"5 &,#*8<#"3)&,%2*5$,08'//"51,,#*8<#"3)&,5)+7/"5,08'//"51
&,#)8</*#),$-()&,5'7#=,0#)8<1#)8</*#),$-()&,':"/,0#)8<1
>)/$ &,%*/2'7)$$)&,%-33)$5*8"/,,%*/2'7)$$)&,5)+7/"5,0.*$1$)4$*/),("$$)5#&,(/"*#,,,0("$$)5#1
&,%*/2'7)$$)&,%-33)$5*8"/,$)4$*/),("$$)5#&,(/"*#,,0("$$)5#1,'()#*#+,$-()&,%*#+/),95)"%$)=
&,,#*8<#"3)&,%)$;*#,%/)):),/)#+$2&,%2'5$,0/)#+$21,
&,#)8</*#),$-()&,5'7#=,0#)8<1#)8</*#),$-()&,:;#)8<#)8</*#),$-()&,':"/,0#)8<1,#)8</*#),$-()&,%8''(,,0#)8<1
&,/)#+$2&,%2'5$,0/)#+$21,#*8<#"3)&,%)$;*#,%/)):),
A5)%% &,,%*/2'7)$$)&,%-33)$5*8"/,'()#*#+,$-()&,B*(;7(,,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1
&,,#*8<#"3)&,%)$;*#,%/)):),/)#+$2&,?5*%$;/)#+$2,
A5)%% &,%*/2'7)$$)&,%-33)$5*8"/,,,$)4$*/),("$$)5#&,(/"*#,,0("$$)5#1,
&,#*8<#"3)&,%)$;*#,%/)):),/)#+$2&,?5*%$;/)#+$2,
@'"$ &,,%*/2'7)$$)&,%-33)$5*8"/,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,
>)/$ &,,%*/2'7)$$)&,%-33)$5*8"/,,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1
@'//"5 &,#*8<#"3)&,%2*5$,08'//"51,#*8<#"3)&,5)+7/"5,08'//"51,
&,#*8<#"3)&,("$82,0('8<)$1,#*8<#"3)&,./"(,0('8<)$1
>"+ &,%*/2'7)$$)&,%-33)$5*8"/,?"*%$/*#)&,2*+2,?"*%$,?"*%$/*#)&,#'53"/,?"*%$,/)#+$2&,3*#*,0/)#+$21,'()#*#+,$-()&,B*(;7(,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1
A5)%% &,%*/2'7)$$)&,%-33)$5*8"/,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,'()#*#+,$-()&,B*(;7(,
>"+ &,,%*/2'7)$$)&,%-33)$5*8"/,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1
&,,/)#+$2&,?5*%$;/)#+$2,#*8<#"3)&,%)$;*#,%/)):),
&,#*8<#"3)&,%)$;*#,%/)):),/)#+$2&,?5*%$;/)#+$2,
&,%*/2'7)$$)&,%-33)$5*8"/,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,!"#!$%"&'$($)*$(+,&-.(/'.0!/1$(+&!"0*($2/")3&%$($(+
A5)%% &,%*/2'7)$$)&,%-33)$5*8"/,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,'()#*#+,$-()&,B*(;7(
6'( &,%*/2'7)$$)&,%-33)$5*8"/,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,/)#+$2&,"9':);$2);2*(,,,0/)#+$21
!"#$% &,%*/2'7)$$)&,%-33)$5*8"/,,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,'()#*#+,$-()&,./-,0'()#*#+1,%*/2'7)$$)&,5)+7/"5,0.*$1,
&,#)8</*#),$-()&,5'7#=,0#)8<1,%*/2'7)$$)&,%-33)$5*8"/,#)8</*#),$-()&,':"/,0#)8<1
&,%*/2'7)$$)&,%-33)$5*8"/,%*/2'7)$$)&,$*+2$,0.*$1,/)#+$2&,3"4*,0/)#+$21,,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,#*8<#"3)&,/)++*#+%
0"1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,091,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,081,,,,,,,,,,,,,,,,,,,,,,,, 0=1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0)1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.1,,,,,,
!"#$%
6'(C)8</*#)
!'8<)$>"+
D2'),E
D2'),F
D2'),ED2'),E
D2'),E D2'),ED2'),FD2'),F
D2'),F
D2'),F
G/"%%)%
!'8<)$,E
D/)):),
@'//"5
C)8</*#)
>)/$
H"8<)$
6'(
!"#$% >78</)
C)8</*#) D/)):),,FD/)):),,E
A5)%%
>"+>"+
D/)):),,E
D/)):),,F
H"8<)$
@'"$
A5)%%
D/)):),,E D/)):),,FA5)%%
!"#$%
@'"$
D/)):),,I
C)8</*#)
6*+2$%,E 6*+2$%,F
>)/$
@'//"5
!'8<)$
>"+A5)%%
@'"$
!'8<)$,F
6*+2$%,E6*+2$%,E
6*+2$%,F
6*+2$%
C)8</*#)
C)8</*#)
C)8</*#)
C)8</*#)
!'8<)$
!'8<)$,E
!'8<)$
D/)):),
D/)):),E,
D/)):),F,
D/)):),E,
D/)):),F,
D/)):),E,
D/)):),F,
H"8<)$
H"8<)$
6*+2$%
&,#*8<#"3)&,/)++*#+%%*/2'7)$$)&,$*+2$,0.*$1/)#+$2&,3"4*,0/)#+$21$)4$*/),("$$)5#&,(/"*#,,,0("$$)5#1
6*+2$%,E
&,#*8<#"3)&,/)++*#+%%*/2'7)$$)&,$*+2$,0.*$1/)#+$2&,3"4*,0/)#+$21$)4$*/),("$$)5#&,(/"*#,,,0("$$)5#1
6*+2$%,F
&,,#*8<#"3)&,("$82,0('8<)$1#*8<#"3)&,?)/$,0('8<)$1,,,#*8<#"3)&,%/"%2,0('8<)$1,#*8<#"3)&,875:)=,0('8<)$1,#*8<#"3)&,./"(,0('8<)$1
!'8<)$,F
&,%*/2'7)$$)&,$*+2$,0.*$1,/)#+$2&,3"4*,0/)#+$21,,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,#*8<#"3)&,/)++*#+%'()#*#+,$-()&,./-,,,0'()#*#+1
6*+2$%
Fig. 6: Attribute-Mask R-CNN results on the Fashionpedia validation set. Masks,bounding boxes, and apparel categories (category score > 0.6) are shown. Thelocalized attributes from the top 5 masks (that contain attributes) on each imageare also shown. Correctly predicted categories and localized attributes are bolded
Fashionpedia 15
8 Supplementary Material
In our work we presented the new task of instance segmentation with attributelocalization. We introduced a new ontology and dataset, Fashionpedia, to further de-scribe the various aspects of this task. We also proposed a novel evaluation metric andAttribute-Mask R-CNN model for this task. In the supplemental material, we providethe following items that shed further insight on these contributions:
– More comprehensive experimental results (§ 8.1)– An extended discussion of Fashionpedia ontology and potential knowledge graph
applications (§ 8.2)– More details of dataset analysis (§ 8.3)– Additional information of annotation process (§ 8.4)– Discussion (§ 8.5)
8.1 Attribute-Mask R-CNN
Per-class evaluation. Fig. 7 presents detailed evaluation results per supercate-gory and per category. In Fig. 7, we follow the same metrics from COCO leaderboard(AP, AP50, AP75, APl, APm, APs, AR@1, AR@10, AR@100, ARs@100, ARm@100,ARl@100), with τIoU and τF1 if possible. Fig. 7 shows that metrics considering bothconstraint τIoU and τF1 are always lower than using τIoU alone across all the supercate-gories and categories. This further demonstrates the challenging aspect of our proposedtask.
In general, categories belong to “garment parts” have a lower AP and AR, com-paring with “outerwear” and “accessories”.
A detailed breakdown of detection errors is presented in Fig. 8 for supercategoriesand three main categories. In terms of supercategories in Fashionpedia, “outerwear”errors are dominated by within supercategory class confusions (Fig. 8(a)). Within thissupercategory class, ignoring localization errors would only raise AP slightly from77.5 to 79.1 (+1.6). A similar trend can be observed in class “skirt”, which belongsto “outerwear” (Fig. 8(d)). Detection errors of “part” (Fig. 8(b) 8(e)) and “acces-sory”(Fig. 8(c) 8(f)) on the other hand, are dominated by both background confusionand localization. “part” also has a lower AP in general, compared with other two super-categories. A possible reason is that objects belong to “part” usually have smaller sizesand lower counts.
F1 score calculation. Since we measure the f1 score of predicted attributes andgroundtruth attributes per mask, we consider the both options of multi-label multi-class classification with 294 classes for one instance, and binary classification for 294instances. Multi-label multi-class classification is a straightforward task, as it is a com-mon setting for most of the fine-grained classification tasks. In binary classificationscenario, we consider the 1 and 0 of the multi-hot encoding of both results and ground-truth labels as the positive and negative classes respectively. There are also two aver-aging choices: “micro” and “macro”. “Micro” averaging calculates the score globally bycounting the total true positives, false negatives and false positives. “Macro” averagingcalculates the metrics for each attribute class and reports the unweighted mean. Insum, there are four options of f1-score averaging methods: 1) “micro”, 2) “macro”, 3)“binary-micro”, 4) “binary-macro”.
As shown in Fig. 9, we present the APF1=τF1IoU+F1 , with τIoU averaged in the range
of [0.5 : 0.05 : 0.95]. τF1 is increased from 0.0 to 1.0 with a increment of 0.01. Fig. 9
16 M. Jia et al.
Max
Min
AP AP50 AP70 APl APm APs AR@1 AR@10 AR@100 ARs@100 ARm@100 ARl@100IoU + F1 IoU IoU + F1 IoU IoU + F1 IoU IoU + F1 IoU IoU + F1 IoU IoU + F1 IoU IoU + F1 IoU IoU + F1 IoU IoU + F1 IoU IoU + F1 IoU IoU + F1 IoU IoU + F1 IoU
Fig. 7: Detailed results (for masks) using Mask R-CNN with SpineNet-143 back-bone. We present the same metrics as COCO leaderboard for overall categories,three super categories for apparel objects, and 46 fine-grained apparel categories.We use both constraints (for example, APIoU and APIoU+F1
) if possible. For cat-egories without attributes, the value represents APIoU or ARIoU. “top” is shortfor “top, t-shirt, sweatshirt”. “head acc” is short for “headband, head covering,hair accessory”
Fashionpedia 17
overall-outerwear-all
0 0.2 0.4 0.6 0.8 1recall
0
0.2
0.4
0.6
0.8
1pr
ecis
ion
[.733] C75[.775] C50[.791] Loc[.912] Sim[.915] Oth[.921] BG[1.00] FN
(a) Super category: outerwear
overall-part-all
0 0.2 0.4 0.6 0.8 1recall
0
0.2
0.4
0.6
0.8
1
prec
isio
n
[.293] C75[.420] C50[.478] Loc[.499] Sim[.524] Oth[.649] BG[1.00] FN
(b) Super category: parts
overall-accessory-all
0 0.2 0.4 0.6 0.8 1recall
0
0.2
0.4
0.6
0.8
1
prec
isio
n
[.687] C75[.781] C50[.822] Loc[.841] Sim[.871] Oth[.912] BG[1.00] FN
(c) Super category: accessory
outerwear-skirt-all
0 0.2 0.4 0.6 0.8 1recall
0
0.2
0.4
0.6
0.8
1
prec
isio
n
[.741] C75[.788] C50[.795] Loc[.894] Sim[.894] Oth[.911] BG[1.00] FN
(d) Outerwear: skirt
part-pocket-all
0 0.2 0.4 0.6 0.8 1recall
0
0.2
0.4
0.6
0.8
1
prec
isio
n
[.397] C75[.633] C50[.792] Loc[.802] Sim[.809] Oth[.921] BG[1.00] FN
(e) Parts: pocket
accessory-tights, stockings-all
0 0.2 0.4 0.6 0.8 1recall
0
0.2
0.4
0.6
0.8
1
prec
isio
n
[.766] C75[.824] C50[.878] Loc[.903] Sim[.916] Oth[.941] BG[1.00] FN
(f) Accessory: tights, stock-
ings
Fig. 8: Main apparel detectors analysis. Each plot shows 7 precision recall curveswhere each evaluation setting is more permissive than the previous. Specifically,C75: strict IoU (τIoU = 0.75); C50: PASCAL IoU (τIoU = 0.5); Loc: localizationerrors ignored (τIoU = 0.1); Sim: supercategory False Positives (FPs) removed;Oth: category FPs removed; BG: background (and class confusion) FPs re-moved; FN: False Negatives are removed. The first row (overall-[supercategory]-[size]) contains results for three supercategories in Fashionpedia; the second row([supercategory]-[category]-[size]) shows results for three fine-grained categories(one per supercategory). Legends present the area under each curve (correspondsto AP metric) in brackets as well
0.0 0.2 0.4 0.6 0.8 1.0F1 score thresholds
0.175
0.200
0.225
0.250
0.275
0.300
0.325
0.350
APIo
U+
F1 sc
ore
microsmacrosbinary-microsbinary-macros
Fig. 9: APF1=τF1
IoU+F1 score with different τF1. The value presented are average overτIoU ∈ [0.5 : 0.05 : 0.95]. We use “binary-macro” as our main metric
18 M. Jia et al.
illustrates that as the value of τF1 increases, APF1=τF1IoU+F1 decreases in different rates
given different choices of f1 score calculation. There are 294 attributes in total, and anaverage of 3.7 attributes per mask in Fashionpedia training data. It’s not surprisingto observe that“Binary-micro” produces high f1-scores in general (higher than 0.97),as the APIoU+F1 score only decreases if the τF1 ≥ 0.97. On the other hand, “macro”averaging in multi-label multi-class classification scenario gives us extremely low f1-scores (0.01− 0.03). This further demonstrates the room for improvement for localizedattribute classification task. We used “binary-macros” as our main metric.
Result visualization. Figure 10 shows that our simple baseline model can de-tect most of the apparel categories correctly. However, it also produces false positivessometimes. For example, it segments legs as tights and stockings (Figure 10(f) ).A possible reason is that both objects have the same shape and stockings are worn onthe legs.
Predicting fine-grained attributes, on the other hand, is a more challenging prob-lem for the baseline model. We summarize several issues: (1) predict more attributesthan needed: (Figure 10(a) , (b) , (c) ); (2) fail to distinguish amongfine-grained attributes: for example, dropped-shoulder sleeve (ground truth) v.s. set-insleeve (predicted) (Figure 10(e) ); (3) other false positives: Figure 10(e) hasa double-breasted opening, yet the model predicted it as the zip opening.
These results further show that there are rooms for improvement and future devel-opment of more advanced computer vision models on this instance segmentation withattribute localization task.
Result visualization on other datasets. Other fashion datasets such as ModaNetand DeepFashion2 also contain instance segmentation masks. Aside from the resultspresented in the main paper (see Table 3 of the main paper) on Fashionpedia, wepresent the qualitative analysis on the segmentation masks generated among Fashion-pedia(Fig. 10), ModaNet(Fig. 11(a-f)) and DeepFashion2(Fig. 11(g-l)) datasets. Photosof the first row in Fig. 11 are from ModaNet. They show that the quality of the gen-erated masks on ModaNet is fairly good and comparable to Fashionpedia in general(Fig. 11(a)). We also have a couple of observations of the failure cases: (1) fail to de-tect apparel objects: for example, the shoe from Fig. 11(c) is not detected. Parts of thepants (Fig. 11(c)) and coat (Fig. 11(d)) are not detected; (2) fail to detect some cate-gories: Fig. 11(e) shows that the shoes on the shoe rack and right foot are not detected,possibly due to a lack of such instances in the ModaNet training dataset. Similar toFashionpedia, ModaNet mostly consist of street style images. See Fig. 12(b) for ex-ample predictions from model trained on Fashionpedia; 3) close-up images: ModaNetcontains mostly full-body images. This might be the possible reason to the decreasedquality of predicted masks on close-up shot like Fig. 11(f).
For DeepFashion 2 (Fig. 11(g,h,k)), the generated segmentation masks tends to nottightly follow the contours of garments in the images. The main reason possibly is thatthe average number of vertices per polygon is 14.7 for Deepfashion2, which is lower thanFashionpedia and ModaNet (see Table 2 in the main text). Our qualitative analysis alsoshows that: 1) the model will generate the segmentation masks of pants (Fig. 11(i)) andtops (Fig. 11(j)) that are not visible in the images. Both of them are covered by a jacket.And we find that in DeepFashion 2, some part of the garments which is covered byother objects are indeed annotated with segmentation masks; 2) better performanceon objects that are not on human body (Fig. 11(l)): DeepFashion 2 contains manycommercial-customer image pairs (both images with and without human body) in thetraining dataset. In contrast, both Fashionpedia and ModaNet contain more imageswith human body than images without human body in the training datasets.
Fashionpedia 19
!"#$% & '()#*#+,$-()&,./-,,0'()#*#+1/)#+$2&,3"4*,0/)#+$21$)4$*/),("$$)5#&,(/"*#,0("$$)5#1
6'( &,%*/2'7)$$)&,%-33)$5*8"/,,/)#+$2&,"9':);$2);2*(,0/)#+$21,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,
&,#)8</*#),$-()&,5'7#=,,,0#)8<1,#)8</*#),$-()&,':"/,0#)8<1,#)8</*#),$-()&,:;#)8<
&,,#*8<#"3)&,875:)=,0('8<)$1,#*8<#"3)&,%/"%2,0('8<)$1,,,,,,,,#*8<#"3)&,("$82,0('8<)$1
>"+ &,%*/2'7)$$)&,%-33)$5*8"/,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,
&,,#*8<#"3)&,("$82,0('8<)$1#*8<#"3)&,?)/$,0('8<)$1,,,#*8<#"3)&,%/"%2,0('8<)$1,#*8<#"3)&,875:)=,0('8<)$1,#*8<#"3)&,./"(,0('8<)$1
&,#*8<#"3)&,%)$;*#,%/)):),/)#+$2&,?5*%$;/)#+$2
@'//"5 &,#*8<#"3)&,%2*5$,08'//"51,,#*8<#"3)&,5)+7/"5,08'//"51
&,#)8</*#),$-()&,5'7#=,0#)8<1#)8</*#),$-()&,':"/,0#)8<1
>)/$ &,%*/2'7)$$)&,%-33)$5*8"/,,%*/2'7)$$)&,5)+7/"5,0.*$1$)4$*/),("$$)5#&,(/"*#,,,0("$$)5#1
&,%*/2'7)$$)&,%-33)$5*8"/,$)4$*/),("$$)5#&,(/"*#,,0("$$)5#1,'()#*#+,$-()&,%*#+/),95)"%$)=
&,,#*8<#"3)&,%)$;*#,%/)):),/)#+$2&,%2'5$,0/)#+$21,
&,#)8</*#),$-()&,5'7#=,0#)8<1#)8</*#),$-()&,:;#)8<#)8</*#),$-()&,':"/,0#)8<1,#)8</*#),$-()&,%8''(,,0#)8<1
&,/)#+$2&,%2'5$,0/)#+$21,#*8<#"3)&,%)$;*#,%/)):),
A5)%% &,,%*/2'7)$$)&,%-33)$5*8"/,'()#*#+,$-()&,B*(;7(,,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1
&,,#*8<#"3)&,%)$;*#,%/)):),/)#+$2&,?5*%$;/)#+$2,
A5)%% &,%*/2'7)$$)&,%-33)$5*8"/,,,$)4$*/),("$$)5#&,(/"*#,,0("$$)5#1,
&,#*8<#"3)&,%)$;*#,%/)):),/)#+$2&,?5*%$;/)#+$2,
@'"$ &,,%*/2'7)$$)&,%-33)$5*8"/,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,
>)/$ &,,%*/2'7)$$)&,%-33)$5*8"/,,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1
@'//"5 &,#*8<#"3)&,%2*5$,08'//"51,#*8<#"3)&,5)+7/"5,08'//"51,
&,#*8<#"3)&,("$82,0('8<)$1,#*8<#"3)&,./"(,0('8<)$1
>"+ &,%*/2'7)$$)&,%-33)$5*8"/,?"*%$/*#)&,2*+2,?"*%$,?"*%$/*#)&,#'53"/,?"*%$,/)#+$2&,3*#*,0/)#+$21,'()#*#+,$-()&,B*(;7(,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1
A5)%% &,%*/2'7)$$)&,%-33)$5*8"/,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,'()#*#+,$-()&,B*(;7(,
>"+ &,,%*/2'7)$$)&,%-33)$5*8"/,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1
&,,/)#+$2&,?5*%$;/)#+$2,#*8<#"3)&,%)$;*#,%/)):),
&,#*8<#"3)&,%)$;*#,%/)):),/)#+$2&,?5*%$;/)#+$2,
&,%*/2'7)$$)&,%-33)$5*8"/,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,!"#!$%"&'$($)*$(+,&-.(/'.0!/1$(+&!"0*($2/")3&%$($(+
A5)%% &,%*/2'7)$$)&,%-33)$5*8"/,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,'()#*#+,$-()&,B*(;7(
6'( &,%*/2'7)$$)&,%-33)$5*8"/,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,/)#+$2&,"9':);$2);2*(,,,0/)#+$21
!"#$% &,%*/2'7)$$)&,%-33)$5*8"/,,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,'()#*#+,$-()&,./-,0'()#*#+1,%*/2'7)$$)&,5)+7/"5,0.*$1,
&,#)8</*#),$-()&,5'7#=,0#)8<1,%*/2'7)$$)&,%-33)$5*8"/,#)8</*#),$-()&,':"/,0#)8<1
&,%*/2'7)$$)&,%-33)$5*8"/,%*/2'7)$$)&,$*+2$,0.*$1,/)#+$2&,3"4*,0/)#+$21,,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,#*8<#"3)&,/)++*#+%
0"1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,091,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,081,,,,,,,,,,,,,,,,,,,,,,,, 0=1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0)1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.1,,,,,,
!"#$%
6'(C)8</*#)
!'8<)$>"+
D2'),E
D2'),F
D2'),ED2'),E
D2'),E D2'),ED2'),FD2'),F
D2'),F
D2'),F
G/"%%)%
!'8<)$,E
D/)):),
@'//"5
C)8</*#)
>)/$
H"8<)$
6'(
!"#$% >78</)
C)8</*#) D/)):),,FD/)):),,E
A5)%%
>"+>"+
D/)):),,E
D/)):),,F
H"8<)$
@'"$
A5)%%
D/)):),,E D/)):),,FA5)%%
!"#$%
@'"$
D/)):),,I
C)8</*#)
6*+2$%,E 6*+2$%,F
>)/$
@'//"5
!'8<)$
>"+A5)%%
@'"$
!'8<)$,F
6*+2$%,E6*+2$%,E
6*+2$%,F
6*+2$%
C)8</*#)
C)8</*#)
C)8</*#)
C)8</*#)
!'8<)$
!'8<)$,E
!'8<)$
D/)):),
D/)):),E,
D/)):),F,
D/)):),E,
D/)):),F,
D/)):),E,
D/)):),F,
H"8<)$
H"8<)$
6*+2$%
&,#*8<#"3)&,/)++*#+%%*/2'7)$$)&,$*+2$,0.*$1/)#+$2&,3"4*,0/)#+$21$)4$*/),("$$)5#&,(/"*#,,,0("$$)5#1
6*+2$%,E
&,#*8<#"3)&,/)++*#+%%*/2'7)$$)&,$*+2$,0.*$1/)#+$2&,3"4*,0/)#+$21$)4$*/),("$$)5#&,(/"*#,,,0("$$)5#1
6*+2$%,F
&,,#*8<#"3)&,("$82,0('8<)$1#*8<#"3)&,?)/$,0('8<)$1,,,#*8<#"3)&,%/"%2,0('8<)$1,#*8<#"3)&,875:)=,0('8<)$1,#*8<#"3)&,./"(,0('8<)$1
!'8<)$,F
&,%*/2'7)$$)&,$*+2$,0.*$1,/)#+$2&,3"4*,0/)#+$21,,,$)4$*/),("$$)5#&,(/"*#,0("$$)5#1,#*8<#"3)&,/)++*#+%'()#*#+,$-()&,./-,,,0'()#*#+1
6*+2$%
Fig. 10: Baseline results on the Fashionpedia validation set. Masks, boundingboxes, and apparel categories (category score > 0.6) are shown. Attributes fromtop 10 masks (that contain attributes) from each image are also shown. Correctpredictions of objects and attributes are bolded
Generalization to the other image domains. For Fashionpedia, we also in-ference on images found in online shopping websites, which usually displays a singleapparel category, with or without a fashion model. We found out that the learnedmodel works reasonably well if the apparel item is worn by a model (Fig. 12).
8.2 Fashionpedia Ontology and Knowledge Graph
Fig. 13 presents our Fashionpedia ontology in detail. Fig. 14 and 15 displays thetraining data mask counts per category and per attributes. Utilizing the proposed on-tology and the image dataset, a large-scale fashion knowledge graph can be constructedto represent the fashion world in the product level. Fig. 16 illustrates a subset of theFashionpedia knowledge graph.
Apparel graphs. Integrating the main garments, garment parts, attributes, andrelationships presented in one outfit ensemble, we can create an apparel graph repre-sentation for each outfit in an image. Each apparel graph is a structured representation
20 M. Jia et al.
!"#$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!%#$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!&#$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ !'#$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!(#$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!)#$$$$$$
!*#$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!+#$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!,#$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!-#$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$!.#$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ !/#$$$$$$
!"#
$%&'
(&&)
*$+,-"./
Fig. 11: Baseline results on ModaNet and DeepFashion2 validation set
(a) (b)
Fig. 12: Generated masks on online-shopping images [53]. (a) and (b) show thesame types of shoes in different settings. Our model correctly detects and cate-gorizes the pair of shoes worn by a fashion model, yet mistakenly detects shoesas jacket and a bag in (b)
Fashionpedia 21
(a) Categories (b) Attributes (partially)
Fig. 13: Apparel categories (a) and fine-grained attributes (b) hierarchy in Fash-ionpedia
sleev
esho
e
neckl
inepo
cketdre
ss
top, t-
shirt,
swea
tshirtpa
ntscollarzip
per
jacket
bag,
walletbe
lt
shirt,
blouselap
elbe
adskirtriv
et
glasse
s
tights
, stock
ings
appliq
ue
hair a
cc*watc
hbu
cklecoa
tsho
rtssock ha
truf
fle
swea
ter tieglo
vesca
rfflo
werho
od
cardig
anseq
uin
jumpsu
it
epau
letteve
stfrin
gebowtas
selrib
boncap
e
umbre
lla
leg warm
er102
103
104
Coun
ts
Apparel categories
Fig. 14: Mask counts per apparel categories in training data. “head acc” is shortfor “headband, head covering, hair accessory”
22 M. Jia et al.
of an outfit ensemble, containing certain types of garments. Nodes in the graph repre-sent the main garments, garment parts, and attributes. Main garments and garmentparts are linked to their respective attributes through different types of relationships.Figure 17 shows more image examples with apparel graphs.
Fashionpedia knowledge graph. While apparel graphs are localized representa-tions of certain outfit ensembles in fashion images, we can also create a single Fashion-pedia knowledge graph (Figure 16). The Fashionpedia knowledge graph is the unionof all apparel graphs and includes entire main garments, garment parts, attributes,and relationships in the dataset. In this way, we are able to represent and understandfashion images in a more structured way.
We expect our Fashionpedia knowledge graph and the database to have applica-bility to extending the existing knowledge graph (such as WikiData [45]) with noveldomain-specific knowledge, improving the underlying fashion product recommendationsystem, enhancing search engine’s results for fashion visual search, resolving ambiguousfashion-related words for text search, and more.
8.3 Dataset Analysis
Fig. 17 shows more annotation examples, represented in the exploded views ofannotation diagrams. Table 5 displays the details about “not sure” and “not on thelist” results during attribute annotation process. We present the result per super-categories of attributes. Label “not sure” means the expert annotator is uncertainabout the choice given the segmentation mask. “Not on the list” means the annotatoris certain that the given mask presents another attributes that is not presented inthe Fashionpedia ontology. Other than “nicknames” (which is the specific name for acertain apparel category), less than 6% of the total masks account for the “not on thelist” category.
Fig. 18 and 19 also compare Fashionpedia and other images datasets in terms ofimage size and vertices per polygons.
set-i
n sle
eve
class
ic (t-
shirt
)we
lt (p
ocke
t)fla
p (p
ocke
t)go
wnpa
tch
(poc
ket)
curv
ed (p
ocke
t)sh
irt (c
olla
r)sla
sh (p
ocke
t)je
ans
drop
ped-
shou
lder
slee
vebl
azer
cap
(sle
eve)
notc
hed
(lape
l)ta
nk (t
op)
shea
th (d
ress
)ra
glan
(sle
eve)
legg
ings
shea
th (s
kirt)
shor
t (sh
orts
)sh
ift (d
ress
)re
gula
r (co
llar)
skat
er (d
ress
)bo
dyco
n (d
ress
)cr
op (t
op)
bike
r (ja
cket
)ca
miso
ledo
lman
(sle
eve)
, bat
wing
(sle
eve)
puff
(sle
eve)
tuni
c (d
ress
)se
am (p
ocke
t)na
pole
on (l
apel
)ba
nded
(col
lar)
peak
(lap
el)
shaw
l (la
pel)
kim
ono
(sle
eve)
halte
r (dr
ess)
bell
(sle
eve)
unde
rshi
rtbi
shop
(sle
eve)
kang
aroo
(poc
ket)
shirt
(dre
ss)
tube
(top
)sli
p (d
ress
)cir
cula
r flo
unce
(sle
eve)
blan
ket (
coat
)ov
ersiz
ed (l
apel
)di
rndl
(ski
rt)tru
cker
(jac
ket)
tuxe
do (j
acke
t)ho
odie
tuni
c (to
p)st
and-
away
(col
lar)
swea
ter (
dres
s)ha
lter (
top)
skat
er (s
kirt)
crop
(pan
ts)
poet
(sle
eve)
hobb
le (s
kirt)
bom
ber (
jack
et)
leg
of m
utto
n (s
leev
e)ju
mpe
r (dr
ess)
sailo
r (pa
nts)
peg
(pan
ts)
over
sized
(col
lar)
peas
ant (
top)
track
(jac
ket)
pea
(coa
t)ca
rgo
(poc
ket)
tulip
(sle
eve)
sund
ress
pete
r pan
(col
lar)
sailo
r (sh
irt)
roll-
up (s
horts
)be
rmud
a (s
horts
)po
lo (s
hirt)
trenc
h (c
oat)
man
darin
(col
lar)
capr
i (pa
nts)
blou
son
(dre
ss)
polo
(col
lar)
tedd
y be
ar (c
oat)
culo
ttes
ragl
an (t
-shi
rt)ra
h-ra
h (s
kirt)
loun
ge (s
horts
)pu
ffer (
jack
et)
wrap
(coa
t)wr
ap (d
ress
)pa
rka
crop
(jac
ket)
norfo
lk (j
acke
t)ba
ll go
wn (s
kirt)
kafta
nsw
eatp
ants
hip-
hugg
ers (
pant
s)bo
w (c
olla
r)va
rsity
(jac
ket)
rugb
y (s
hirt)
bole
roac
cord
ion
(ski
rt)sm
ock
(top)
jodh
pur
chan
el (j
acke
t)sa
fari
(jack
et)
wind
brea
ker
henl
ey (s
hirt)
ringe
r (t-s
hirt)
tie-u
p (s
horts
)gy
psy
(ski
rt)m
ilitar
y (c
oat)
puffe
r (co
at)
dust
er (c
oat)
carg
o (p
ants
)te
a (d
ress
)ki
mon
otu
tu (s
kirt)
nigh
tgow
nsh
earli
ng (c
oat)
chem
ise (d
ress
)ha
rem
(pan
ts)
culo
tte (s
horts
)pe
a (ja
cket
)cla
ssic
milit
ary
(jack
et)
trunk
sas
ymm
etric
(col
lar)
wrap
(ski
rt)pr
airie
(ski
rt)ch
else
a (c
olla
r)dr
ess (
coat
)sw
ing
(coa
t)du
ffle
(coa
t)tra
ck (p
ants
)tu
lip (s
kirt)
robe
carg
o (s
horts
)an
orak
cam
o (p
ants
)go
det (
skirt
)bo
oty
(sho
rts)
boar
dsho
rtssa
ilor (
colla
r)ch
eong
sam
sra
inco
atne
hru
(jack
et)
kilt
saro
ng (s
kirt)
carg
o (s
kirt)
jabo
t (co
llar)
bloo
mer
ssk
ort
mao
(jac
ket)
flam
enco
(ski
rt)
101
102
103
104
Coun
ts
Apparel attributes (nickname)
(a)
Fig. 15: Mask counts per attributes in training data, grouped by super categories.Best viewed digitally
Fashionpedia 23
wrist
-leng
th
abov
e-th
e-hi
p (le
ngth
)
shor
t (le
ngth
)
hip
(leng
th)
min
i (le
ngth
)
max
i (le
ngth
)
thre
e qu
arte
r (le
ngth
)
floor
(len
gth)
abov
e-th
e-kn
ee (l
engt
h)
micr
o (le
ngth
)
elbo
w-le
ngth
knee
(len
gth)
mid
i
belo
w th
e kn
ee (l
engt
h)
sleev
eles
s
103
104Co
unts
Apparel attributes (length)
(a)
roun
d (n
eck)
v-ne
ckov
al (n
eck)
scoo
p (n
eck)
stra
ight
acr
oss (
neck
)bo
at (n
eck)
swee
thea
rt (n
eckl
ine)
u-ne
ckpl
ungi
ng (n
eckl
ine)
asym
met
ric (n
eckl
ine)
colla
rless
crew
(nec
k)hi
gh (n
eck)
one
shou
lder
squa
re (n
eckl
ine)
turtl
e (n
eck)
halte
r (ne
ck)
off-t
he-s
houl
der
surp
lice
(nec
k)qu
een
anne
(nec
k)illu
sion
(nec
k)co
wl (n
eck)
keyh
ole
(nec
k)ch
oker
(nec
k)cr
osso
ver (
neck
)
102
103
Coun
ts
Apparel attributes (neckline type)
(b)
zip-u
p
singl
e br
east
ed
fly (o
peni
ng)
no o
peni
ng
wrap
ping
doub
le b
reas
ted
lace
up
buck
led
(ope
ning
)
togg
led
(ope
ning
)
chai
ned
(ope
ning
)
102
103
104
Coun
ts
Apparel attributes (opening type)
(c)
norm
al w
aist
no w
aist
line
high
wai
st
low
waist
drop
ped
waist
line
empi
re w
aist
line
basq
ue (w
asitl
ine)
103
104
Coun
ts
Apparel attributes (waistline)
(d)
no sp
ecia
l man
u_te
ch*
linin
gpr
inte
dga
ther
ing
wash
edpl
eat
appl
ique
(a)
bead
(a)
ruch
ed slit
dist
ress
edcu
tout
rivet
(a)
tiere
dse
quin
(a)
fraye
dbu
rnou
tqu
ilted
perfo
rate
dsm
ocki
ngem
boss
ed
102
103
104
Coun
ts
Apparel attributes (textile finishing, manufacturing techniques)
(e)
plai
n (p
atte
rn)
flora
lst
ripe
chec
kle
tters
, num
bers
abst
ract
geom
etric do
tca
rtoon
plan
tto
ile d
e jo
uyfa
ir isl
epa
isley
leop
ard
chev
ron
cam
oufla
gech
eeta
har
gyle
snak
eski
n (p
atte
rn)
houn
dsto
oth
(pat
tern
)he
rring
bone
(pat
tern
)ze
bra
gira
ffepe
acoc
k
102
103
104
Coun
ts
Apparel attributes (textile pattern)
(f)
sym
met
rical
regu
lar (
fit)
tigh
t (fit
)
st
raig
ht
lo
ose
(fit)
a-li
ne
p
encil
fit a
nd fl
are
asy
mm
etric
al
c
ircle
flar
e
b
ell
peg
trum
pet
ove
rsize
d
m
erm
aid
hig
h lo
w
w
ide
leg
tent
boo
tcut
pep
lum
bal
loon
cur
ved
(fit)
bag
gy
b
ell b
otto
m
102
103
104
Coun
ts
Apparel attributes (silhouette)
,(g)
no n
on-te
xtile
mat
eria
l
plas
tic
met
al
gem fur
rubb
er
sued
e
shea
rling
feat
her
croc
odile
snak
eski
n
wood
bone
ivor
y
100
101
102
103
104
105
Coun
ts
Apparel attributes (non-textile material type)
(h)
Fig. 15: Mask counts per attributes in training data, grouped by super categories(cont.). Best viewed digitally
24 M. Jia et al.
CONTAIN
CO
NTA
IN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
CONT
AIN
CONT
AIN
CONTAIN
CO
NTA
IN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
CONT
AIN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
HAS_ATTRIBUTE
CONT
AIN
CONTAIN
CONTAIN
CONTAIN
CONT
AIN
CONTAIN
CONTAIN
CONTAIN
CONTAINCONTAIN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
CO
NTAIN
CONT
AIN
CONTAIN
CONTAIN
CONT
AIN
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
CO
NTA
INCONT
AIN
CONTAIN
CONTAIN
CONT
AIN
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN
CO
NTAIN
CONTAIN
CONT
AIN
CONTAIN
CONTAIN
CONTAIN
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
CONTAIN
CONT
AIN
CONTAIN
CONT
AIN
CONTAIN
CO
NTAIN
CONTAIN
HAS_ATTRIBUTE
CONTAIN
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
EHAS
_ATT
RIBUTE
CONTAIN
CONTAIN
CONTAIN
CO
NTAIN
CONTAIN
CONT
AIN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
CO
NTAIN
CONTAIN
HAS
_ATT
RIB
UTE
HAS_ATTRIBUTE
HAS
_ATT
RIB
UTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
CO
NTA
IN
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTR
IBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN
CONTAIN
CONTAIN
CONTAIN
HAS_
ATTR
IBUT
EH
AS_ATTRIBU
TE
HAS
_ATT
RIB
UTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
EHAS
_ATT
RIBU
TE
CONTAIN
CONTAIN
CONTAIN
HAS_ATTRIBUTE HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN
CONT
AIN
CONTAIN
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONT
AIN
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
CONT
AIN
CONTAIN
HAS_ATTRIBUTE
CONTAIN
CONTAIN
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTEHAS_ATTRIBUTEHA
S_AT
TRIB
UTE
HAS_ATTRIBUTE
CONTAIN
CONT
AIN
HAS_ATTRIBUTE
CONTAIN
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN
CONTAIN
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTECONTAIN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN
CONT
AIN
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_A
TTRIB
UTE
CONTAIN
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTEHAS_ATTRIBUTE
CONT
AIN
HAS_ATT
RIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
E
CONTAIN
CO
NTA
IN
CONTAIN
HAS
_ATT
RIB
UTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTEHAS_
ATTR
IBUT
E
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
CONTAIN
CONTAIN
CONTAIN
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_A
TTRIB
UTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
CONT
AIN
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN
CONTAIN
CONTAIN
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONT
AIN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
HAS_ATTRIBUTEH
AS_A
TTR
IBU
TEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
CONTAIN
CONTAIN
HAS_ATTRIBUTEHAS_ATTRIB…
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONT
AIN
CONTAIN
CONTAIN
CONT
AIN
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_
ATT…
HAS_ATTRIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
CONTAIN
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN
CONT
AIN
HAS_ATTRIBUTE
CONTAIN
HAS
_ATT
RIB
UTE
HAS_ATTR
IBUTE
HAS_
ATTR
IBUT
E HAS_
ATTR
IBUT
E
HAS_
ATTR
IBUT
EHA
S_AT
TRIB
UTE
CONTAIN
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONT
AIN
CONT
AIN
CO
NTAIN
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN
HAS_ATT
RIBUTE
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATT
RIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN
CONTAIN
CONT
AIN
HAS_ATTRIBUTE
CONTAIN
CONTAIN
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
CONTAIN
CO
NTAIN
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
CO
NTA
IN
CONTAIN
product_3
Jacket / Blazer
product_2
product_1
product_7
Coat
product_6
product_5
product_4
product_9
Shorts
product_8
product_13
Skirt
product_12
product_11
product_10
product_18
Tops / T-ShirtSweatshirt
product_17
product_16
product_15
product_14
product_20
Dress
product_19
Bolero
Collarless
Set-in sleeve
Symm-etrical
Regular (fit)
Above-the-…
Wrist-length
Crop (Jacket)
Plain(pattern)
EmpireWaistline
Asym-metrical
Pleat /pleated
Zip-up
Peplum
Leather /Faux-leather
Biker(Jacket) /
Napoleon(Lapel)
Threequarter
Puffer(Jacket) /
Down(Jacket)
Quilted
Oversized(Collar)
Above-the-k…
Check / Plaid/ Tartan
Dropped-sh…
Loose (fit)
Fur
Hip (length)
Washed
Distressed /Ripped
Slit
Fly (Opening)
Mini(length) /
Low waist /
Curved(Pocket)
NormalWaist /
Pencil
StraightTight (fit) /
Slim (fit) /Skinny (fit)
Skater (Skirt)
Circle
Flare
Off-the-shoulder
Straightacross
(Neck)
Tiered /layered
Sleeveless
Shirt (Collar)
Ringer(T-shirt)
Round(Neck) /
Roundneck Stripe
One shoulder
Perfo-rated
Dolman(Sleeve) &
CONTAIN
CONTAIN
CONTAIN
CO
NTA
IN
CONTAIN
CONTAINCONTAIN
CONT
AIN
CONTAIN
CONTAIN
CONTAIN
HAS_ATTRIBUTE
H…
HAS_ATTRIBUTE
HAS_ATTRIBUTEH
AS_A
…
HAS_
ATTR
IBUT
E
HAS_…
HAS_ATTRIBUTE
Jacket / Blazer
Set-in sleeve
Regular (fit)
Above-the-hip
Plain(pattern)
Asymmet -rical
Zip-up Leather
Biker(Jacket)
Napoleon(Lapel)
Threequarter(length) /
CO
NTAIN
CONTAIN
CONTAINCONTAIN
CONTAIN
CONTAIN
CONT
AIN
CONTAIN
HAS_ATTRIBUTE
HAS_ATTRIBUTEHAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_A
TTRIB
UTE
HAS_
ATTR
IBUT
E
Skirt
Symmet -rical
Plain(pattern) Above
-the-Knee
NormalWaist
Skater (Skirt)
Circle
FlareCONTAIN
CONTAIN
CONT
AIN
CONTAIN
CONTAIN
CONTAIN
CONTAIN
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTRI…
HAS_ATTRIBUTE
HAS_ATTRIBUTE
HAS_ATTRIBUTE
Shorts
Symmetr -ical
Regular (fit)
Washed
Distressed /Ripped
Fly (Opening)
Mini(length)
CONTAINCONTAIN
CONTAIN
CONTAIN CONTAIN
CONT
AIN
CONTAIN
HAS_
ATTR
IBUT
E
HAS_ATTRIBUTE
HAS_ATTR
IBUTE
HAS_ATTRIBUTE
CoatCollarless
Set-in sleeve
Symmet -rical
Wrist-length
Fur
Hip (length)
Apparel CategoriesFine-grained Attributes
Fig. 16: Fashionpedia Knowledge Graph: we present a subset of the Fashionpediaknowledge graph by aggregating 20 annotated products. The knowledge graphcan be used as a tool for generating structural information
We compare image resolutions between Fashionpedia and four other segmentationdatasets (COCO and LVIS share the same images). Fig. 18 shows that images inFashionpedia have the most diverse image width and height. While ModaNet has themost consistent resolutions of images. Note that high resolution images will burden thedata loading process of training. With that in mind, we will release our dataset in boththe resized and the original versions.
We also report the distribution of number of vertices per polygons in Fig. 19. Thismeasures the annotation effort in mask annotation. Fashionpedia has the second-widestrange, next to LVIS.
Table 5: Percentage of attributes in Fashionpedia broken down by super-class.“Tex finish, manu-tech.” is short for “Textile finishing, Manufacturing tech-niques”. Summaries of “not sure” and “not on the list” during attributes an-notations are also presented. It was calculated by the counts divided by thetotal masks with attributes. “not sure” is mainly due to occlusion inside theimages, which cause some super-classes (such as waistline, opening type, andlength) are unidentifiable in the images. The percentage of “not on the list” isless than 15%. This demonstrates the comprehensiveness of our Fashionpediaontology
Super-category class count not sure not on the list
Length 15 12.79% 0.01%Nickname 153 9.15 % 12.76%Opening Type 10 32.69% 3.90%Silhouettes 25 2.90% 0.27%Tex finish, manu-tech 21 4.47% 1.34%Textile Pattern 24 2.18% 5.30%None-Textile Type 14 4.90% 4.07%Neckline 25 9.57% 3.38%Waistline 7 30.46% 0.17%
Fashionpedia 25
!"#$%
&#%'()*'
+,-% .*/01,-'#2#34
5#6*'0*'#3$7
8,9("*0:"2%$
!*"2#
;,**"9
<*"%%'%
=*''>'=*''>' !,?6'$!,?6'$
@"?6'$
&#%'()*'
A9'%%
B'*,:C$7'C6#''
A,D)*'0)9'"%$'E
F,,%'01G2$4;7'?60H0!*"2E0
-*"2#
=7,'%
=*''>'=*''>' F"-'*
<*"%%'%
!*"2#
=2#3*'C)9'"%$'E
=*2(01G2$40
=/(('$92?"*
=/(('$92?"*
5%/(('$92?"*
=/(('$92?"*I'3D*"901G2$4
!*"2#
=7,'%;"-'
=?"9G!"#$/7,%'
;,"$
&#%'()*'
+,-%
J2#20*'#3$7
ID?7'E
=/(('$92?"*
K9"--2#3
.*,,90*'#3$7F,,%'01G2$4
5)%$9"?$01-"$$'9#4
8'?6*2#'
=7,9$% <*"%%'%
IDGG*'IDGG*' IDGG*'IDGG*'
=*2(01G2$4
.*/01,-'#2#34
=*2(01G2$4
!92#$'E
K"%7'E
.9"/'E
!*"2#
!,?6'$
!"#$%&'()*&+),-- !"9$0,G
+'L$2*'0.2#2%72#3 +'L$2*'0!"$$'9# =2*7,D'$$' M-'#2#30+/-'
F'#3$7 82?6#"(' K"2%$*2#'
Fig. 17: Example images and annotations from our dataset: the images are anno-tated with both instance segmentation masks and fine-grained attributes (blackboxes)
26 M. Jia et al.
Fig. 18: Image size comparison among Fashionpedia, ModeNet, DeepFashion2,and COCO2017, LVIS. Only training images are shown. The Fashionpedia im-ages has the most diverse resolutions. Note that COCO2017 and LVIS havehigher resolution images for annotation. The distribution presented here are thepublicly available photos
0 250 500 750 1000 1250 1500 1750Number of vertices per polygon
10 3
10 1
101
Perc
ent o
f pol
ygon
DeepFashion2ModaNetCOCO2017LVISFashionpedia
Fig. 19: The number of vertices per polygon. This represents the quality of masksand the efforts of annotators. Values in the x-axis were discretized for bettervisual effect. Y-axis is on log scale. Fashionpedia has the second widest range,next to LVIS
Fashionpedia 27
8.4 Fashionpedia dataset creation details
Image collection. To avoid photo bias, all the images are randomly collectedfrom Flickr and free license stock photo websites (Unsplash, Burst by Shopify, Freestocks, Kaboompics, and Pexels). The collected images are further verified manuallyby two fashion experts. Specifically, they check the scenes diversity and make sure theclothing items were visible and annotatable in the images. The estimated image typebreakdown is listed as follows: street style images (30% of the full dataset), celebrityevents images (30%), runway show images (30%), and online shopping product images(10%). For gender distribution, the gender in 80% of images are female, and 20% ofimages are male.
Fig. 20: Example of different type (people position/gesture, full/half shot, occlu-sion, scenes, garment types, etc.) of images in Fashionpedia dataset
We did aim to address the issue of photographic bias in the image collection process.Our dataset includes the images that are not centered, not full shot and with occlusion(see examples in Fig. 20). Furthermore, our focus is to identify clothing items, not toidentify people during the image collection process.
Crowd workers and 10-day training for mask annotation. In the spirit ofsharing the same apparel vocabulary for all the annotators, we prepared a detailedtutorial (with text descriptions and image examples) for each category and attributesin the Fashionpedia ontology (see Fig. 21 for an example). Before the official annotationprocess, we spent 10 days on training the 28 crowd workers for the following three mainreasons.
First, some apparel categories are commonly referred as other names in general.For example, “top” is a general term for “shirt”, “sweater”, “t-shirt”, “sweatshirt”.Some annotators can mistakenly annotate a “shirt” as a “top”. We need to train theseworkers so they have the same understanding of the proposed Fashionpedia ontology.
28 M. Jia et al.
Fig. 21: Annotation tutorial example for shirt and top
Utilizing the prepared tutorials (see Fig. 21 for an example), we trained and educatedannotators on how to distinguish among different apparel categories.
Second, there are fine-grained differences among apparel categories. For example,we observed that some workers initially had difficulty in understanding the differenceamong different garment parts, such as tassel and fringe. To help them understand thedifference of these objects, we ask them to practice and identify more sample imagesbefore the annotation process. Fig. 22 shows our tutorials for these two categories. Wespecifically shows some correct and wrong examples of annotations.
Third, we ask for the quality of annotations. In particular, we ask the annotatorsto follow the contours of garments in the images as closely as possible. The polygonannotation process is monitored and verified for a few days before the workers startedthe actual annotation process.
Quality control of debatable apparel categories. During the annotation pro-cess, we allow annotators to ask questions about the uncertain categories. Two fashion
Fashionpedia 29
!"#$%&'%()&*+,-%%%%%%% ./)),0$%1%!)/*+%2#3%$/%$)#0,%()&*+,
!"#$%&'%4#'',5-%%%%%%% ./)),0$%1%!)/*+%2#3%$/%$)#0,%4#'',5
Fig. 22: Annotation tutorial for fringe and tassel
experts monitored the annotation process by answering questions, checking the anno-tation quality, and providing weekly feedback to annotators.
Instead of asking annotators to rate their confidence level of each segmentationmask, we asked them to send back all the uncertain masks to us during the annotation.The same two fashion experts made the final judgement and gave the feedback to theworkers on these debatable or unsure fashion categories. Some examples of debatableor fuzzy fashion items that we have documented can be found in Figure 23.
8.5 Discussion
Does this dataset include the images or labels of previous datasets? Weonly include the previous datasets for comparison. Our dataset doesnt intentionallyuse any images or labels from previous datasets. All the images and labels from Fash-ionpedia are newly collected and annotated.
Who were the fashion experts annotating localized attributes in Fash-ionpedia dataset? The fashion experts are the 15 fashion graduate students that werecruited from one of the top fashion design institutes. For double-blind policy, we can-not mention the name of the university. But we will release the name of this universityand the collaborators from this university in the final version of this paper.
Instance segmentation v.s. semantic segmentation. We didn’t conduct se-mantic segmentation experiments on our dataset for the following two reasons: 1)Although semantic segmentation is a useful task, we believe instance segmentation ismore meaningful for fashion images. For example, if we need to distinguish the differentshoe style of a fashion image containing 3 pair of different shoes, instance segmenta-tion (Figure 24(a)) can help us distinguish each shoe separately. However, semanticsegmentation (Figure 24(b)) will mix all the shoe instances together. 2) Semantic seg-mentation is the sub-problem of instance segmentation. If we merge the same detected
30 M. Jia et al.
Fig. 23: Example of debatable fashion items in Fashionpedia dataset. The ques-tions are asked by the crowdworkers. The answers are provided by two fashionexperts
Fig. 24: Instance segmentation (left) and semantic segmentation (right)
object class from our instance segmentation experimental result, it yields the resultsfor semantic segmentation.
Which image has the most annotated masks? In Fashionpedia dataset, themaximum number of segmentation masks in an image is 74 (Fig. 25). We find thatmost of the masks are belonging to “rivets” (garment parts).
What’s the difference between Fashionpedia and other fine-grained datasetslike CUB-200? We propose to localize fine-grained attributes within segmentationmasks of images. This is a novel task with real-world application to the best of ourknowledge. The differences between Fashionpedia and CUB are as follows: 1) CUB useskeypoints as annotation to indicate different locations on birds, while Fashionpedia hassegmentation masks of garments, garment parts, and accessories; 2) Fashionpedia at-tributes are associated with garment or garment part instances in images, whereasCUB provides global attributes, not associated with any specific keypoints.
Fashionpedia 31
iMat-Fashion Kaggle challenges To advance state-of-the-art of visual analysisof clothing, we hosted two kaggle challenges (imaterialist-fashion) on Kaggle in 2019 2
and 2020 3 respectively.
2 https://www.kaggle.com/c/imaterialist-fashion-2019-FGVC63 https://www.kaggle.com/c/imaterialist-fashion-2020-fgvc7
!"#$%"&'#(%$$)!*+$'#($)%!",-./01#2"34!5%6'4,748%#479 5%#:#
!"#$#%&'(#)&$* !"#$#%&'(#)&$*(+#,-()&./.
0*,&#'*1(#%23(32(*&4-()&./(+#,-(&..34#&,*1('34&'#5*1(&,,"#67,*.8
!"#$%"&'#(%$$)!*+$'#($)%!",-./01#2"34!5%6'4,748%#479 5%#:#
!"#$#%&'(#)&$* !"#$#%&'(#)&$*(+#,-()&./.
0*,&#'*1(#%23(32(*&4-()&./(+#,-(&..34#&,*1('34&'#5*1(&,,"#67,*.8
!"#$%"&'#(%$$)!*+$'#($)%!",-./01#2"34!5%6'4,748%#479 5%#:#
!"#$#%&'(#)&$* !"#$#%&'(#)&$*(+#,-()&./.
0*,&#'*1(#%23(32(*&4-()&./(+#,-(&..34#&,*1('34&'#5*1(&,,"#67,*.8!"#$%#&'()#%$(*+,$,+$#'-"$.'/0$1(&"$'//,-('&#%$),-')(2#%$'&&3(45&#/6
!"#$%"&'#(%$$)!*+$'#($)%!",-./01#2"34!5%6'4,748%#479 5%#:#
!"#$#%&'(#)&$* !"#$#%&'(#)&$*(+#,-()&./.
0*,&#'*1(#%23(32(*&4-()&./(+#,-(&..34#&,*1('34&'#5*1(&,,"#67,*.8
!"#$%"&'#(%$$)!*+$'#($)%!",-./01#2"34!5%6'4,748%#479 5%#:#
!"#$#%&'(#)&$* !"#$#%&'(#)&$*(+#,-()&./.
0*,&#'*1(#%23(32(*&4-()&./(+#,-(&..34#&,*1('34&'#5*1(&,,"#67,*.8
73(8(*')$(.'8# 73(8(*')$(.'8#$1(&"$.'/0/
Fig. 25: The image with 74 masks in Fashionpedia dataset
32 M. Jia et al.
References
1. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghe-mawat, S., Irving, G., Isard, M., et al.: Tensorflow: A system for large-scale machinelearning. In: OSDI (2016) 11
2. Attneave, F., Arnoult, M.D.: The quantitative study of shape and pattern percep-tion. Psychological bulletin (1956) 10
3. Bloomsbury.com: Fashion photography archive, retrieved May 9, 2019from https://www.bloomsbury.com/dr/digital-resources/products/
fashion-photography-archive/ 54. Bossard, L., Dantone, M., Leistner, C., Wengert, C., Quack, T., Van Gool, L.:
Apparel classification with style. In: ACCV (2012) 4, 55. Du, X., Lin, T.Y., Jin, P., Ghiasi, G., Tan, M., Cui, Y., Le, Q.V., Song, X.:
Spinenet: Learning scale-permuted backbone for recognition and localization. In:CVPR (2020) 11
6. Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their at-tributes. In: CVPR (2009) 1
7. FashionAI: Retrieved May 9, 2019 from http://fashionai.alibaba.com/ 48. Fashionary.org: Fashionpedia - the visual dictionary of fashion design, retrieved
May 9, 2019 from https://fashionary.org/products/fashionpedia 59. Ferrari, V., Zisserman, A.: Learning visual attributes. In: Advances in neural in-
formation processing systems (2008) 1, 210. Fu, C.Y., Berg, T.L., Berg, A.C.: Imp: Instance mask projection for high accuracy
semantic segmentation of things. arXiv preprint arXiv:1906.06597 (2019) 511. Ge, Y., Zhang, R., Wang, X., Tang, X., Luo, P.: Deepfashion2: A versatile bench-
mark for detection, pose estimation, segmentation and re-identification of clothingimages. In: CVPR (2019) 2, 4, 5, 8, 10
12. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accu-rate object detection and semantic segmentation. In: CVPR (2014) 1
13. Goyal, P., Dollar, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tul-loch, A., Jia, Y., He, K.: Accurate, large minibatch sgd: Training imagenet in 1hour. arXiv preprint arXiv:1706.02677 (2017) 11
14. Guo, S., Huang, W., Zhang, X., Srikhanta, P., Cui, Y., Li, Y., Adam, H., Scott,M.R., Belongie, S.: The imaterialist fashion attribute dataset. In: ICCV Workshops(2019) 1, 4, 5
15. Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instancesegmentation. In: CVPR (2019) 3, 8, 9, 10
16. Han, X., Wu, Z., Huang, P.X., Zhang, X., Zhu, M., Li, Y., Zhao, Y., Davis, L.S.:Automatic spatially-aware fashion concept discovery. In: ICCV (2017) 4, 5
17. He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask r-cnn. In: ICCV (2017) 1, 1118. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition.
In: CVPR (2016) 1119. He, R., McAuley, J.: Ups and downs: Modeling the visual evolution of fashion
trends with one-class collaborative filtering. In: WWW (2016) 420. Hoiem, D., Chodpathumwan, Y., Dai, Q.: Diagnosing error in object detectors. In:
ECCV (2012) 1121. Hsiao, W.L., Grauman, K.: Learning the latent look: Unsupervised discovery of a
style-coherent embedding from fashion images. In: ICCV (2017) 4, 522. Huang, J., Feris, R., Chen, Q., Yan, S.: Cross-domain image retrieval with a dual
attribute-aware ranking network. In: ICCV (2015) 4, 5
Fashionpedia 33
23. Inoue, N., Simo-Serra, E., Yamasaki, T., Ishikawa, H.: Multi-label fashion imageclassification with minimal human supervision. In: ICCV (2017) 4, 5
24. Kendall, E.F., McGuinness, D.L.: Ontology engineering. Synthesis Lectures on TheSemantic Web: Theory and Technology (2019) 3
25. Kiapour, M.H., Han, X., Lazebnik, S., Berg, A.C., Berg, T.L.: Where to buy it:Matching street clothing photos in online shops. In: ICCV (2015) 4, 5
26. Kiapour, M.H., Yamaguchi, K., Berg, A.C., Berg, T.L.: Hipster wars: Discoveringelements of fashion styles. In: ECCV (2014) 4, 5
27. Kirillov, A., He, K., Girshick, R., Rother, C., Dollar, P.: Panoptic segmentation.In: CVPR (2019) 3
28. Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S.,Kalantidis, Y., Li, L.J., Shamma, D.A., et al.: Visual genome: Connecting languageand vision using crowdsourced dense image annotations. IJCV (2017) 3, 4, 5
29. Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Attribute and simile clas-sifiers for face verification. In: ICCV (2009) 1
30. Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Featurepyramid networks for object detection. In: CVPR (2017) 11
31. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P.,Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV (2014) 3, 5,8, 10, 11
32. Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: Deepfashion: Powering robust clothesrecognition and retrieval with rich annotations. In: CVPR (2016) 1, 4, 5
33. Lopez, A.: Fdupes is a program for identifying or deleting duplicate files residingwithin specified directories., retrieved May 9, 2019 from https://github.com/
adrianlopezroche/fdupes 734. Mall, U., Matzen, K., Hariharan, B., Snavely, N., Bala, K.: Geostyle: Discovering
fashion trends and events. In: ICCV (2019) 535. Matzen, K., Bala, K., Snavely, N.: StreetStyle: Exploring world-wide clothing styles
from millions of photos. arXiv preprint arXiv:1706.01869 (2017) 436. Parikh, D., Grauman, K.: Relative attributes. In: ICCV (2011) 137. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detec-
tion with region proposal networks. In: Advances in neural information processingsystems (2015) 1
38. Rosch, E.: Cognitive representations of semantic categories. Journal of experimen-tal psychology: General (1975) 2, 5
39. Rubio, A., Yu, L., Simo-Serra, E., Moreno-Noguer, F.: Multi-modal embedding formain product detection in fashion. In: ICCV (2017) 4, 5
40. Simo-Serra, E., Fidler, S., Moreno-Noguer, F., Urtasun, R.: Neuroaesthetics infashion: Modeling the perception of fashionability. In: CVPR (2015) 4
41. Simo-Serra, E., Ishikawa, H.: Fashion style in 128 floats: Joint ranking and classi-fication using weak data for feature extraction. In: CVPR (2016) 4, 5
42. Takagi, M., Simo-Serra, E., Iizuka, S., Ishikawa, H.: What makes a style: Experi-mental analysis of fashion prediction. In: ICCV (2017) 4, 5
43. Van Horn, G., Mac Aodha, O., Song, Y., Cui, Y., Sun, C., Shepard, A., Adam,H., Perona, P., Belongie, S.: The inaturalist species classification and detectiondataset. In: CVPR (2018) 2, 4
44. Vittayakorn, S., Yamaguchi, K., Berg, A.C., Berg, T.L.: Runway to realway: Visualanalysis of fashion. In: WACV (2015) 4, 5
45. Vrandecic, D., Krotzsch, M.: Wikidata: a free collaborative knowledge base (2014)5, 22
34 M. Jia et al.
46. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSDBirds-200-2011 Dataset. Tech. Rep. CNS-TR-2011-001, California Institute ofTechnology (2011) 2, 4
47. Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2. https://
github.com/facebookresearch/detectron2 (2019) 1148. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for bench-
marking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017) 4,5
49. Yamaguchi, K., Berg, T.L., Ortiz, L.E.: Chic or social: Visual popularity analysisin online fashion networks. In: ACM MM (2014) 4, 5
50. Yamaguchi, K., Kiapour, M.H., Ortiz, L.E., Berg, T.L.: Parsing clothing in fashionphotographs. In: CVPR (2012) 4, 5
51. Yu, A., Grauman, K.: Semantic jitter: Dense supervision for visual comparisonsvia synthetic images. In: ICCV (2017) 4
52. Yu, F., Chen, H., Wang, X., Xian, W., Chen, Y., Liu, F., Madhavan, V., Darrell,T.: Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In:CVPR (2020) 1
53. ZARA.com: Zara white leather flat ankle boots with top stitching size 5 bnwt,retrieved May 9, 2019 from https://www.zara.com 20
54. Zheng, S., Yang, F., Kiapour, M.H., Piramuthu, R.: Modanet: A large-scale streetfashion dataset with polygon annotations. In: ACM MM (2018) 2, 4, 5, 8, 10
55. Zoph, B., Ghiasi, G., Lin, T.Y., Cui, Y., Liu, H., Cubuk, E.D., Le, Q.V.: Rethinkingpre-training and self-training. arXiv preprint arXiv:2006.06882 (2020) 11