Upload
patriciaharpring
View
327
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Patricia Harpring, Managing Editor, Getty Vocabulary Program. Discussion of issues and resolutions regarding the Getty Vocabularies entering the LOD cloud, scheduled in increments 2013-2015. presented at American Art Collaborative meeting, April 2013
Citation preview
Patricia Harpring, Managing EditorGetty Vocabulary Programy y g
American Art Collaborative meetingApril 2013
© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.
The Getty Vocabularies are constructed to allow their use in linked data, but to date little linking
dwas done
All four Getty vocabularies are scheduled to be released as LOD in the coming months
CONA is the first Getty vocabulary to actually be CONA is the first Getty vocabulary to actually be linked to the other three Getty vocabularies
I i li ki Issues in linking
© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.1
G tt b l i l ith ti l d Getty vocabularies comply with national and international standards for vocabulary construction ISO and NISOconstruction, ISO and NISO
CCO (Cataloging Cultural Objects) and CDWA (Categories for the Description of Works of Art) standards for art information
Map to RDA and DACS (Library and Archives standards) and other standardss a da ds) a d o e s a da ds
© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.2
G th f G tt b l i li Growth of Getty vocabularies relies upon contributions from the expert user community
Getty vocabularies are “social” (contributors are the community) yet “authoritative”
Qualified contributors = repositories of art works, visual resources art libraries other expertsvisual resources, art libraries, other experts
© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.3
C t ib ti d i b lk i ib d Contributions are made in bulk via prescribed XML format
Released in XML and Relational Tables, as annual full releases; updated versions every two weeks via Web ServicesWeb Services
We plan to continue the XML and Rels releases h h LOD b d f db k f even when we have LOD; based on feedback from
the existing user community (300-plus license holders))
© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.4
• Scope includes generic terms for work types, roles, materials, styles, cultures, techniques, attributes, techniques, attributes, abstract concepts• Current totals
36 114 records; 36,114 records; 244,665 terms
Recent activity: y• Translations in Spanish, Dutch, Chinese,
German, French, Italian, Portuguese• Contributions from the conservation • Contributions from the conservation
community organized by Getty Conservation Institute (GCI)
© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.5
AAT is increasingly multilingual: F ll t l ti i S i h f C t d D t ióFull translation in Spanish from Centro de Documentaciónde Bienes Patrimoniales, ChileFull translation in Dutch translation from the Rijksbureau
K thi t i h D t tivoor Kunsthistorische DocumentatieChinese translation by the TELDAP (Taiwan E-Learning and Digital Archives Program) is underway = 8,000 termsg g ) y ,German translation is being undertaken by the Institut fürMuseumsforschung in Berlin
A Portuguese translation will begin soon3,000 French terms from CHIN have been fully integrated; European full French translation is planned
www.getty.edu/research/tools/vocabularies/ write to us: [email protected]
integrated; European full French translation is planned3,000 Italian terms from ICCD
• Scope includes cities, nations, empires, archaeological sites, physical featuresphysical features• 1,241,020 records;
1,799,859 names
Recent activity: • Contributions National Geospacialp
Intelligence Agency (NGA, formerly NIMA) and archaeological sites
• Greece Italy United Kingdom India • Greece, Italy, United Kingdom, India, Mexico, Chile, Egypt, New Zealand, the Netherlands
© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.6
• Scope includes artists, architects firms studios architects, firms, studios, patrons, sitters; named and anonymous
• 222,851 records; 581,525 names
[Historical note: ULAN was conceived of by, and initiated under the leadership of, Eleanor Fink, today’s moderator. TGN
l b d h h b
Recent activity:
was also born, and the three vocabs were brought together under one roof under her leadership.]
y• Processing contributions (Grove, ARTstor, others)• ULAN contribution to the Virtual International
Authority File. VIAF is a joint project with the Library f C d i t ti l lib i t of Congress and numerous international libraries to
combine name authority files into a single name authority service
© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.7
• Scope includes movable works (e.g., museum objects) and objects) and architecture• CONA is accepting
contributions will contributions, will grow over time• The pilot release
contains sample records• 1,011 records; 1,011 records; • 1,887 titles/names
© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.8
Catalog Level (item, group, etc.) • These are the basic fields included Object/Work Type Title or Name Creator
C ti D t
in most museum records• Compliant with CCO and CDWA,
standards for best practice• An OCLC survey of 9 North American Creation Date
Measurements Materials and Techniques Depicted Subject
• An OCLC survey of 9 North American museums for CCO compliance (i.e., CONA) discovered that all participating museums collected all of these fields, except subject (collected by only 2) But Depicted Subject
Current Location Repository number for movable works Sources
except subject (collected by only 2). But users strongly wish to retrieve by subject. How to remedy this? Contributing to CONA hopefully can improve this situationSources improve this situation.
Default values are available for missing required information. E.g., “unavailable” for measurements.
© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.9
Simple entity relationship diagram
We are linking b l i
diagram
ULAN
vocabularies to each other
TGNTGNCONA Records
Source Records
AAT
IconographyAuthority
© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.10
A critical feature that makes the
subject_id=500013247
that makes the vocabularies useful as authorities is that each vocabulary record
term_id=1500207490vocabulary record is identified by a unique, persistent numeric ID
Terms and controlled lists also each have unique numeric IDs
nat_code=905040
role id 31261numeric IDs role_id=31261
TGN subject id=7006827TGN subject_id 7006827
subject_id=500115332
rel_type_code=1553 12
Another critical feature that makes the vocabularies useful in linking are existing relationshipsare existing relationships
Thesaural relationships (AAT is the prototypical thesuaurus, but all Getty vocabs are thesauri. The examples here are from ULAN.) Equivalence
▪ Sèvres Porcelain Manufactory = Manufacture nationale de Sèvres Hierarchical
▪ Sèvres Porcelain Manufactory is broader context for Eloy Brichard companySèvres Porcelain Manufactory is broader context for Eloy Brichard company Associative
▪ Sèvres Porcelain Manufactory was directed by Robert, Louis-Rémy 1832-1879
Relationships beyond thesaural: Relationships beyond thesaural: Nationality/Culture/Ethnicity; Role; Geographic places; published
sources; contributors Examples are from ULAN – Thesaural and other relationships also exist p p
in TGN, AAT, and CONA
© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.13
W l t bli h ll G tt b l i t th LOD l d[Joan Cobb, software architect, was unable to attend today] We plan to publish all Getty vocabularies to the LOD cloud
Implementation project begins July 2013
First phase will focus on publishing vocabulary data as linked data
Subsequent phases will focus on how we use the data (e.g., using it on our own Web sites, collaboration with external sites, harvesting, visualization, etc.) s tes, a vest g, v sua at o , etc.)
Current plan: the data will be published as SKOS-extended format under the ODC-BY 1.0 license
© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.14
The majority of the work will be done by our in-house b i d bli h i team, but we intend to establish an open community
and welcome collaboration
h ll i l i ill b i l d d i h Challenges exist: solutions will be included in the release because these are among the critical features that make our thesauri unique
Multilingual data – we already have terms in over 110 different languages and the list is growingg g g g
Sources and contributors at the subject (=record), term, and note levels
© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.15
We will begin with AAT released as LOD and then move on to TGN, ULAN d fi ll CONA f l t 2013 th h 2015and finally CONA, from late 2013 through 2015
The sequence was chosen to take advantage of the way the data is connected: AAT is linked to itself TGN pulls from AAT; ULAN from AAT and TGN; CONA from all three TGN pulls from AAT; ULAN from AAT and TGN; CONA from all three Intend to publish our lookup lists (e.g, languages, roles, nationalities, place types,
sources) as linked data Our ontology AAT is based on SKOS and SKOS-XL Our ontology AAT is based on SKOS and SKOS XL We worked with Marcia Zeng to define the mapping and Pedro Szekely from ISI
to develop the ontology TGN and ULAN will use same core approachTGN and ULAN will use same core approach CONA ontology must be in synch with other vocabs, but must be aligned with
other projects such as the American Art Collaborative, Europeana, and Arches (= a project of collaboration between GCI & World Monuments Fund to develop an open source system to inventory immovable cultural heritage)
16
• Nationality/ Culture/ Race/ Ethnicity in
Many links cannot be made automaticallyRace/ Ethnicity in ULAN should be linked to AAT
• Nat list was never actually linked to
Matching ULAN Nat table to AATactually linked to AAT
• Project to match encounters issues e g no match e.g., no match, ambiguous match
• Must be resolved by hand
This
no match
apparent match, but wrong
hessian is a type of burlap
ambiguous match
no match
© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.17
Issues in linking CONA
Some matches should be clear and done automatically, Pierre Koenig as artist on this drawing record and the
CONA Editorial SystemCONA Editorial System
Example below is display, we actually match on controlled fields
Pierre Koenig as artist on this drawing record, and the corresponding ULAN record.
CONA Editorial SystemCONA Editorial System
Koenig, Pierre (American architect, 1925-2004) 500086520
Since CONA is linked to the other vocabularies, it is necessary to match incoming values to the AAT, ULAN, TGN, and CONA Iconography Authority when CONA records are processed for , , , g p y y ploading
The CVA/Processor was developed for editors to use if auto-links are not possible Contribution Validation Application (CVA), software architect Gregg Garcia© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.
18© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.
1
• Unresolved auto-matches to other vocabularies are vetted in the CVA/ProcessorEdit h th b l i i CONA CVA• Editor may search other vocabularies in CONA CVA• CVA presents editor with choices for linking
• E.g. below: CONA record has too little info for artist identification to allow an auto-link to Jan Smit. Which one is he? Or maybe none of these, and needs to be addedlink to Jan Smit. Which one is he? Or maybe none of these, and needs to be added on the spot as a stub in ULAN. • New stub records for AAT, ULAN, TGN, or IA, may be added and linked on
the spot, filled in later by editors in other vocabs• For links with a pattern, editor may write a ‘rule’ for CVA
© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.19
Issues in linking CONA
• Editor could write a rule in CVR if there is a pattern• In this case, for this particular contribution of European prints, when the
incoming “Place of Publication” contains the value “Amsterdam” we can gassume they always mean Amsterdam in the Netherlands, not Amsterdam, Ohio, or any of the other dozens of Amsterdams in the world.
© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.21
Caveats in linking dataOb ti li k d d t i l d th f ll iObservations re. linked data include the following:
(various authors)
In any discipline, LOD (Linked Open Data) is not one cloud, but many, with imperfect links (each cloud has dense internal connections but typically only sparse connections between clouds)
Impact of linkage error is underestimated by developers
The reason why it is hard to avoid linkage error is that humans are needed to ll l k f l kmanually create linkage or to proof auto links
Homogeneity is required to make accurate links, but such homogeneity does not occurnot occur
Even when terminology is standardized, differences in values between corresponding variables cause linking errors (underlying causes: errors
t d d i ll ti f th d t d i th d t t th created during collection of the data, during the data entry, or there are true changes of meaning or application of a particular value)
© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.21
Caveats in linking dataOb ti li k d d t i l d th f ll iObservations re. linked data include the following:
(our observations)
• In linking with CONA contributions, we link automatically where possible
• But for uncertain matches, we link by hand
• Even then, mistakes may be made when the incoming data has incorrect references or links
© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.22
Caveats in linking data
Auto matches or by hand? Data is not an exact match Better to err on side of caution than make an incorrect link These are the same person, but conflicting data means a human
must confirm
Alvarez Algeciras, Germ$00an (Spanish painter, exhibited 1871-1878) 500035166
birth: 1831 death: 1878
$00Alvarez de Algeciras y Jimenez, German(Spanish artist, 1848-; fl. bef. 1878) 500298289
birth: 1831 death: 1878
birth: 1848 death: 18784 7
Names not exact match to algorithm-Matching based on fielded data, here display for ease of illustration-Painter not = artist, tables allow match if all else matches
© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.23
,-Birth date not match, one estimated other is exact
• Errors in linking that is in the incoming data may be caught, but in general must rely upon contributors’ accuracymust rely upon contributors accuracy• To which person should the link be made in ULAN?
ULAN recordULAN record
• Contributed record was linked to LOC record for travel writer “Lazowski”• But should be the French revolutionary
Inscribed title: Inauguration du buste de Marat au tombeau qui été élevé pour sa gloire et celle de Lazowski, place de la Réunion a Paris, l'an 2 de la Re p. Franc. une et indivisible /© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.
24
Introduction to Introduction to Controlled Vocabularies
Ebook or paperback available at www.getty.edu
Added a section on LOD in the revised edition
© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.25
Patricia HarpringPatricia HarpringPatricia HarpringPatricia HarpringManaging Editor Managing Editor Getty Vocabulary ProgramGetty Vocabulary Programy y gy y g
1200 Getty Center Drive1200 Getty Center Drive1200 Getty Center Drive1200 Getty Center DriveLos Angeles, CA 90049Los Angeles, CA 90049
310/440310/440 63536353310/440310/[email protected]@getty.edu
© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.26