Upload
ariadnenetwork
View
201
Download
5
Embed Size (px)
Citation preview
D15.1: Report on Thesauri and Taxonomies
Authors: Douglas Tudhope, USW Ceri Binding, USW
Ariadne is funded by the European Commission’s 7th Framework Programme.
TheviewsandopinionsexpressedinthisreportarethesoleresponsibilityoftheauthorsanddonotnecessarilyreflecttheviewsoftheEuropeanCommission.
ARIADNED15.1ReportonThesauriandTaxonomies(Public)
Version:1.5(final) July2016
Authors: DouglasTudhopeandCeriBinding(USW)
Contributingpartners: HollyWright(ADS),
FlorenceLaino(AIAC,L-PArchaeology),
PhilippGerth,FrancescoMambrini(DAI),
FedericoNurra,EmmanuelleBryas,NouvelBlandine,EvelyneSinigaglia(INRAP,FRANTIQ),
PaulBoon,HellaHollander,PeterBrewer(KNAW-DANS,UniversityofArizona),
EvieMonaghan,LouiseKennedy,AnthonyCorns(Discovery),
SaraDiGiorgio,TizianaScarselli(MIBACT-ICCU),
EstherJansma(RCE),
withadditionalcontributionsfromallpartners
Qualityreview
HollyWright(ADS-ArchaeologyDataService)
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 3
Tableofcontents
ExecutiveSummary.........................................................................................................................4
1 Introduction...............................................................................................................................5
1.1 Controlledvocabularies..............................................................................................................51.2 ARIADNEpartnervocabularies...................................................................................................6
2 Mappingbetweenthesauri........................................................................................................7
2.1 Briefdescriptionofthesaurusmapping.....................................................................................72.2 MappingsinARIADNEtosupportcrosssearch..........................................................................82.3 GettyArtandArchitectureThesaurus.......................................................................................92.4 PrototypeexperimentwithAATashubvocabulary...................................................................92.5 PrototypeexperimentwithAAThierarchicalexpansioninElasticsearch................................11
3 CreatingmappingsforARIADNE...............................................................................................20
3.1 Overviewofmappings..............................................................................................................213.2 Descriptionandreflectionsonmappingexercise....................................................................22
4 MappingsintheARIADNEinfrastructure..................................................................................31
4.1 Mappingenrichmentprocess...................................................................................................314.2 MappingswithintheARIADNEportal......................................................................................32
5 Conclusion...............................................................................................................................34
6 References...............................................................................................................................35
7 AppendixA..............................................................................................................................37
8 AppendixB...............................................................................................................................41
9 AppendixC...............................................................................................................................42
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 4
ExecutiveSummaryThisdeliverablereportsontheworkofARIADNEWP15,Task1:SKOSthesauriandtaxonomies.Thisincludesvocabularies,suchasthesauriandtermlistsindifferentlanguagesusedbypartnersforsubjectindexing.Whensearchingfreetextwithuncontrolledterms,significantdifferencescanarisefromtrivialvariationsinsearchstatementsandfromdifferingconceptualisationsofasearchbyusers.Differentpeopleusedifferentwordsforthesameconcept,oremployslightlydifferentconcepts.Assuch,thiswasakeyissuetobeaddressedwithintheARIADNEproject,andisakeyfocusofthisreport.TheissuesposedforinteroperabilityandcrosssearchbyARIADNE'smultilingualcollectionofdatasetsandreportsarediscussed,alongwiththeuseofacontrolledvocabularytoreduceambiguitybetweentermsbyvariousfeatures.ThevocabulariesmostrelevantforARIADNEarealsolistedanddescribed.
Mappingbetweenvocabulariesisakeyaspectofsemanticinteroperabilityinheterogeneousenvironments.Mappingbetweennativepartnervocabularies canprovideausefulmediationplatform forARIADNEcrosssearch, particularly as subjectmetadata are in different languages. However the creation of links directlybetween the items from different vocabularies can quickly become unmanageable as the number ofvocabularies increases. Therefore, a hub architecturewas adopted, using an intermediate structure ontowhich the concepts from local vocabulariesweremapped. Theworkonproducingmappings isdescribed,togetherwith the incorporation ofmappings in the ARIADNE infrastructure, and their use to date in theemergingARIADNEPortal.
TheGettyArtandArchitectureThesaurus(AAT)waschosenasanappropriatehubvocabulary,followingaprototypemappingandretrievalexerciseinvolvingfiveARIADNEvocabularies inthreedifferent languages.Inanotherprototypeexperiment,theimplementationofhierarchicalexpansiontechniqueswasinvestigatedusingtheElasticsearchinfrastructureadoptedfortheARIADNEPortal.
AlargescalepilotexercisewithoneARIADNEpartnerwasconducted,inordertoallowforrefinementofthemethodologyandmappingguidelinesafterreviewingtheresults.Thefirstcompletemappingexercisewassuccessfully performed by ADS, using a custom linked data vocabulary matching tool developed for theARIADNEproject.Analysisofresultsfromthispilotmappinginformedaniterationofthemappingguidelinesandthematchingtooluserinterface.Followingthereviewofthepilotmappingexercise,anadditional,basicspreadsheet based utility was developed for recordingmappingsmademanually in situations where thesourcevocabularieswerenotavailableasLinkedData. Mappingswereconductedby thevariouscontentpartners from their native vocabularies to the AAT. A summary of mappings with statistics on the SKOSmatch types employed by the various content partners is discussed. This shows that in almost all casesmappingsweresuccessfullyestablishedtotheAAT.AbouthalfwereexactMatch,withtheotherhalfmostlycloseMatch and broadMatch. As expected only a small number were narrower matches – most partnervocabularies were considered to be reasonably congruent or were more specialized than the AAT.Reflectionsbypartnersonthemappingexercisearediscussed.
The output from the partnermappings from their source vocabularies to the AAT is transformed to therequired format for further processing by the relevant MoRe enrichment services used by the ARIADNERegistry.TheenrichmentprocessaugmentsthedataimportedtotheRegistrywithmappedAATconcepts.These derived subjects in turnmake possible concept based search and browsing in theARIADNE Portal.WhilethePortal isstillevolvingat thetimeofwriting,aqueryonthePortal illustrateshowthemappingsmakepossibleconceptbasedsearchacrosssubjectmetadataindifferentlanguages.
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 5
1 Introduction
This document is a deliverable (D15.1) of the project ARIADNE - Advanced Research Infrastructure forArchaeological Dataset Networking in Europe that has been funded under the European Community’sSeventh Framework Programme. This deliverable reports on the work of ARIADNE WP15, Task 1: SKOSthesauri and taxonomies. This includes thesauri and term lists indifferent languagesusedbypartners forsubject indexing. Followingon from the surveyof vocabulariesdescribed inD3.3, thosemost relevant forARIADNEareidentifiedandaugmentedbyasmallnumberofadditionalvocabularies.TheissuesposedforinteroperabilityandcrosssearchbyARIADNE'smultilingualcollectionofdatasetsandreportsarediscussed.Linking between vocabularies, following standard mapping relationships is considered the best practiceapproach towardsmultilingual functionality. Theworkonproducingmappings is described, togetherwiththeincorporationofmappingsintheARIADNEinfrastructureandtheirusetodateintheemergingARIADNEPortal.
1.1 Controlledvocabularies
Vocabularies are used for control of subject metadata. Other types of metadata can also benefit fromvocabulary control, including place names, time periods and personal names. Vocabulary control aims toreducetheambiguityofnaturallanguage(freetext)whenindexingandretrievingitemswhilesearchingforinformation(Svenonius2000;Tudhopeetal.2006).
Controlled vocabularies consist of terms, that is, words from natural language selected for retrievalpurposes.Atermcanconsistofoneormorewords.Inacontrolledvocabulary,suchasathesaurus,atermisusedtorepresentaconcept(whichcanhaveseveraltermsassociatedwithit).
Twofeatures(synonymsandambiguity)innaturallanguageposepotentialproblemsforretrieval:
a)Differentterms(synonyms)canrepresentthesameconcept.
b)Thesameterm(homographs)canrepresentdifferentconcepts.Thiscanbeamajorprobleminamono-lingual system and becomes a significant problem in a multi-lingual collection, such asARIADNE.
Acontrolledvocabularycanattempttoreduceambiguitybetweentermsbyvariousfeatures:• Definingthescopeofterms-howtheyaretobeusedwithinaparticularvocabulary.• Providingasetofsynonyms(oreffectivesynonymsforretrievalpurposes)foreachconcept• Restrictingscopesothattermsonlyhaveonemeaning(andrelatetoonlyoneconcept).
Notallvocabulariesprovideall three featuresabove.Someare justsimple listsofauthorisedterms (termlists). Controlled vocabularies also provide vocabulary for Knowledge Organization Systems (KOS), whichadditionally structure their concepts via different types of semantic relationship (such as broader andnarrowerconcepts).
Controlledvocabulariesaresometimescontrastedwithfreetextsearching,assistedbystatisticaltechniquesinautomatic indexingandranking.Thesearenothoweverexclusiveoptionsanddifferentcombinationsofthetwoapproachesarepossible.Controlledvocabulariescanbeusedtoaugmentfreetextsearch.
Whensearchingfreetextwithuncontrolledterms,significantdifferencescanarisefromtrivialvariationsinsearch statements and from differing conceptualisations of a search by searchers. Different people usedifferentwords for thesameconceptoremployslightlydifferentconcepts.Thismaynotbeaproblem incasual search. However, in systematic research on a specialized topic, it is undesirable to miss relevantresources.
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 6
At the simplest level, a controlled list of terms ensures consistency in searching and indexing, helping toreduce problems arising from synonym and homograph mismatches. At a more complex level, thepresentationofconceptsinhierarchiesandothersemanticstructureshelpstheindexerandsearcherchoosethemostappropriateconceptfortheirpurposes.Browse-baseduserinterfacesbecomepossible.
AKOScanassistbothprecision(byallowingspecificsearching)andrecall(byretrievingitemsdescribedbyrelated concepts or equivalent terms). It also provides potential pathways (for human andmachine) thatconnecta searcherand indexer’s choiceof terminology.Themore formal specificationof logical semanticrelationshipswithinanontologycanassistapplicationswhererulesarespecifiedabouttherelationshipsandlogic-basedinferencingisappropriate.
The information retrieval thesaurus is designed for retrieval purposes and has a restricted set ofrelationships(TudhopeandBinding2016).TheserelationshipsareEquivalence(connectsaconcepttotermsthat act as effective synonyms),Hierarchical (broader /narrower concepts) andAssociative (more looselyrelated, ‘see also’ concepts). These are defined by an international standard (the recently approved ISO25964). The equivalence relationship connects a concept with a set of equivalent terms, treated assynonymsfortheretrievalsituationsenvisagedbythedesigners.Eithermonoorpolyhierarchicalstructuresmaybeemployed.Thesauriareusuallyemployedfordescriptive indexingpurposesandthecorrespondingsearchsystems.Thesauricanalsobeusedasaqueryexpansionresourceorasthebasisforauto-completesuggestionsinasearchuserinterface,asintheARIADNEPortal.
1.2 ARIADNEpartnervocabularies
Thevocabulariesthemselvesvaryfromasmallnumberofkeywords inapicklist foraparticulardatasettostandardnationalvocabularieswithalargenumberofconcepts.ARIADNEDeliverable3.1(Initialreportonstandards and on the project registry) listed some archaeology-related subject vocabularies (terminologyresources)andmoredetailscanbefoundthere.
Theseincludedelementsofthefollowingvocabularies,consideredparticularlyrelevantforWP15purposes:• ArtandArchitectureThesaurus(GettyResearchInstitute)–athesaurususedfordescribingitemsof
art,architectureandmaterialculture• Pactols Thesaurus (Frantiq) – six multilingual thesauri for describing items on antiquity and
archaeology• ThesaurusofMonumentTypes(FISH)–thesaurusofmonumenttypesbyfunction• ArchaeologicalObjectsThesaurus(FISH)–thesaurusforrecordingofarchaeologicalobjectsinBritain
andIrelandoverallarchaeologicalperiods• BuildingMaterialsThesaurus(FISH)–thesaurusofmaterialsusedinarchaeologicalmonuments• PICO (MiBACT) – cultural heritage thesaurus covering Who/What/Where/When for use in
Culturaitaliaportal• ICCD(MiBACT)–pictorialthesaurusfordescribingarchaeologicalfinds• ReferentienetwerkErfgoed/ABR(RCE-CulturalHeritageAgencyoftheNetherlands)–containsthe
structuredsetofconceptsofculturalheritageintheNetherlands• ARKAStermlist(ZRC-SASU)–alistoftermsforthedefinitionofarchaeologicalsitesinSlovenia• FEDOLG-Rtermlist(MNM-NÖK)–alistoftermsfordescribingarchaeologicalfindsinHungary• Museumsvocabularies(DAI)-agroupofvocabulariesfordescribingmuseumobjectsandconcepts• Archaeological Dictionary (DAI) – a multilingual dictionary for archaeological concepts under
development
AndadditionallyconsideredforWP15• FASTItermlist(AIAC)–setoftermsfordescribingmonumenttypesinFASTIOnline• IrishMonumentsVocabulary(NMS)-fordescribingmonumenttypesinIreland
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 7
• Archaeologicaltermlist(SND)–asetoftermsfordescribingarchaeologicalobjectsandmonumenttypesinSwedendrawingonnationalstandards
Someof these vocabularies are available online or published as LinkedData in SKOS representation. Thisallowsprogrammaticaccesstothevocabularyelementsandtheuseofvocabulariesas linkinghubs inthewebofdata.ThisisfurtherdescribedintheforthcomingD15.2.
2 Mappingbetweenthesauri
2.1 Briefdescriptionofthesaurusmapping
Mappingbetweenvocabulariesisakeyaspectofsemanticinteroperabilityinheterogeneousenvironments,andisparticularlyimportanttomulti-lingualcollections(Tudhopeetal.2006).Itcanimprovebothrecall(indifferentlanguages)andprecision(falseresultsmayarisefromliteralstringsearch).
Significant effort is required, however, for useful results; detailed mapping work at the concept level isnecessary, requiring a combination of intellectualwork and automated assistance. Zeng and Chan (2004)reviewdifferentmethodologicalapproachestomapping:
a)Derivation/Modelingofaspecialisedorsimplervocabularyfromanexistingvocabulary.
b)Translation/Adaptationfromanexistingvocabularyinadifferentlanguage.
c)SatelliteandLeafNodeLinkingofaspecialisedthesaurustoalarge,generalthesaurus.
d)DirectMappingbetweenconceptsindifferentcontrolledvocabularies,usuallywithanintellectualreview.
e)Co-occurrencemappingbetweentwovocabulariesbasedontheirmutualoccurrenceswithintheindexing of items within a collection. Co-occurrence mappings are considered looser than directmappingmadebyexperts.
f)Switchinglanguageusedasanintermediary.Itcanbeanewsystemcreatedforthepurposeoranexistingsystem.
A switching language is one of the most frequently used approaches. This is the approach adopted byARIADNE, as described below, where the switching language is described as a “hub” for the ARIADNEmetadataconnections.Seealsothediscussionintherecentthesaurusstandard,ISO25964-2:2013section6“Structuralmodelsformappingacrossvocabularies”.
There are also variants and combinations of these mapping approaches in practice. Effective mappingrequires some degree of overlap and congruence of purpose in the vocabularies being mapped. Someprominent examples of mapping work are mentioned briefly. OCLC, providers of the Dewey DecimalClassification (DDC), developed various mappings between major vocabularies (both intellectual andstatisticalco-occurrencemappings)makingthemavailableasterminologywebservices(Vizine-Goetzetal.2003).TheOAIprotocolwasusedtoprovideaccesstoavocabularywithmappings,viaabrowsertohumanusers and through the OAI-PMH web service mechanisms to machines. Both direct mappings and co-occurrencemappingswere provided, depending on the situation. The DDCwas employed as a switchinglanguageintheRenardusFP5projecttosupportacross-browsingserviceforaEuropeanacademicsubjectgatewayservice(Kochetal.2003).
More recently, the United Nation’s Food and Agriculture Organization (FAO) has devoted considerableresources to its AGROVOC thesaurus, which is a significant element of the VocBench collaborativevocabularyeditingandpublishingplatformandtheassociatedAIMS(AgriculturalInformationManagement
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 8
Standards)portal. Thishasbeenexpressedas LinkedDataand there is anextensivemappingprogrammewith(SKOS)mappingsestablishedfor13vocabulariesincludingLCSH(LibraryofCongressSubjectHeadings),GEMET (General Multilingual Environmental Thesaurus) and STW (Standard Thesaurus for Economics /Standard Thesaurus fürWirtschaft) (Caracciolo et al., 2013).Mapping services have been a longstandingfocusoftheGermanbilingualSTWThesaurus,astructuredvocabularyforsubjectindexingandretrievalofeconomics literature.This isnowbaseduponaLinkedDataarchitectureLinkedData(KempfandNeubert,2016).
TimBerners-Lee,creatoroftheWorldWideWebandtheconceptofLinkedDatahasproposedafivestardeployment scheme for grading Linked Open Data, which stresses linking to external Linked Open Dataresources to achieve full potential. In the context described here, these links take the form of machinereadablemappingstoacommonreferencevocabulary.
« Datamadeopenlyavailableonthewebinanyformat
«« Asabove,butinamachinereadablestructureddataformat(e.g.Excel)
««« Asabove,butinanon-proprietarystructureddataformat(e.g.XML)
«««« Asabove,butusingW3Copenstandards(e.g.URIs,RDF&SPARQL)
««««« Asabove,andalsolinkingouttootherexternalLOD
Figure1:The5stardeploymentschemeforLinkedOpenData
Part 2 of the International Thesaurus Standard (ISO25964-II) aims to facilitate high quality informationretrievalacrossnetworkedresourcesindexedwithdifferenttypesofvocabularies.Itexplainshowtosetupmappingsbetweentheconceptsinsuchvocabulariesandincludesadiscussionoftheimpactofmappingonretrieval.This isanimportantconsideration,particularlywhennoexactequivalentconceptexists,andit isnecessary to map to a broader or narrower concept, a partially overlapping concept, or to a (Boolean)combinationofconcepts.Section14ofISO25964-IIdiscussestechniquesforidentifyingcandidatemappings.
MappingbetweennativepartnervocabulariescouldprovideausefulmediationplatformforARIADNEcrosssearch, particularly as subjectmetadata are in different languages. However the creation of links directlybetween the items from different vocabularies can quickly become unmanageable as the number ofvocabularies increases. Mapping between more than three vocabularies would be more efficient andscalableusingthehubarchitecture(i.e.switchinglanguage),usinganintermediatestructureontowhichtheconceptsfromeachlocalvocabularymaybemapped.Asearchonaconceptoriginatingfromonevocabularycanthenutilisethismediatingstructuretoroutethroughtoconceptsoriginating fromothervocabularies,possiblyexpressedinotherlanguages.
2.2 MappingsinARIADNEtosupportcrosssearch
Forsubjectaccess,theACDMArchaeologicalResourceclasshastwokindsofsubjectproperty.Theproperty,native-subject, associates the resourcewith one ormore items froma controlled vocabulary used by thedata provider to index the data. However, there are a large number of partner vocabularies in severaldifferent languages. Cross search and semantic interoperability is rendered difficult, as there are nosemantic links or mappings between the various local vocabularies. Standard ontologies for metadataschemas,suchastheCIDOC-CRM,donotcoverparticularsubjectvocabulariesbutexpecttheontologytobecomplemented with the terminology contained in the relevant subject vocabularies for an applicationdomain.Spellingvariationsordifferentsynonymsforthesameconceptcanresultinfailuretofindrelevantresults.Thisproblemisexacerbatedwhensubjectmetadatamaybeindifferentlanguages,whichisclearlythe case when providing an infrastructure for European archaeology. Not only may useful resources be
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 9
missedwhensearchinginadifferent languagefromthesubjectmetadatabutthereisalsotheproblemoffalseresultsarisingfromhomographswherethesametermhasdifferentmeanings indifferent languages.Forexample,“vessel”hasdifferentarchaeologicalmeanings intheEnglishlanguage,while“coin” isFrenchfor corner, “boot” isGerman forboat and “monster” isDutch for sample (very different from the Englishlanguagemeaningsofthesewords).
2.3 GettyArtandArchitectureThesaurus
TheGettyArt andArchitectureThesaurus (AAT) is an influential and longstanding,multi-lingual thesaurususedworld-wide,withover40,000conceptsandover350,000terms(Harpring,2016).TheAAThas7facets(and 33 hierarchies as subdivisions): Associated concepts, Physical attributes, Styles and periods, Agents,Activities, Materials, Objects and optional facets for time and place. The AAT’s scope is broader thanarchaeology,encompassing fineart,builtworks,decorativearts,othermaterial culture, visual surrogates,archival materials, archaeology, and conservation. However it contains much useful, high levelarchaeologicalcontent,particularlyintheBuiltEnvironment,MaterialsandObjectshierarchies.
The AAT has a faceted poly-hierarchical structure, containing generic concepts, with labels in multiplelanguages. It appears to have a good breadth of archaeological coverage to map local vocabularies to,togetherwithclear scopenotesdefining thescopeofusage foreachconcept.TheAAThas recentlybeenmadeavailableasLinkedOpenDatabytheGettyResearchInstitute(GettyResearchInstitute,2016b),whichfitswellwithARIADNE’sstrategyforsemanticinteroperability.
2.4 PrototypeexperimentwithAATashubvocabulary
TheAATwaschosenasanappropriatehubvocabulary,followingaprototypemappingandretrievalexerciseinvolvingfiveARIADNEvocabulariesinthreedifferentlanguages.ThisisdiscussedinmoredetailinBindingandTudhope(2015).Briefly,asmallextractfromthepublishedAATlinkeddatawasusedasahub,togetherwith a set of intellectual mappings via consulting the Getty Vocabularies search facility(http://vocab.getty.edu/). For this exercise, the skos:closeMatch relationship was used rather thanskos:exactMatch.Mappingswerecreatedmanually(byUSW)forthesetofconceptsemployedinthepilotstudy.Insomecases,partnervocabulariescontainedmorespecialisedconceptsthancontainedintheAAT.However,itwasconsideredthattheskos:broadMatchrelationshipshouldbeappropriateinthesesituations,sincetheusecasewascross-searchintheARIADNEPortal,ratherthanfinegrainedsemanticprocessing.
In addition, the possibility of query expansion based upon the AAT's hierarchical structure (semanticexpansion over the thesaurus hierarchical relationships)was noted. Thiswould open up the possibility inretrievalofmatchingontermsassociatedwithnarrowerconceptswhenqueryingatamoregeneral level.This would have the potential of improving recall without loss of precision. As part of the pilot, a freelyavailabledesktopRDFsearchfacility(SparqlGui,2016)wasemployedtoquerytheextractofAATconcepts,combinedwiththemappingsproducedforthepilotexercise.Usingthequerytool,aSPARQL1.1queryontheAIACconceptfasti:cemetery(seeFigure2)returnsresultsfromfivedifferentvocabularieswithtermsindifferentlanguagesviatheAATsemanticstructure(seeTable2).Thissearchmakesuseofthemappingsandalsothehierarchicalqueryexpansion.Theresults fromthepilotexercisewerepresentedanddiscussedattheARIADNEsessionintheResearchInfrastructuresonCulturalHeritageconference,co-organizedinRomebytheARIADNEprojectandtheItalianMinistryofCulture(MIBAC)inNovember2014(andpublishedinanaccompanying ARIADNE booklet). It was decided that they held sufficient promise to proceedwith a fullmappingexercise,inordertodeliversomedegreeofmultilingualcapabilityfortheARIADNEsearchsystemintheforthcomingPortal.
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 10
Conceptidentifier Conceptlabel
iccd:catacomba catacomba
tmt:91386 catacomb(funerary)
fasti:catacomb Catacomb
iccd:colombario colombario
fasti:columbarium Columbarium
dai:3736 Kolumbarium
dans:6a7482e5-2fd5-48fb-baf4-66ad3d4ed95e kerkhof
dai:1947 Gräberfeld
iccd:necropoli necropoli
dai:2485 Nekropole
tmt:70053 cemetery
tmt:70053 necropolis
# SPARQL 1.1 to locate concepts related via AAT to FASTI “cemetery” concept
PREFIX gvp: <http://vocab.getty.edu/ontology#>
PREFIX aat: <http://vocab.getty.edu/aat/>
PREFIX fasti: <http://fastionline.org/monumenttype/>
PREFIX iccd: <http://www.iccd.beniculturali.it/monuments/>
PREFIX tmt: <http://purl.org/heritagedata/schemes/eh_tmt2/concepts/>
PREFIX dans: <http://www.rnaproject.org/data/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dai: <http://archwort.dainst.org/thesaurus/de/vocab/?tema=>
SELECT DISTINCT ?concept ?label WHERE {
fasti:cemetery (skos:exactMatch | skos:broadMatch | skos:closeMatch) ?aatconcept .
?aatdescendant gvp:broader+ ?aatconcept .
{
{?concept (skos:exactMatch | skos:broadMatch | skos:closeMatch) ?aatdescendant}
UNION
{?concept (skos:exactMatch | skos:broadMatch | skos:closeMatch) ?aatconcept}
}
OPTIONAL {?concept skos:prefLabel ?label}
}
Figure2:SPARQL1.1queryonthesemanticframeworkofAATpluslocalvocabularymappings.
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 11
dans:be95a643-da30-40b9-b509-eadfb00610c4 christelijk/joodsebegraafplaats
dans:b935f9a9-7456-4669-91d0-2e9c0ff7d664 vlakgrafveld
iccd:cimitero cimitero
dans:abb41cf1-30dc-4d55-8c18-d599ebba1bc2 rijengrafveld
Table1:SampleextractoftheresultsfromthequeryinFigure2
2.5 PrototypeexperimentwithAAThierarchicalexpansioninElasticsearch
FollowinganARIADNEJointTechnicalMeeting,itwasdecidedtoinvestigatefurtherhowtoimplementthehierarchicalexpansiontechniquesatscaleinthecontextoftheElasticsearchinfrastructureadoptedfortheARIADNE Portal. Therefore a second prototype experiment with the AAT was conducted using theElasticsearchplatform.
Hierarchicalsemanticexpansionmakesuseofbroadergeneric(“IS-A”)relationshipsbetweenconceptsinahierarchicallystructuredknowledgeorganizationsystem,allowingasearchonaparticularsubject indexingconcepttoalsoretrieveanyitemsindexedusingconceptsthatarepositionedbelowthatconceptwithinthehierarchicalstructure.
• aat:300264092ObjectsFacet
• aat:300264551FurnishingsandEquipment(hierarchyname)• aat:300036743WeaponsandAmmunition(hierarchyname)
• aat:300036926weapons• aat:300036973edgedweapons
• aat:300036982axes(weapons)• aat:300036983battleaxes
Figure3:fullhierarchicalancestryofAATconceptID300036983(battleaxes)
Figure3 illustrates the fullhierarchical ancestry foranexampleAATconcept aat:300036983 (battleaxes).Usinghierarchicalsemanticexpansionaqueryonconceptaat:300036926(weapons) shouldthereforealsoretrieveitemsindexedasedgedweapons,axes(weapons),battleaxesetc.
The prototype experiment demonstrated hierarchical semantic expansion using SPARQL against RDFresources.TheElasticsearchinfrastructureusedinARIADNEhasfunctionalityreferredtoasgenreexpansion(GormleyandTong,2015)whichshouldbeabletoachievesimilarresultstotheSPARQLprototypedescribedinsection2.4.Theobjectofthisexercisewasthereforetoagainusetheexistingpoly-hierarchicalstructureoftheAAT,thistimetoproduceconfigurationdataintheformatrequiredtoimplementElasticsearchgenreexpansion.WefirstextractedtheAATbroadergenericrelationshipsbyrunningtheSPARQLqueryinFigure4againsttheGettyVocabularyProgramSPARQLendpoint(GettyResearchInstitute,2016c).
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 12
#Extractthepoly-hierarchicalstructureoftheAAT
PREFIXskos:<http://www.w3.org/2004/02/skos/core#>
PREFIXgvp:<http://vocab.getty.edu/ontology#>
PREFIXaat:<http://vocab.getty.edu/aat/>
CONSTRUCT{?sgvp:broaderGeneric?o}
WHERE{?sskos:inSchemeaat:;gvp:broaderGeneric?o}
Figure4:SPARQLquerytoextractthepoly-hierarchicalstructureoftheAAT
TheresultsofthisqueryweredownloadedinN-TripleRDFformattoproducealocalfilecontaining45,443RDFtriples.TheconfigurationofElasticsearchgenreexpansionrequiresthefullancestrychainofidentifiersfor each concept to be expressed as textual “rules” containing a comma separated list of identifiers,formatted as shown in Figure 5 (note the full AAT concept URIs have been shortened for illustrationpurposes):
aat:300264551=>aat:300264551,aat:300264092
aat:300036743=>aat:300036743,aat:300264551,aat:300264092
aat:300036926=>aat:300036926,aat:300036743,aat:300264551,aat:300264092
aat:300036973=>aat:300036973,aat:300036926,aat:300036743,aat:300264551,aat:300264092
(etc.)
Figure5:Elasticsearchgenreexpansionrulesexpressed
TheextractedRDFdatafileresultingfromthequeryinFigure3wasimportedtoSparqlGui(SparqlGui,2016)—adesktop tool forperformingexperimental SPARQLqueriesonRDFdata. TheSPARQLquery shown inFigure6 then retrieved theexpansion rulesdata in the formatshown inFigure5,producinga17MB file,consistingof41,866linesoftext.
#ProducetheancestrychainsrequiredforElasticsearchgenreexpansion
PREFIXgvp:<http://vocab.getty.edu/ontology#>
SELECT(concat(str(?uri),"=>",str(?uri),",",group_concat(?broader;separator=","))AS?ancestry)
WHERE{
?urigvp:broaderGeneric+?broader.
}
GROUPBY?uri
Figure6:SPARQLquerytoproducetheancestrychainsrequiredforElasticsearchgenreexpansionrules
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 13
Note:Thisprocesswassplit(i.e.extractingasubsetofAATdatathenqueryingtheextract)onlytoalleviatepotential performance issues, as this is a fairly demanding query. In practice itwas found that theGettySPARQL endpoint does actually support running the Figure 6 query directly - so in hindsight this wouldsimplifytheoverallprocess.
Thenext stagewas to incorporate theextractedand formatteddata intoElasticsearchand test thegenreexpansion functionality.A localdesktopcopyofElasticsearchwasused in conjunctionwith the “Marvel –Sense”dashboardusedforconfiguringandpopulatingindexesandrunningexperimentalqueries.ThefileofAATgenreexpansionruleswascopiedto the/config folderof theElasticsearch installation,andwas thenreferenced in a synonym filter for a customanalyzerwhen specifying the settings for initially creating anindex,asillustratedinFigure7.
Figure7:specifyingsettingsfortheAATgenreexpansionanalyzerandsynonymfilter
Amappingwasthencreatedspecifyinghowtohandlevaluesinthedct:subjectsubjectindexingfield(note:this was for demonstration and testing purposes; the actual naming of this field would have to be inaccordancewith the ARIADNE Elasticsearch index structure, as implemented). Note that genre expansionwas configured during initial creation of the index and not at query time (see the index_analyzer /search_analyserconfigurationsettingsinFigure8);otherwisetheexpansionwouldruninbothbroaderandnarrower directions - leading to incorrect and potentiallymisleading results. Thismeans that (by default)genre expansion of AAT concept identifiers would always be enabled in search, though possibly somemethodcouldbedevised tooverride itwithin thesearchparametersand theassociateduser interface, ifthatwasdeemednecessary.
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 14
Figure8:Addingamappingspecifyinghowtohandlethesubjectindexingfield
Some sample data items indexed using the dct:subject field (with various AAT URI identifiers from theexample in Figure 2) were created for testing purposes and added to the experimental index, using thecommandsshowninFigure9.
Figure9:Addingsomesampleitemstotheindexfortesting
Testingtheitemindex
TestingwasachievedbyqueryingfortheitemsindexedusingspecificAATconceptURIs.TheexamplequeryshowninFigure10issearchingforitemsindexedusingadct:subjectfieldvalueofaat:300036926(weapons).Anumberofquerieswererunusingdifferentdct:subjectvalues.
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 15
Figure10:TestingthegenreexpansionbyqueryingforitemsindexedusingspecificAATconcepts
The results shown in Table3 illustrate theeffectsof genreexpansion.A searchon aat:300036983 (battleaxes) retrievedonly the single item indexedusing that concept identifier, but a searchonaat:300036973(edgedweapons)retrieveditemsindexedusingthatconceptANDitemsindexedusinganyofthedescendantconcepts,inaccordancewiththeAAThierarchicalstructureexampleinFigure3.
dct:subjectsearchonAATconceptidentifier ID(s)oftheitemsretrieved
aat:300036983battleaxes 10
aat:300036982axes(weapons) 10&11
aat:300036973edgedweapons 10,11&12
aat:300036926weapons 10,11,12,13&14
Table2:Resultsofsearchingforspecificdct:subjectvalues
Useofvocabularyresources
Thepreviousdocumentationdiscussesgenreexpansiondirectlyappliedtoregistryitems.Asimilarapproachcanthereforebetakento indexingandexpandingtheARIADNEvocabularyconceptresourcesthemselves.Using the same test index as previously (ariadnedata) and the same analysers, some sample vocabularyconceptresourceswereindexed.First,anewmappingspecifyinghowtohandletheconceptmetadatafieldswasadded(Figure11).
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 16
Figure11:Addingamappingspecifyinghowtohandletheconceptmetadatafields
Some sample concept metadata was then created for testing purposes and manually added to theexperimental index, using the commands shown in Figure 12. A bulk import process would have to beadoptedforimportingtheactualGettyAATconceptmetadata,asitisalargedataset.
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 17
Figure12:Insertingthemetadataforsomeexampleconcepts
Testingtheconceptindex
Asthegenreexpansionanalyzerhadalreadybeenpreviouslycreatedandconfigured,wecouldnowperformsemanticgenreexpansionqueriesdirectlyonthevocabularyconceptresourcesthemselves.Notehowthequery shown in Figure 13 is quite similar to that shown in Figure 10, but this timewe are searching theresources under /ariadnedata/concept for a specified dct:identifier value – which in this case is the AATconceptrepresenting“weapons”(seeFigure3).
Figure13:querytoperformgenreexpansiononAATconcept300036926(“weapons”)
The results of this query are shown in Figure 14. The results include the specified concept AND allhierarchicallydescendantconceptsinaccordancewiththeAAThierarchicalstructure(fromFigure3).
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 18
Figure14:resultsofElasticsearchgenreexpansionqueryonAATconcept300036926("weapons")
This demonstrates one possible method of implementing the hierarchical semantic expansion of AATconceptsinElasticsearch.Thetechniquecanimprovetherecallmeasureofqueryresultswithoutsacrificingprecision.The fullAAT“expansion rules”data fileasproducedcouldbe reused inotherprojects,and thesameapproachcanbeeasilyadaptedtootherhierarchicallystructuredknowledgeorganizationresources,suchastheGettyThesaurusofGeographicNames.
ThetwoprototypeexperimentsalsoshowthepotentialofworkingwiththeURIidentifiersofAATconceptsratherthantheambiguousstringsoftermlabels.UsingtheURIidentifierfortheconceptavoidstheproblem
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 19
of ambiguity, common in multilingual datasets, of terms that are homographs in different languages.Working at the concept level also makes possible hierarchical semantic expansion, making use of thebroader generic (“IS-A”) relationships between concepts in a hierarchically structured knowledgeorganization system, such as the AAT. Thus a search expressed at a general level can (if desired) returnresults indexedat amore specific level. Forexample, a searchon settlementsmight also returnmonasticcentres.
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 20
3 CreatingmappingsforARIADNE
Following the prototype experiment, the next step was to produce the mappings from the subjectvocabulariesemployedtoindexthevariousdatasetsselectedfortheARIADNECatalogue.ItwasdecidedtoproceedwithalargescalepilotexercisewithoneARIADNEpartner,inordertoallowforrefinementofthemethodologyandmappingguidelinesafterreviewingtheresults.
ThefirstcompletemappingexercisewasperformedbyADSonSKOSifiednationalheritagevocabulariesforEngland,ScotlandandWales,usingacustom linkeddatavocabularymatching tooldevelopedbyUSWfortheARIADNEproject.Fordetailsofthemappingexerciseandthetool,seeBindingandTudhope(2015)andtheforthcomingD15.3willdiscusstoolsinmoredetail.Analysisofresultsfromthispilotmappinginformedaniterationofthemappingguidelinesandthematchingtooluserinterface.Forexample,itwasdecidedthatmappingtoAATGuideTerms(notnormallyusedforindexing)wasundesirableforARIADNEpurposes.Also,multiplemappings fromthesamesourceconceptwereonlyconsidereduseful incertaincircumstances.Acompletesetofmappingswasthenproducedfor thesubjectmetadataused intheADSdata importedbytheARIADNERegistry.Examplesofmappings fromtheADSmappingexerciseareshown inTable4.Thesewere reviewed by a senior archaeologist and the final mappings (after minor fine tuning) werecommunicated to the ATHENADCURegistry team as RDF/JSON statements (see section 4). This exercise,togetherwiththeguidelines,wasreviewedbytheUSWteam.RevisionstothemappingguidelinesincludedrecommendationsontheappropriateSKOSmappingrelationshiptoemployindifferentcontexts,andwhenappropriate,tospecifymorethanonemappingforagivenconcept.
Sourceconcept matchURI Targetconcept
DITCHEDENCLOSURE
http://purl.org/heritagedata/schemes/eh_tmt2/concepts/70361
skos:broadMatch agriculturalsettlements
http://vocab.getty.edu/aat/300008420
CROFT
http://purl.org/heritagedata/schemes/eh_tmt2/concepts/68617
skos:closeMatch smallholdings
http://vocab.getty.edu/aat/300000211
Table3:ExamplesfromtheADSmappingexercise
The revised guidelineswere employed in themappings of vocabularies from the other partners (and seeAppendix C). Following the review of the pilot mapping exercise, an additional, basic spreadsheet basedutilitywas developed for recordingmappingsmademanually in situationswhere the source vocabularieswerenotavailableasLinkedData(seeD15.3,forthcoming).
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 21
3.1 Overviewofmappings
Table4:Summaryofmappingswithstatisticsonmatchtype(asofJune2016).Note–forADS,ICCUandINRAPthemappingsarebasedonasubsetofthesourcethesaurusterms
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 22
Table5givesasummaryofmappingscompletedattimeofwriting.ThevocabulariesaredescribedinSection1andreflectionsbypartnersonthemappingexercisearegiveninSection3.2.
FromtheoverallstatisticswecanseethatinalmostallcasesmappingswereestablishedtotheAAT.Some50%wereskos:exactMatch,with18%skos:closeMatchand27%skos:broadMatch.Asexpectedonlyasmallnumber (5%) were narrower matches – most partner vocabularies were considered to be reasonablycongruentorweremore specialized than theAAT.However, therewerea fewexceptionswhere theAATwasmorespecialized.Themappingguidelinessteeredpartnersawayfromusingskos:relatedMatchbutthatwas found useful byDANS in a very small number of caseswhen itwas considered appropriate tomakemorethanonemapping,perhapsadditionallytoarelatedactivity(seediscussioninSection3.2).Consideringthemappingchoicesmadebyindividualpartners,wecanseesomedifferenceinthemappingrelationshipschosen, e.g. a higher proportion of skos:closeMatch inmappings for ADS, DANS (EASY),MNM-NOK, SND,ZRC-SASU. This could variously reflect the nature of the vocabularies involved or the style of the persondoingthemapping(e.g.whenthoughtappropriatetoassertanexactmatch).Another factorcouldbetheamountofcontextualinformationavailableintheformofscopenotesetc.–ifnoinformationotherthanthepreferred term label is available, then that might be considered a reason to assert a skos:closeMatchrelationshipratherthanskos:exactMatch.
3.2 Descriptionandreflectionsonmappingexercise
A selection of example reflections on their respective mapping exercises are given below by ARIADNEcontentproviderpartners.
ADS
ADScarriedoutinitialevaluationsregardingthesuitabilityoftheAATtodescribearchaeologicalsubjects,todeterminewhetheritwasanappropriatethesaurusforthemulti-lingualmappingsnecessaryforARIADNE.ResultswereverypositiveduringtestsusingtheUKnationalvocabularies,anditwasfeltthattheAATwassufficient,althoughthereweresomeoddareasofextremedetail(i.e.knives)andotherareaswheretherewasnothingdirectlycomparable(i.e.humanoranimalremains).However,therewasfelttobeasufficientrangeofSKOSmappingtypesavailabletohandlethesesituations.Therewasalsounderstoodtobeacertainamount of subjectivity inmapping choices, even for domain experts, and itwas deemed a good practicefuture idea to havemappings done bymultiple people (essentially creating an authoritativemapping byattribution,or“expertcrowdsourcing”).
ADSalsocarriedouttheinitialmappingexercisetotestthematchingtooldevelopedbyUSWandcreatethemappingtotheUKthesauri,andprovideanexemplarforotherpartners.ItwasdeterminedtobeimpracticaltodocompletemappingsofeverytermintheUKthesauri,soallthedistincttermsinusebytheADSweremappedinstead.Thisstillrepresentedaround1000termstobemapped,themajorityofwhichwerederivedfromtheEnglishMonumentandTypethesaurus.ADSwasabletoachievecomprehensivecoverageoftheirdistincttermsmappedtotheAAT.InevitablythereweresomebroadmatchesincaseswherethegranularityoftheAATdoesnotmatch themore fine-graineddetail of the archaeologydomain, but itwas confirmedthattheAATdoesgivesufficientbreadthanddepthofdomaincoverageforsomeverygoodmatchesonallthe terms used, despite being quite diverse – including maritime craft, organic and inorganic materials,objects and monument types. The mapping exercise also clearly showed that purely automated stringmatching would indeed have been insufficient, and that expert input was necessary (e.g. Alan WilliamsTurret=>fieldfortifications,lynchet=>agriculturalland,etc.)
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 23
AIAC
Some130mappingsfromtheFastimonumentthesaurustotheAATwereprovidedbyUSWtoFasti.Thesewereimportedviaascript,andaninterfacedevelopedtoeditthemontheFastiAdminpage.Severalmoremappingswere addedwith this interface and someminor correctionsweremade to themappings fromUSW. These are now available on the Fasti website as part of the published Fasti concepts viahttp://www.fastionline.org/concept/attributetype/monument.
By providing a URI it is possible to refer to these thesaurus items in a controlled way, with an explicitreferencetotheAATandtothetranslationstomanylanguagesthatareavailableinFasti.TheconceptsusebothanEnglish‘humanreadable’URIandanumericURIusingidentifiersfromtheFastidatabase,tocreatelanguageindependentidentifiers inamannerreflectingtheAATURIs.Thesemappingswereusedtomakesure that the terms in theOAI-PMHXMLmatch the termsused in theARIADNEPortal for ingestion. It isplannedthatbefore theendof theARIADNEproject, thesemappingswillbemadeavailable to thepublicthroughouttheFastiinterfacesothattheconceptsareusefullydefined.TheprocessofissuingURIsfortheconceptsusedinFastiwilladdmeaningtothepresenteddata,bylinkingtoexistingenrichedthesauri.
DAI
The IT infrastructureof theGermanArchaeological Institute (DAI) containsmanydifferent subject specificinformation systems, e.g. for excavations and surveys (iDAI.field), objects and for publication of data(Arachne),bibliographicalinformation(Zenon)anddigitizedbooks(iDAI.bookbrowser).Whiletheplacesarealready centrally structured within the iDAI.gazetteer (http://gazetteer.dainst.org/) and all informationsystems refer to the gazetteer, each of the systems has their own vocabulary for describing the storedobjects.AtthemomentworkisongoingtoharmonizethedifferentDAIthesauritoonecommonstandardiniDAI.vocab(http://archwort.dainst.org/).
For themappingactivities inARIADNE, the relevantvocabularycategoriesof theobjectdatabaseArachnewere chosen, as Arachne contains, in contrast to iDAI.field, withmore than 3.6million datasets, a largeamountofwhichisopenlyavailable.ThevocabularyofthefollowingcategorieswasmappedtoGettyAAT:
• Topographie(eng.Topography,http://arachne.dainst.org/category/?c=topographie):Arachne’smostgranularobjectunit,whichisthesuperiorcontextforallrelatedclasses,whichincludeslandscapes,sites,andpartofsites.ItismappedtotheACDMclass“sitesandmonuments”andcontains55valuesmappedtoGettyAATfromtwodifferentvaluelists.
• Bauwerke(eng.Buildings):Thisclasscomprisesbuildingsandmonuments,whichformsacontextforsingleobjectrecordsandcouldbepartofalargersite.ItismappedtotheACDMclass“sitesandmonuments”andcontains176valuesmappedtotheGettyAATfromfourdifferentvaluelists.
• MehrteiligeDenkmäler(eng.Multipartmonuments):Allkindsofgroups,whicharenotbuildingsortopographicunits,aresubsumedintomultipartmonuments,e.g.groupsofstatues,graveyards,hoards.ThisclassismappedtotheACDMclasses“sitesandmonuments”or“burials”,dependingontheobjecttype,andcontains108valuesmappedtotheGettyAATfromsixdifferentvaluelists.
• Sammlungen(eng.Collections):Privateandmuseumcollectionsbelongtothisclass.ItismappedtotheACDMclass“diverse”andcontains11valuesmappedtotheGettyAATfromtwodifferentvaluelists.
• Bücher(eng.Books):Digitalreproduction,characterizationandcontextofclassicalstudyprintsfromthe16thto19thcentury.ItismappedtotheACDMclass“textualdocuments”andcontains17valuesmappedtotheGettyAATfromthreedifferentvaluelists.
• Inschriften(eng.Inscriptions):Thisclasscontainsinscriptionsandepigraphsdepictedonobjects.ItismappedtotheACDMclass“textualdocuments”andcontains19valuesmappedtoGettyAATfromonevaluelist.
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 24
DANS
DANStranslatedtheABRtermsintoEnglishasafirststeptowardsmappingtheDANSEASYComplextypentotheAAT.AsDANSdiscovered, translating a termandunderstanding the concept it stands for, gohand inhand.Associatedwiththiswork,DANStranslatedthetermsnotonlytoEnglish,butalsotoGerman,French,Italian,SpanishandCzech,withthehelpofcolleaguesandvolunteers,manyofwhomhadnoarchaeologicalbackground.Theprocessoftryingtofindtranslationsindifferentlanguageshelpedinbetterunderstandingand“pinningdown”theconceptandthusfindinganoptimalAATmappingforit.BesidesthewebsitesoftheAAT(GettyandtheDutchRKD)andthesiteoftheABRplus(RCE)DANSalsousedWikipedia.Evenifatermwas not a Wikipedia lemma, DANS could sometimes find it mentioned in a description of an evidentlyrelated lemma.Most of thematches foundwere either skos:closeMatchor skos:broadMatch. Finding themappingswasfarfromeasyhowever.Firstlyitwasdifficulttounderstandthearchaeologicalconceptbehindthe ABR term when only the term and the hierarchical context were available (without scope notes).Secondly, it was sometimes difficult to understand AAT concepts when they reflected a perspective notspecificallyarchaeological.Forexample,insomecases,theDANS(EASY)ABRessentiallycapturedthenotionof a place where an activity occurred and this had no exact match in the AAT. In these situations, askos:broadMatch was sometimes generated plus an additional skos:relatedMatch to a correspondingactivity,materialorobject.FutureworkwillconsiderstepsformakinguseoftheDutchtransactionsintheARIADNEPortalandinarchaeologicalterminologyresourcesmoregenerally.
TheTreeRingDataStandard(TRiDaS)
TRiDaS(Jansmaetal,2010)wasdesignedcollaborativelybydendrochronologistsandcomputerscientiststoaccuratelydescribe thewealthofdataandmetadataused indendrochronological research. The standardsupports information produced by all sub-disciplines of dendrochronology, not just archaeological andhistorical research facilitating the exchange of data within and between sub-disciplines. ControlledvocabularieswithinTRiDaSareakeyaspectenablingthisexchangeofdata.
TRiDaS provides for two mechanisms for describing vocabulary entries. For concepts with limited (<20terms), relativelystaticvocabularies thereare 'normalTridas’ term listsdefinedwithin theTRiDaSschema.Examplesofthisinclude:datingtype;timbershape;measurementmethod;andmeasurementvariable(seeTable5).Thesesimplelistsoftermsweredevisedduringthedesignofthestandarditselfwiththepotentialtoextendthemifnecessarywhenthestandardisrevised.
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 25
normalTridasvocabulary Description
Datingtype Typicallydatingindendrochronologyisabsolute,however,therearecircumstanceswherethisisn'tthecase.Thedatingtypeallowstheusertodefineifthedatingisrelativeordatedwithuncertainty,typicallyusingradiocarbon.
Locationtype Thetypeoflocationrecordedfordendrochronologicalsamplescanbeextremelyimportantwheninterpretingresults.Forexampledendrochronologicaldatacanbeusedforpalaeoenvironmentalreconstructions,butfortheseanalysestobevalidthegrowthlocationofthetreeisrequired.Samplescanbetakenfromtreesintheirgrowthlocation,fromitems(suchasships)thatareinherentlymobile,orfromitems(suchasbuildings)thatarestatic.
Measuringmethod Thereareanumberofmethodsusedforrecordingdendrochronologicalmeasurementsdependingonthecircumstances,eachwiththeirprosandcons.
Remark Observationsaboutindividualtreeringscanbeanextremelyusefulindicatorofenvironmentalchange.TheTRiDaSremarkvocabularystandardisesthemostcommonfeaturessuchas:falserings;missingrings;andfrostdamage.
Shape Thisvocabularystandardisesthedescriptionoftheshapeoftimbers.
Unit Theunitvocabularystandardisestheunitsforbothring-widthmeasurementsandmeasurementsoftimbers.
Variable Thetypicalmeasurementvariableindendrochronologyisthering-width;howeverresearchersmayalsorecordsub-annualmeasurements(early/latewood),variousdensitymetrics,andvesselsize.Thisvocabularyislikelytoberevisedasnovelapproachesaredeveloped.
Table5:Summaryofthe'normalTridas'vocabulariesusedinTRiDaS.Theseshort,simpletermlistsaredefinedwithintheTRiDaSschemaandarerelativelystatic
ThesecondandmoretypicalstyleofvocabularyinTRiDaSisthe'controlledVoc'datatype.Thisenablesusersto define links to external vocabularies with a standardised term and identifier. This mechanism wasdesignedintoTRiDaS,recognisingtherapiddevelopmentofstandardvocabulariesthataresuitableforuseindendrochronologicalresearch.
WhiletheTRiDaSdevelopmentteamintendedforthestandardtolargelyuseexternalvocabulariesastheybecomeavailable,theyalsoacknowledgedtheshort-termneedsofthedendrochronologicalcommunity.Assuch, a vocabulary was developed for use primarily bymembers of the Digital Collaboratory for CulturalDendrochronology (DCCD – Jansma et al, 2012) project describing the object/element types used indendrochronological research. These range from the obvious “tree”, to many items found in thearchaeologicalandculturalrecorde.g.buildings,barrels,ships,doors,musicalinstruments,paintingsetc.
Theobject/elementvocabularywaswrittenasamultilingual(English,Dutch,FrenchandGerman)flat-tablecontainingnohierarchicalrelationships.Termsinoneormorelanguageswithnodirecttranslationscausedconfusion and overlapping concepts. Many of the terms have exact matches with the AAT. However,substantialproportionsarespecialistterms(especiallynauticalterms)thathaveonlyverygenericmatches.
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 26
During the course of the ARIADNE project the DCCD object / element vocabulary has been substantiallyreworked. Using bespoke scripts, the redundancy within the flat table has been removed, and basichierarchical relationships defined. The simple terms list has been converted to a true concepts-basedvocabularywithredundanttermsassignedasalternatelabels.LinkstotheAAThavebeenestablishedforallconcepts(eitherexactorbroaderrelationships)andscopenotesadded.
Themajorityoftheeffortrequiredtoreworkthevocabularycamefromcontentspecialists.Combiningthespecialistknowledgeforallsubjectareasacrossfourlanguageswaspainstakingwork.Attemptstolocatingexistingsoftwareaimedatcontentratherthaninformaticsspecialistswereunsuccessful.Suchatoolissorelyneededtofullyleveragetheknowledgeofcontentspecialists.
Theenhancedvocabulary iscurrently in theprocessofbeing incorporatedback into theDCCDrepository.Theambiguousnatureoftheoriginaltermlistmeansworkisrequiredtocross-mapexistingrecordstothenewvocabulary,andinsomecasesthisunfortunatelyrequiresconsultationwiththeoriginaldataproviders.
ThesecondsubstantialvocabularyusedinTRiDaS/DCCDisthespeciestaxonomicdictionary.ThebasisofthisvocabularyistheSpecies2000andITISCatalogueofLife(http://catalogueoflife.org/).TheCatalogueofLife(CoL)formsthetaxonomicbackboneformanymajorprojectsincludingtheGlobalBiodiversityInformationFacility(GBIF),theEncyclopediaofLife(EoL)andtheIUCNredlistofendangeredplantsandanimals.AnnualeditionsoftheCoLhavebeenproducedsince2000withthemostrecenteditionincludingover1.6millionspecies from 158 contributing databases. While the CoL is an incredible resource, it suffers from thedrawbackthatthereisnolinkagebetweenconceptsineachedition.WhileeffortsareunderwaytoproduceatrueSKOSmapping,thisisnotyetavailable.Intheinterim,TRiDaS/DCCDisusingastaticsubsetwiththeintentionofmigratingtothedynamicCoLSKOSoncereleased.
ICCD/RAThesaurus
Theissueofmultilingualismisamatterthatneedstobetakenintoaccount,notonlybecauseofthevarietyofnationalthesaurithataregoingtobeintegratedbytheARIADNEinitiative,butalsoforthefuturecreationof common and transnational terminological tools. Linguistic issues often make the direct mapping of aconcept via the skos:exactMatch property to the AAT concept difficult. However, other mappingrelationships are available. The conceptual mapping between the ICCD RA Thesaurus and AAT has beencompletedandrevised; for thispurpose itwasdecidedtomanuallyconstructamapping fromthevarioustermsandfunctions(ifany),followinginsequencesthethreemaincategoriesoftheRAThesaurus.TheworkpatternwasbasedonanExcelrepresentationofthethesaurustowhichadditionalcolumnswereaddedinordertospecify:
• ThetargetLabelandtheidentifier(targetURI)ofthecorrespondingconceptselectedinAAT• matchURI was one of the SKOS mapping properties (skos:closeMatch; skos:exactMatch;
skos:broadMatch)• Thenameoftheinstitutioninchargeofthedefinitionofeachspecificmapping(creator)
OnlyasubsetoftheRAThesauruswastakenintoaccounttodemonstratethefeasibilityoftheseoperations.The subset includes 1191 terms related to 10 major categories (highlighted in the original source as"livello_1_categoria")relatingto:
● CLOTHINGANDACCESSORIES● FURNISHING● TRANSPORTATION● CONSTRUCTIONINDUSTRY● PAINTING● ARCHAEOBOTANICALFINDINGS
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 27
● ARCHAEOZOOLOGICALFINDINGS● SCULPTURE● INSTRUMENTS-TOOLSANDOBJECTSOFUSE● GENERALTERMS
Theanalysis for finding thecorrespondingentries in theAAT thesaurus took intoaccount the informationprovidedbyscopenotesandimagesaccompanyingeachconcept;extensivewebsearcheswereperformedto find themostappropriatematching termbetween ItalianandEnglish;and terminological researchwascarriedoutusingdifferentresourcestoidentifysynonymstomaketheassociatedtargetLabelasuniqueandaspreciseaspossible.Themappingworkalsoincludesother"113"termsandCOINScategory(derivedfrom"dc:title"elementofXMLfilesuploadedtoCulturaItaliaanddeliveredtotheARIADNEPortal).Intotal,thethesaurusincludes11categoriesand1304terms.ThemappingworkhasidentifiedthefollowingSKOSmatchtypes:
• 642skos:exactMatch;• 94skos:closeMatch;• 310skos:broadMatch;• 258skos:narrowMatch.:
Themappingmethodologyadoptedisbasedonthefollowingthreeexamplesofassociationprovidedinthetable:
Categoria
livello1 livello2 livello3 Livello4termine
targetLabel AATID matchLabel
Mezziditrasporto
Terrestri Atrazioneanimale
cisium two-wheeledcarriages
300215685 broadmatch
Strumenti-UtensilieOggettid’uso
ArmieArmature
Armidadifesa
farsettodaarmare
armingdoublets
300226824 closematch
Scultura imagoclipeata clipei(portraits)
300178246 exactmatch
Table6:ExamplesofmappingsbetweenICCD/RAtermsandGettyAATconcepts
In reflection, themost significant activity, from the scientific-methodological point of view, has been thereviewof thewholeprocess. Startedaspunctual control “1:1”correspondencebetween the termsof thetwoterminologytools(thesaurusICCD/RAandAAT),thisreviewhasexpandedbyrealizingthemappingofthe terminological categories relating to individual entries with the codes referring to the facet and the
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 28
hierarchyAAT.Thishasmadepossible:1. Disambiguatingandcorrectionofmatchespreviously selected -andoften lexically corrected -but
decontextualisedfromtheiroriginaldomain;2. Providingthebasisforfuturematchingbetweendifferentcategoriesofmultilingualthesauri.
It is worth emphasising that the focus of themappingwork is the concept of individual termsmeant asrecordsenteredinacompletehierarchicalstructureofrelatedtermsandnotes.Amongtheresultsachieved—andwhicharehighlightedthoughthemappingbetweenclasses—are thehighlevelofcorrespondencebetweentheICCD/RAthesaurusentriesandtheAATthesaurusrecordtypes.Out of 1,191 basic records, 1,164 among them are linked to “concept” and only 27 to “guide term”.AccordingtotheAATThesaurusguidelines:
• Concept:ReferstorecordsintheAATthatrepresentconcepts;recordsforconceptsincludeterms,anote,andbibliography.
• Guide term: Refers to records that serve as place savers to create a level in the hierarchy underwhichtheAATcancollocaterelatedconcepts.Guidetermsarenotusedforindexingorcataloguing.
INRAP(FRANTIQ)
DOLIA is the catalogue of the archaeological reports at the French National Institute for PreventiveArchaeologicalResearch(Inrap).TheDOLIAcataloguewasdevelopedwithFlora3.1.0software,createdbyEverteam (© Everteam 2015) http://dolia.inrap.fr:8080/flora/jsp/index.jsp. The reports, stored in pdfformat,areindexedwithnativesubjectsinheritedbythePactols“Sujets/Subjects”thesaurus.
The DOLIA catalogue currently has 1,573 (5,149 occurrences) subject metadata terms in the Pactolsthesaurus. The current mapping concerns only the indexed terms from the DOLIA catalogue used inARIADNE.
The alignment has been done between those terms and the AAT thesaurus by using a source term fromPactols,asourceURI,atargettermfromAATandatargetURI,specifyingtheSKOSmatch.
E.g.Pactols:Archéologiehttp://ark.frantiq.fr/ark:/26678/pcrty05M9SVnLuskos:exactMatchAAT:archaeologyhttp://vocab.getty.edu/aat/300054328orPactols:amphoregauloisehttp://ark.frantiq.fr/ark:/26678/pcrtiUhJYvi7PGskos:broadMatchAAT:amphorae(storagevessels)http://vocab.getty.edu/aat/300148696
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 29
Matchtype Mappings Proportion
skos:exactMatch 1161 71%
skos:closeMatch 121 7%
skos:broadMatch 346 21%
skos:narrowMatch 6
Table7:resultofthealignment
AcompletemappingofthePactolsSubjectsisplannedinthenextfewmonths.
IrishMonumentsVocabulary
ThemostdetailedclassificationsystemavailableforIrishMonumenttypesistheclasslistdevelopedbytheNational Monuments Service (NMS). This is a flat / simple hierarchical list which was used in theclassificationofsitesandmonumentsthat formedpartof theArchaeologicalSurveyof Ireland,whichwasestablishedtocompileaninventoryoftheknownarchaeologicalmonumentsintheState.TheinformationisstoredonadatabaseandinaseriesofpaperfilesthatcollectivelyformtheASISitesandMonumentsRecord(SMR).Eachsite/monumenthasauniqueSMRnumberwhichgreatlyfacilitatesthecreationofLinkedData,andeachsite/monumentisgivenaclassificationbasedontheNMSclasslist.Thedevelopmentofthelistwasanorganicandevolvingprocessandthelistissubjecttoreviewwithamendmentsbeingmadeonanon-goingbasis.
Irish Monuments Mapping was undertaken by the Discovery Programme in order to map the subjectclassificationsintheNMSlisttotheGettyAAT.ThiswasdoneforeachtermbycomparingthescopenotesoftheNMSclass listtothenotesfieldoftheAATOnline.Thisautomatically introducesa levelofsubjectivitywhichwascounteredbyusinganappropriateSKOSmappingpropertywhenlinkingtothetargetvocabulary(AAT).Wheretherewasanyambiguityabouttheterm,broadermappingpropertieswerealwaysused.
IncertaincaseswheremappingsweredifficultandcouldbemorecloselyrelatedtotheUKFISHThesaurusofMonumentTypes,theVocabularyMatchingTooldevelopedbyUSWwasfirstusedto identifymatchingterms,whichwasinturnmappedtotheAAT(i.e.atwostagemappingprocess).
ThenatureoftheclassificationlistoftheNMSpresentedoccasionaldifficulties:• Someclassificationscontainedhighlydetailedelementse.g.objecttermswererefinedattermlevel
bytheirpresentlocation[Cist(presentlocation)]orweredevelopedinordertoclassifyidiosyncraticsites[turfstand;watchman’shut-burialground].
• TherewasgreatercongruencebetweentheFISHMonumentTypevocabularyandthe Irishsubjecttermsenablinggreaterpossibilitiestofindanexactorclosematch.Insomecasestermshadclearlybeen based on the FISH vocabulary. This was to be expected due to geographical / historicalcontiguity.Forexamplebullaunstone, forwhichthereareover1000currentlydocumented in theASI,relatesmorecloselytoa‘cup-markedstone’inFISHbutcanonlybesatisfactorilymappedusingtwo(ormore)termsintheGettyAAT[ceremonialobjects;mortaria].
• Some termsarenot clearlydefined in theNMSclass list [e.g.settlementplatform: ‘A raisedarea,oftensurroundedbywaterloggedorboggy land,whichhasevidenceof formerhumanhabitation’]whichmademapping,evenatahighlevel,difficult.
• Subjectdefinitionsoftenincludedbroadperiodclassificationswithinthescopenote;itwasdecidednottotakethisintoconsiderationasperiodtermscouldbecoveredbytheIrishPeriodsVocabulary.Occasionally terms contained period terms in their term name (e.g. House-16th century; House-16th/17thcentury)aswellasarefiningsubjectelement(e.g.House-fortifiedhouse)ThisnecessitatesboththeuseoftheIrishPeriodsVocabularyand/oradditionaltermsfromtheAAT.
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 30
• Some classifications were subdivided (but not hierarchically) into more specific elements (e.g.Ringfort-cashel;Ringfort-rath;Ringfort-unclassified).Thegranularityofthetermswasconservedbyusingtheappropriatemappingproperty, insomecasesbymappingtermstomultipletermsinthetargetvocabularye.g.
§ Ringfort-cashel->[skos:broadMatch]->raths§ Ringfort-cashel->[skos:broadMatch]->drywalls(masonry)
ThemappingprocessattemptedtobalancethepressingneedtoimplementLinkedDatawiththerealitythattheavailablevocabularywasrich indetail,but lackedastructurethatwaseasily reconciledwithstandardconcepts of controlled vocabularies and indexing. This was largely achieved bymultiplemappings to thetarget vocabulary, as well as by utilising an intermediate vocabulary which more closely reflected theparticularnuancesofIrishmonumenttypes.
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 31
4 MappingsintheARIADNEinfrastructure
TheARIADNECatalogueDataModel (ACDM) specifies themetadata schema that underpins theARIADNEinfrastructure(seeD12.2:InfrastructureDesign).TheACDMisbasedontheDCATvocabulary,addingclassesandpropertiesneededfordescribingARIADNEassets.TheARIADNECatalogueaggregatesmetadata,suchasdescriptions for datasets,metadata schemas, vocabularies, etc. provided by the project partners throughmetadata file uploads, or theOpen Archives Initiative Protocol forMetadata Harvesting (OAI-PMH).1 Themetadataandobject repositoryaggregator (MORe)2 (Isaacetal.,2013)hasbeencustomized forARIADNEpurposes and is driven by the ACDM.MORe includes a set ofmicro-services, including variousmetadataenrichment services. ForARIADNEpurposes, a bespokederivedAAT subject enrichment service has beendevelopedbyATHENADCU thatapplies thepartnervocabularymappings (in JSON format) to thepartnersubject metadata and derives an AAT concept (both preferred label and URI) to augment the subjectmetadata,bothintheRegistryandalsosuppliedtotheARIADNEPortal.
Forsubjectaccess,theACDMArchaeologicalResourceclasshastwokindsofsubjectproperty.Theproperty,native-subject, associates the resourcewith oneormore items froma controlled vocabulary usedby thedataproviderto indexthedata.Howeverasdiscussed inSection2.2, therearea largenumberofpartnervocabulariesinseveraldifferentlanguages,andcrosssearchisrendereddifficult,astherearenosemanticlinks or mappings between the various local vocabularies. The established solution to this problem is toemploy mapping between the concepts in the different vocabularies. However, as discussed above, thecreationoflinksdirectlybetweentheitemsfromdifferentvocabulariescanquicklybecomeunmanageableasthenumberofvocabulariesincreases.Ascalablesolutiontothismappingproblemistoemploythehubarchitecture, an intermediate structure where concepts from the ARIADNE data provider sourcevocabulariescanbemapped(ISO2013).Intheportal,retrievalbasedonaconceptfromonevocabulary(inasearch or browsing operation) can use the hub to connect to subjectmetadata from other vocabularies,possiblyexpressed inother languages. In theACDM,ariadne-subject isused forsharedconcepts fromthehubvocabulary(theAAT),whichhavebeenderivedviathevariousmappingsfromsourcevocabularies.ThisunderpinstheMOReenrichmentservicesaugmentingthedata importedtotheRegistrywithmappedhubconcepts.ThesederivedsubjectsinturnmakepossibleconceptbasedsearchandbrowsingintheARIADNEPortal. It isthusanticipatedthatthemappingscanformoneofthesteppingstonestowardsamultilingualcapabilityinthePortal.
4.1 Mappingenrichmentprocess
TheAATLinkedOpenDatathatformsthebasisoftheARIADNEmappinghubvocabulary isexpressedinacombinationofontologicalmodels includingSKOS.Theappropriaterepresentation for themappings isviaSKOSmappingproperties(seeSKOSMappingProperties).TheoutputfromthemappingtoolsofthepartnermappingsfromtheirsourcevocabulariestotheAATistransformedtotherequiredJSONformatbyUSWforcommunication to the Registry team at DCU, where it is processed by the relevant MoRe enrichmentservices.AbriefexampleofthisJSONformatisgiveninAppendixB.
The information from themapping a tool is passed toMORewhich associates itwith theproviderof thevocabulary. Itupdatesthepropertyderived-subjectusingtheAATmappingsandenrichesanACDMrecord(seeFigure15),addingabroaderterm,oraskos:altLabeltocorrelateatermusingthe‘usefor’relationship,oraddsmultilinguallabels(skos:prefLabelandskos:altLabel)inordertofacilitatemultilingualsearch.
1http://www.openarchives.org/pmh/2http://more.dcu.gr/
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 32
Figure15:MOReenrichment
4.2 MappingswithintheARIADNEportal
Atthetimeofwriting,developmentoftheARIADNEPortalandthesearchfunctionalityisstillongoingwithmappingsstillbeing importedfromsomepartners.However, it ispossibletohaveapreviewatthisstage.Figure16 showsaqueryon thePortalmakinguseof themappings.On themainResults screen, a setoffilters isavailable for refininga search following the facetedsearchparadigm.One filter, currentlynamedDerived Subject, is populated by the MORe enrichment process described in section 4.1; effectively theDerivedSubjectsareAATconcepts,whichhavebeenmappedtothenativevocabularyconceptsthatformthesubjectmetadataofthedataresourcesinthePortal.Figure16showsthatasimplequeryonthesingleAAT (mapped) concept, churches (buildings), is able to retrieve results in multiple languages from AIAC(Fasti),DAIandDANSARIADNEcontentproviders.ResultsfromADSarealsoreturnedthoughnotshowninthisscreendump,whichonlyshowsasmallnumberoftheoveralltotalresults.
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 33
Figure 16: Portal Query on AATmapped subject: churches (buildings) showing results from AIAC (Fasti), DAI andDANS,withmultiplelanguages(June2016)
Infuturework,makingthemappings(andmappingservices)fullyavailableasoutcomesintheirownright,with appropriate metadata for the mappings would be desirable, as more than one mapping may beproduced for large vocabularies. Themappingsmay also serve to underpin amultilingual capability in aninitialstringsearch,byaugmentingthelanguagecoverageoftheAAT.
FulltechnicaldocumentationaboutthemappingspresentedinthisreportisavailablefordownloadfromtheARIADNEwebsite.
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 34
5 Conclusion
This report has reviewed the key vocabularies considered relevant to the ARIADNE project. Mappingbetweenvocabularieshasbeenshowntobeakeyaspectforconceptbasedsearch,avoidingtheambiguitiesposed by literal string search and making possible a multi-lingual search capability. The Getty AAT wasselectedasamappinghubvocabularyandpartnernativevocabularieshavebeenmappedtoitusingSKOSmapping relationships and bespoke mapping utilities. The mappings have been incorporated into theRegistryenrichmentprocesssothatpartnersubjectmetadatahasbeenaugmentedbyAATconcepts.
ThetwoprototypeexperimentsalsoshowthepotentialofworkingwiththeURIidentifiersofAATconceptsratherthantheambiguousstringsoftermlabels.UsingtheURIidentifierfortheconceptavoidstheproblemof ambiguity, common in multilingual datasets, for terms that are homographs in different languages.Working at the concept level also makes possible hierarchical semantic expansion, making use of thebroader generic (“IS-A”) relationships between concepts in a hierarchically structured knowledgeorganization system, such as the AAT. Thus a search expressed at a general level can (if desired) returnresults indexedat amore specific level. Forexample, a searchon settlementsmight also returnmonasticcentres.
An example from the ARIADNE Portal has illustrated the potential for themappings to assist a query inretrieving results in multiple languages. Themappings have potential to underpin various options in thesearchfunctionalityanduserinterface,offeringacosteffectiveroutetowardsdifferentformsofmultilingualfunctionality.
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 35
6 ReferencesAitchison,J.,Gilchrist,A.,Bawden,D.(2000).Thesaurusconstructionanduse:apracticalmanual(4thedition).ASLIB,London.
ARIADNEproject(2016).Availableat:http://www.ariadne-infrastructure.eu/[Accessed15Jun.2016].ARIADNECatalogDataModel(ACDM)ArtandArchitectureThesaurus.J.PaulGettyTrust.http://www.getty.edu/research/conducting_research/vocabularies/aat/index.html[Accessed15Jun.2016]
Berners-Lee,T.LinkedData.Availableat:http://www.w3.org/DesignIssues/LinkedData.htmlBindingC.,TudhopeD.(2016).ImprovingInteroperabilityusingVocabularyLinkedData.InternationalJournalonDigitalLibraries,17(1),5-21.Springer.
Bizer,C.,Heath,T.,Berners-Lee,T.(2009).LinkedData-TheStorySoFar.InternationalJournalonSemanticWebandInformationSystems,5(3):1–22.
Caracciolo,C.,Stellato,A.,Rajbahndari,S.,Morshed,A.,Johannsen,G.,Jaques,Y.andKeizer,J.(2012).Thesaurusmaintenance,alignmentandpublicationaslinkeddata:theAGROVOCusecase.InternationalJournalofMetadata,SemanticsandOntologies,7(1):65-75.Inderscience.
Caracciolo,C.,Stellato,A.,Morshed,A.,Johannsen,G.,Rajbahndari,S.,Jaques,Y.andKeizer,J.(2013).TheAGROVOCLinkedDataset.SemanticWeb,4(3):341-348.IOSPress
Charles,V.,Devarenne,C.(2014).EuropeanaenrichesitsdatawiththeAAT.EDMcasestudy.Availableat:http://pro.europeana.eu/page/europeana-aat[accessed30/11/2015]
DataCatalogVocabulary(DCAT)Availableat:http://www.w3.org/TR/vocab/dcat/GettyResearchInstitute(2016a).GettyVocabularies[online]Availableat:http://www.getty.edu/research/tools/vocabularies/[Accessed15Jun.2016].
GettyResearchInstitute(2016b).GettyVocabulariesasLinkedOpenData.[online]Availableat:http://www.getty.edu/research/tools/vocabularies/lod/[Accessed15Jun.2016].
GettyResearchInstitute(2016c).GettyVocabulariesSPARQLendpoint.[online]Availableat:http://vocab.getty.edu/sparql/[Accessed15Jun.2016].
Gormley,C.,Tong,Z.(2015).Elasticsearch–TheDefinitiveGuide.GenreExpansion[online]Availableat:https://www.elastic.co/guide/en/elasticsearch/guide/current/synonyms-expand-or-contract.html#synonyms-genres
Harpring,P.(2016).ArtandArchitectureThesaurus:IntroductionandOverview.http://www.getty.edu/research/tools/vocabularies/aat_in_depth.pdf[Accessed15Jun.2016].
Heritagedata.org.(2016).LinkedDataVocabulariesforCulturalHeritage[online]Availableat:http://www.heritagedata.org/[Accessed15Jun.2016].
Isaac,A.,Charles,V.,Fernie,K.,Dallas,C.,Gavrilis,D.andAngelis,S.(2013).AchievingInteroperabilitybetweentheCARAREschemaforMonumentsandSitesandtheEuropeanaDataModel,inProceedingsoftheInternationalConferenceonDublinCoreandMetadataApplications,DC-2013.Lisbon,Portugal,115–125.
ISO25964-1:2011.Informationanddocumentation-Thesauriandinteroperabilitywithothervocabularies-Part1:Thesauriforinformationretrieval.Availableat:http://www.niso.org/schemas/iso25964/#part1[accessed30/11/2015]
ISO25964-2:2013.Informationanddocumentation-Thesauriandinteroperabilitywithothervocabularies-Part2:Interoperabilitywithothervocabularies.Availableat:http://www.niso.org/schemas/iso25964/#part2[accessed30/11/2015]
Jansma,E.,Brewer,P.andZandhuis,I.(2010).TRiDaS1.1:Thetree-ringdatastandard.Dendrochronologia,28(2),pp.99-130.Availableat:http://dx.doi.org/10.1016/j.dendro.2009.06.009[accessed15/06/2016]
Jansma,E.,vanLanen,R.,Brewer,P.andKramer,R.(2012).TheDCCD:Adigitaldatainfrastructurefortree-ringresearch.Dendrochronologia,30(4),pp.249-251.Availableat:http://dx.doi.org/10.1016/j.dendro.2011.12.002[accessed15/06/2016]
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 36
Kempf,A.,Neubert,J.(2016).TheRoleofThesauriinanOpenWeb:ACaseStudyoftheSTWThesaurusforEconomics.KnowledgeOrganization,43(3),160-173.ErgonVerlag.
Koch,T.,Neuroth,H.andDay,M.(2003).Renardus:Cross-browsingEuropeansubjectgatewaysviaacommonclassificationsystem(DDC).In:McIlwaine,I.C.(ed.)Subjectretrievalinanetworkedworld:proceedingsoftheIFLASatelliteMeetingheldinDublin,OH,14-16August2001.(UBCIMPublications,NewSeries,Vol.25).München:K.G.Saur,25-33.
SKOSMappingProperties.Availableat:http://www.w3.org/TR/skos-reference/#L4138SparqlGui(2016)-desktopRDFqueryingtool.Availableat:https://bitbucket.org/dotnetrdf/dotnetrdf/wiki/UserGuide/Tools/SparqlGui
STWThesaurusforEconomicsandassociatedwebservices.LeibnizInformationCentreforEconomics.Availableat:http://zbw.eu/stw/[accessed30/11/2015]
TudhopeD.,KochT.,HeeryR.(2006).TerminologyServicesandTechnology:JISCstateoftheartreview.Availableat:http://www.jisc.ac.uk/media/documents/programmes/capital/terminology_services_and_technology_review_sep_06.pdf[accessed15/06/2016]
Tudhope,D.,Binding,C.(2016).StillQuitePopularAfterallThoseYears-TheContinuedRelevanceoftheInformationRetrievalThesaurus.KnowledgeOrganization,43(3),174-179.ErgonVerlag.
Vizine-Goetz,D.,Hickey,C.,Houghton,A.,Thompson,R.(2003).VocabularyMappingforTerminologyServices.JournalofDigitalInformation,4(4),ArticleNo.272,2004-03-11.Availableat:https://journals.tdl.org/jodi/index.php/jodi/article/view/114/113[accessed15/06/2016]
Zeng,M.,Chan,L.(2004).Trendsandissuesinestablishinginteroperabilityamongknowledgeorganizationsystems.JournalofAmericanSocietyforInformationScienceandTechnology,55(5):377-395.Wiley.
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 37
7 AppendixA
Conceptmappingsusedfortheprototypemappingexercise(TurtleRDFformat):
#namespaceprefixes
@prefixskos:<http://www.w3.org/2004/02/skos/core#>.
@prefixaat:<http://vocab.getty.edu/aat/>.
@prefixfasti:<http://fastionline.org/monumenttype/>.
@prefixiccd:<http://www.iccd.beniculturali.it/monuments/>.
@prefixdans:<http://www.rnaproject.org/data/>.
@prefixtmt:<http://purl.org/heritagedata/schemes/eh_tmt2/concepts/>.
@prefixdct:<http://purl.org/dc/terms/>.
@prefixgvp:<http://vocab.getty.edu/ontology#>.
@prefixdai:<http://archwort.dainst.org/thesaurus/de/vocab/?tema=>.
#ICCDconcepts
iccd:catacombaskos:prefLabel"catacomba"@it.
iccd:cenotafioskos:prefLabel"cenotafio"@it.
iccd:cimiteroskos:prefLabel"cimitero"@it.
iccd:colombarioskos:prefLabel"colombario"@it.
iccd:dolmenskos:prefLabel"dolmen"@it.
iccd:mausoleoskos:prefLabel"mausoleo"@it.
iccd:menhirskos:prefLabel"menhir"@it.
iccd:monumento-funerarioskos:prefLabel"monumentofunerario"@it.
iccd:necropoliskos:prefLabel"necropoli"@it.
iccd:sepolcreto-rupestreskos:prefLabel"sepolcretorupestre"@it.
iccd:tombaskos:prefLabel"tomba"@it.
#ICCD->AATmappings
iccd:catacombaskos:closeMatchaat:300000367.
iccd:cenotafioskos:closeMatchaat:300007027.
iccd:cimiteroskos:closeMatchaat:300266755.
iccd:colombarioskos:closeMatchaat:300000370.
iccd:dolmenskos:closeMatchaat:300005934.
iccd:mausoleoskos:closeMatchaat:300005891.
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 38
iccd:menhirskos:closeMatchaat:300006985.
iccd:necropoliskos:closeMatchaat:300000372.
iccd:sepolcreto-rupestreskos:closeMatchaat:300387008.
iccd:tombaskos:closeMatchaat:300005926.
#DANSconcepts
dans:8f14ae7e-3d66-4e85-b77c-454a261150e9skos:prefLabel"begraving"@nl.
dans:e98c8cf0-aa0d-4fcd-99a2-db76cd1d827dskos:prefLabel"begraving,onbepaald"@nl.
dans:87a2f9e9-8e40-4c97-b17b-82275d54c78dskos:prefLabel"brandheuvelveld"@nl.
dans:be95a643-da30-40b9-b509-eadfb00610c4skos:prefLabel"christelijk/joodsebegraafplaats"@nl.
dans:77130cff-58e0-4c6d-b608-33fadc946283skos:prefLabel"dierengraf"@nl.
dans:df17ef8a-1a58-4c58-ab6f-2e127c90c571skos:prefLabel"grafheuvel"@nl.
dans:9a729782-ca06-47e1-aa50-87561f36a8eeskos:prefLabel"grafheuvelveld"@nl.
dans:6a7482e5-2fd5-48fb-baf4-66ad3d4ed95eskos:prefLabel"kerkhof"@nl.
dans:e1f67762-c405-42a5-b073-88c13043aab0skos:prefLabel"megalietgraf"@nl.
dans:abb41cf1-30dc-4d55-8c18-d599ebba1bc2skos:prefLabel"rijengrafveld"@nl.
dans:74899123-2b00-4e12-83f2-f37bc4f129ffskos:prefLabel"terechtstellingsplaats/galgenberg"@nl.
dans:b98f1315-91c5-411e-b91b-9693e5dfc5c2skos:prefLabel"urnenveld"@nl.
dans:a156e09c-b40c-45a9-8487-d7b68f8dbae7skos:prefLabel"vlakgraf"@nl.
dans:b935f9a9-7456-4669-91d0-2e9c0ff7d664skos:prefLabel"vlakgrafveld"@nl.
#DANS->AATmappings
dans:8f14ae7e-3d66-4e85-b77c-454a261150e9skos:closeMatchaat:300387004.
dans:e98c8cf0-aa0d-4fcd-99a2-db76cd1d827dskos:closeMatchaat:300387004.
dans:be95a643-da30-40b9-b509-eadfb00610c4skos:broadMatchaat:300266755.
dans:6a7482e5-2fd5-48fb-baf4-66ad3d4ed95eskos:closeMatchaat:300000360.
dans:abb41cf1-30dc-4d55-8c18-d599ebba1bc2skos:closeMatchaat:300266755.
dans:b935f9a9-7456-4669-91d0-2e9c0ff7d664skos:broadMatchaat:300266755.
#EH-TMTconcepts
tmt:70053skos:prefLabel"cemetery"@en.
tmt:100531skos:prefLabel"walledcemetery"@en.
tmt:92672skos:prefLabel"mixedcemetery"@en.
tmt:70060skos:prefLabel"inhumationcemetery"@en.
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 39
tmt:70056skos:prefLabel"cremationcemetery"@en.
tmt:70055skos:prefLabel"cairncemetery"@en.
tmt:70054skos:prefLabel"barrowcemetery"@en.
tmt:91386skos:prefLabel"catacomb(funerary)"@en.
tmt:70053skos:prefLabel"necropolis"@en.
#EH-TMT->AATmappings
tmt:70053skos:closeMatchaat:300266755.
tmt:100531skos:broadMatchaat:300266755.
tmt:92672skos:broadMatchaat:300266755.
tmt:70060skos:broadMatchaat:300266755.
tmt:70056skos:broadMatchaat:300266755.
tmt:70055skos:broadMatchaat:300266755.
tmt:70054skos:broadMatchaat:300266755.
tmt:91386skos:closeMatchaat:300000367.
tmt:70053skos:closeMatchaat:300000372.
#FASTIconcepts
fasti:burialskos:prefLabel"Burial"@en.
fasti:catacombskos:prefLabel"Catacomb"@en.
fasti:cemeteryskos:prefLabel"Cemetery"@en.
fasti:columbariumskos:prefLabel"Columbarium"@en.
fasti:mausoleumskos:prefLabel"Mausoleum"@en.
#FASTI->AATmappings
fasti:burialskos:closeMatchaat:300387004.
fasti:catacombskos:closeMatchaat:300000367.
fasti:cemeteryskos:closeMatchaat:300266755.
fasti:columbariumskos:closeMatchaat:300000370.
fasti:mausoleumskos:closeMatchaat:300005891,aat:300263068.
#DAIconcepts
dai:1819skos:prefLabel"Friedhof"@de.#cemetery
dai:1947skos:prefLabel"Gräberfeld"@de.#graveyard
dai:3736skos:prefLabel"Kolumbarium"@de.#columbarium
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 40
dai:2485skos:prefLabel"Nekropole"@de.#necropolis
#DAI->AATmappings
dai:1819skos:closeMatchaat:300266755.
dai:1947skos:closeMatchaat:300000360.
dai:3736skos:closeMatchaat:300000370.
dai:2485skos:closeMatchaat:300000372.
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 41
8 AppendixB
ExampleoftheJSONexchangeformatforcommunicatingthemappingstotheARIADNERegistryteam,usingthemappingsofthreeFASTI(AIAC)conceptstotheAAT
[
{
"created":"2015-11-20T15:27:13.342Z",
"sourceURI":"http://www.fastionline.org/concept/attribute/abbey",
"sourceLabel":"Abbey",
"matchURI":"http://www.w3.org/2004/02/skos/core#closeMatch",
"targetURI":"http://vocab.getty.edu/aat/300000642",
"targetLabel":"abbeys(monasteries)"
},
{
"created":"2015-11-20T15:27:13.342Z",
"sourceURI":"http://www.fastionline.org/concept/attribute/amphitheatre",
"sourceLabel":"Amphitheatre",
"matchURI":"http://www.w3.org/2004/02/skos/core#exactMatch",
"targetURI":"http://vocab.getty.edu/aat/300007128",
"targetLabel":"amphitheaters(builtworks)"
},
{
"created":"2015-11-20T15:27:13.342Z",
"sourceURI":"http://www.fastionline.org/concept/attribute/ancient_beach",
"sourceLabel":"Ancientbeach",
"matchURI":"http://www.w3.org/2004/02/skos/core#broadMatch",
"targetURI":"http://vocab.getty.edu/aat/300008816",
"targetLabel":"beaches"
}
]
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 42
9 AppendixC
Extractfrommappingguidelines
This document should be read in conjunction withMapping-Template.xlsx. This document describes thecolumnsinthespreadsheettemplateusedformappingpartnersourcevocabularies(thesauri)totheGettyAAT(ArtandArchitectureThesaurus),aspartoftheSubjectaccessstrategyforARIADNE.ThemappingswillinformcrosssearchforresourcediscoveryintheARIADNEPortal.
The mapping exercise matches concepts in a Partner vocabulary with concepts in the AAT using SKOSmappingrelations(e.g.skos:broadMatch,http://www.w3.org/TR/skos-reference/#mapping).Thedocumentalsocontainsguidelinesformakingthemappings.
TheMapping Template is an alternative to the (USW)VocabularyMatching Tool,which requires that thesource vocabulary be available as Linked Data. The Mapping Template allows mappings to be made bypartners' own methods (e.g. using AAT and source vocabulary webpages, or some other tool) andrepresented in a spreadsheet. A separate spreadsheet should be produced for each partner vocabularymapped to the AAT. The standard column names in theMapping Template should be followed. ThiswillallowasubsequentautomatictransformationbyUSWtotheRDFstatements,employedbytheRegistryandPortal.
The first tab in a partner mapping spreadsheet (Mapping-Template-partner-source.xlsx) should containmetadata and any necessary description of the mapping exercise. This can inform a subsequent VoIDmetadatadescriptionofthemapping.Themetadatashould includethefollowing itemsusingthefirstandsecondcolumns(pleasesubstitutetheNameofSourceVocabularyforXXX):-
dcterms:creator Nameoforganisationdoingthemapping
dcterms:created Dateofcreation(onedaterepresentingacompletemappingexercise)
dcterms:modified Dateoflastmodification
dcterms:title SKOSMappingbetweenconceptsinsource(XXX)andtarget(AAT)vocabulariesusingSKOSmappingproperties.
void:subjectsTarget URIofsourcevocabularyifknown(e.g.http://purl.org/heritagedata/schemes/eh_tmt2)
void:objectsTarget URIoftargetvocabulary(forARIADNEthiswillbehttp://vocab.getty.edu/aat/)
dcterms:description AnintellectualmatchingmadeforARIADNEfromthesourcevocabularyXXXtotheGettyAATforresourcediscoverycrosssearchpurposes.Includehereanydetailsofmethod(hopefullywithexpertreview)
dcterms:license TheRightsappropriateforPartnerandARIADNE,e.g.perhapsCC0orCC_BY/3.0
Thesecondtabshouldholdthemappingusingthecolumnnamesbelow(onespreadsheetforeachdifferentsourcevocabulary).Adifferentmapping is specified ineachrow.The followingcolumnnames inboldaremandatory(necessaryforexpressingtheresultingRDFstatements).
sourceLabel (thepreferredtermorlabelfortheconcept)
sourceURI (useURIifitexists,otherwiseuniqueconceptID,otherwiseprefLabelagain)
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 43
matchURI (skos:closeMatch|skos:exactMatch|skos:broadMatch)
targetLabel AATlabelforconcept(e.g.smallholdings)
targetURI AATURIforconcept(e.g.http://vocab.getty.edu/aat/300000211)
Additional optional columns may be useful while creating the mappings or for human inspection of themappingspreadsheetbypartnersbutarenotrequired.Examplesofoptionalcolumnsfrompartnermappingworktodateinclude:
Source-Hierarchy (hierarchyorcategorythesourceconceptbelongsto)
Source-ScopeNote (scopenoteordefinitionofconcept-thismaybeparticularlyuseful)
Source-En (anEnglishlanguagetranslation,orotherlanguagesifdesired)
Comment (ifdesired,anycommentonthismapping,egarationale)
Other-Target-prefLabel (ifusefultopartnertoalsoincludemappingstootherthesauri)
Other-Target-URI (ifusefultopartnertoalsoincludemappingstootherthesauri)
Mappingguidelines
Theaimof themappingexercise is to identify subjectmappings toAAT for concepts that are likely tobeusefultoassistbrowsingandsearchoftheportal(timeandspacearebeinghandledseparately).
IfanyexistingmappingstoAATareknowntheymaybeusefultobuildon.TheAATcanalsobesearchedandbrowsedmanuallyviatheGettywebsite–http://www.getty.edu/research/tools/vocabularies/aat/
ProbablytheAATObjectshierarchyisthemostrelevanthierarchy.
Ifresourcesarelimited,asensiblestrategyistostartwiththemostusefulconceptsinthefirstinstanceforthedatasets/reportspartnershaveprovidedtotheRegistry.Thesewouldprobably includethetop(say2)levelsofrelevantpartnerthesauri(e.g.ObjectsandMonumenttypes)andalsoconceptsusedtoindexthedataprovidedtotheregistry.Itwillalsoincludecontrolledkeywordlistsusedbypartnerstoindexthedata.
Matchtypesforthemapping
If themapping is approximate then skos:closeMatch is probably the bestmatch type. If it is a very goodmatchthenskos:exactMatchisappropriate.Ingeneral,donotmakeuseofskos:relatedMatchforARIADNEpurposes (unless perhaps as an additional mapping for a given concept). The idea is to make the mostappropriatematchforeachconceptinthePartnervocabulary.
Usuallyyouwilljustmakeonematch(thebestone)toAATforanygivenconcept-thereisusuallynoneedtoexpressmultiplerelationshipstoAATconceptsasthisisprovidedgratisviatheAAT’ssemanticstructure.Thus if youmakeamatch fromagivenpartner concept toanAATconcept then there isnoneed toalsomake mappings to narrower AAT concepts for that given partner concept. The only exception is if thepartnerconcepthastwogenuinelyquitedifferentexpressionsintheAAT(thatarenotimmediateparentorchildconcepts).Inthiscaseoneortwoadditionalmappingsarepossiblebutthatshouldbeverymuchtheexception. Normally you would work through a hierarchy making a mapping for each concept, givingcompletecoverageofthathierarchy.
IfapartnerconceptismuchmorespecificthananyAATconceptthenyoucanmakeaskos:broadMatchtotheAATconcept.Thisisusefulforcaseswhenapartnervocabularyhasdetailedarchaeologicalconcepts.Itisnotexpectedthatyouwouldneedtomakemuchuseofskos:narrowMatchforARIADNEvocabularies.
MatchesshouldbemadetoAATconceptsratherthanguide-terms(inside<>).IfanAATguidetermappearsasamatchinthetool,consideranarrowerorbroaderconceptintheAAT.Forexample,insteadofmapping
ARIADNE–Deliverable15.1:ReportonThesauriandTaxonomies July2016
Deliverable15.1 44
to <containers by form>, it is better tomap to containers (receptacles) even if themapping relationshipneedstobeskos:broadMatch.
Wheretoplevelpartnerconceptsaretoohighlevelorgeneral(egperhaps‘society’,‘religion’)tomapeasilythenprobablybesttoconsiderthenext leveldown.IfanypartnerconceptsproveparticularlyproblematicthenjustsetthemasideanddiscusswithUSWlater.
Optionalmatchingtool
WhenvocabulariesarealreadyavailableasLinkedDataviatheRegistryorviaHeritageDatathentheUSWVocabularyMatchingToolmaybehelpful.
http://heritagedata.org/vocabularyMatchingTool/
When using the VocabularyMatching Tool, remember to Save the data before ending a session (data issavedinJSONformat).ThisallowsyoutosubsequentlyLoadtheJSONfileintothetoolandmakerevisionsor further mappings. When sending the final results of the matching exercise, please send us the JSONformatfile.