Abstracts DHBenelux Tuesday · currencies such as Bitcoin, it seems future generations will see...

Preview:

Citation preview

1

AbstractsDHBenelux2017conferenceTuesday4July2017

SessionA

1.CoinProductionintheLowCountries,fourteenthcenturytothepresentRombertStapel1,JacoZuijderduijn2,JanLucassen1,KerimMeijer1

InternationalInstituteforSocialHistory,Amsterdam,NetherlandsLundUniversity,Lund,Sweden

Thisprojectcollects,combinesandmakesavailabledataonmintproductionintheLowCountries(Netherlands,Belgium,Luxembourg)andhasdevelopedawebapplicationtoqueryandvisualizethedata,whichisalsolinkedtoadigitalmapof(changing)historicalboundariesintheLowCountriesfrom1100tothepresent(availableinLinkedOpenData).Itprovidesscholarswithauser-friendlyapproachtolargedatasets,andallowsthemaccesstosuchvariablesasregionalproductionfiguresandcoindenominations.

IntroductionMonetizationisakeyconceptineconomicsandineconomichistory.Throughouthistorycurrencieswereacrucialelementofeconomicexchange:firstintheformofmetalcoins,whichmadeupthelion’sshareofcurrencies,andwerewidelyusedineverydaytransactions.Onlymuchlaterpapermoneyalsoemerged:beforetheFirstWorldWarveryfewnormalpeoplewouldhaveeverseenpapermoney.Finally,nowadaysnon-materialbookmoneyhasbecomemuchmoreimportantthancurrencies,andwiththeonsetofmobilebankingandvirtualcurrenciessuchasBitcoin,itseemsfuturegenerationswillseemuchlesscurrenciesthanpeopleinthepast.

Historicalsocietiesdependedasmuchonmediaofexchangeaswedotoday:coinsandpapermoneyhelpedagreatdealinrealizingeverydaytransactions,asdidvariousformsofcredit.Coinproductionfiguresareofcrucialimportanceforunderstandingdevelopmentinthelongrun.1Thestudyofcoinage,theirquantity,denominations,use(e.g.inwagepayments)andmonetarypolicyingeneralprovidesimportantinsightineconomicandsocialhistoryandthisprojectprovideshistoriansafirmquantitativebasisfortheirresearch.

Inthispaper,wewillpresenttheprojectanditsgoals,giveanoverviewoftheprocessofdatacollectionandthewebapplicationwebuilttoqueryandvisualizethedata(includinggeospatialvisualizations),andprovidesomeoftheresultsforhistoricalresearchthatstemfromourdataset.

ProjectCoinProductionintheLowCountries,fourteenthcenturytothepresentprovidesanoverviewofcoinproductionfigurescoveringmanycenturies.Ofcoursewedealwithomissions:notallmintaccountsgobacktothefourteenthcentury,andnotalladministrationhassurvived.Thewebsiteallowsforanoverviewoftheminthousedatawehaveatourdisposalatthemoment,and1 Jan Lucassen and Jaco Zuijderduijn, ‘Coins, currencies, and credit instruments. Media of exchange in

economic and social history’, Tijdschrift voor sociale en economische geschiedenis 11 (2014) 1-13.

2

visualizesthemissingdata.CoinproductionintheLowCountries,fourteenthcenturytothepresentalsodoesnotpretendtobethefinaldataset:likeanyotherdatasetitreflectsthedatathathasbeencollectedandmadeavailableupuntilnow.AlthoughweareconfidentwecoverthevastmajorityofthecoinsmintedintheLowCountries,someneworoverlookedsourcesmayemergeinthefuture;wearelikelytomakeadditionsintime.Thedatasetrepresentsthedatawepresentlyhave,andisatooltobeusedbyscholarslookingforvariablesrelatedtocoinproduction.

Ourgoalforthisprojectwastotaketheaforementioneddatasets,checkthevalidityofthecollecteddata,selectand/or(re)calculatetherelevantvariablesforourproject,combinethedifferentdatasets,andpresentourselectedvariablesinawebapplicationwhichallowstheusertoqueryandvisualizethedata.

Manualwebapplication

Figure1.NumberofcoinsmintedinFlandersbetween1334and1700,organisedperalloy(status:November2016).

Inthewebapplication2,theusercanquerythedataandcreate(andexport)theirownsubsets.Differentqueriesandselectionscanbemadeatthetopleft.ThisincludesthepossibilitytodisplaytheValueindeniergroot,acommoncoinusedasmoneyofaccount,inhourlywages.Thequerystartsbyclicking‘Run’.Therearethreetabs:‘Table’,‘Chart’,and‘Map’.Thevariablesinthetableandchartcanbeadjustedfreelyontheright.Themapneedssomefurther

2https://datasets.socialhistory.org/dataverse/coinproduction/search/.

3

introduction.Atthemoment,themapisusedtogivearoughindicationhowcompleteourdatasetisforparticularminthousesandauthoritiesintime.WehaveturnedtotheworksbyHugoVanhoudtandH.EnnovanGelder,supplementedwithdatafromourowndatasets,todeterminetheyearsofactivitiesofminthousesandauthorities.3

Thecolourofaregionthatmintedcoins(e.g.DuchyofBrabant)willbedependentonthenumberofyears(inaparticularquery)weknowthatregionwasmintingcoinsandforwhichofthoseyearswehaveactualproductionfiguresinourdataset.Thisalsoappliestotheminthouses,wherewehaveusedpiecharts.Forthispurpose,wehavecreatedaGISmapofallmajorauthoritiesintheLowCountriesintime.4Thismeansthatborderswillchangewithtimeandminthouseswillpopupanddisappear.5Aslideronthetopleftcornerofthemapallowstheusertochangetheyears.On therightoftheapplication,differentoptionsregardingthemapcanbeselected,choosingwhetherthecoloursandpiechartsshouldchangeinstantaneouswiththesliderornot.

Figure2.MapoftheLowCountries(1432)withpercentageofdataavailabilityinthatyear(status:November2016).

3 H. Vanhoudt,Atlas dermunten van België van de Kelten tot heden (Heverlee 2007, 2nd edition); H.E. vanGelder,DeNederlandsemunten(Utrecht2002,8thedition).

4ForsomeimportantdisclaimersregardingtheseGISmaps,seetheintroductionathttp://hdl.handle.net/10622/HPIC74/.5 This process was visualized in a movie of the period 1100-2016, where each frame is a year:http://hdl.handle.net/10622/5KGG1T.

4

2.MappingthePlace:“DeKrookQuarter”PirayeHacıgüzeller,SallyChambers,ChristopheVerbruggenandHansBlommeGhentCentreforDigitalHumanities,GhentUniversity

ThepresentationwillelaborateonanewprojectGhentCentreforDigitalHumanities(GhentCDH)isstartingtocarryout,“MappingthePlace:‘DeKrookQuarter’”,whichinvolves“deepmapping”ofahistoricaldistrictinGhent.Inthepresentation,thecontext,framework,workflowandimpactoftheprojectwillbedescribedanddiscussed.

Theobjectiveofthe“MappingthePlace”projectistoharnessthewell-demonstratedpowerofcartographyasaparticipatorytool(Perkins2007).Specifically,theprojectaimstocontributetotheparticipatorygovernanceofculturalheritageinEuropethrough“deepmapping”adistrictinGhent(Belgium)thatembodiesplace-basedheritagesuchasVooruit(apeople’spalaceestablishedin1913thathasbeenturnedintoavibrantinternationalcontemporaryartscentre),theMinard Theatre,De Krook(thenewlybuiltcitylibraryanddigitalinnovationcentre)andadjoiningformerWintercircus,andthesurroundingstreets(Kuiperskaai)thatusedtoconnectaLatinQuarterandredlightdistrict.Incollaborationwiththeheritageinstitutionsresponsibleformanagementoftheseplaces,GhentCDHwillemployavarietyofparticipatorymappingtoolsandmethodologiesinordertoinvolvearangeofcommunitiesinadeepmappingproject.

Deepmapsare“thickspatialdescriptions”ofplacesbreakingawayfromCartesianparadigmincartography.Thelatter,knownalsoas“Westernscientificmapping”(Pickles2004;seeTurnbull1996),limitbothcontentandmethodsofmappingasittraditionallyaimstomaponlyempiricallyobservablephenomenathatisconsideredtoconstituterealityexclusively.Deepmaps,ontheotherhand,inspiredbytheconceptof“thickdescription”coinedbyanthropologistCliffordGeertz(Bodenhameretal.2105),arebasedonamuchmoreflexibleandfruitfuldefinitionofwhatcanconstituteamapandwhatconstitutesplacesaimingtobringtogetheralargeandricharrayofspatialqualities.Deepmappingisevenmorepromisingtodayasdigitalcartographyopensupmanypossibilitiestocollectandcrowdsourcenewtypesofgeospatialinformationandvisualise,integrateandanalyseitinnovelwayswiththehelpoftechnologiessuchasgeographicalinformationsystems,virtualandaugmentedrealityand,realtimemapping.

TheparticipatorydeepmapofGhent,displayedinDeKrookandVooruit,willbeaninnovative,openended,multi-vocalandlargelydigitalcartographicprocessthatwillbringtogethergeographicalinformation,sensualexperiences,memories,oralhistories,creativenarratives,emotions,knowledges,imaginations,practicesandevents.Themapisplannedtobeproducedthroughthefollowingfivetypesofactivity:a)playfulcommunitymappingexercises(Pinder1996;2005)willbeorganisedfordiversegroupsinordertocarryoutacertaincartographictask(e.g.mappinganarea)andtheirknowledgeandexperiencesoftheplaceswillberevealedintheprocessthroughtheirinteraction(e.g.Grasseni2004)b)adigitalonlinecrowdsourcingplatformforheritageplaceswillbecreatedwherepeoplecanentercartographicinformation(seePerkins2013);c)geospatialdataonpeople’semotions(http://biomapping.net/),movement,soundandsmellwillbecollectedinreal-timeandconvertedintodatasculpturesorpaintingsbyartists(see,e.g.,www.refikanadol.com/);d)multi-layeredgeographicinformationsystemsandthree-dimensionalvirtualrealitydisplayswillbeinstalledinDeKrookaffordingadiversegroupsofvisitorstoannotatetheirexperiencesandknowledgeaboutheritageplacesfocusedinthedeepmappingprojecte)(non-)digitalmap-basedor-aidedgames(e.g.geocaching)willbedesigned,developedand/oremployedinordertofacilitateconversationaboutheritageplacesinquestionbetweendiversegroupofpeopleaswellasinformingandengagingthemwiththeseplaces.ThelayersoftheparticipatorydeepmapwillbedistributedacrossmanylocalsinDeKrookcomprisingageographicalinformationsystemscomponent,virtual

5

realityroom,gameroom,exhibitionroom,digitalsculptureandpaintingrooms,screensforrealtimemapping,andcomputerswithaccesstothedigitalcrowdsourcingplatform.

ReferencesBodenhamer,D.J.,Corrigan,J.&Harris,T.M.eds.,2015.Deepmapsandspatialnarratives,Indiana:IndianaUniversityPress.

Grasseni,C.,2004.Skilledlandscapes :mappingpracticesoflocality.EnvironmentandPlanningD:SocietyandSpace,22,pp.699–717.

Perkins,C.,2007.Communitymapping.TheCartographicJournal,44(2),pp.127–137.

Perkins,C.,2013.Plottingpracticesandpolitics:(Im)mutablenarrativesinOpenStreetMap.TransactionsoftheInstituteofBritishGeographers,39(2),pp.304–317.

Pickles,J.,2004.Ahistoryofspaces:Cartographicreason,mappingandthegeo-codedworld,London&NewYork:Routledge.

Pinder,D.,1996.Subvertingcartography:thesituationistsandmapsofthecity.EnvironmentandPlanningA,28,pp.405–427.

Pinder,D.,2005.Artsofurbanexploration.CulturalGeographies,12(4),pp.383–411.

Turnbull,D.,1996.CartographyandscienceinearlymodernEurope:mappingtheconstructionofknowledgespaces.ImagoMundi,48,pp.5–24.

3. Cinemas on the Move: A geospatial analysis of the role oftravelingcinemasintheDutchcinemalandscapeJolandaVisser,JuliaNoordegraafandIvanKisjesUniversityofAmsterdam

Theemergenceofthecinemaasanewculturalindustryatthedawnofthetwentiethcenturyhashadasignificantimpactonthesocial,culturalandeconomicinfrastructuresofmodernizingsocieties.Cinema’stechnologicalandculturalinnovation,combinedwitheconomiccompetition,significantlyreconfiguredtheroleandplaceofentertainmentcultureinpubliclife.Besidesbeinganeconomicfactorofimportance,italsohasliterally“takenplace”inurbanandruralinfrastructures,transformingtheorganizationandexperienceofmodernpublicspace.

ThewaysinwhichcinemahastakenplaceinDutchpublicspacehasbeenthesubjectofanumberofstudies.Somefocusonthehistoryofspecificcinematheatresandtheurbancontextinwhichtheyfunction(Visser2012;Noordegraafetal.2016).Othershaveinvestigatednationalandlocalcinemanetworksandfocusedontheorganizationandeconomicsoftheindustry(Dibbets1980&2006;Oort2016).Yetotherstudiesfocusedonthewaysinwhichmoviesreachedtheiraudiencesandhowthiscorrelateswithspecificreligiousandideologicalorientations(BoterandClaraPafort-Overduin:2009),orstudiedthepopularityofcertaingenresorstars(VanBeusekom2013).Inaddition,acomprehensivedatabasehasbeencreatedthatfacilitatesdata-drivenresearchonnationalDutchfilmculture.6

Atthesametime,though,thestudyoftheroleofcinemainmodernpubliclifehasfocusedprimarilyonurbancontexts.WhenplottingthelocationsofcinemasfromtheCinemaContextonamap,it

6 www.cinemacontext.nl

6

appearsthatthemajorityofcinemasislocatedinurbanizedareas.Infact,therewerecinemascreeningsinlessurbanizedareasaswell;thosewerefrequentedbytravelingcinemas.Atpresent,theroleandimpactofthesetravelingcinemasinDutchcinemacultureremainsentirelyunknown.Inthispaper,wepresenttheresultsoftheveryfirststudyoftheimpactoftravelingcinemasonDutchfilmculture.Usingacombinationofnetworkandgeospatialanalysissoftware,thepapercontributes:1.newinsightsintothewaycinemaasaleisureindustrycontributedtotheshapingofmodernDutchidentity;and2.areflectionontheaffordancesandlimitationsofGISandnetworkanalysistoolsfor(cinema)historicalresearch.

CentralQuestion

OurresearchaimstoestablishtheroleandplaceoftravelingcinemasintheDutch,post-WWIIcinemalandscape.Whatwastherelationbetweenthepermanentandtravelingcinemas,intermsofgeographicaldistribution,marketshare,anddistributionandexhibitionpractices?Inordertoanswerthisquestion,weapproachtheDutchcinemalandscapeasanetworkwithsocio-economic(distribution,consumption)andcultural(programming)dimensions.Inordertoanalysethisnetwork,wecombineageospatialanalysisofthenetworkofpermanentandtravelingcinemasandowners/exhibitorsinTheNetherlandsin1949withanin-depthcasestudyofoneparticularsectionofthismarket.Thiscombinationallowsustocombineamacrosocialanalysisoftheroleoftravelingcinemasinthenationalcinemamarketwithananalysisofthecontextualfeaturesthatexplaincausalityinonespecificcase(Ragin1987).

MethodFortheresearch,weadoptedatwo-tieredapproach.First,weextendedthedataonthelocationofpermanentcinemasandtheirownersintheCinemaContextdatabasewithnewlyassembleddataontheplacesfrequentedbytravelingcinemas.Then,wemappedthesecinemasaccordingtotheirtypologies,distinguishingbetweenpermanenttheatres,theatreswithoccasionalscreeningsandtravelingcinemasinQGIS.ThisresultedinageospatialanalysisoftheorganizationoftheDutchindustrythat,forthefirsttime,includesdataontravelingcinemas.

Second,thenetworksofcinemaexhibitorsofpermanentandtravelingcinemashavebeenanalyzedbyprocessingthedataontheatresandowners/exhibitorsinGephi.Theresultinggraphallowedustoacknowledgetheinfluenceofcinemachainsaswellasindividual,non-networkedentrepreneurs.ByprojectingthesedataonhistoricalmapsinQGis,wecouldcomparethegeographicaldistributionofdifferenttypesofcinemaswiththenetworkofcinemaowners/exhibitors.Weidentifiedanumberofclusterswherepermanentcinemasandmobilecinemaswererelatedandusedthisanalysistoselectonecaseforfurther,in-depthanalysisoffilmflowswithinacinemachainwithatravelingdepartment.TheselectedcasestudytracksthefilmflowsofthecinemachainofJoh.MiedemaandhiscompetitorsintheNorthernprovinceofFrieslandin1949.

Results

Some of the data sets used already existed (Cinema Context database), some had to be digitized partly (census data) and some had to be created (film programming, traveling cinema locations and screenings). In the first phase of the project the data of the cinemas and the networks of cinemas were combined. The first results showed the geographical networks of Dutch permanent cinemas in relation to the network of owners/exhibitors. In general, as also shown by Dibbets (1980), one can conclude that half of the cinemas belonged to a cinema chain, leaving the other half as isolates.

After adding the mobile cinema networks, we identified a clear geographical distribution for exhibitors of a cinema chain with a traveling department, among others in the provinces Friesland and Drenthe. The selected case study focused on the network of Joh. Miedema in

7

Friesland, which comprised 10 permanent cinemas surrounded with places he claimed for his mobile department. It appears he used these mobile screening locations for constructing a buffer zone around the permanent cinemas in his chain, to ward of competition from other owners in the region. Reconstructing film programming practices within that network and comparing that to those of his competitors in the province of Friesland in 1949 provides new insights in the economics of a cinema chain with a traveling department, the socio-economic and cultural context of these various sites visited, and patterns of taste. Based on the first results of this research, the benefits and pitfalls of the combined use of Gephi and QGIS will also be evaluated.

ReferencesBeusekom,Ansjevan.“Distributing,programmingandrecyclingAstaNielsenfilmsintheNetherlands,1911-1920.”InImportingAstaNielsen:Theinternationalfilmstarinthemaking1910-1914,editedbyMartinLoiperdinger&UliJung,259-272.NewBarnet,HertsUK:JohnLibbey/KINtop,2013.

Boter,Jaap,andClaraPafort-Overduin.“CompartementalisationandItsInfluenceonFilmDistributionandExhibitioninTheNetherlands,1934-1936.”InDigitalToolsinMediaStudies:AnalysisandResearch:AnOverview,editedbyMichaelRoss,ManfredGrauer,andBernhardFreisleben,55–68.Bielefeld:TranscriptVerlag,2009.

Dibbets,Karel.“BioscoopketensinNederland:Economischeconcentratieengeografischespreidingvaneenbedrijfstak,1928-1977.”Doctoraalscriptie,UniversiteitvanAmsterdam,1980.online:http://kd.home.xs4all.nl/home/Karel%20Dibbets%20%20Bioscoopketens%20in%20Nederland%201980.pdf

Dibbets,Karel.“HetTaboevandeNederlandseFilmcultuur:NeutraalinEenVerzuildLand.”TijdschriftVoorMediageschiedenis9,no.2(2006):46–64.

Hallam,Julia,andLesRoberts,eds.LocatingtheMovingImage:NewApproachestoFilmandPlace,2014.

Horak,Laura.“UsingDigitalMapstoInvestigateCinemaHistory.”InTheArclightGuidebooktoMediaHistoryandtheDigitalHumanities,editedbyCharlesRAclandandEricHoyt,65–102.Falmer:ReframeBooks,2016.

Noordegraaf,Julia,Opgenhaffen,Loes,&Bakker,Norbert.“CinemaParisien3D:3DVisualisationasaToolfortheHistoryofCinemagoing”.Alphaville,11(2016):45-61.

Oort,Thunnisvan.“IndustrialOrganizationofFilmExhibitorsintheLowCountries:ComparingtheNetherlandsandBelgium,1945–1960.”HistoricalJournalofFilm,RadioandTelevision(March17,2016):1–24.Onlinefirst:http://dx.doi.org/10.1080/01439685.2016.1157294

http://dx.doi.org/10.1080/01439685.2016.1157294

Oort,Thunnisvan.“‘ComingupThisWeekend’:AmbulantFilmExhibitionintheNetherlands”.(Forthcoming).

Ragin,CharlesC.TheComparativeMethod:MovingbeyondQualitativeandQuantitativeStrategies.Berkeley,CA:UniversityofCaliforniaPress,1987.

Visser,Jolanda,SamennaarTheMovies–100jaarBioscoopopdeHaarlemmerdijk161,TheMoviesArtHouseCinemasandFilmDistributionAmsterdam:2012.

8

SessionB

1. Soft skills inhardplaces: the changing faceofDH training inEuropeanresearchinfrastructuresJenniferEdmond,TrinityCollegeDublinVickyGarnett,TrinityCollegeDublin

Researchinfrastructuresarebecominganincreasinglydistinctpresenceinthelandscapeofthedigitalhumanities,creatinguniqueresearchecosystemsthatinteractwith,butremaindistinctfrom,thetraditionaluniversity-basedones.Itisaresearchsectorstillverymuchintheprocessofdefiningitself,however,inparticularintheartsandhumanities,notonlyintermsofhowexactlyinfrastructuressupportresearchbutalsointermsofhowawordwithsuch“hard”connotations(conjuringupimagesofroadsandbridges)canencompassthemany“soft”resourcesandskills,fromdatatoknow-how,thatwenowrecogniseasapartofinfrastructuralprovisionforresearchinEurope.Thistensionisalreadyinhowresearchinfrastructureisdefined,withsomecampspreferringtofallbackonlonglistsofelementsinfrastructuremayormaynotcomprise,suchasdata,servicesandtools,whileothersremainmoretheoretical,placingthemintheroleof“mediating”(BadenochandFlickers,2010)or“belowthelevelofthework”(Edwardsetal..,2012).Regardlessofhowweconceptualiseit,however,infrastructureisundeniableasarisingpresence,withagrowingimpactonhowresearchisconceptualisedandcarriedout,howresearchresultsarecommunicatedandshared,andhowthepotentialscaleofahumanitiesprojectcanbeconceptualised.

Thereisoneelementinthislandscapeofchangethathassteadfastlyremainedbasedwithintheuniversities,however:thatisthemannerinwhichnewgenerationsofresearchersareformed,throughtrainingandeducation.Someofthereasonsforthislieintheneedforspecialisedprocedures,staff,resourcesandexpertisetodeliverformaleducationalprogrammes,alayerofprovisionthatresearchinfrastructuresseldomhave.Indeed,itisthelackofthislayerthatmostdistinctlydifferentiatesactivitiesoftheresearchinfrastructurefromthoseofthemorefamiliaracademiccontext.Aswecontinuetodevelopourunderstandingofwhatitmeansto‘teach’thedigitalhumanities(eg.Fyfe,2011,Hirsch,ed,2012,orBellamy,2012),however,weneedalsotoreconsidertheutility,responsibilityandpotentialcontributionsofotheractorsthanuniversitiesinthisprocess,andhowweintegratethemintorecognisedlearningpathways.Itisnotinfrastructuresdonotoffertrainingopportunities,justthattheparadigminformingmuchofthistraininghashistoricallybeenfoundeduponamorenarrowconceptualisationoftheaddedvalueoftheinfrastructuralspaceforcreatingandsharinguniqueknowledge.Assuch,projectsandplatformswouldtraditionallycreatematerialstoassistusersapproachingspecifictoolsdevelopedorhostedbytheinfrastructure,servingaverynarrowconceptualisationoftheuserandhisorherneeds.

Therehasbeenanincreasingnumberofexamplesoftheinfrastructuralcommunityexpandingtheiractivitiestofillspaceslesseasilyaddressedbytraditional,formal,course-andinstitution-basedtrainingcontexts,however.Hands-ontrainingwithspecificcollectionsorobjects,orusingtransnationalaccesstobuildskills,forexample,aremechanismsthathavebeendevelopedtogreateffectbyinfrastructures,ashasthemodelofpartneringwithotherorganisationstodelivercredit-bearingprogrammes.Thesearemechanismsthathaveariseninpartbecauseoftheopportunitiesthatexist,forexample,whenresearchersworkincloseproximitytospecificscientificinstruments,asinthefieldsofculturalheritageandpreservation,buthavealsoarisenasaccidentsofdesign.Manyresearchinfrastructurefundingschemesincludefixedelementsdrawndirectlyfromthelongertraditionofinfrastructuredevelopmentinthefieldsofscienceandtechnology,mechanismsthatdonotnecessarilyfithumanitiesmodesofworkorinteraction.

9

Evenassuchprogrammesremainedlargelyadhocextensionsoftheoriginatinguser-supportmodeloftraining,theyexposedthepotentialofresearchinfrastructuresnotonlyasplacesthatsupportresearch,butwhereuniqueknowledgewasbeingcreated,andwherethisknowledgecouldandshouldbeshared.Thedevelopmentofatheoreticalunderstandingofthestrengthsoftheresearchinfrastructure,whatknowledgetheycontributetodigitalhumanities,andhowthisknowledgecouldbemoresystematicallysharedhasbeenaprimarygoalofthetrainingprogrammeofthePARTHENOS(PoolingActivities,ResourcesandToolsforHeritagee-Research,OptimizationandSynergies,http://www.parthenos-project.eu/)clusterproject,itselfacollaborationbetweenanumberofresearchinfrastructuresandtheiraffiliatedprojects.

Asaninfrastructurecluster,PARTHENOSischargedwithdeepeningunderstandingofwhatinfrastructureisandhowcommonactivitiescanbebetteralignedformaximalbenefittoresearchersbetweenthecommunitiesthathavebuiltlandmarkresearchinfrastructuresatEuropeanlevel.ThePARTHENOStrainingframeworkseeksfirstandforemosttomakeadistinctionbetweenresearchworkthatdoesandthatdoesnotengagewithdataandserviceinfrastructuressuchasthePARTHENOSpartnersrepresent.Atthenextlevel,theframeworkseekstoaddressthedigitalhumanitiesnotonlyasasetofdomains,butalsoasasetofrolesandactors,followingupontheworkoftheDigCurvproject(http://www.digcurv.gla.ac.uk/).Byreconceptualisingadidacticsystemfromthefirstprinciplesofwhomightneeddigitalinfrastructureandwhattheymightneedtoknoworbeabletodo,PARTHENOShasbeenabletocreatebespoketrainingmaterialsthatdrawfromtheuniquesexperienceswithinresearchinfrastructuresandtheuniqueknowledgetheycreate.Thematerialsexistwithinasimplebutevolvingframework,addressingexperiencelevelsfromthenovice(forexample:“WhatisanInfrastructure”),totheintermediate(forexample:“ManagementChallengesinResearchInfrastructures”)andadvanced(forexample:“IntroductiontoInfrastructuresasCollaborations”)levels.Modulesaredesignedtobuildbridgesbetweenpotentialusersandtheentirecontextoftheresearchinfrastructureandhowtheyoperate,answeringfundamentalquestionsaboutwhatresourcesareavailableandhowtheyoperate,throughtomuchmorefundamentalexplorationsoftheopportunitiesandchallengesthatexistinthisenvironment,issuesthatevenexpertpractitionersstruggletodefineandaddress.

ThepaperwillembedapresentationofPARTHENOS’sworkinatheoreticaldiscussionoftheroleofresearchinfrastructuresinthedevelopmentofskillsandcareersinthedigitalhumanities.Itwillgiveanoverviewofsomeofthepracticalinterventionstheprojecthasmadetoaddressthethornyissuesofdevelopingtrainingandeducationprogrammesoutsideoftheacademy,includingawarenessraising,foresightwork,embeddinginhighereducation,partnershipsandaccreditation.Workinginconcertwithitsconstituentpartners(theDARIAH,CLARINandE-RIHsResearchInfrastructures,aswellastheirpartnerprojects,suchasCENDARI,EHRI,ARIADNE,andIPERIONCH),thePARTHENOSteamistestingthepotentialforinfrastructuralknowledge,foritstransmissionasmaterialsforself-directedusebyindependentlearnersandtrainers,andforitscapacitytobeintegratedintheprogrammesofuniversitiesandprofessionalorganisationsalike.ThroughthisprogrammeofengagementPARTHENOSwillnotonlybringanextendedhorizonfortrainingtoresearchinfrastructuresandtheirusers,buttoallofdigitalhumanities.

ReferencesBadenoch,A.,andA.Fickers,MaterializingEurope:TransnationalInfrastructuresandtheProjectofEurope(PalgraveMcMillan,2010)

Bellamy,Craig,‘TheSoundofManyHandsClapping:TeachingtheDigitalHumanitiesthroughVirtualResearchEnvironment(VREs)’,DigitalHumanitiesQuarterly,6(2012)

Edwards,PaulN.,Knobel,CoryP.,Jackson,StevenJ.,andBowker,GeoffreyC.,UnderstandingInfrastructure:Dynamics,Tensions,andDesign<http://hdl.handle.net/2027.42/49353>[accessed16November2012]

10

Fyfe,Paul,‘DigitalPedagogyUnplugged’,DigitalHumanitiesQuarterly,5(2011)

Hirsch,BrettD.,DigitalHumanitiesPedagogy:Practices,PrinciplesandPolitics(Cambridge:OpenBookPublisher,2012)<http://www.openbookpublishers.com/product/161/digital-humanities-pedagogy--practices--principles-and-politics>[accessed7April2017]

2.Ranke.2-HowtoGetDigitalSourceCriticismontheTeachingAgendaStefaniaScagliola-C2DH–CentreforContemporaryandDigitalHistoryUniversityofLuxemburg

AbstractThetermRanke.2referstotheneedtoreassessLeopoldvonRanke’smethodforhistoricalsourcecriticism,inthelightoftheimpactofdigitizationandtheworldwidewebonthepositionofthearchiveandthecraftofthehistorian.Itisalsotheproposedtitleofaplatformforlessonsondigitalsourcecriticism,aprojectthatisbeingdevelopedattheCentreforContemporaryandDigitalHistoryattheUniversityofLuxemburg.

Whileanumberofscholarshavesuccessfullyaddressedvarioustheoreticalandepistemologicalimplicationsofthedigitalturnforthehistoricalcraft,littleisknownabouthowthissubjectisdealtwithintherealmofteaching.ThispaperpleadsforanassessmentoftheconceptofDigitalSourceCriticismfromtheperspectiveofDigitalHumanitiesPedagogy.ItstartsoffwithsomereflectionsonwhyandhowRanke’sconcepthastobereconsidered.Thenitdiscusseswhethersourcecriticismcanstillberegardedasaspecifichistoricalmethod.Thethirdsectionofthepaperisanaccountofasmall-scaleexplorationamonghumanitiesscholarsinvolvedinteachingatthehumanitiesfacultyoftheUniversityofLuxemburg.Theywereaskedtosharetheirunderstandingofhowdigitalsourcecriticismshouldbetaught.ThepaperconcludeswithapleaforaintegratingsmallscaleDHinterventionsintothetraditionalhistoricalcurriculum.

‘Everythinghaschangedandeverythinghasstayedthesame’Withthearrivalofdigitally-based‘fakenews’andtheinabilityofsectionsofthepublictodistinguishitfromthe‘realthing’,thevitalimportanceofdigitalsourcecriticismshouldbeevident.Whatislessevidentishowitaffectsthecraftofthehistorian.Historianseducatedinthe21stcenturyarewitnessingtheconsolidationofthe‘digitalturn’withprofoundconsequencesforthehistoricalprofession.TheGermanscholarLeopoldvonRankewasresponsibleforanearlierradicalchangeinscholarlypracticeinthe19thcentury:heintroducedtheso-called‘archivalturn’.Healsointroducedtheconceptofthe‘seminar’andencouragedanewgenerationofaspiringscholarstovisitnumerousarchives,scrutinizeandcomparedocuments,andtracebacktheidentityandmotivesoftheauthorandthecircumstancesunderwhichadocumentcameintoexistence.Rankemadeadistinctionbetween‘external’sourcecriticism,whichfocusesonthecreation,appearanceandallegedorrealauthenticityofasource,and‘internal’sourcecriticism,whichevaluatestheevidentialvaluethatcanbeattributedtoaparticularsource.Thisnewapproachbecamewidespreadandproblematizedthetraditionof‘universalhistories’,basedonbroadphilosophicalconceptsandideasabouttheevolutionofmankind.Rigorousfact-checkingcameinplaceofmyth-making.Ranke’sinnovationinthesecondhalfofthe19thcenturycoincidedwiththeperiodofmodernstateformationandthecreationofnationalarchives.Itgraduallybecamethebackboneofprofessionalhistory,withastrongorientationtowardsthearchiveastheguardianofauthenticityandhistoricalrelevance(RisbjergEskildsen2008).

11

Wenowliveinglobalizedworldwithculturalanddisciplinaryboundariesthatareblurred,withdigitaltechnologythathaspermeatedtheacademicresearchpractice,andwiththeopportunitytocopy,alterandremixdatawithrelativeease.Itthereforecomesasnosurprisethatconcernabouttheorigin,authenticityandvalueofhistoricalsourcesindigitalformisincreasing(JonesandHafner,2012)Howthishasaffectedthehistoricalprofessionandwhatchangesneedtobeintroducedhasbeendiscussedbyseveralscholars(Fickers2012,Sternfeld2014,Zaagsma2014,Föhr2015).Theypleadforacriticalreflectiononthenatureofsourcesindigitalformandforaninvestmentindigitalskillstobeenablestudentsandpractitionerstoapplydigitaltoolsinaprofessionalmannerandunderstandtheirpotential,biasandlimits.Criticalreadingandthinkingarenolongerenoughintermsofsafeguards,buthavetobecomplementedwithamoretechnicalandmathematicalunderstandingofdigitalphenomena.(Scagliola2016)

InadditiontothetraditionalRankianinquiryintothecontextinwhichahistoricalsourcecameintoexistence,twoadditionalprocessesofcreationandpossiblemanipulationneedtobescrutinized.Thefirstinvolvesidentifyingalterationsandlossofcontextthatoccurduringthetransformationfromanalogsourcetodigitalobject.(Fickers2012,Treleani2013).Transparencyshouldbethenorm,astowhowasinvolvedinthechainofdigitization,whatchoicesweremadeandwhattoolswereused.Ifthisisabsent,thescholarmusthaveenoughcontextualandtechnicalknowledgetobeabletoidentifyandreconstructtotheextentpossiblethisgapandevaluatehowthismayinfluencethehistoricalinterpretationoftheobject.

Thesecondprocessrelatestoabetterunderstandingofthealgorithm-basedselectionbiasofsearchenginessincetheseincreasinglydetermineourreferenceframeandhavealsopenetratedacademiclibrarysystems(VanDijk2010,Vaidhyanathan2009).Itlooksasifourearlierdependencyonthepolicyofthenationalarchivewithregardtograntingaccesstodocumentsbasedonnationalsecurityandotherconcerns,hasbeensubstitutedbyoneonthebiggeststakeholdersinsearchtechnology:Google.Themeritsandperilsofalgorithm-basedsearchtechnologieshavebeentheobjectofacademicdebatesandhaveledtoreflectionsontheepistemologyofthedigitalenvironment(Woutersetal2013,Liu2014).However,theseremainlimiteddiscussionsbetweenthe‘usualsuspectswithinthecommunityofDHscholars’.Theydonotseemtomatterenoughtopushforreformingifnotrevolutionizingthecurriculum.

Crap-DetectionorDigitalPhilology?Thequestionwefaceishowtogoabouttoadjustandadapttheclassicalhumanitiescurriculumtotherequirementsof21stcenturyacademicresearch.Wheredowestart?Shouldwemakeadistinctionbetweengeneralacademicdigitalskillsandthosethatarecalibratredforspecificfieldsofresearchsuchashistory?

Whenobservingthelearningsubject‘methodsofresearch’,whichisoftentaughtinthefirstyearofahumanitiesbachelorcurriculum,onegainstheimpressionthatwiththe‘Googlelizationofknowledge’andthemoregeneraldigitizationofinformation(Vaidhyanathan2009)topicsthatinthepastbelongedtodistinctive(sub-)fieldsofresearchsuchascriticalmediastudies,informationscience,literacystudiesandeducationstudiesarenowmoreandmorealike.Thiscallsforarenegotiationofboundariesandspecificationofwhatisdistinctiveabouthistory.

Whenwelookattherealmofeducation,thecallfortrainingyoungpeopleinassessingthetrustworthinessofwhattheyconsultandofwhattheyengagewiththroughsocialmedia,isarecurrentfeature.Therearemanyinitiativesaimingatmakingtheuseofdigitalmedialessdangerousforthenoviceinthefield.(Scanlon2014,Cartelli2013,Bellanca2010)ThewriterHowardRheingoldhasre-introducedHemingway’sjournalisticprinciplesfor‘crap-detection’,andpointstotheimportanceofwebresourcesthatgiveadviseonhowtodetectfalseinformation(Rheigold2013).

12

However,whenstudentsenteracademiawiththeintenttoexploretheworldofhistoricalnarratives,philosophicalconceptsandgeneralculturalheritage,willthepossessionofgeneralcriticalmedialiteracybeenoughtoavoidpitfalls?Itseemsthatsomespecialskillsareneeded.Inadditiontobeingabletodistinguishfakefromreal,theyshouldalsobeabletotracebackthehistoryofthevariousversionsofadocument.Thisphilologicalinquiryinadigitalenvironmentrequiresunderstandingthebackendofadigitaldocumentandsometimesrequiresapplyingforensicsoftwaretodetectthetrailofbinarydigitsthateachmanipulationhasleft.Moreover,Web.2andforthcomingWeb.3technologyalsorequirestudentsandacademicstobeabletoexpresstheirthoughtsandinsightsinotherwaysthenwritingatextintheformofanessay.Therefore,digitalsourcecriticismwhenappliedtohistory,involvesmorethanamerecriticalreadingofdigitalsourcesandwritingofarticlesthatarepublishedonline.Itentailstheactiveapplicationoftoolstotraceanddetectchanges,andtocreatedigitalcontent.Itisnotjustonemoremethodaspartofawiderrepertoireofthehistorian’scraft,itisanewconceptofconductinghistoricalresearch.Thishasseriousimplicationsforwhatneedstobeputinpracticeandconsequencesforitsrelationhiptotheexistingcurriculum.Thishasanestablishedstatuswithengravedsocialpractices,inwhichlecturersareinvolvedwhohaveputeffortinit.Changingthesepracticesrequirespatienceanddiplomacy.

OntheVergeofTransformationPassiveifnotactiveresistanceamonglecturerswhentryingtointroducedigitalmethodsinthehumanitiesisnotuncommon.Thisisoftenseenasbeinganinstinctivereactiontoprotectestablishedpositionsofpowerandexpertise(Scanlon2013,DeJongetall2011).Fearfornewtechnologiesanddistrustofrosypromisesaboutwhatsuchtechnologiescando,alsoplayarole.Anotherobstructiveelementcanbetherigidorganizationalstructureoftraditionalacademicteaching,thatisbasedonthetimespanoflecturesofjustoneortwohours.Thishardlyleavesspaceforlearningnewskillsletaloneexperimenting.(HendersonandRomeo2013).

ToexplorethespaceforthesubjectofDigitalSourceCriticismattheFacultyofHumanitiesoftheUniversityofLuxemburg,asmall-scaleuserstudywasconducted.7TheFacultyisasalientenvironmentfortestinginterestinDigitalSourceCriticism,asitisexperiencingconsiderableinstitutionalchanges.AsofOctober2016,thenewCentreforContemporaryandDigitalHistoryhasbeenestablished,thatwilltakeupinnovativeresearchandteachinginclosecollaborationwithitsformerbasis,theInstituteofHistory.

ThefirstpartoftheuserstudyconsistedofapresentationoftheenvisionedformatforlessonsonDSCduringthemainmeetingoftheInstituteofhistory,followedbyasurvey.

Theplanistocreateanappealingvideoessayaroundaparticulardatatypeinwhichthedigitalversionisproblematizedandcomparedtoitsanalogversion.Subsequentlystudentshavetoreadliteratureandconductresearch,andfinallycreateadigitalpublicationorobjectwithasimilartypeofdatawiththehelpofdigitaltools.Thesurveytocollectfeedbackonthisformatwassetoutto40colleaguehistorians,amixofprofessors,lecturersandPh.D.students.Thisyieldedninebenevolentresponses,whichallstressedtheimportanceofthetopic,butalsotheexistinglimitationstointegrateitintotheirlessons,duetolackofexpertiseandtime,andofspacewithinthelimitsoftheprescribedICTS.

Thenextstepwastoorganizefocusgroupswithcolleaguesfromthenewcenter.Fourmeetingswereheldwiththreetofourparticipants,amixofjuniorandseniorcolleagues.Inaddition,afewface-to-faceinterviewswereheld.Thebackgroundoftheparticipantsvaried,mostofthemwere

7The consultation of lecturers is work in progress; it should be completed in the coming months and should yield a more solid foundation for designing and realizing Ranke.2, the new teaching platform on Digital Source Criticism.

13

historians,amongwhichmediastudieswasoverrepresented.Specialweightwasgiventothefeedbackofaninformationscientistandoftwohistoriansspecializedindigitalmethods,allthreewithampleteachingexperience.Again,theywerefirstshownthepresentationontheidealtypicalformatoftheDigitalSourceCriticismlesson,afterwhichthreemainquestionswerepresented:

I. Inwhatwayisdigitalsourcecriticismrelevantforyourresearch?II. Whatdoyouregardasnecessarydigitalskillsforstudents(basic,academic,specificfor

historians);III. Whatwouldyouchoosetointegrateinyourcourses,thevideoessay,theassignments,the

hands-oncomponentoracombination?

Thefeedbacktothepresentationandquestionswasinmostcasesrecordedandlatertranscribed.Inafewcasesnoteswerejotteddownduringtheinterview.Themostsalientconcernsandpreferencesthatcameoutoftheconsultationsaresummarizedbelow:

-Thelevelofdigitalliteracywhenenteringtheuniversity

Thelevelofcompetencesistoodiversebecauseoflackofsystematiccoverageofthetopicinsecondaryeducation.Anentrancetestshouldbeconsideredtobeabletocoverthegapswithindividualtrainingunits.

-LimitedTime.

DigitalLiteracyandcompetencestodealwithdigitaldata,arebesttaughtincollaborativeprojectsthattakeuptimebecauseoftheneedtoteachskills.Thinkofhowmuchtimeittakestolearntowriteaccordingtoacademicstandards.Atthesametime,lecturersofthematiccoursesconsiderdigitalsourcecriticismasatopicthatbelongstothesubject‘researchmethods’-asubjectwithalimitedamountofhoursinthecurriculumwhichisofferedonlyonce,mostofteninthefirstyearofabachelor.Mostteachingisthematicandnotaboutmethods.

-The‘branding’ofthetermDigitalSourceCriticismisproblematic

Creatingaspecialtermforthistypeofsourcecriticismsuggestsitisadifferentandnewpractice.Alecturerof‘methodsofresearch’suggestedtousethegenerictermSourceCriticism,thatcanbeappliedtoanysource,regardlessofwhetheritisananalogueordigitalform.

-Thereisaneedforcontinuityinthe‘framing’oftheproblem.

Somelecturersofmediastudiesstatedthatgivingtoomuchattentiontothetransformationfromanalogtodigital,riskstoobscurethemanytransformationsandmanipulationsthatalreadyoccurbetweenanalogmedia(e.g.intheprocessofeditingofnewsreel).Theyprefertoframethesubjectinamoregeneralway,e.g.‘reflectingontransformations’.

-ThemajorityofresearchersandPhDworkwithnon-digitizedsources.

Takingintoaccounthowmanylecturersandresearchersworkwiththematicsubjectsandwithdataandliteraturethatisnotdigitized,itwouldbedisproportionatetoplaceDigitalSourceCriticism,amethodologicaltopic,asacentralsubjectonthecurriculum.Theprincipleof‘hybrid’researchculturesshouldbeemphasizedasitconnectsbettertothedominantteachingpractice.

ConclusionToaddresssuchconcernsasmartcommunicationstrategyshouldbeconsideredinwhich‘digitalsourcecriticism’ispresentedasa‘hybridconcept’thatencompassesbothdifferencesandcontinuitiesindealingwithsourcecriticism.Whatcouldbeconsideredistosubstitutetheprincipleofaseriesoflessonsthatwouldtakeupmuchofthetimeinthecurriculum,withsmallerteachingunitswithadigitalcomponent.Thesecouldbecomplementaryinathematiccourse,andmorecentralinamethodologicalsubject.Awaytosupportthisapproach

14

wouldbetofollowthepedagogicalprincipleoftheSAMRmodel,whichstandsforSubstitute,Augment,Modify,Redefine.Itwasdesignedtograduallyintegratetechnologyintothecurriculum(Puentedura2014).Theprocessstartswithfirstmerelysubstitutingtasksthathavetobecompletedmanuallywithatechnology,andthengraduallyaddingtechnologicalcomponentstofamiliarizenewuserstothepossibilitiesthattheyoffer.Theoutcomeofthisgradualprocessshouldleadtoaredefinitionoftheoriginaltask.

ThisSAMRmodelapproachiscurrentlybeingconsideredasaninstrumenttorealizetheenvisionedtransition.Atthesametime,however,masterandPhDstudentswillbeimmersedinintensiveDHcollaborativecourseswithexperimentalcomponentsatthenewcentre.

Thepolicyofcombininggradualchangewithimmersiveandexperimentallearningcouldbethesolutiontocreateacommongroundamongdifferentgenerationsofhistoriansandfuturegenerationsofstudentsofhistory.

ReferencesJamesA.Bellanca(2010),21stCenturySkills:RethinkingHowStudentsLearn,SolutionTreePress.Seealso:http://www.p21.org/about-us/our-history

CatherineFrancisBrooks(2016).‘Disciplinaryconvergenceandinterdisciplinarycurriculaforstudentsinaninformationsociety’.In:InnovationsinEducationandTeachingInternational,http://www.tandfonline.com/toc/riie20/current

AntonioCartelli(2013),(ed)Fostering21stCenturyDigitalLiteracyandTechnicalCompentency,InformationScienceReference.

JoseVanDijck(2010),Searchenginesandtheproductionofacademicknowledge.InternationalJournalofCulturalStudies,13(6).doi:10.1177/1367877910376582.

AndreasFickers(2012)‘TowardsANewDigitalHistoricism?DoingHistoryintheAgeofAbundance.’VIEWJournalofEuropeanTelevisionHistoryandCulture,1(1).

PascalFöhr,"Poster‚HistoricalSourceCriticismintheDigitalAge‘,"HistoricalSourceCriticism,31.März2015,http://hsc.hypotheses.org/328..

MichaelHenderson,andJeoffRomeo(2016),TeachingandDigitalTechnologies:BigIssuesandCriticalQuestions:CambridgeUniversityPress.

RodneyH.JonesandChristophA.Hafner(2012),UnderstandingDigitalLiteracies;aPracticalIntroduction,Routledge.

DeJong,Ordelman,Scagliola,Audio-visualCollectionsandtheUserNeedsofScholarsintheHumanities;aCaseforCo-Development,ProceedingsofSupportingDigitalHumanities,2011,Copenhagen.http://files.beeldengeluid.nl/pdf/r-en-d_audio-visual-collections-and-userneeds_dejong-ordelman-scagliola_20111117.pdf

AlanLiu(2014)“ThesesontheEpistemologyoftheDigital:AdviceFortheCambridgeCentreforDigitalKnowledge.”http://liu.english.ucsb.edu/theses-on-the-epistemology-of-the-digital-page

RubenPuentedura(2014),SAMRandTPCK:AHands-OnApproachtoClassroomPracticehttp://www.hippasus.com/rrpweblog/archives/000140.html

HaroldRheingold(2013).http://rheingold.com/2013/crap-detection-mini-course/retrieved1-5-2017.

KasperRisbjergEskildsen,‘Leopoldranke’sarchivalturn:locationandevidenceinmodernHistoriography’,ModernIntellectualHistory,5,3(2008),pp.425–453C_2008Cambridge.doi:10.1017/S1479244308001753

15

EileenScanlon,E.(2014),Scholarshipinthedigitalage:Openeducationalresources,publicationandpublicengagement.BrEducTechnol,45:12–23.doi:10.1111/bjet.12010

MatteoTreleani(2013),‘Recontextualisation;cequelesmédianumériquesfontauxdocumentsaudiovisuels’,in:Réseaux,1,(no177)http://www.cairn.info/publications-de-Treleani-Matteo--99590.htm

StefaniaScagliola(2016),DigitalSourceCriticisminthe21stCentury:ReconsideringRanke’sPrincipleintheDigitalAge,blogDigitalHistoryLab,August2016.http://www.dhlab.lu/blog-post/digital-source-criticism-inthe-21st-century-reconsidering-rankes-principles-in-the-digital-age/

JoshuaSternfeld(2014),‘HistoricalUnderstandingsintheQuantumAge’,JournalofDigitalHumanities,Vol3,nr.2,http://journalofdigitalhumanities.org/3-2/historical-understanding-in-thequantum-age/

SivaVaidhyanathan(2009),‘TheGooglizationofUniversities’,in:TheNEA2009AlmanacofHigherEducation,2009http://www.nea.org/assets/img/PubAlmanac/ALM_09_06.pdf

PaulWouters,AnneBeaulieu,AndreaScharnhorstandSallyWyatt(2013)(eds),VirtualKnowledge;ExperimentingintheHumanitiesandtheSocialSciences(Eds.)

GerbenZaagsma,‘OnDigitalHistory", BMGN - Low Countries Historical Review 128/4 (2013)3-29.

3.Individualpresentation:VideoessaysandthenewpossibilitiesforfilmcriticismandpedagogyIrinaTrocan,CinemaandMediaPhD,NationalUniversityofFilmandTheatreBucharest

Theshiftoffilmcriticismtotheonlinesphereinrecentyearshasledtoanumberofmutations,includingtheincreaseinpopularityofarelativelynewformat:thevideoessay.Roughlyanaudiovisualversionoffilmcriticism-amodeofanalysisthatemploysthediscussedobject(thecinematicwork)directly-,thevideoessayquotesthefilmevenasitdeconstructsit.Itcanthereforebeeasiertograspwithoutnecessarilybeingsimplifiedasdiscourse–aseven-minuteclipcanbeasrichandthoughtfulasalongformessay–andallowsforthesurvivalofintelligentfilmcriticisminaratherdyslexicculturalenvironment.

Theaimofthispresentationistosummarizethecurrentstateofvideoessaysandtheiraestheticanddidacticpossibilities.In2017,thehistoryofvideoessaysissimultaneouslytooshortandtoolong.Sincetheformisroughlyadecadeoldinpopularview,inordertodiscernitsinfluences,onewouldhavetolookbeyondthepracticeitselftoexamineeitherthemoretimeworntraditionofessaycinema–thenon-narrativefilmsofChrisMarker,Jean-LucGodard,HarunFarocki–ortheaudiovisualhistoriesandTVbroadcastsonthesubjectofcinema–MarkCousins'TheStoryofFilm:AnOdysseyorAPersonalJourneywithMartinScorsesethroughtheAmericanCinemabeingpopularexamples.However,adecadeofvideo-essay-makingisalsolongenoughfortheformtohaveexperienceditsfirstmomentsofcrisisandforattemptstotheorizeittobecomeincreasinglydifficultanddangerouslyreductive.Forinstance,videoessaysmadecca.2014wereproblematicintheirover-relianceonvoice-over(i.e.audiocommentaryoftheauthoroverlappedwiththeimages),whereasin2017,beingaimedatsocialmediadistribution,severalofthemadopttheirrelevant/mutedaudio,text-on-screenformat,thusplacingalltheweightonthevisualcomponent;facedwiththenewerpattern,commentershavegonefrompleadingforlessvoice-overtoaskingformoreofit.Thisconstantlychangingmedialandscapemakesiturgenttodevelopstrategiesfor

16

aestheticevaluationandcurationofvideoessays–otherwise,theoverproductionofonlinecontentwillobscurethebestonesandthemoreprovocativepossibilitiesoftheform.

Essential(thoughunderstated)productionguidelinesDuetotheirabilitytoquotefromfilmwithnoneedofprocessingitintoanewlanguage,popularvideoessaysareoftenmadefromimmediatelystrikingfragments:strikingfilmimagery(asinStanleyKubrickfilms),dialogues(AaronSorkin-scriptedone-liners),orevenblatantjuxtapositions(comparingtwostylisticallysimilarfilmsinasplit-screen,withtheaimofprovingjusthowmuchthelaterfilmborrowsfromtheearlier,usuallycanonicone).However,theirrangeofsubjectslargelyoverlapswiththatofcinephile/poponlinecriticism:overviewsofacertainartist'sfilmography,acertaingenre,filmfestival,nationalcinema,trendoftechnicalevolutioninfilmcraft.

Therearealreadyafewprominentplaformsforlaunchingvideoessays,whichprovidevideo-essayistswithopportunities(on-the-jobtraining,accesstoneccesarymedia)evenastheysometimeslimittheircreativeoptions.Thefirstandalreadymostcontroversialisthevideo-on-demandplatformFandorwithitsannexedpublication,Keyframe;othersaretheBFI/Sight&Soundwebsite;theNetherlands-basedplatformFilmkrant;MUBI(alsoannexedtoaVODplatform),andthemostacademic-oriented,[in]Transition(whichismoresimilartoadistributorthanaproducer,toborrowterminologyfromthefilmmarket).

Whiletherearealsovideo-essayist'superstars'withdistinctivestyles,forthesakeofbrevity,Iwillonlyfocusontheinstitutionalguidelineswhichtheymustfollow.Studyingtheseauthors'workoverseveralyearsprovesthat,eveninthisseeminglylaxworkingprocess,shiftingeditorialdemandscanhaveasignificantimpactonwhattheyproduceandhowwidelyitcirculates.Iwouldfurtherarguethattheformativetrainingofthevideo-essayists(whethertheyarefilmmakers,critics,academics)isitselfonlypartlyrelevanttotherigororwhimsicalityoftheirvideographiccriticism.Althoughtheformatisinrapiddevelopmentandexpansion,andmakingavideoessayishypotheticallyaccessibletoanyonewhoownsacomputerandeditingsoftware,hierarchiesandmandatorystylemarkerscaneasilybetracedamongthemostwell-knownvideoessaysmadetodate,whichonceagainindicatesthatthetotalcreativefreedomoftheInternetismerelyautopiandream.

ChallengestothedevelopmentofvideoessaysThedifficultiesofthisnewformtendtobepragmatic,sincethevideoessaysdependonveryprecariousfactors.Thefirstistheirsurvivalandcontinuedavailabilityintheonlinesphere,whichtherecentFandorscandal-involvingthewithdrawalofseveralhundredvideoessays-hasprovedtenuous.Thesecondisthelegalcircumstanceoftheirrighttoexist,namelytheFairUsecopyrightexception:thisstatesthatclipsofartworkcanbeusedbyindividualswithoutpermissionandcopyrightownershipaslongastheultimatepurposeisdifferentfromthestraightforwardexploitationofthematerial.AnoteonFairUseinthebrochureTheVideographicEssay:CriticisminSoundandImageendswithadisclaimerthattheymerelyofferpeeradvice–theyarenot,nordotheyclaimtobe,lawyers.

VideoessaysasstudymaterialAmongthemostremarkablefeatsofvideoessaysisthepopularizationoffilmastheory–oraudiovisualthinking.AsVolkerPantenburgpointsoutinhiscomparativestudyofFarockiandGodard,theoryhasthusfarbeenpredominantlylinguistic,evenwhenitisself-reflexiveandproposesabreakwiththedominant“amalgamofstructuralism,Lacanianpsychoanalysis,post-structuralism,andMarxism”.AsPantenburgputsit,“writingagainstthefilmtheoriesofthe1970scontinuestoassumeacleardistinctionbetweenthefilmsontheonesideandtheiranalysisandtheorizationontheother.”

17

Similarly,inhis2012essayVisualizationMethodsforMediaStudies,LevManovichcouldbetalkingaboutvideoessayswhenusingtermslike“collectionmontage”andclaimingthereisafutureinvisualizationofmediaartifactswhengroupingthembyintrinsic,yet-unarticulatedfeatures:“themostimportantquestion,whichisstillunresolved,ishowtocombinedistantandclosereadings”.Forthis,videoessayscouldbeapowerfultoolofscholarshipandamorecomplexwayofconveyinginformationthanwrittenlanguage.

BibliographyEricFaden,CatherineGrant,KevinB.Lee,JasonMittell,TheVideographicEssay:CriticisminSoundandImage,caboosebooks

Pantenburg,Volker,Farocki/Godard:FilmasTheory(FilmCultureinTransition),AmsterdamUniversityPress2015

Wees,WilliamC.(1993),RecycledImages:TheArtandPoliticsofFoundFootageFilms,AnthologyFilmArchives

Manovich,Lev(2001),TheLanguageofNewMedia,MITUniversityPress

Manovich,Lev(2012),MuseumWithoutWalls,ArtHistoryWithoutNames:VisualizationMethodsforHumanitiesandMediaStudies,manovich.net

Witt,Michael(2013),Jean-LucGodard,CinemaHistorian,IndianaUniversityPress

18

SessionC

1. The Pyramid of Conscientious Digital Humanities Research:howtogeta‘generalideaofwhatyoushouldbeseeing’SergeterBraake,UniversityofAmsterdam

‘Theonlywaytoknowifyourresultsareusefulorwildlyoffthemarkistohaveageneralideaofwhatyoushouldbeseeing.’8

Thequestionhowtocopewithamassivenumberofdigitalhumanitiestexts,andthetoolstoprocessthem,hasledtopublicationson‘algorithmiccriticism’,‘toolcriticism’and‘datacriticism’.Whatthesepublicationshaveincommonisthequestforaconscientiouswaytodealwithtoolsanddata,balancedwiththehumanistdomainknowledgeandmethodologies.9Humanitiestextscanbepoemsthatwerewrittenafterasuddenburstofinspiration,wellcraftedtextsonthehistoryofanempire,themostinnerthoughtsofadiarywriterorconscientiouslycraftedbookkeepingaccountsoflonggonerulers.ThefieldofDigitalHumanitiestendstotreatthesetextsquitebadly.Textsarerippedoutoftheiroriginalcontexts,choppedintopieces,linkedtoothertexts,andusedforanalysesthatgofarbeyondtheiroriginalintentions.

Dependingontheresearchquestionoftheindividualresearcher,orresearchgroup,this‘textransacking’isnotnecessarilyabadthing.DigitalHumanitiescan,should,anddoes,askquestionsthatgobeyondthescopeoftextsthatcouldbestudiedintenselybyonehumanbeing.Therearehowever,plentyofdangersinvolvedinusingdigitaltoolswithoutreallyknowingwhattheyexactlydo.Firstofallthereisthequestionwhenweknowenoughofwhatatooldoestoperformconscientiousdigitalanalyses.Secondlythereisthequestionifwekeep(enough)intouchwiththematerialwestudywithdigitalmethods.Whereliesthedomainknowledgethresholdthatisnecessarytodealwithdigitaldatacarefully?Atwhatpointdowehavea‘generalideaofwhatweshouldbeseeing?’

Thedangerof‘blackboxtooling’isincreasinglygettingattention.10Thedangersoflosingtouchwiththeoriginalsourcematerialrequiressomefurtherexplanation.Forsomehumanitiesscholars,digitalhumanitiesresearchmainlyextendstheworktheyalreadyaredoing:samekindofdata,largerapproaches.WhenFatherRobertBusainitiatedtheIndexThomisticusinthe1940’s,heobviouslyalreadywasfamiliarwiththeworkofThomasofAquinas.WhenliteraryscholarswanttostudythelanguageuseintheworksofJaneAustenwemayassumetheyhavealreadyreadquiteabitof

8MeganR.Brett,‘TopicModeling:ABasicIntroduction’,JournalofDigitalHumanities,vol2.,nr.1,Winter20129Tociteonlyafew:On‘algorithmiccriticism’theslightlydatedbutstillinsightful:S.Ramsay,ReadingMachines:TowardanAlgorithmicCriticism(Chicago2011).Ondatacriticism:FrederickW.GibbsandTrevorJ.Owens,TheHermeneuticsofDataandHistoricalWriting(2012revision)’,in:JackDoughertyandKristenNawrotzkieds.,WritingHistoryintheDigitalAge(Michigan,2013);OnToolcriticism:S.terBraake,,A.S.Fokkens,N.OckeloenandC.vanSon,‘DigitalHistory:towardsnewmethodologies’in:Bozic,Mendel-Gleason,DebruyneandO’Sullivaneds.,2ndIFIPWorkshoponComputationalHistoryandData-DrivenHumanities(2016).10SeeforexampletheToolCriticismWorkshopinAmsterdam:http://event.cwi.nl/toolcriticism/;AlbertMeroño-Peñuela,AshkanAshkpour,MariekevanErp,KeesMandemakers,LeenBreure,AndreaScharnhorst,StefanSchlobach,FrankvanHarmelen,‘SemanticTechnologiesforHistoricalResearch:ASurvey’,SemanticWebJournal,Volume6,Number6(2015)539-564;TerBraaketal,‘DigitalHistory’.

19

Austen.Thesescholarscertainlyalreadyhaveageneralideaofwhattheycouldbeseeing.Whenhistoriansuselargenewspaperarchivesfordigitalresearchhowever,includingdifferentnewspapersspanningnumerousdecades,thingsbecomemorecomplex.Historiansareoftenexpertsononeorseveralhistoricaltopics,withthenecessaryarchivalsourcesattachedtothem.Fewhistoriansareexpertsonawidevarietyofhistoricalnewspapers.Thisproblemisenlargedbythewaydigitaltoolsdealwiththesenewspapers.Textistransformedinto‘data’,takenawayfromthepageanditssurroundingsandistransformedtogetherwithotherpiecesoftextintoanaggregatedresult.11

ThequestionsIwanttoaddresshereare:

1. Whendoesaresearcherknowenoughofatooltouseitconscientiously?2. Whendoesaresearcherknowhismaterialwellenoughtousedigitaltoolsfordistantreading

analyses?

Andfinally,springingforthfromthis:

3. Atwhatpointdowedecidethattheanswersto1)and2)arenotcostefficientanymore?Atwhatpointshouldwedecidethata‘simple’toolandclosereadingpracticesaremorepracticalforhumanistresearchthancomplicatedtoolsusedonlargedatasets?

Ifwewanttovisualisetheinterplaybetweenresearcher,algorithm,tool,interfaceanddata,thenwecancometoapyramidofconscientiousdigitalhumanitiesresearch,asvisualisedbelow.Ontopthereisthehumanistresearcher,withallofhisorherpresuppositionsacquiredfrompriorknowledge.Thisresearcherwillmostlybeworkingwithaninterface,butalsohastounderstandthetoolbehindtheinterfaceandthedataandalgorithmsbehindthetool.Ifthehumanistmisseseitherasufficientgraspofthecomputeralgorithms,orofthedatathatisused,theresultsthatare

11ForexampletheShiCotool,tracingconceptsthroughtime:https://github.com/NLeSC/ShiCo.SeeforreflectionsonthelossofcontextC.Jeurgens,‘TheScentoftheDigitalArchive:DilemmaswithArchiveDigitisation’,BMGN-LowCountriesHistoricalReview128(4)92013)pp.30–54

20

providedbythetoolthroughaninterfacemaybemisinterpreted,orsignificanterrorsmaynotbespotted.

Inshort,thereshouldbea‘generalidea’ofwhatwecouldseeing,bothbyknowingthetoolandthedata.Inthispresentation,Iwillpresentaproposal,astep-by-stepplan,ofwhatcouldbedonetoreachthisgeneralunderstandingbytakingtheexampleofmyownresearchonconceptdriftinDeGidsandVaderlandscheLetteroefeningen,twonineteenthcenturyjournalsdealingwithallkindsoftopicsofgeneralinterest.Thesestepsinclude:1)manualclosereading;2)digitalclosereading;3)digitalanalysis;4)criticismoftheresults;5)reflectiononsteps1and2:weretheysufficient?6)reflectionsonstep3:wasthistoolthebesttouseforthispurpose?

Whengoingthroughthiscyclethesequestionsshouldalwaysbeconsidered:atwhatpointaretherequirementsforconscientiousdigitalhumanitiesresearchtoohightobeworththeeffort?Atwhatpointisthepyramidtoocostly?Whenisitmoreefficient,andinfactconscientious,tosettlefora‘simpler’tool?Atwhatthresholdshouldthedigitalmakeroomagainformoretraditionalhumanities?

2.Thisismygroundtruth,tellmeyours:PotentialsofmultipleannotationsfordigitalhumanitiesBeritJanssenMeertensInstitute,AmsterdamandInstituteforLogic,LanguageandComputation,UniversityofAmsterdam

Manymethodsindigitalhumanitiesrelyoncomputationalmethods,whichmaybetrainedonasetofreferenceannotations,alsoreferredtoasgroundtruth.However,humanjudgementsarerarelyunanimous:thisledtoresearchintohowinformationfromhumanjudgescanbebestcombinedtoincreaseknowledgeofthe“true”relationshipsindata(e.g.,Dong,2010).However,inmanydomains,forinstanceinmusicinformationretrieval,itmaybeassumed,thatmultipleannotatorjudgementsmayformequallyvalidinterpretationsofdatasuchasmusicsimilarityorchordestimation(Koops,2016;Schedl,2014).Thepresentcontributionshowshowmultipleannotationscanbeusedtorevealhumanstrategiesandknowledgebyinvestigatinghowannotatorsmayagreeordisagreeondifferentsubgroupsindata.

Asanexample,Ipresentadata-setofannotationsonphrasesimilarityin360Dutchfolksongs.12Thesefolksongsarecategorizedinto26groupsofvariants,ortunefamilies.Threeannotatorsworkedindependentlytogivelabelstophraseswithintunefamilies,orgroupsofvariants.Thelabelsconsistedofalettercombinedwithanumber,withwhichannotatorscouldindicatesimilarityinthreecategories:“almostidentical”(sameletterandnumber),“relatedbutvaried”(sameletterbutdifferentnumber),and“different”(differentletterandnumber).Theannotatorsdidnotagreeonphrasesimilarityatalltimes,butwithFleiss’κ=0.71(Fleiss&Cohen,1973),theagreementwassubstantial.

Thedatasetwasusedtoevaluatepatternmatchingalgorithms:thesealgorithmscomparedeachphraseinthedatasetagainstthemelodieswithinthetunefamilyfromwhichthequeryphrasewastaken,andreturnedamatchscore.Forevaluationpurposes,thethreeannotationswerecombinedthroughamajorityvote:iftwoormoreannotatorshadgivenanyphraseinagivenvariantthesame

12Availablefromliederenbank.nl/mtc

21

labelasthatofthequeryphrase,thevariantwasconsideredtocontainaninstanceofthephrase,whichapatternmatchingalgorithmshouldfind(cf.Janssen,vanKranenburg&Volk,2017).

The added value of combining multiple annotations is that next to the evaluation of pattern matching algorithms, also the annotators themselves may be compared to the majority vote. This comparison shows that individual annotators agree around 87% with the majority vote: they miss about 10% of the relevant phrase instances, and find about 10% irrelevant occurrences, as compared with the majority vote. Flexer and Grill (2016) showed how such inter-rater disagreement introduces an upper bound for various tasks in music information retrieval.

The current work presents a way to learn from inter-rater disagreement: the dataset is categorized into tune families, which form homogeneous groups of melodies with high distinctiveness between groups. An analysis of the distribution of disagreement with the majority vote over tune families reveals that individual annotators disagree with the majority vote in different ways, such that some tune families lead to few disagreements for one annotator, but many disagreements for another annotator. This differs from the errors produced by the three-best performing pattern matching algorithms: they show similar trends over the tune families, such that a tune family in which one algorithm produces many irrelevant results will also be more difficult to handle by other algorithms. This suggests that the strategies of the compared pattern matching algorithms may be similar, while the annotators bring different strategies to the table.

ReferencesDong,X.L.,Gabrilovich,E.,Heitz,G.,Horn,W.,Murphy,K.,Sun,S.,&Zhang,W.(2014).Fromdatafusiontoknowledgefusion.ProceedingsoftheVLDBEndowment,7(10),881-892.

Fleiss,J.L.,&Cohen,J.(1973).Theequivalenceofweightedkappaandtheintraclasscorrelationcoefficientasmeasuresofreliability.Educationalandpsychologicalmeasurement,33(3),613-619.

Flexer,A.,&Grill,T.(2016).TheProblemofLimitedInter-raterAgreementinModellingMusicSimilarity.JournalofNewMusicResearch,45(3),239-251.

Janssen,B.,vanKranenburg,P.&Volk,A.(2017,inpress).Findingoccurrencesofmelodicsegmentsinfolksongsemployingsymbolicsimilaritymeasures.JournalofNewMusicResearch.

Koops,HendrikVincent,etal."IntegrationAndQualityAssessmentOfHeterogeneousChordSequencesUsingDataFusion."InternationalSocietyforMusicInformationRetrievalConference.2016.

Schedl,M.,Gómez,E.,&Urbano,J.(2014).Musicinformationretrieval:Recentdevelopmentsandapplications.FoundationsandTrendsinInformationRetrieval,8(2-3),127-261.

3.DigitalHistoryProjectsasBoundaryObjectsMaxKemmanUniversityofLuxembourgmax.kemman@uni.lu

Digitalhistoryisconcernedwiththeincorporationofdigitalmethodsinhistoricalresearchpractices.Thus,digitalhistoryaimstousemethods,concepts,ortoolsfromotherdisciplinestothebenefitofhistoricalresearch,makingitaformofmethodologicalinterdisciplinarity(Klein,2014).Thisrequiresexpertiseofdifferentfacets,suchashistory,technology,anddatamanagement,andasaresultmanydigitalhistoryactivitiesareacollaborationofscholarsandprofessionalsfromdifferentbackgrounds.

22

SuchcollaborationswouldfitSvensson’scharacterisationofdigitalhumanitiesasafractionedtradingzone(Svensson,2011,2012).Simplystated,thismeansfirstthatdigitalhumanitiesfunctionsasheterogeneouscollaborations,i.e.,withparticipantsfromdifferentdisciplinarybackgrounds,andsecondthattheparticipantsactvoluntarily.

Inthispaper,wewillinvestigatethesetwoaspectsinthecontextofdigitalhistorytounderstandhowdigitalhistoryprojectsfunctionasheterogeneouscollaborations,andwhattheparticipants’incentivesareforenteringsuchcollaborations.

Wewilllookatdigitalhistoryprojectsasboundaryobjects,aconceptdevelopedbyLeighStarandGriesemertodescribeanobjectthatmaintainsacommonidentityamongthedifferentparticipants,yetisshapedindividuallyaccordingtodisciplinaryneeds(StarandGriesemer,1989;Star,2010).Thisconceptcouldbeusedforexampletorefertothetoolunderdevelopment,orthedataonwhichthetoolandhistorianwillwork.However,inthispaperwewillapproachtheprojectitselfasboundaryobject;theprojectbindstheparticipantstogether,andallparticipantssubscribetoacommondescriptionoftheproject’sgoals,whileatthesametimetheparticipantsshapetheprojectaccordingtotheirownneeds.Asonedigitalhistoryprojectcoordinatordescribeditinaninterview:

”[Y]ouhavearesearchidea,andyoufitthattothecallyou’reapplyingto,andthenyougetfunding…Andifyouthenhireresearchers,yestheytoohavetheirownideaofcourse,andtheirownlineofresearchthey’reworkingon,andtheytrytofitthatintheresearchproject.”

Thisleadsustoinvestigatetheincentivesforcollaboration.Whenwritingaboutinterdisciplinarycollaborationindigitalhistory,thisisalmostalwaysdonetounderscorethepositiveorevennecessaryeffects(e.g.Eijnattenetal.,2013;Hitchcock,2014;Sternfeld,2011).However,suchcollaborationisnottrivialandrequiresdedicationandinvestmentsfromallinvolved,e.g.asshownbySiemens(2009;2012).InordertoinvestigatetheactivitiesofindividualparticipantswewillfollowtheworkofWeedmanonincentivesforcollaborationsbetweenearthscientistsandcomputerscientists(1998).ForseveraldigitalhistoryprojectsbasedintheBeneLux,wehaveinterviewedtheparticipantsandinquiredabouttheirreasonsforjoiningtheproject,theirindividualgoalswiththeproject,andtheexpectedeffectsoftheirparticipationaftertheprojecthasended.Forexample,inaninterviewonehistoriannotedabouttheirproject:

”[W]e’resupposedtobeadvisingtheteamdevelopingthetool.Andtryingtothencarryoutresearchonaspecificcasestudy.Andsooriginallyitwaslikewowwe’regoingtobeabletousethetool,butveryquicklyitbecameclearokactuallyprobablywe’renotgoingtobeabletousethetool.”

Bylookingintotheincentivesofalltheparticipantsofaproject,wewillunpackthetradingzonesofdigitalhistoryprojects,togainanunderstandingofhowheterogeneous,interdisciplinarycollaborationswork,andhowparticipantsshapethesecollaborations.Thiswillallowustolookintowhyasituationasdescribedabovebythishistorianoccurs,andhowindividualshapingoftheprojectcanleadtothis.Moreover,wewillarguethattheseincentivesgobeyonddisciplinaryboundaries,whichmeansthatthetradingzoneinadigitalhistoryprojectismorecomplexthanthe(in)famousTwoCulturesasdescribedbyC.P.Snow.

ThisresearchispartofPhDresearchonhowtheinterdisciplinaryinteractionsindigitalhistoryaffectthepracticesofhistoriansonamethodologicalandepistemologicallevel(Kemman,2016).Byunpackingdigitalhistoryprojects,weaimtogainbetterinsightinhowdigitalhistoryfunctionsasacoordinationofpracticesbetweenhistoriansandcollaboratorsfromdifferentbackgrounds,andhowindividualincentivesshapethiscoordination.

23

ReferencesEijnatten,J.van,Pieters,T.,andVerheul,J.(2013).BigDataforGlobalHistory:TheTransformativePromiseofDigitalHumanities.BMGN-LowCountriesHistoricalReview,128(4):55–77.

Hitchcock,T.(2014).BigData,SmallDataandMeaning.Availablefrom:http://historyonics.blogspot.co.uk/2014/11/big-data-small-data-and-meaning_9.html.

Kemman,M.(2016).DimensionsofDigitalHistoryCollaborations.DHBenelux.Belval,Luxembourg.

Klein,J.T.(2014).InterdiscipliningDigitalHumanities:BoundaryWorkinanEmergingField.UniversityofMichiganPress,onlineedition.

LeighStar,S.(2010).ThisisNotaBoundaryObject:ReflectionsontheOriginofaConcept.Science,Technology&HumanValues,35(5):601–617.

LeighStar,S.andGriesemer,J.R.(1989).InstitutionalEcology,‘Translations’andBoundaryObjects:AmateursandProfessionalsinBerke-ley’sMuseumofVertebrateZoology,1907-39.SocialStudiesofScience,19(3):387–420.

Siemens,L.(2009).’It’sateamifyouuse”replyall”’:Anexplorationofre-searchteamsindigitalhumanitiesenvironments.LiteraryandLinguisticComputing,24(2):225–233.

Siemens,L.andINKEResearchGroup(2012).FromWritingtheGranttoWorkingtheGrant:AnExplorationofProcessesandProceduresinTransition.ScholarlyandResearchCommunication,3(1).

Sternfeld,J.(2011).Archivaltheoryanddigitalhistoriography:Selection,search,andmetadataasarchivalprocessesforassessinghistoricalcontextualization.AmericanArchivist,74(2):544–575.

Svensson,P.(2011).Thedigitalhumanitiesasahumanitiesproject.ArtsandHumanitiesinHigherEducation,11(1-2):42–60.

Svensson,P.(2012).BeyondtheBigTent.InGold,M.K.,editor,DebatesintheDigitalHumanities.UniversityofMinnesotaPress,onlineedition.

Weedman,J.(1998).TheStructureofIncentive:DesignandClientRolesinApplication-OrientedResearch.Science,Technology&HumanValues,23(3):315–345.

24

SessionD

1.ModellingandAnalyzingCharacterNetworksinRecentDutchLiteratureRoelSmeets(PhDcandidate)RadboudUniversityNijmegen,DepartmentofLiteraryandCulturalStudies

Keywords:socialnetworkanalysis,characternetworks,DigitalLiteraryStudies,Dutchliterature

CharacterrelationsWhenweinterpretnovelsweareinfluencedby(hierarchical)relationsbetweencharacters.Theserelationsarenotneutral,butvalue-laden:e.g.thewayinwhichweconnectClarrisawithRichardisofmajorimportanceforourinterpretationofthegenderrelationsinMrsDalloway(1925).Inliterarystudies,characterrelationshavethereforelainatthefoundationofavarietyofcriticalstudiesonliterature(e.g.Minnaard2010,Song2015).Abasicpremiseinsuchcriticismisthatideologicalbiasesareexposedinthe(hierarchical)relationsbetweenrepresentationsofcertaingroups(i.e.gender,ethnicity,socialclass).

Closereading–thecommon,traditionalmethodinliterarystudies–iswellsuitedforfine-grainedanalysesofthenuancesandsubtletiesofcharacterrelations,butfallsshortwhenitcomestofindingpatternsamongcharacterrelationsortestinghypothesesoncharacterrelationsinlargerbodiesofliterarytexts(cf.Stronks2013).

SocialNetworkAnalysisIncomputationallinguistics,inrecentyearsabroadeningrangeofresearchhasbeencarriedoutonthecomputationalanalysisofsocialnetworksin(literary)texts(e.g.Elsonetal.2010,Karsdorpetal.2012).Onthebasisofautomated,computationalmodelscharacterrelationsofallkindsareformalizedandmappedinlargeamountsoftexts.Althoughinitsinfancy,thisbranchofresearchshowsthatsocialnetworkscaninfactbereliablyextractedautomaticallyfromnarrativetexts(VandeCamp2016),andrelationshipscanalsobeclassifiedaccuratelybycomputationalmodelstrainedonexamples,e.g.asbeingromantic(Karsdorpetal.2015)

Thecurrentresearchprojectdepartsfromthehypothesisthatacomputationalapproachtocharacterrelationscanreveal(hierarchical)patternsbetweencharactersinliterarytextsinamoredata-drivenandempiricallyinformedway.Inordertotestthishypothesis,experimentsarebeingconductedwithdifferentformsofsocialnetworkanalysisofcharactersinacorpusof170recentDutchliterarynovels.Thetwomajormethodologicalchallengesare:

1. todefinethenodesthatconstitutethesocialnetworkofanovel2. todefineandtoweightherelationsbetweenthenodes

Thefirstmethodologicalchallengeisaboutdoingaformofcharacterdetection:NLPtechniquesasNamedEntityRecognitionandResolution,pronominalresolutionandcoreferenceresolutioncometomind.However,automaticcharacterdetectioninliterarytextsisfarfromaconvenientclassificationtask(Valaetal2015).

Thesecondmethodologicalchallengeisaboutfindingawaytodecidewhenandhowtwoormorecharactersinatext‘interact’.WhenFrancoMorettiinhisfamousbookDistantreading(2013)madeacharacternetworkofShakespeare’sHamlet,hedidthatonthebasisofoccurrencesofcharacterX(theaddressee)inthelinesofcharacterY(thespeaker).Novelsarefundamentallydifferentthandramaticplaysinthatrespect:charactersinnovelsusuallydon’tspeaktoeachotherinadirectway,andthedefinitionandweighingofcharacterinteractionthereforerequiresadifferentapproach.

25

Top-downandbottom-upapproachInthistalkIwillarguethatapracticalcombinationofmanuallygathereddataandcomputationalanalysiscangaininsightinpatternsbetweencharacterrelationsinrecentDutchliterature.Insteadofusingabottom-upapproachofcharacterdetection,Iwillstarttop-downusingapredefinedlistofnamesofcharactersfromeachnovelinmycorpus.Furthermore,Iwillusemanuallygathereddatafromearlierresearchtoascribedemographicfeaturestothecharactersthatconstitutethenodesofthenetwork(VanderDeijletal2016).Assuch,itwillbepossibletorelatedemographicbackgroundsofcharacterstotheirrespectiveplaceinthecharacternetworkofthenovel.Moredataarecurrentlybeinggatheredmanuallyfromtheresearchcorpus:thematicrelationsasfamily,friend,lover,colleagueandenemy,whichwillbeusedtodepictthenatureoftherelationsbetweenthecharactersinthecorpus.

Iwilldemonstrateinthistalkhowmanuallygathereddata(demographicfeaturesandthematicrelations)canbeusedfordefiningboththenodesofthenetworkandthenatureofrelationbetweenthenodes.Moreover,Iwillshowhowatop-downapproachbasedonmanuallygathereddatacanbecomplementedandenrichedbyabottom-up,computationalanalysisofco-occurrences,whichwillwebeusedforweighingtherelations(or:interactions)betweenthecharacternodes.Theco-occurrenceanalysiswillconsistofpreciselydelineatedtextualwindows(onthesentencelevel)inwhichwillbesearchedfordifferenttokens(variantsofnames,pronouns)forspecificcharacterentitiesinadjacencywithtokensbelongingtoothercharacterentities.

ReferencesCamp,Matjevande.2016.Alinktothepast:ConstructingHistoricalSocialNetworksfromUnstructuredData.PhDthesis,TilburgUniversity(TilburgSchoolforHumanities).

Deijl,Lucasvander,Pieterse,Saskia,Prinse,Marion&Smeets,Roel.2016.‘MappingtheDemographicLandscapeofCharactersinRecentDutchProse:AQuantitativeApproachtoLiteraryRepresentation.’In:JournalofDutchLiterature(7:1).

Elson,David,Dames,Nicholas&McKeown,Kathleen.2010.‘ExtractingSocialNetworksfromLiteraryFiction’.In:Proceedingsofthe48thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL2010),Uppsala.

Karsdorp,Folgert,Kranenburg,Petervan,Meder,Theo&AntalVandenBosch.2012.‘Castingaspell:Identificationandrankingofactorsinfolktales.’In:F.Mambrini,M.Passarotti,andC.Sporleder(eds.),ProceedingsoftheSecondWorkshoponAnnotationofCorporaforResearchintheHumanities(ACRH-2),pp.39–50.

Karsdorp,Folgert,Kestemont,Mike,Schöch,Christof,&Bosch,Antalvanden.2015.‘TheLoveEquation:ComputationalModelingofRomanticRelationshipsinFrenchClassicalDrama.’In:ProceedingsoftheSixthInternationalWorkshoponComputationalModelsofNarrative,pp.98-107

Minnaard,Liesbeth.2010.‘TheSpectacleofanInterculturalLoveAffair:ExoticisminVanDeyssel'sBlankengeel’.In:JournalofDutchLiterature(1:1).

Moretti,Franco.2013.DistantReading.London:Verso.

Song,AngelineM.G.2015.APostcolonialWoman’sEncounterWithMosesandMiriam.NewYork:PalgraveMacmillanUS.

Stronks,Els.2013.‘Deafstandtussencloseendistant.Methodenenvraagstellingenincomputationeelletterkundigonderzoek’.In:TijdschriftVoorNederlandseTaal-enLetterkunde(4).

Vala,Hardik,Jurgens,David,Piper,Andrew&Ruths,Derek.2015.‘Mr.Bennet,hiscoachman,andtheArchbishopwalkintoabarbutonlyoneofthemgetsrecognized:Onthedifficultyofdetecting

26

charactersinliterarytexts.’In:Proceedingsofthe2015ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,pages769–774,Lisbon,Portugal,AssociationforComputationalLinguistics.

2. Spinozist discourse in Dutch textual culture (1660-1720)A computational approach to the dissemination of the RadicalEnlightenmentLucasvanderDeijl,UniversityofAmsterdam

LiavanGemert,UniversityofAmsterdam

ErikvanZummeren,UniversityofAmsterdamContact:l.a.vanderdeijl@uva.nl

Keywords:Spinozism,RadicalEnlightenment,topicmodeling,discourseanalysis,textmining

Sincethelinguisticturn,theterm‘discourse’hasbeenanimportantinstrumentformanyhumanitiesscholars(Bové1995).Ithasbecomecommonpracticetostudyculturalhistorythroughthelanguageanddiscussionsinwhichitwasmediated.Currently,thegrowingavailabilityofdigitisedhistoricalmaterialprovidesnewwaysandscalestostudyhistoricaldiscourses,whichhavebeenrecognisedbydigitalhumanitiesscholarsatanearlystage(Olsen&Harvey1988).However,digitalapproachestohistoricalcorporafacetheproblemthattheoftenlooselydefinedterm‘discourse’isnoteasytoformalise.Intraditionalliterarystudies,theverylackofdefinitionisinherenttotheinfluentialpost-structuralistparadigmthatreinventedtheterm,inwhichmeaningisconsidered‘indefinite’bydefinition.Withinthistradition,discursiveelementsaremeasuredthroughbothmanifestandlatentsemanticrelations,withanequalfocusonwhatissaidandwhatisleftout,forgottenorsuppressed.Quantitativemethods,tothecontrary,requireamorereductiveunderstandingofwhatadiscoursecomprises(e.g.Jockers2013;Ramsay2011).Theyprimarilyrelyoninformationrepresentedincomputationallymeasurabletextelements,whichchallengesthetraditionaluseoftheterm.DigitalHumanitiesthuspromisenewopportunitiesforculturalhistory,butalsorequireacriticaltranslationoftraditionalmethodology.

Adominantapproachinthestudyofintellectualdiscoursesfocusesonconcepts(e.g.Mandelbaum1965;Lovejoy2001;Kuukkanen2008).Philosophersandcomputationallinguistshavecreatedmodelsandmethodsinordertoaccountforconceptualchangeordriftthroughtimecomputationally(Betti&Hein2014;Kenteretal.2015).Secondly,studiesthatemploydigitaltextanalysistoapproachhistoricaldiscoursesoftenuse‘topics’asarepresentationorindicationofdiscursivepatternsinlargetextcorpora(e.g.Nelson2010).Topicmodelingisausefultechnologyfornarrowingdownaresearchcorpusintoaselectionthatcouldbeofinteresttotheresearcher.Themethodalsoallowstracingtheevolvementofdominantthemesovertime.Itisespeciallyusefulwhentheresearcherhasnostrongintuitionsaboutthecorpus:thepoweroftopicmodelingisitsindependencefromassumptions(Underwood2012).Theuseoftopicsasameasurefor‘discourse’inthetraditionalsenseis,however,problematic.Atopicisformallydefinedasa‘distribution[ofwords]overavocabulary’andisnomorethanasetofwordsthatarestatisticallylikelytoco-occurinagiventext(Blei2012).AdiscourseintheFoucauldiansensecomprises(historical)values,sharedassumptions,‘commonsense’,associations,automatedmodesofwritingandthinking,whichconstituteandregulatepowerrelationsthroughlanguageandintertextuality(e.g.Foucault1977;Bové1995).WhenfollowingFoucaultsnotionofdiscourse,collocations–thebasiclinguisticelementfortopicmodeling–couldbemisleading.Theoperationalisationofdiscoursesthroughtopicsmaybeintuitive,butistheoreticallyfarfromevident.

27

Thestudyofthedisseminationofconceptsanddiscoursesisespeciallyrelevantinthecontextoftheso-calledRadicalEnlightenment,amovementofproto-EnlightenmentintellectualinnovationinwhichSpinozaplayedakeyrole(Israel2001;Jacob1981;Krop2014).AsaresultoftheexplosivetheologicalandscientificdebatesthatthreatenedthestabilityoftheRepublicthroughouttheseventeenthcentury,radicaldiscoursesthatchallengedorthodox-Calvinistdoctrinewerefirmlysuppressedthroughcensorshipandprosecutionofauthors,publishersandprinters(Israel1997).Inspite(orbecause)ofthiscensorship,radicaldiscoursescirculated‘underground’,inclandestinepublicationsandcircuits(cf.Darnton1982).Manyculturalhistorianshavealsoindicatedhowauthorscommunicatedradicalideasindirectlyandambiguouslythroughliterarygenressuchasnovelsandpornography(VanBunge2003;Elias1974;Leemans2002;Wortel2006).TheFoucauldianmeaningof‘discourse’asapossiblemeansforthereinforcementofpowerrelationsbecomesevidentduringtheRadicalEnlightenment.

Ratherthanelaboratingonthetheoreticaldifferencebetweentopics,conceptsanddiscoursesonanabstractlevel,thispaperdemonstratesitthroughacasestudy.Itpresentscomputerassisteddiscourseanalysisasanapproachtoaspecifichistoricalquestion:howdidSpinozistphilosophydisseminateintoa‘Spinozist’discourseinearlymodernDutchtextualculture(1660-1720)?Inthisstudy,Spinozistphilosophywasreducedtoasetofcharacteristicconcepts(cf.DeBolla2013),whichwereidentifiedthroughtf-idf13frequencyanalysesandthenrefinedbyhand.Theconceptswererepresentedasnetworksofco-occuringwordsinseventeenthcenturyDutchtranslationsofeightworkswrittenbythephilosopher,translatedbyPieterBalling(?–1664)andJ.H.Glazemaker(1620-1682)(Thijssen-Schouten1967;Steenbakkers1999).14TheseconceptualnetworkswereusedasameasuretoidentifySpinozist‘discourse’inacorpusof500textspublishedbetween1660and1720.Forpragmaticreasons,thevocabularieswereassumedtobestable,butthispaperaddressespossibleadvancementsbasedontheliteratureonconceptualandlinguisticdrift(Betti&Hein2014;Kenteretal.2015).Also,conventionalproceduresappliedincomputationalintellectualhistoryweremodifiedinordertoreducetheproblemscausedbyspellingvariationinhistoricalDutch(e.g.inHerbelotetal.2012;Tangherlini&Leonard2013).

Theresultsobtainedthroughtheconcept-orientated‘topdown’approacharecontrastedwithamore‘bottomup’transformationofthecorpusbasedontopicmodeling.ThispaperevaluatesthedifferencesbetweenbothapproximationsofSpinozistdiscourseandshowshowSpinozisttextsunknowntothecomputerweresuccessfullyidentifiedanddescribed.Basedontheseresults,itformulatesaworkinghypothesisonthedisseminationofSpinozistdiscourseinDutchtextualcultureandadvancesthedebateontheresonanceof(Radical)Enlightenmentideaswithcomputationalresults(Darnton1982;Israel2001;Leemans2002;Edelstein2010etc.).

ReferencesBetti,A.&H.vandenBerg,‘ModellingtheHistoryofIdeas’.BritishJournalfortheHistoryofPhilosophy22(2014)4:812-835.

Blei,D.,‘ProbabilisticTopicModels’.CommunicationsoftheACM55(2012)4:77-84.

Bolla,P.de,TheArchitectureofConcepts.TheHistoricalFormationofHumanRights.NewYork2013.

13 ‘term frequency – inverse document frequency’. 14 Korte verhandeling van God, de mensch en deszelvs welstand (1660-1661); Renatus Des Cartes Beginzelen

der wysbegeerte, I en II bewezen (1664); Aanhangzel, over-natuirkundige gedachten (1664); Handeling van de verbetering van 't verstant (1667); Zedekunst, In vijf delen onderscheiden (1677); Brieven Van verscheide geleerde Mannen Aan B.d.S (1677); Staatkundige verhandeling (1677); De Rechtzinnige Theologant, of godgeleerde staatkundige verhandeling (1693).

28

Bové,P.A.,‘Discourse’.In:F.Lentricchia&T.McLaughlin,CriticalTermsforLiteraryStudy.Chicago1995:50-64.

Bunge,W.van,‘Philopater,deradicaleVerlichtingenheteindevandeEindtijd’.MededelingenvandeStichtingJacobCampoWeyerman26(2003):10-19.

Darnton,R.,TheliteraryundergroundoftheOldRegime.Cambridge(MA)1982.

Elias,W.,‘HetspinozistischeerotismevanAdriaanBeverland’.TijdschriftvoordeStudievandeVerlichting2(1974):283-320.

Edelstein,D.,TheEnlightenment.Agenealogy.Chicago2010.

Foucault,M.,‘TheArcheologyofKnowledgeandtheDiscourseonLanguage’.Trans.A.Sheridan.NewYork1977.

Gemert,L.van,‘Steneninhetmozaïek.DevroegmoderneNederlandseromanalsinternationaalfenomeen’.TijdschriftvoorNederlandseTaal-enLetterkunde124(2008)1:20-30.

Herbelot,A.,E.vonRedecker,J.Müller,‘Distributionaltechniquesforphilosophicalenquiry’.Proceedingsofthe6thEACLWorkshoponLanguageTechnologyforCulturalHeritage,SocialSciences,andHumanities.Avignon2012:45-54.

Israel,J.,‘ThebanningofSpinoza’sworksintheDutchRepublic’.In:C.Berkvens-Stevelincke.a.(red.),TheemergenceoftoleranceintheDutchRepublic.Leiden1997.

Israel,J.,RadicalEnlightenment.NewYork2001.

Jacob,M.C.,TheradicalEnlightenment.Pantheists,freemasonsandrepublicans.London1981.

Jockers,M.,Macroanalysis.DigitalMethodsandLiteraryHistory.Urbana2013.

Kenter,T.M.,M.Wevers,P.Huijnen&M.deRijke,‘AdHocMonitoringofVocabularyShiftsoverTime’.Proceedingsofthe24thACMInternationalConferenceonInformationandKnowledgeManagement.Melbourne2015.

Krop,H.,Spinoza.EenparadoxaleicoonvanNederland.Amsterdam2014.

Kuukkanen,J.M.,‘MakingSenseofConceptualChange’.HistoryandTheory47(2008):351-372.

Leemans,I.,Hetwoordisaandeonderkant.RadicaleideeëninNederlandsepornografischeromans1670-1700.Nijmegen2002.

Lovejoy,A.O.,‘TheHistoriographyofIdeas’.ProceedingsoftheAmericanPhilosophicalSociety78(1938):529-543.

Lovejoy,A.O.,TheGreatChainofBeing.AStudyoftheHistoryofanIdea.Cambridge,MA/London2001[1964].

Mandelbaum,M.,‘TheHistoryofIdeas.IntellectualHistory,andtheHistoryofPhilosophy’.HistoryandTheory5(1965):33-66.

Nelson,R.K.,‘MiningtheDispatch’,2010.[http://dsl.richmond.edu/dispatch/pages/home]

Olsen,M.&L.G.Harvey,‘ComputersinIntellectualHistory:LexicalStatisticsandtheAnalysisofPoliticalDiscourse’.TheJournalofInterdisciplinaryHistory18(1988)3:449-464.

Ramsay,S.,ReadingMachines.TowardsanAlgorithmicCriticism.Urbana:UniversityofIllinoisPress,2011.

29

Siebrand,S.J.,SpinozaandtheNetherlands.Aninquiryintotheearlyreceptionofhisphilosophy.DissertationRijksuniversiteitGroningen1980.

Steenbakkers,P.M.L.,‘BenedictusdeSpinoza.Eenoverzicht.’Filosofie9(1999)6:4-14.

Tangherlini,T.R.&P.Leonard,‘TrawlingintheSeaoftheGreatUnread:Sub-corpustopicmodelingandHumanitiesresearch’.Poetics41(2013)6:725-749.

Thijssen-Schouten,C.L.,UitdeRepubliekderLetteren.ElfstudiënophetgebiedderideeëngeschiedenisvandeGoudenEeuw.DenHaag1967.

Underwood,T.,‘Topicmodelingmadejustsimpleenough’.Online2012.[https://tedunderwood.com/2012/04/07/topic-modeling-made-just-simple-enough/]

Wortel,D.,‘VrouweninmannenklerenenSpinoza.DeKloekmoedigeLand-enZee-Heldin(1682)alsverpakkingvandefilosofievanSpinoza’.In:SpiegelderLetteren48(2006):27-55.

30

SessionE

1.BuildingaConceptualArchitectureandDataModeltoaddresstheSustainableDataIntegrationProblemGeorgeBruseker,MariaTheodoridou,MartinDoerr(ICS-FORTH)

ResearchInfrastructures(RI)seekingtoprovideaunifiedresourcesettotheirusercommunitytendtobeginwiththeelaborationofanewmodelforunifyingadomainofdiscourseandthenseekouttheinstitutionalandpoliticalsupporttoundertakemappingstothedefinedcommonstructure.Theseprojectsareundertakenwiththecriticalaimoffacilitatingbroadresourceaccesswithinthedomainofinterest.Suchprojects,however,notablyfacestrongchallengesbothintermsofdefininganadequatemodeland,then,insustainingamappingandaggregationprocesswhichisunavoidablytimeconsumingandexpensive.Whilesuchresourceintegrationprojectsundoubtedlyserveacrucialroleinresearchenvironments,anessentialaspectofthisprocessseemstobeconsistentlyoverlooked.Dataarefundamentallyheterogenousinnature-astatethatcannotbeavoided-andareinaprocessofcontinuouspotentialoractualchange.Further,actorsmanagingresourceschangecomposition,statusandactivities.Thisquicklycreatesthepotentialforobsolesenceofanyintegrateddataenvironmentastheindexedresourcesinevitablychange.

Itseems,then,thatvaluecanbehadfromanewapproachthatfocusesonmakingintegrationsustainableandusefulinthelongrunbymodellingandmanagingtheintegrationprocessitself.Bymodellingthismetametalevelandprovidingadatastructureforthetrackingofthesame,weargue,itispossibletoprovidethenecessarymanagementstructuresforbuildinggroundupandon-demandaggregationwhichwillmeettheaimsofthisprocessbothinthepresentandintothefuture.Thispaperwilloutlinetheproposalofanewconceptualarchitecturetosupporthighlyscalableintegrationactivitiesfordevelopingevermoreintegratedpoolsofresourcesandaconceptualmodelcapableofrepresentingthedatarequiredtodrivethisprocess.

TheproposedconceptualarchitecturehasatitscorearegistrythatisalogicallyifnotphysicallydistinctdatastructurethatholdsdatapertainingtotheactivitiesofRIsandtheirmembersthemselves,theresourcestheyprovideandthemannerinwhichtheydoso.Theregistrymaintainsthepictureofwhohasanddoeswhatandwhereresourcesare,aswellastheirlevelofcompatibilitywithotherresources.Thedatarequirementsofthisregistryareextremelylightinordertoformaslittleabarrieraspossibletoparticipationinsuchaservicebypotentialpartners.Thebasicfunctionalrelationshipsthataretrackedtoallowthelong-termmanagementandcontrolofresourcesare:part-of,metadata-ofandindexed-by.Additionalmetadataisonlyrequestedinordertohelpdisambiguate

entitiesintheregistryandtosupportitsreadabilitybytheoperators.Intheproposedarchitecture,sourcemetadataanddataaswellastheirmultiplemappingsremaininacontentcloudwhichcanbeeitherdataheldbytrustedproviderswhoguaranteetheirmaintenanceor,otherwise,canbecopiedintoastablestoragefacilityatthetimeofregistration.Theregistryhastheintentiontoenabledecisionswithregardstothemanagementofdata,basedonthehighlevelviewof

resources,wherevertheymayresideacrossthedatacloud.Suchdecisionscouldinclude:identifyingdatasetsforanintegration,identifyinggapsincoverage,connectingorphaneddatasetsto

31

appropriatecurators,followingupwithserviceproviderswithregardstoavailability/qualityofserviceetc.

Inordertosupporttheproposedarchitecture,itisnecessarytoproposeanewconceptualmodeldescribingintegrationprocessesthemselves.ThisisthefunctionoftheParthenosModel.BuiltoffananalysisoftheregistriesofexistingRIs,itaimstomodelthefundamentalresourcesandrelationsthatareofinteresttomanageinintegration.Identifiedthroughthisprocesswereanumberoffundamentalentitiesthestudyofwhoserelationsdrovethemodeldevelopment.Theseare:services,project,datasets,software,andactors.WhatwasofinterestinthemodelwastounderstandthenatureoftheseobjectsnotassuchbutastheyplayarolewithinRIs.Takingthisscopeintoaccountallowedforstronganalyticdistinctionsofthehighlevelentitiesofinterestdeliveringacompactmodelof+-38classesand50relations.

Particularmodellingchallengesincludedefiningthefunctionalroleofservicesandcollections.Serviceplaysacentral,ifoftenoverlooked,roleinRIdiscourse.Itiswhatbindsassetstoactorsandallowsforeffectivecommunicationbetweenagentsonascientificandtechnicallevel.Aparticularchallengewastomodel

servicebeyondthescopeofe-servicesandtounderstandthefullrangeofitsmeaning.Thisleadtothedefinitionofserviceasawillingnessandabilityforsomeonetotakeactiontothebenefitofsomeotheragent.Modellingserviceatthisgenericlevelandthenprovidinghighlevelclassesforhosting,curatingande-servicesallowsahighlyflexibledescriptionofthevariouskindsofserviceRIsprovidetotheirmembers,notablyincludingnon-ITrelatedservices.Anotherparticularmodellingchallengefacedwastoaddresstheperennialquestionofwhatconstitutesa‘collection’.Thefactofthepluralityofanobjectiseasilymodelledthroughpartofrelations,butthismissesanaspectofthephenomenonthat‘collection’triestoexpress.Considerationofthisquestioninrelationtothecontextofserviceallowedforahighlyusefulnewconceptualization,distinguishingpersistentandvolatiledigitalobjects.Theformerarestaticinformationobjectswhoseidentityisfixedatthebitlevelandhaveanobjectivelyidentifiableexistenceovertimefromtheirstructure.Avolatiledigitalobject,however,hasnofixedidentityinitself,sinceitundergoescontinuouschangeandmodification.Itinheritsanidentityfromthefactthatitisanobjectundercuration,theactivityofacurationservice,undertakenwithsomespecificplan.Bymakingreferencetotheserviceofcurationanditsplan,wecanidentifyvolatiledigitalobjectsor‘collections’overtime.

TheproposedshiftinfocusfromdomainmodellingtomodellingofRIintegrationprocessesthemselvesiscurrentlybeingtestedwithintheParthenosProjectwherethearchitectureandmodelarebeingimplemented.ThemodelisbeingdevelopedandvalidatedthroughaniterativeprocessofmappingfromtheparticipatingRIsregistriestothemodelforintegrationintheregistry.ThemappingprocessisbeingundertakenusingtheX3MLtoolkitforwritingdeclarativemappings.OncepopulatedtheregistrywillbeusedtogetanoverviewoftheintegratedresourcecapacitiesofthejoinedRIsanddetermineappropriatedeeplevelintegrations.ThetechnologiestoruntheaggregationandthesubsequentVREsareprovidedthroughtheGCubeandD4Sciencesystems.Todate,themodelhasshownitselfrobustagainstbasicrevisionandflexibleenoughtodescribethishigh-levelmanagementpictureofintegration.

32

2. Improving data quality in Europeana by designing extensiveEDMrecords-TheUniversitätsbibliothekHeidelbergstudycasePierre-EdouardBarrault,ValentineCharles,AntoineIsaac(EuropeanaFoundation,PrinsWillem-Alexanderhof5,2595BE,TheHague,TheNetherlands)

IntroductionForthispaper,wehaveworkedonimprovingtheresultsofmappingprocessfromtheMETS15toEDM16schemas,formetadatarecordsassociatedwithculturalheritageobjects.WechosetopresentthecaseoftheUniversitätsbibliothekHeidelberg17,whichwasfoundedin1386andisGermany'soldestuniversityandoneoftheworld'soldestsurvivinguniversities.Itsmagnificentcollectionofabout25000records18containsparchments19andearlyprintedbooksfromthe14thcenturyuntilModernAge,orbooks,magazinesandnewspapersfromthe19thandonward,invariouslanguagesincludingFrench20,German,ItalianorSpanish.Itiswithoutanydoubtasolidaccomplishmentforanoldbookdigitizationproject,demonstratingthevalueaddedfromrespectingbothcontentintegritythankstohighdigitizationstandardscoupledwiththeIIIFframework,andinformationalqualitythroughrich,highly-structured,opendata.Inaddition,theinstitutionproposesitscollectionundertheCreativeCommons-Attribution,ShareAlike(BY-SA)openlicense,allowingforfreere-use21.

Ontheotherhand,theEuropeanaCollections22isanEuropeanplatformpartneringwithculturalinstitutionstocentralize,inanopenonlinedatabase,allmetadataandcontentrelatedtoculturalheritageobjectsavailableacrossEurope.Theplatformsactsasasearchenginetoexplorethesecollections,offersasetofcuratedchannelsfocusedonspecificthematics,andalsomakesseveralWebservicesavailablethatcanbeusedbydevelopers,creativesandresearchersfortacklingandre-usingdigitalculturalresources.

Previouslytothisexperiment,thecollectionoftheUniversitätsbibliothekHeidelberginEuropeanawasbasedonharvestsoftheOAI-PMHserveroftheinstitutionexposingmetadataundertheESEschema.Weusedtoreceivelimitedmetadatarecordsinwhichmultiplevaluesforagivenfieldweremappedinonlyoneinstanceofthisfield.Fieldssuchasdc:date,dc:typeanddc:subjectwerebiased.HavingsinglestringsintroducedinasinglemetadatafieldwithseparatorspreventstheEuropeanaautomaticsemanticenrichmentfromdetectingtheappropriatestringandenrichingtherecordbasedonthematchingstring.Othershortcomingswerebasedonthelackoflanguageattributesorrelevanthierarchicaldata.

15Seehttp://www.loc.gov/standards/mets/mets-schemadocs.html

16Seehttp://pro.europeana.eu/share-your-data/data-guidelines/edm-documentation

17Seehttp://www.uni-heidelberg.de/index.html

18Europeanarecordsforthisinstitutionhttp://www.europeana.eu/portal/en/search?view=grid&q=PROVIDER%3A%22Universit%C3%A4tsbibliothek+Heidelberg%22&per_page=96

19SeeHeidelbergerSchicksalsbuch(HeidelbergBookofFate),1491http://www.europeana.eu/portal/en/record/07932/diglit_cpg832

20SeeLeSifflet:journalhumoristiquedelafamille(LeSifflet:humorousfamilynewspaper),1872http://www.europeana.eu/portal/en/record/07931/diglit_sifflet1872.html?q=PROVIDER%3A%22Universit%C3%A4tsbibliothek+Heidelberg%22

21Seehttp://creativecommons.org/licenses/by-sa/4.0/

22Seehttp://www.europeana.eu/portal/en

33

IIIFimplementationWefocusedourworkonthisspecificproviderwiththehopeforimprovingitscollections,whichwerealreadyavailableinEuropeanaCollections,withtheIIIF23featurestheyhadimplementedontheirside.Thisopentechnologicalframeworkcanbeimplementedwithincontentmanagementsystemstoenabledeepvisualisationfeatures(zoom,crop,effects),andtomakeimagesharingeasierontheWeb.

ThemaintargetofthisexperimentwasaboutimplementingIIIFmetadataelements,whichwerenotpresentinpreviouslysubmitteddatafromthisinstitutiontotheEuropeanaCollectionsdatabase.Afterinvestigatingtheavailabledataontheinstitution’sside,wedecidedtoharvestMETSrecordsasthiswasamuchrichermetadatasource,regardingbothIIIFcoreelementsandmetadatarangeandquality.

DataqualityEvenifmetadataimprovementsarenotalwaysobviousonaresultpageintheEuropeanaCollectionsportal,theyneverthelesshaveastrongimpactonsearchandoverallfindability.Ingestionofreliabledatathereforeparticipatesinensuringacohesiveexperienceforitsusers,from

Inthecaseofdigitalculturalheritage,qualitativedatasetscouldbedefinedasensembleofstandardised(suchasLODresources),granular,specific,relevantandconsistentmetadata,associatedwithhighqualityvisualisationstandards.Thenatureoftherecordsitselfshouldobviouslybeinconsiderationwhendefiningtheoverallstrategy.Forinstance,OCR24techniqueswouldmakesenseinthecaseoftextdocumentswhilefocusingonhighdigitizationstandardswouldbettersuitphotographs.Dataqualityisyetcriticaltosupportusersfocuseddiscoveryscenarios25,andlong-termstrategytoimproveitshouldbeconsidereddefactobyanyculturalinstitutions,asaleveragetoreachawideraudience.

ByusinganothermetadatasourcefromTheUniversitätsbibliothekHeidelberg,werefinedandimprovedtheoveralldataqualitybyrelyingonLinkedOpenDataresourcesfromtheGNDauthorityvocabularymaintainedbytheGermanNationalLibrary26,whichwereavailableintheoriginalMETSrecords.Wethereforeincluded,assystematicallyaspossible,theprovidedURIsofresourcesrelatedtoagents,conceptsandplaces.ThisapproachfollowsLODimplementationbestpractices:onlylinkstoresourcesareprovidedintheingestedrecords,andthenEuropeanade-referencesthem,fetchingalltheavailablemetadataforeachprovidedURI.

Wealsoappliedstricterconditionstothemappinginordertopreservethesemanticprecisionandgranularityoftheoriginaldataasmuchaspossible.Thiswasdonebychoosingmorespecificmetadatafields,andrejectingirrelevantones.Wefocusedoncoremetadataelementsrelatedtotypology,format,temporalandgeographicalinformation.Wealsocreatedanadhocdescriptionfieldinordertoprovidemorephysicallocationinformationtousers.

Furthernormalizationwasdoneforagentsrelatedtotheserecords(e.g.creatorsandcontributors),whichwerepreviouslysentwithoutanyroledistinction.Wedisambiguatedthemappingofthese

23Seehttp://iiif.io/

24Seehttps://en.wikipedia.org/wiki/Optical_character_recognition

25MostofEuropeanausersrelyonthesearchfunctionality,and59%ofthemuseextrafilteringoptionstorefinetheirsearch.Morethanhalfoftheuserssearchitemsbasedonspecificgeographicallocation.(Source:EuropeanaCollectionsOnlineSurvey,April2016)26Seehttp://www.dnb.de/EN/Standardisierung/GND/gnd_node.html

34

elementsusingtheMARCRelatorscodes27originallyembeddedintheMETSrecords,suchas“aut”thatrepresents“Author”.Thecodeswereusedtoidentifytheagentsascreatorsorcontributors,andthenwerenormalizedintostringstobedirectlyincorporatedintotheresultingEDMrecordsasadditionalmetadata.

Finally,hierarchicalrelationshipsthatwerenotmadeavailableintheoriginalconversionwererepresentedinthenewmetadata.Wefocusedonrecordsforindividualjournalsencompassedinbiggervolumes,andmappedtherelevantmetadata-referencestoparentandchildrenrecords-withinhierarchicalfields.Thisenabledabetterexperienceforendusersthankstothedisplayofawidgetdedicatedtobrowsehierarchicalresourcesbyfollowingtheircardinalityortheirappartenance.

ResultsThefirstoutcomeofthisworkisanextensivereportpresentingthisstudycase,standingasdataguidelinesavailableintheProsectionofEuropeanaCollections28.However,ourresultsrelyonbothqualitativeandquantitativeachievements.

TheoveralldataimprovementempowerstheEuropeanausers-creatives,searchers,curious-withhigherqualityresults,allowingthemtotailortheirexperienceevenfurtherfromthemainpublicaccess.Specificdatareuseordataminingscenariosalsobenefitfromsuchexperiment,thankstotheEuropeana’sRESTAPI29.Inaddition,thecompatibilitywiththeIIIFframeworkensureaseamlessuserexperiencecarriedoutthroughextendedvisualisationfeatures.ThiscanbetransposedintomoreadvancedapplicationsbydirectlyreusingtheaggregatedIIIFmetadatafromEuropeana,e.g.withinDigitalHumanitiesvisualisationprojects.

Finally,theupdateddatasetsdidn’tnecessarilygrowinsize,recordswise.Butinsteadoftheformer1thumbnailperrecordrule(forabout25Krecords),thenewlyaddedIIIFmetadataenablestheEuropeana’sviewertofetchnowmorethan3.5Mhigh-resolutionpictures(+1600pxwide)fromalltheconnectedJSONmanisfests30.

3.EasingAccesstoLinkedDataResourcesforDigitalHumanitiesScholarsAlbertMeroño-Peñuela1andRinkeHoekstra1,21ComputerScienceDepartment,VrijeUniversiteitAmsterdam,NL{albert.merono,rinke.hoekstra}@vu.nl2FacultyofLaw,UniversityofAmsterdam,NL

Abstract.SemanticWebtechnologycomprisesavarietyoflanguages,standardsandpracticesthat,overthelasttwodecades,hasfacilitatedtheemergenceoftheLinkedOpenData(LOD)Cloud–aglobalWebgraphofmorethan100billioninterconnectedstatements[1].DatasetsinthisLODcloudcovera

27Seehttp://www.loc.gov/marc/relators/

28Seehttp://pro.europeana.eu/share-your-data/data-guidelines/edm-case-studies/the-universitaetsbibliothek-heidelberg-case-study

29Seehttp://labs.europeana.eu/api/introduction

30Seehttp://iiif.io/api/annex/notes/jsonld/#greedy-compaction-of-terms

35

varietyofdomains,includinggeography,government,lifesciences,linguistic,media,publicationsandsocialnetworking.DespitethissuccessintegratingdataontheWeb,SemanticWebtechnologyisstillverypresentateveryleveloftheLODcloud.ThisincludestheearlylayerofaccessingLinkedData;thisis,themechanismbywhichusersselectandgrabthedatatheyconsiderfortheirapplicationsoranalyses.AccessingLinkedDatarequirescertaintechnicalskills–mostlyinvolvingunderstandingoftheResourceDescriptionFramework(RDF)[6]andtheSPARQL[7]querylanguage,butalsootherssuchasSQUIN[3]orLinkedDataFragments[8]–thatveryoftenexcludepotentialusers.Inthedigitalhumanities,manyscholarslackthistechnicalknowledge,andconsequentlymissagreatdealofLODsourcesoftheirinterest.Thisincludes,butisnotlimitedto,multiplelinkeddatasetsonhistoricalstatistics(e.g.CEDAR[2],CLARIAH[4]),museumcollections(e.g.Amsterdam,BritishMuseum,Smithsonian),linguisticresources(e.g.lexinfo,BabelNet),andmedia(e.g.MusicBrainz,BBC,NewYorkTimes,LinkedMovieDatabase)).Althoughthesescholarsarebecomingmoreandmoretechsavvy,deepknowledgeoftechnologyshouldnotbeastrictrequirementforaccessingLinkedData.Inordertoaddressthisissue,weproposegrlc[5],anLinkedDataaccessingserverthatusesSPARQLqueriesstoredanywhereontheWebtogeneratecomprehensive,welldocumented,neatlyorganized,andprovenance-trustedAPIspecifications.SuchAPIsmakeanyLinkedDataactionable,makingaccesstoLinkedDatasourceseasy,repeatableandshareablewithonesingleURIentrypoint.grlcreliesontheSwaggerUI31,anOpenAPI32frontend,topresenttheseAPIstotheuserasanintuitiveuserinterface.Inthisdemo,wewillshowhowgrlccanhelponeasingthetraditionallyhightechnicalrequirementstoaccessLinkedData.WewillillustratethiswithseveralrunningusecasesinCLARIAH33,aDutchnationalprojecttobuilddigitalinfrastructureforthehumanities.

Keywords:LinkedData,API,REST,SPARQL,#LD,WebDataaccess,middleware,OpenAPI

References1.Abele,A.,McCrae,J.P.,Buitelaar,P.,Jentzsch,A.,Cyganiak,R.:LinkingOpenDataclouddiagram.http://lod-cloud.net/(2017)

2.CEDARProject,http://www.cedar-project.nl/

3.Hartig,O.:Squin:Atraversalbasedqueryexecutionsystemfortheweboflinkeddata.In:Proceedingsofthe2013ACMSIGMODInternationalConferenceonManagementofData.pp.1081–1084.SIGMOD’13,ACM,NewYork,NY,USA(2013),http://doi.acm.org/10.1145/2463676.2465231

4.Hoekstra,R.,Meroño-Peñuela,A.,Dentler,K.,Rijpma,A.,Zijdeman,R.,Zandhuis,I.:AnEcosystemforLinkedHumanitiesData.In:Proceedingsofthe1stWorkshoponHumanitiesintheSemanticWeb(WHiSe2016),ESWC2016(2016)

5.Meroño-Peñuela,A.,Hoekstra,R.:grlcMakesGitHubTasteLikeLinkedDataAPIs.In:TheSemanticWeb:ESWC2016SatelliteEvents,Heraklion,Crete,Greece,May29–June2,2016,RevisedSelectedPapers.pp.342–353.Springer(2016)

6.TheWorldWideWebConsortium(W3C):ResourceDescriptionFramework(RDF).http://www.w3.org/RDF/

31See http://swagger.io/swagger-ui/ 32See https://www.openapis.org/ 33See http://www.clariah.nl/en/

36

7.TheWorldWideWebConsortium(W3C):SPARQLQueryLanguageforRDF.http://www.w3.org/TR/rdf-sparql-query/

8.Verborgh,R.,Sande,M.V.,Colpaert,P.,Coppens,S.,Mannens,E.,vandeWalle,R.:Web-ScaleQueryingthroughLinkedDataFragments.In:Proceedingsofthe7thWorkshoponLinkedDataontheWeb(LDOW2014),WWW2014(2014)

37

SessionF

1.TheNederlabresearchenvironment:anupdateHennieBrugman&AntalvandenBoschMeertensInstitute,Amsterdamhennie.brugman@meertens.knaw.nl

Nederlab34(Brugman,2016)isafiveyearlong'NWOgroot'projectbuildingaresearchinfrastructureforprimarilyhistoriansandliterary,linguisticandculturalscholars.Buildingthisinfrastructureinvolvesactivitiesinthreemaintracks:

1. Acquisition,harmonisation/semanticmapping,textenrichmentandmetadatacurationofasubstantialnumberofexisting(historical)DutchdigitaltextcollectionsofouracademicandculturalheritagepartnersintheBenelux.

2. ImprovingthequalityoftheoutputofexistinglanguageprocessingtoolswhentheyareappliedtohistoricalDutchtextsfrom800untilpresent.

3. Buildingavirtualresearchenvironmentwithapowerfulsearchbackendforexploration,searchandanalysisofmetadataandannotedtextfromourverylargeaggregatedandintegratedcollections(Brouwer,2016).

Wearecurrentlyinthelastyearofourproject.Therefore,inourcontributionwewouldliketotaketheopportunitytoevaluatetowhatextentwehavebeenabletoimplementouroriginal,ambitious,projectusecases.WeintendtosupportthisevaluationwithademonstrationatDHBenelux2017.

Ingeneral,weexpecttohaveprocessedbetweentwentyandthirtycollectionsbytheendofourprojectandtohavemadethoseavailabletotheresearchcommunity.Atthemomentofwritingthis,wehavereachedatotalofalmosttenbillionwordsofannotatedtext,accessiblethroughouronlineVirtualResearchEnvironment,the'researchportal'35.Duringthelastyearofourprojectwearecarryingoutanumberofscientificpilotprojectsinanopencall,totesttheusabilityofthisVREandtheNederlabcollections,andtoaddextensionsbasedonrealuserneeds.

Belowwewillzoominonouroriginalcategoriesofusecases.

1.Detectingtheonsetofchange

Whendonewconceptsoccurforthefirsttime?Ornewwordforms?Orwordcombinations(collocations)?

Bytheendofourprojectwewillhavecollectiondataforallperiodsbetween800andpresenttime,therebyenablingfulldiachronicsearches.OurNederlabresearchportalisabletovisualisetimedistributionsoverallhitsfoundforspecificqueries,bothdocumentandhitcountsandshowingabsoluteaswellasrelativefrequencies(forexample,showthenumberofoccurrancesof'vliegtuig'-airplane-foreachyear).ThesystemsupportscomplexqueriesforsequentialpatternsovermultipleparallelannotationlayersusingtheCorpusQueryLanguage(2),aquerylanguageintroducedbytheCorpusWorkBench(CWB)andregularlyusedinourdomain(e.g.bySketchEngine,MTAS,BlackLab).NederlabusesMultiTierAnnotationSearch(MTAS) 36.SearchingforpatternsusingCQL,incombinationwithgroupingofresultsenablesresearcherstoinvestigatewordcombinationsandhow

34 www.nederlab.nl 35 www.nederlab.nl/onderzoeksportaal 36 https://meertensinstituut.github.io/mtas/

38

oftentheyoccur,forspecificperiodsintime.Forexample,itispossibletoqueryforthemostfrequentnounsusedinsentencescontainingthelemma'varen',foreachcentury,toinvestigatepotentialshiftsinmeaningovertime(inthiscasefrom'go'to'gobyboat').

2.Establishingthespreadofchanges

Howdosuchchangesspread,overtime,overplaces,fromonetexttypetoanother,fromoneauthortoanother?

Oursystemallowsuserstosearchforwordsorpatternsandvisualisetheresultsasdistributionsovermanymetadatadimensions,evenovermultipledimensionssimultaneously(e.g.timeandgenre).Itisalsopossibletodirectlycomparetimedistributionsfordifferentsearchtermssimultaneously(usinga'trends'visualisation,e.g.'mensch'versus'mens')(TjongKimSang,2016).

3.Findingconnectionsandnetworks

Findandinvestigatemotivesusingsemanticwordfieldsaroundconcepts.Establishrelationsbetweenpersonsandplaces.

WecurrentlyalreadysupportexpansionofquerieswithhistoricalvariantsusingawebservicebuiltaroundtheDutchhistoricallexiconbytheInstituutvoordeNederlandseTaal(INT).Weintendtogeneralizeandextendthisqueryexpansionmechanismtoincludesemanticexpansionandexpansionwithuserdefineddomainlexica.Wewilldothisincollaborationwithanumberofourongoingscientificpilotprojects.Anexampleofsuchadomainlexiconisasemanticlexiconcontainingemotionwords.

Networksofpersonsandplacescanbechartedonbasisofthenamedentitiesthatwereaddedtoourcorpusduringtheenrichmentprocess.WeuseCQLsearchingincombinationwithgroupingfunctionalitytodothis(e.g.listthemostfrequentlymentionedpersonsinsentencesorparagraphscontainingthelocation'Deventer').

4.Detectingsimilaritiesanddifferencesbetweentexts

Investigatereuseoftextfragmentsamongauthors.Comparetextsortextcollectionswithcorpusanalysistools.

Forindividualtextsorforanysubcollectionoftextsfromourcompletecorpuswecanqueryforstatistics.Wecandeterminetotalnumbersofdocuments,tokensandtypes,butalsomeanandmediannumberofwordsperdocument,infact,oursystemcanreturncompletewordcountdistributionsthatcanbedirectlyvisualised.Otherstatisticsthataresupported:numbersofsentences,paragraphs,divisions,heads,frequencylistsoverwordsoroveranyoftheannotationlayers,foranysubcollectionofourcorpus.Allofthesestatisticsandlistscaninprinciplebeusedtocomparetextdocumentsorcompletedocumentcollections.Allstatisticscanalsobeexportedforfurtheranalysisinexternaltools,likeforexampleR.

ConclusionAfteranumberofyearsofconstructingthefoundationsofourinfrastructure,theprojectisnowatastagewherewecanstartusingitforrealresearchpilotsorprojects.Althoughthereissubstantialroomforimprovementonmanyaspectsofourproducts,ourinitialaimsarewithinreach.

ReferencesBrouwer,Matthijs,HennieBrugman,Marckemps-Snijders(2016).‘MTAS:ASolr/LucenebasedMultiTierAnnotationSearchsolution’,CLARINAnnualConference2016,Aix-en-Provence,France,26-28October2016.

39

Brugman,Hennie,MartinReynaert,NicolinevanderSijs,RenévanStipriaan,ErikTjongKimSang,AntalvandenBosch(2016).‘Nederlab:TowardsaSinglePortalandResearchEnvironmentforDiachronicDutchTextCorpora’,in:ProceedingsofLREC(10theditionoftheLanguageResourcesandEvaluationConference,23-28May2016,Portorož(Slovenia),pp.1277-1281.

Christ,O.(1994).AModularandFlexibleArchitectureforanIntegratedCorpusQuerySystem.InProceedingsofCOMPLEX’94:3rdConferenceonComputationalLexicographyandTextResearch,Budapest).

TjongKimSang,Erik(2016).'FindingRisingandFallingWords',In:ProceedingsoftheCOLING2016workshopLanguageTechnologyResourcesandToolsforDigitalHumanities,ACL,Osaka,Japan,2016.http://ifarm.nl/erikt/papers/lt4dh2016.pdf

2.ModelingtheevolutionoflanguagesthroughtextminingAproposedmethodologyappliedtothetransitionbetweenLatinandromancevernaculars

FlorianCafieroandRemyVerdo

Themechanismsatstakeinthepassagefroma“dilateddiasystem”,wherealanguagebecomesmoreandmorecomplex,toa“disconnecteddiasystem”,wheretwodistinctlinguisticsystemsappearinthesameculturalsystem,seemtobeawell-studiedproblematic.

Forinstance,severalmodelshavebeenpresentedtodescribetheevolutionfromLatintoromancevernacularsinthepastdecade.ThefirstmodelproposedtoaddressthisquestionisErnstPulgram’s(Pulgram,1950:462).InthispioneeringworkLatin,languageishoweverrepresentedaccordingtothetraditionalwrittenvs.oraldistinction,anddoesnotallowaverydetailedanalysis.Itsdeterministicapproachmightalsoleadtosomeinaccuracies,thelanguagebeingconsideredasalwaysfurtherfrom“oldLatin”asthetimegoesby.In1986,WalterBerschin(Berschin,1986:148)proposedamorecomprehensivemodeling.Berschinproposesatwo-sideddiachronicmodeling.TheconceptofvulgarLatinismorerefinedhere,asitincludesbothwrittenandspokenlanguage.Yet,heretoo,vulgarLatinisseparatedfromliteraryLatin,the“stylisticallevel”(Stilhöhe)ofwhich,evenwhenitisatlowest,nevercrosses,oreventouches,thecurveoforallanguage.Whatismore,thisVulgarLatinissupposedtoevolvelinearly,asinPulgram’sworks.ThecurvemodelingliteraryLatinseemstorepresentthesoleevolutionofthehigherregisteroflanguagethatisobserved.Itdisregardstheco-existenceofdifferentregistersoflanguageinliteraryLatin,andignorestheirarticulationtovulgarLatin.Last,wecanonlyregrettheabsenceofdatatakenfrom“diplomatic”texts,inwhichstylisticandpragmaticeffortsarealsotobenoticed.

Hence,thosestudiesraiseafewproblems.Theydonotaddresswelltheproblemofregisters,usingverybroaddistinctions,andforgettingthepossibilitythatdifferentlanguageregisterscouldbeusedatthesametime,eveninthesametext.Theyalsoare“expertsview”,basedontheauthor’sextensiveexperience,ratherthanonasystematicanalysisofthetexts.

Wethusproposeamethodologytosystematicallystudytheevolutionofalanguagefromaformtoanother,takingintoaccountourremarkonregisters.Thismethodologyinvolvescomputerizedstatisticalanalysisandartificialintelligencebutshouldnotbeseenasanautomatedprocessdisconnectedfromthelinguist’sanalysis.Onthecontrary,ithasbeendesignedasawaytoextendthewayofthinkingofaparticularexpert.Itenablestopartiallyre-createhisownpointofview,andtoapplytoalargeamountoftext,thatwouldtaketoolongtoanalyzeotherwise.

Thefirststepconsistsin“traditional”linguisticanalysisonaselectionoftexts,aimedatdifferentiatingseveralregistersusedinsidethetextsofone’speriodofinterest.

40

Oursamplecorpusconsistsinthreehagiographicaltextsandtwenty-onediplomatictexts.OurthreehagiographicaltextswerewritteninlaterMerovingianorinearlyCarolingianages(ca.650-780),thenrewrittenduringtheCarolingianRenaissance(from780tothedeathofCharlestheBaldin877,orso).Thediplomatictextsare21originalFrankishroyalchartersdatingfromca.665to868.Mostofthemareaccountingforajudgment.OriginallypartofthegreatcollectionofthemonasteryofSaint-Denis,theyarekeptintheFrenchnationalArchive.

Weisolatefivelanguageregistersinthissamplecorpus,consistentwithMichelBanniard’sworks(Banniard,2008),andwedesignatableofcriteriatocharacterizethem.

Wethengothroughacalibrationphase.Wetrytoapplyvariouscomputingmethodsthatcanhelpisolatingdifferentlanguageregistersusedinthevarioustextsofthecorpus-orinsideontextofthecorpus.Thisinitiallycallsforunsupervisedmethods,aswewouldnotwanttoinfluencethecomputations’outcome.Thestatisticalanalysiscouldrevealdivisionsweignored,highlightunnoticedphenomena...Wetrytoimplementclusteringalgorithmssuchask-means,hierarchicalclustering,andvariousneuralnetworks.Wethencomparetheperformanceofthosealgorithmswithsupervisedalgorithms,whereoursamplecorpusisusedastrainingdata.

Crucialforthoseanalysisisthewaywechoosetopresentthetextstoouralgorithms.Lemmatizingthetextswouldremovetoomuchinformation.Here,evensmallvariations,suchaswrittenformvariations,arelikelytobesignificant.Itcansometimesbeevenmoresignificantthanthegrammaticalstructureofthetextsitself.Thisiswhyweapplyourcomputationstotwotypesofversionofourcorpus’texts.Inthefirstversions,thetextsaretreatedasalistofwords,withoutanyfurthertreatment,orwithaselectionofthemostfrequentwords.Inthesecondversions,thetextsaretreatedasn-grams(for8>n>3),withoutanyfurthertreatment,orwithaselectionofthemostfrequentforms.N-gramscandemonstrategreatperformancehere,astheyallowtotakeimplicitlyintoaccountthestructureofthesentences-here,whichwordcomesafterwhich.

Wecompareallthosefindingswithourown“expert”modeldesignedonoursample,andselectthesolutionthatgivesthemostaccuratedivisioninregisters.

Wethenruntheselectedalgorithmonanextendedcorpus,formedbyalargeselectionoftextswrittenduringthesameperiod(650-877).Wethenfollowtheregister’sevolutionacrosstimeonthisbroadercorpus.Wethenconcludeontheglobalconsistenceoftheseresultswiththemodelwedesignedbyanalyzingourfirstsample.

BIBLIOGRAPHYMichelBanniard,«Dulatindesillettrésauromandeslettrés:laquestiondesniveauxdelangueen

France(viiie-xiiesiècle)»,inZwischenBabelundPfingsten:SprachdifferenzenundGesprächsverständigunginderVormoderne(9.-16.Jh.):Aktender3.deutsch-französischenTagungdesArbeitskreises«Gesellschaftund

individuelleKommunikationinderVormoderne»(GIK)inVerbindungmitdemHistorischenSeminarderUniv.Luzern,Höhnscheid(Kassel),16-19nov.2006,PetervonMoosed.,Münster,2008(«GesellschaftundindividuelleKommunikationinderVormoderne»,1),p.269-286.

W.Berschin,BiographieundEpochenstilimlateinischenMittelalter,Stuttgart,t.3:KarolingischeBiographie,750-920,1991.

PieraMolinelli,«Perunasociolinguisticadellatino»,inLatinvulgaire–latintardif:actesduVIIe

colloqueinternationalsurlelatinvulgaireettardif(Séville,02-06septembre2003),éd.CarmenAriasAbellán,Séville:UniversidaddeSevilla,2006,p.463-474.

41

Giovanni Polara, « Problemi di ortografia e di interpunzione nei testi latini di età carolina », Grafia e interpunzione del latino nel Medioevo (Roma, sept. 1984), éd. Alfonso Maieru, Rome, 1987.

Ernst Pulgram, « Spoken and written Latin » , Language. Journal of the Linguistic Society of America, t. 26, 1950.

3.Experimentsinfine-grainedentitytypingforDutchMariekevanErpandPiekVossenComputationalLexicologyandTerminologyLab,VrijeUniversiteitAmsterdam

IntroductionManyentityrecognitionapproachesclassifyrecognisedentitiesintoalimitedsetofcoarse-grainedentitytypes[1].However,fine-grainedentitytypesaremoreusefulfordeepernaturallanguageanalysisandend-usertasks,inparticularinthedigitalhumanitiesdomainwhereentitylinking(groundinganentityinaknowledgebase)isnotpossible.Forexample,whilestandardnamedentityrecognitionmaydeterminethatanentityisapersonknowingwhetherthatentityisawriterorapoliticianisimportantforpopulatingadatabaseofpersonswithparticularoccupations.Currently,fine-grainedentitytypinghasonlybeeninvestigatedforEnglish.Inthisabstract,wepresentafine-grainedentitytypingsystemforDutchusingtrainingdataextractedfromWikipediaandDBpedia.OursystemachievescomparableperformancetoEnglishwithanF1measureof.90on59typesand.57on269types.

ApproachOurapproachtogeneratetrainingdataisinspiredby[2]and[3].In[2],thetrainingdataisgeneratedusingWikipedia,wherethewikilinkanchortextisextractedasanentitymentionwhichmapittoitscorrespondingFreebaseentitytypes.WealsotaketheWikipediawikilinks,anchortextandsurroundingtext,butinsteadoflinkingittoFreebase,welinkittoDBpedia[4].TheadvantageofDBpediaisthatitisbasedonWikipedia,thereforethereisadirectlinkavailablebetweenawikilinkandDBpediathroughamappingsfile.37

Featurename Description Example

Mention Theentityphrase SanFrancisco

Head Thesyntacticheadoftheentityphrase Francisco

Non-head Thenon-headtokensintheentityphrase San

Entity-shape Thewordshapeofthewordsintheentityphrase

AaaAaaaaaaa

Trigrams Charactertrigramsintheentityhead _FrFraranancncicisiscscoco_

Wordbefore Thewordbeforetheentityphrase te

Wordafter Thewordaftertheentityphrase Californië

37http://downloads.dbpedia.org/2016-04/core-i18n/nl/wikipedia_links_nl.ttl.bz2

42

Table1:Descriptionoftheextractedfeatures

Webaseourfeaturevectorson[3],whereweleaveoutthedependencyandtopicrelatedfeaturesduetoprocessingconstraints.ThisresultsinthefeaturesdisplayedinTable1.

Tocompareourresultstothoseinpreviouswork,wemappedtheDBpediatypehierarchytotheentitytypinghierarchyusedin[2]and[3].Outofthe86typesthatwerepresent,9typescouldnotbemappedtotheDBpediatypehierarchy.38Asnotalltypesarepresentinthedataset,weonlyfind59ofthetypesfrompreviousworkinourdataset.WealsoperformaseriesofexperimentswiththefullDBpediatypehierarchy,resultinginanexperimentwith269typestopredict.

Astherearenofine-grainedentitytypingdatasetsavailableforDutchyet,wesplitthegenerateddatasetinto⅔partsfortrainingand⅓partsfortest.Thisresultsinabout1millioninstancesfortrainingonthesetwith59entitytypes,and2milliononthesetwith269entitytypes.

WeusetheFastTextalgorithm[5,6]39totrainourtypepredictionmodel.Thisalgorithmlearnsrepresentationsforcharactern-gramsandwordsarerepresentedasthesumofthen-gramvectors.Thishelpsincoveringmorphologicallyrichlanguages,wordsthatdonotoccuroftenandpotentiallyentitymentionsthatdonotoccurinthetrainingcorpus.

ExperimentsandResultsWefirstevaluateourapproachontheentitytypesfrompreviouswork(rows2-6inTable2).AtLevel1,coarse-grainedentitytypes(person,location,organisation,andother)areevaluated.Thesearethesamehigh-leveltypesthatarepresentinmostnamedentityclassificationtasks.AtLevel2,thefiner-grainedentitytypesthataredirectlybelowtheseareevaluated(e.g.person/artistandorganisation/company).AtLevel3,superfine-grainedtypesareevaluated,forwhichwestillachieveamacroF1of.90(e.g.person/artist/musicandorganisation/company/news).

Types Precision Recall F1

Level1:4types .98 .98 .98

Level2:33types .92 .90 .91

Level3:24types .89 .91 .90

Overall(59types) .93 .88 .90

Overallonlydarkentities(59types) .67 .56 .60

DBpediatypes(269) .68 .52 .57

DBpediatypes,onlydarkentities(269types) .50 .41 .44

Table2:Precision,recallandmacro-averageF1

38Thetypeswecouldnotmapwerethefollowing:location/structure/government,organization/stockexchange,other/health,other/livingthing,other/product/car,other/product/computer,person/education,person/education/student,person/education/teacher39https://github.com/facebookresearch/fastText

43

Wealsoevaluatedtheapproachononlydarkentities(i.e.entitymentionsthatwerenotpresentinthetrainingdata).40HereweseethatthescoresdroptoandF1of.60whichisinlinewithpreviousresearch[7].Itisunlikelythatthereisnooverlapbetweenthetrainingandtestdata,butthisissuedeservesfurtherinvestigation.

Furthermore,weseethattheresultsfortheDBpediatypehierarchycontaining269typesaresignificantlylower,butthereislesstrainingdataavailableforthoseandnotall685DBpediatypesarecovered.ThisispartlyaresultofthemappingsfileonlycontainingthemostspecificDBpediatype,forexamplehttp://nl.dbpedia.org/resource/Old_Amsterdamislistedashavingtype‘Cheese’inthemappingsfile,butitssuperclass‘Food’isnotpresent.

ConclusionsandFutureWorkWehavepresentedanapproachandexperimentsforfine-grainedentitytypingforDutchwhichcanbeparticularlyinterestingforcollectinginformationaboutentitiesindigitalhumanitiessources.OurresultsareonparwithpreviousworkforEnglishandoursoftwareisavailableathttps://github.com/cltl/multilingual-finegrained-entity-typing.

Forfuturework,weaimtotesttheapproachonhistoricaldatasetssuchastheNIOD“GetuigenVerhalen”datasetandBiografischPortaal.Wealsointendtocompileasubsetofmostrelevanttypesforthedigitalhumanitiesdomainandprovideatrainedmodelforreusebyhumanitiesresearchers.

References:[1]Nadeau,D.,Sekine,S.:Asurveyofnamedentityrecognitionandclassification.LingvisticaeInvestigationes30(1),3–26(2007)

[2]Ling,X.,Weld,D.S.:Fine-grainedentityrecognition.In:AAAI(2012)

[3]Gillick,D.,Lazic,N.,Ganchev,K.,Kirchner,J.,Huynh,D.:Context-dependentfine-grainedentitytypetagging.In:arXiv(2014)

[4]Bizer,C.,Lehmann,J.,Kobilarov,G.,Auer,S.,Becker,C.,Cyganiak,R.,Hellmann,S.:DBpedia-acrystallizationpointforthewebofdata.WebSemantics:science,servicesandagentsontheworldwideweb7(3),154–165(2009)

[5]Bojanowski,P.,Grave,E.,Joulin,A.,Mikolov,T.:Enrichingwordvectorswithsubwordinformation.Tech.rep.,Archiv(2016),https://arxiv.org/abs/1607.04606

[6]Joulin,A.,Grave,E.,Bojanowski,P.,Mikolov,T.:Bagoftricksforefficienttextclassification.Tech.rep.,arXiv(2016),https://arxiv.org/abs/1607.01759

[7]Yaghoobzadeh,Y.,Schütze,H.:Corpus-levelfine-grainedentitytypingusingcontextualinformation.In:Proceedingsofthe2015ConferenceonEmpiricalMethodsinNaturalLanguageProcessing.pp.715–725.AssociationforComputationalLinguistics,Lisbon,Portugal(17-21September2015)

40Whilstwemadesurethetrainingdataandtestdatawereseparateontheinstancelevel,popularentitiescanstillbementionedinbothdatasets

44

SessionG

1.PredictingfamilialriskofdyslexiabyapplyingmachinelearningtoinfantvocabularydataAoChen*1,2,FrankWijnen2,CharlotteKoster3,HugoSchnack11DepartmentofPsychiatry,BrainCenterRudolfMagnus,UniversityMedicalCenterUtrecht,Utrecht,theNetherlands2InstituteofLinguistics,UtrechtUniversity,Utrecht,theNetherlands3CenterforLanguageandCognitionGroningen,UniversityofGroningen,Groningen,theNetherlands

BackgroundThecombinationofrapidprogressinthedevelopmentofcomputationaltools,suchasmachinelearning,andthegrowingavailabilityofdigitizeddatainlanguageresearch(e.g.,theDANSdataarchive)andtoolstoassessthesedata(e.g.,viaCLARIAH),hasmadeitpossibletoinvestigatelanguageacquisitioninanautomatedwayandonalargescale(weused22,000vocabularyscoresinourstudy).Inthisstudy,weappliedamachinelearningalgorithmtovocabularydatatomapthepatternofvocabularydevelopmentinindividualchildren.Weinvestigatedwhetherindividualdifferencesbetweenchildreninthewordknowledgeindifferentwordclasses(e.g.,nouns,pronouns,helpingverbs)canbeusedtodetectifachildisatriskofdevelopingdyslexia.Earlydetectionofdevelopmentaldyslexia,aspecificreadingdisorder,willenableinterventionsatanearlyage,beforetheonsetofformalreadingandspellinginstruction.Althoughdeviationsinearlyspeech/languagedevelopmenthavefrequentlybeenrelatedto(riskof)dyslexia(vanderLeijetal,2013),noneofthesemarkershavebeensuccessfullyusedtopredictlaterlanguage/literacyperformanceattheindividuallevel.Machinelearningisatechniquecapableofdiscoveringpatternsindatatomakesuchpredictions.Inthepastdecademachinelearninghasbeensuccessfullyemployedin,e.g.,medicineandthehumanities.Recentexamplesincludethepredictionofdisease-courseinpsychosis(Koutsoulerisetal,2016)andtheattributionofawriterwhowaspreviouslynotconsidered,asauthoroftheDutchanthem(Kestemontetal,2016).Theaimofthisstudywastoinvestigateifearlyvocabularydevelopmentcanbeusedtopredictwhetherornotaninfantisatriskofdyslexia.

MethodWeinvestigatedearlyvocabularydevelopmentintwolarge,independentsamplesofchildrenatfamilialriskofdyslexia(FR;N=495)andtypicallydevelopingchildren(TD;N=498)between17and35monthsofage.TheDutchversionoftheMcArthur-BatesCommunicativeDevelopmentInventory(WordsandSentences)(N-CDI;Fensonetal,1993)wasusedtomeasureeachinfant’svocabularydevelopment.Thiswasdonebycountingthenumberofwordshe/sheknewin22wordcategories.Theseso-called22featuresformedtogetherthefeaturevectorrepresentingthissubject.Wetrainedalinearsupportvectormachine(SVM;Vapnik,1999)topredictthestatusofat-riskattheindividuallevel,basedonthesefeaturevectors.SVMisasupervisedmachinelearningtechniquethatisabletofindpatternsintheinputdata(wordcountsin22categories,inourcase)thatarerelatedtosomeoutputmeasure(inourcase:belongingtotheFRorTDgroup).Thetrainingprocedureresultsinamodelthatoptimallypredictsfor(new)subjectstowhichgrouptheybelong.Thispredictionisbasedontheweightedsumoftheinputvariables,wheretheweightsaretheresultoftheoptimizationprocedureduringtraining.

PerformanceofourpredictionmodelwasassessedbythepercentagesofFRsubjectsthatwerecorrectlyclassifiedasFR(sensitivity),thepercentageofTDsubjectscorrectlyclassified(specificity)andthebalancedaccuracy(meanofsensitivityandspecificity).

45

Themodel’sgeneralizabilitywastestedusingcross-validation.Inthissetupthemodelistrainedandtestedindifferentsubsamples.

ResultsTherewasaspecificageperiod,18-20months,inwhichthemodelwassensitivetopredictthestatusofbeingatrisk(FR).At19-20monthsofage,thecross-validationaccuracywas68%(p<0.01),withsensitivitybeing70%andspecificitybeing67%.Intheotheragegroupstheaccuracywaslowerandnotsignificant.

Notall22featurescontributedtothesameextenttothediscriminationbetweentheFRandTDsubjectsatage19-20months.Theweightsof5wordcategoriesweresignificantlydifferentfromzero.Thecategorieshelpingverbsandprepositionsandlocationscontributedmost.Themodelhadlearntfromthedatathatknowingfewerwordsinthesecategoriesatthisageisasignificantmarkerforbeingatfamilyrisk.

ConclusionMachinelearningmethodsarepromisingtechniquesforseparatingFRandTDchildrenatanearlyage,beforetheystartreading.ThereisasensitivewindowinwhichthedifferencebetweenFRandTDismostevident.ThemodelalsoindicatedthewordcategoriesinwhichFRinfantsknow(onaverage)fewerwordsascomparedtoTDinfants.Itshouldbenotedthatwedidnotpredictthemanifestationofdyslexia,butonlyelevatedrisk.Wewillfollowthesechildrenup,andtheultimategoalistotrainamodelthatisabletodiscriminatebetweentheFRchildrenwhodevelopdyslexiaandwhodonotatanearlyage.

ReferencesCLARIAH.http://www.clariah.nl

DANS.https://dans.knaw.nl

Fenson,L.,Dale,P.S.,Reznick,J.S.,Thal,D.,Bates,E.,Hartung,J.P.,etal.(1993).TheMacArthurCommunicativeDevelopmentInventories:User’sGuideandTechnicalManual.SanDiego,CA:SingularPublishingGroup.

KestemontM,StronksE,DeBruinM,DeWinkelT.VanwieishetWilhelmus?(2016Dec)AmsterdamUniversityPress.

KoutsoulerisN,KahnRS,ChekroudAM,LeuchtS,FalkaiP,WobrockT,DerksEM,FleischhackerWW,HasanA.Multisitepredictionof4-weekand52-weektreatmentoutcomesinpatientswithfirst-episodepsychosis:amachinelearningapproach.LancetPsychiatry.2016Oct;3(10):935-946.doi:10.1016/S2215-0366(16)30171-7.

vanderLeij,A.,vanBergen,E.,vanZuijen,T.,deJong,P.,Maurits,N.,andMaassen,B.(2013).Precursorsofdevelopmentaldyslexia:anoverviewofthelongitudinaldutchdyslexiaprogrammestudy.Dyslexia19,191–213.doi:10.1002/dys.1463.

Vapnik,VN.(1999).Anoverviewofstatisticallearningtheory.NeuralNetworks,IEEETransactionson,10(5),988-999.doi:10.1109/72.788640.

46

2.TheDictionaryoftheSouthernDutchDialects(DSDD):DesigningaVirtualResearchEnvironmentfordigitallexicographicalresearchProf. dr. Jacques Van Keymeulen Ghent University, Belgium

ThesouthernDutchdialectareaconsistsoffourdialectgroups:(1)theFlemishdialects,spokeninFrenchFlanders(France),WestandEastFlanders(Belgium)andZeelandFlanders(TheNetherlands);(2)theBrabanticdialects,spokeninAntwerpandFlemishBrabant(Belgium)andNorthernBrabant(TheNetherlands);(3)theLimburgiandialects(spokenintheLimburgprovincesofBelgiumandTheNetherlands);(4)theZeelanddialects,spokeninZeelandandGoeree-Overflakkee(theNetherlands).

ThedialectvocabularyoftheFlemish,BrabanticandLimburgiandialectsiscollectedinthreeregionaldictionaries(WVD,WBDandWLDrespectively),whicharesetupaccordingtothesameplan,conceivedbyprof.A.Weijnen(Nijmegen):theyareonomasiologicallyarrangedandpublishedinthematicfascicles.Contrarytotheirtitles,thesedictionariesaretobeconsideredasgeographically-orientatedinventoriesofwordusage,andnotasdictionariesproper,sinceitisimpossibletodescribemeaninginanonomasiologicallyarrangeddictionary.Theyareatlasses,notdictionaries!Weretaintheworddictionaries–however–sincethethreeprojectsaretraditionallyknownassuch.

Figure1:Researchareasofthe4regionaldialectdictionariesofthesouthernDutcharea

ThethreedictionariesdescribethevocabularyofthetraditionaldialectsofthefirsthalfofthetwentiethcenturyinthesouthernpartoftheDutchlanguagearea,inajointinternationalandinter-universityproject.TheWBD,the'mother'ofthetwootherprojects,wasfinishedin2005;theWLDwascompletedin2008.TheywerecompiledattheUniversityofNijmegenandtheUniversityofLeuven.TheWVDstarted12yearslaterthanitssisterprojects(in1972attheGhentUniversity,byprof.W.Pée)andwillcontinueuntilabout2019.

47

Thedictionariesweresetupinparallelinordertomakepossibletheaggregationofthedata,thusfulfillingtheobjectivesofthefoundersoftheprojects.Tothateffect,in2016aconsortiumof11linguists,computerscientists,digitalhumanitiesexpertsandgeographerswascreatedsupportingtheproject“DictionaryoftheSouthernDialects”(DSDD).ItaimsattheaggregationandstandardizationofthethreecomprehensivedialectlexicographicdatabasesintooneDSDD-database(towhichhopefullythealphabeticallyarrangedWZDwillbeaddedinthefuture).Inparticular,dialectologistsfromGhentUniversityworkcloselywiththeGhentCentreforDigitalHumanities(GhentCDH)topreparethegroundfortheaggregationofthethreeSouthernDutchdialectdatabasesandtheirexploitationviaaVirtualResearchEnvironmentfordigitallexicographicalresearch.TheGhentteamwillworkcloselywiththeInstituutvandeNederlandseTaalwithregardtothetechnicalandlinguisticsustainabilityoftheDSDD.ThroughthiscollaborationinteroperabilitywithCLARINwillalsobeensured.TheDSDDisadditionallyapilotprojectofDARIAH-BEBelgium.

TheDSDDVirtualResearchEnvironmentwillenablearesearchprogrammewithnewresearchquestions,particularlyinthefieldofquantitativelexicologyandgeographicalanalysis.Duringtheproject2-3researchusecaseswillbedevelopedtotesttheapplicabilityofthenewlyaggregatedDSDDfordigitalscholarship.Forexample:

1. Whatsystematiclexico-geographicalpatternsdothesouthernDutchdialectsshow?Dotheycoincidewiththetraditionalones,basedonphonology?(seeDeVriendt2012).Aretheregeographicalpatternsinsemantics?

2. Inordertoexplorethegeographicalspreadingofseveraldialectologyconceptsandtolinkthemto“Kloekeplaatscodes”(whichareusedinlinguisticresearchformapping/linkingdialectologyconceptstogeographicalregions),asetofgenericbuildingblocksforautomaticatlas/heatmapgenerationwillbedeveloped.Segmentationandclusteringtechniquescanberunoverthegeneratedatlases/heatmapsinordertoautomaticallydetectthehomogeneity(orheterogeneity)ofaparticulardialectologyconcept.Furthermore,spatialqueryingtechniqueswillbesupportedinordertogeographicallysearch/explorethiskindofdialectologyconcepts.

3. Clusteranalysisandexplorationofthelinkage(andvisualization)oflinguisticdatawithsynchronicanddiachronicextralinguisticdataofallkinds.

Bytheendoftheproject,theDSDDwilla)makethenewlyaggregatedDSDDavailableviaauser-friendlywebsiteandb)enabletheDSDDfordigitalscholarship.Toenablethis,aprofessionallydesigneduser-friendlywebapplication,orVirtualResearchEnvironment,(includingapplicationprogramminginterface(API)fordataexport)willbecreated.Theexporteddatawilluseexistingdigitalresearchtools(e.g.forgeo-visualisation,qualitativelexicologyanddialectometry)tovalidatetheresearchcasestudiesdescribedabove.

AttheDHBeneluxConference,wewillproposetheplanfortheaggregation,thestructureofthedatabaseanddwellonthedifferent‘editorial’problemsthathavetobesolved.Thedifferentdictionaries/databasewereindeedcomposedoveraverylongperiodoftime,atdifferentplaces(Nijmegen,Leuven,Ghent)andbydifferenteditors,henceagreatnumberofinconsistenciesaroseovertime.InordertocomposeanaggregatedDSDD-database,anumberofstandardizationactivitieshavetobecarriedout.Additionally,wewillpresenttheinitialresultsoftheVirtualResearchEnvironmentrequirementsanalysis.

3.Establishinginterdisciplinarydialogue:conductingaqualitativeinvestigationintolinguisticrequirementsforNaturalLanguageGenerationEmmaClarkeandOwenConlan

48

BackgroundDialoguesystems,commonlyreferredtoaschatbotsarebecomingincreasinglypopular.In2016,chatbotwasshortlistedaswordoftheyearbyOxfordDictionaries41andplatformssuchasFacebook

Messenger2arefrequentlyutilisedtocommunicateupdatesorinformation,sellproductsorprovideservices.Whilethegoalofadialoguesystemwhichcommunicatesnaturallywithitsuserappearedtohavebeen‘withinreach’asfarbackas2001(Rambowetal.,2001),currentNaturalLanguageGeneration(NLG)researchapproachescontinuetohavelimitationswhenitcomestothe‘natural-ness’oftheirinteractions(LeCunetal.,2015)(ReiterandDale,2006)(ManningandSchütze,1999).Thus,theNLGfieldislookingtomovetowardsmorenaturalconversationalinterfacesbytakinginfluencefromnaturalhumanspeechandasdialoguesystemsbecomemorehuman-like,theinterspersionofpersuasivelanguagewithinthemwillbecomemoreapplicable.Somepriorresearchhasbeencarriedoutonthedevelopmentofpersuasivedialoguesystems(Prakken,2009)(Parsonsetal.,2003)(WaltonandKrabbe,1995).Mostrecently,Hiraokaetal.(2016)observedthat“thesepersuasivedialoguesystemsareintheirfirststagesofdevelopment,andarefarfromtheabilitiesoftheirhumancounterparts,bothintermsofpersuasiveability,andalsoabilitytoachieveusersatisfaction”.Thefocusofthisresearchprojectisthelanguageofpersuasion,namelyrhetoricaldevices.WebelievethatinordertounderstandtherequirementsoftheNLGcommunityinthisarea,theestablishmentofcross-disciplinaryconversationisessential.

ChallengeThenuancesofhumanspeechsuchassarcasm,slangandwordplayandthehumanabilitytoprocessandunderstandthesesubtletiesmakethemequallyfascinatingandfrustratingforresearchersintheareasofnaturallanguageprocessing,understandingandgeneration.AmajorchallengefacedbyNaturalLanguageGeneration(NLG)researchersishowtoincorporatelinguisticunderstandingintoNLGsystemsinordertogeneratemorenaturalsoundinglanguage.Thischallengeisexpectedtocontinuetopervadeinthenextgenerationofnaturallanguagesystems(Dale,2016)(LeCunetal.,2015)(WardandDeVault,2015)(Gartner,n.d.).

OftenlackingindialoguesystemsandNLGresearchislinguisticexpertisepresentedinaformwhichisunderstandable,thatdissectsnaturalelementsofhumanspeech,particularlyelementswhicharedifficultformachinestolearn.WardandDeVault(2015)highlightthisinterdisciplinaryengagementintheir‘TenChallengesinHighly-InteractiveDialogSystems’.

Asinterdisciplinaryresearchbecomesmoreprevalent,therequirementforcomputersciencepractitionerstoengagewithnon-technicalresearchersfromdiversebackgroundswillincrease.Dale(2016)alsoreferstocross-disciplinaryconversationsandencouragesdialoguesystemsdeveloperstoaccesstheexpertiseofthecomputationallinguisticscommunity,inwhichresearchintodiscoursephenomenahasbeenon-goingsincetheinceptionofthefield.Dale(2016)presentsanencouragingcalltoaction:“Ifwewanttohavebetterconversationswithmachines,westandtobenefitfromhavingbetterconversationsamongourselves.”.

ApproachTheoverallaimofthisresearch(fig.1)istoestablishananapproachtounderstandinghowrhetoricaldevicesfunctioninnaturalhumanspeechinordertoproposeamethodwhichcanbebuiltintopracticalNLGapplicationssuchasdialoguesystems(chatbots).Theworkwilldrawuponstructuredratherthanrandominfluencebyobservingtheusageoftheselinguisticstrategiesforpersuasioninhumanspeech.Fromtheseobservations,aTEIschemahasbeencustomisedinorderto

41 https://en.oxforddictionaries.com/word-of-the-year/word-of-the-year-2016 2

https://www.messenger.com/

49

markupasetofrhetoricaldeviceswithinacorpus.

Figure1

Thispaperwillpresentfindingsonthecentralcomponentofthediagramabove:thecross-disciplinaryengagementwithNLGpractitionersinordertodevelopapragmaticapproachtoincorporatingpersuasivelanguageintodialoguesystems.WeexplorehowacustomisedTEIschemaisusedinsemi-structuredinterviewswithNLGresearchers(anongoing,iterativeprocess).Basedonqualitativefindingsfromtheinterviews,theschemaisrevisedandamendedtoincorporaterequirementsandsuggestions.ThefinalschemawillultimatelybeusedtomarkupandannotatespeechesfromthecorpusinordertobeaddedtoNLGaspartofthesystemtraining.

MethodAseriesofsemi-structuredinterviewsarebeingcarriedoutinwhichtenNLGpractitionersinareaskedquestionsinordertounderstandcurrentandfuturerequirementsofNLGapplicationssuchasdialoguessystems.

Inthecourseofeachinterview,theTEIschemaispresentedandthesuggestionsoftheNLGpractitionerssought.Theinterviewsarerecordedandtheresultingoutcomesareanalysedusingatlas.tisoftware.TheresultsarethensummarisedtocreateanoverallpictureofNLGresearcherrequirements.

Outcomes(todate)Theprocessoutlinedaboveisongoingatthetimeofsubmission.However,preliminaryfindingsfromtheinterviewscanbesummarisedasfollows:

• ●Bothtemplate-drivenanddeeplearningsystemsuseannotateddata.Inarule-basedapproach,annotationsareusedtohelpfurtherengineerfeaturesbyhandwhileadeeplearningapproachusesannotationtohelplearnandunderstandstructure.

50

• ThereisanemergingquestioninNLGresearchabouthowtodealwithsentencestructureandnuance.Increasingly,researchersareusingmarkeduptexttohelpsystemslearnhigherorderstructures.

• Pattern-matchingaloneisnotarobustenoughapproach.• Averyclearannotationschemathatmarksupfeaturesofrhetoricaldeviceswouldbeusefulfor

NLGresearchersworkingintheareaofpersuasion.ConclusionTheaimofthisresearchistoengageinaninterdisciplinaryconversationwithNLGpractitioners.Theprocessofengagementandthefindingsfromtheinterviewswillbepresentedinthispaper.

ReferencesDale,R.,2016.Thereturnofthechatbots.Nat.Lang.Eng.22,811–817.Gartner,n.d.Gartner’s2016HypeCycleforEmergingTechnologiesIdentifiesThreeKeyTrendsThatOrganizationsMustTracktoGainCompetitiveAdvantage[WWWDocument].URLhttp://www.gartner.com/newsroom/id/3412017(accessed11.24.16).Hiraoka,T.,Neubig,G.,Sakti,S.,Toda,T.,Nakamura,S.,2016.Constructionandanalysisofapersuasivedialoguecorpus,in:SituatedDialoginSpeech-BasedHuman-

ComputerInteraction.Springer,pp.125–138.LeCun,Y.,Bengio,Y.,Hinton,G.,2015.Deeplearning.Nature521,436–444.Manning,C.D.,Schütze,H.,1999.Foundationsofstatisticalnaturallanguageprocessing.

MITPress,Cambridge,Mass.;London.Parsons,S.,Wooldridge,M.,Amgoud,L.,2003.Propertiesandcomplexityofsomeformal

inter-agentdialogues.J.Log.Comput.13,347–376.Prakken,H.,2009.Modelsofpersuasiondialogue,in:ArgumentationinArtificialIntelligence.

Springer,pp.281–300.Rambow,O.,Bangalore,S.,Walker,M.,2001.Naturallanguagegenerationindialogsystems,in:ProceedingsoftheFirstInternationalConferenceonHumanLanguage

TechnologyResearch.AssociationforComputationalLinguistics,pp.1–4.Reiter,E.,Dale,R.,2006.Buildingnaturallanguagegenerationsystems,Digitallyprinted1stpbk.version.ed,Studiesinnaturallanguageprocessing.CambridgeUniversity

Press,Casmbridge,U.K.;NewYork.Walton,D.,Krabbe,E.C.,1995.Commitmentindialogue:Basicconceptsofinterpersonal

reasoning.SUNYpress.Ward,N.G.,DeVault,D.,2015.Tenchallengesinhighly-interactivedialogsystems,in:AAAI

SpringSymposiumonTurn-TakingandCoordinationinHuman-MachineInteraction.

51

SessionH

1.GettingtheBiggerPicture:ExploratorySearchandNarrativeCreationforMediaResearchintoDisruptiveEventsdr.BerberHagedoorn,UniversityofGroningen,ResearchCentreforMediaStudiesandJournalismdr.SabrinaSauer,UniversityofGroningen,ResearchCentreforMediaStudiesandJournalism

IntroductionDigitalHumanitiescentresonquestionsthatareraisedbyandansweredwithdigitaltoolsintheHumanities.Atthesametime,itinterrogatesthevalueandlimitationsofdigitalmethodsinHumanities’disciplines.WhileitisimportanttounderstandhowdigitaltechnologiescanoffernewvenuesforHumanitiesresearch,itisequallyessentialtounderstand–andtherefore,beingabletointerpret–‘theuserside’ofDigitalHumanities.Specifically,howHumanitiesresearchersappropriateanddomesticatesearchtoolstoaskandanswernewquestions,andapplydigitalmethods.PrevioususerresearchinDigitalHumanitiesconcentratesonassessing,forexample,howandwhyDigitalHumanitiesbenefitsfromstudiesintouserneedsandbehaviour(Warwick,2012),userrequirementresearch,aswellasparticipatorydesignresearch(Kemman&Kleppe,2014).

ExploratorysearchiscrucialforHumanitiesresearcherswhodrawuponmediamaterialsintheirresearch.Audio-visual,onlineanddigitalsourcesareinabundance,scatteredacrossdifferentplatforms,andchangingdailyinourcontemporarylandscape.Supportingresearchers'explorationsbecomesevenmoreimportantwhenscholarsstudymediaevents.A‘mediaevent’isaneventwithaspecificnarrativethatgivestheeventitsmeaning,andisincontemporarysocietiesincreasinglyrecognizedasnon-plannedordisruptive.Disruptivemediaevents,suchasthe‘sudden’riseofpopulistpoliticians,terroristattacksorenvironmentaldisasters,areshockingandunexpected,makingthemdifficulttointerpret.Thisleadstoproblemsformediaresearcherswhoanalysehownarrativesconstructdifferentpolitical,economicorculturalmeaningsaroundsuchevents.Previousresearcharguesthatmediaeventsshouldalwaysbeviewedinrelationtotheirwiderpoliticalandsocioculturalcontexts.Events,astheyunfoldinthemedia,maycorrespondtolong-termsocialphenomena,andthewayinwhichsucheventsare‘constructed’hasparticularconnotations(Jiménez-Martínez,2016).Specificactors(newscasters,governments,institutions)usemediaeventstobuildnarrativesinlinewiththeirownpolitical,economicorculturalpurposes.Mediaresearchersalsobuildnarrativesaroundevents;priorresearchunderlinestheimportanceofvisualizing,constructingandstoringofnarrativesduringtheinformationnavigationtocontextualizematerial(Akkeretal.,2011;Kruijt,2016;DeLeeuw,2012).Offeringmediaresearcherstheabilitytoexploreandcreatelucidnarrativesaboutmediaeventsthereforegreatlysupportstheirinterpretativework.

Thispaperproposestoaddtothisbodyofresearchbypresentingtheinsightsofacross-disciplinaryuserstudythatinvolves,broadlyspeaking,researchersstudyingaudio-visualmaterials,inaco-creativedesignprocess,settofine-tuneandfurtherdevelopadigitaltoolthatsupportsHumanities’researchthroughexploratorysearch.Thispaperfocusesonhowresearchers-inbothacademicaswellasprofessionalsettings-usedigitalsearchtechnologiesintheirdailyworkpracticestodiscoverandexploredigitalaudio-visualarchivalmaterial.Wefocusspecificallyonthreeusergroups,namely(1)MediaStudiesresearchers,(2)Humanitiesresearchersthatuseaudio-visualmaterialsasasourceand(3)Mediaprofessionals.Theseusergroupsaretheforeseenendusersofthetool,becausetheycreateaudiovisualnarrativesfortheirrespectiveworkpurposes.Weset-upco-creativedesignsessionswith74participants(group1:24;group2:40;group3:10)toobserveandreflectonthepracticesofmediaresearchersintermsofhowtheyinteractwithsearchtoolstoexplore,accessandretrievedigitizedaudio-visualmaterial,inordertointerpret,andinsomecases,re-usethismaterialinnewaudio-visualproductions.

52

MethodologyInouruserstudy,weemployauser-centreddesignmethodologytoevaluateandfine-tunetheexploratorysearchtoolDIVE+mediabrowser.Itoffersevents-drivenexplorationofdigitalheritagematerial,whereeventsareprominentbuildingblocksinthecreationofnarrativebackbones(DeBoeretal.,2015)andlinksavarietyofdifferentmediasourcesandcollections.DIVE+offersintuitiveexplorationofmediaeventsatdifferentlevelsofdetail.Itconnectsmediaobjects,subjects(“concepts”),events,andpersonstoaidintheformulationofresearchquestions,andtocontextualizetheformerintooverarchingnarrativesandtimelines.Ourmainresearchquestionthroughoutthecasestudyishowdoesexploratorysearchsupportmediaresearchersintheirstudyofhowmediaeventsareconstructedacrossdifferentmediaandinstilledwithspecificculturalorpoliticalmeanings?Tobeabletoanswerthisquestion,westudyhowmediaresearchersconstructnavigationpathsviaexploratorysearchand-bymeansofuserstudies-evaluatetheroleofnarrativesin(1)learningand(2)research.Inthisprocess,wecompareDIVE+tootheronlinesearchtools.

TheuserstudyobservesmediaresearchersastheyuseDIVE+toexploremediaevents,across3stages:(1)duringresearchquestionformulation(2)DIVE+use;and(3)comparativeuserevaluationsoftheDIVE+browser,comparedtootheronlinesearchtools.Thecollecteddata,consistingofbothqualitative–observationalandfocusgroup-data,aswellasloggingdatagatheredduringusertesting,providesinsightsabouthowmediaresearcherssearchandexploredigitalaudio-visualarchives.Weutilizeacasestudyapproach,whichcombinesgroundedtheory(thatfostersanunderstandingofhowresearchersinterpretandcreatenarratives)withusabilitymethodologies,suchasworktaskevaluations.This,firstofall,allowsustodrawconclusionsabouthowsearchtoolsanddigitaltechnologiesco-constructtheresearcher’sprofessionalpractice.Second,thedatahelpsusprobethequestionhowthe‘digitality’ofsearchandretrievalshapesthepracticeofmediaresearch,and,inextensionofthis,creativeprocesses.

Theresearchpresentedinthispapertakesaninterdisciplinaryapproach:itcombinesinsightsfromMediaStudies,aswellasfromInformationStudiesandScienceandTechnologyStudiesandintegratesideasaboutnarrativecreation,searchpractices,andoverarchingnotionsabouthowusersandtechnologiesco-constructmeaning.ThereforethepresentedresearchdoesnotfocusonhowDigitalHumanities’toolshaveanimpactonresearchers’practices,butratheranalyseshowresearchersmakeuseofsearchtools.Wesubsequently(1)drawconclusionsaboutscholarlypracticeandtheroleofsearchtechnologiesfordigitizedaudio-visualmaterialstherein;and(2)presentlessonslearnedonhowtooptimizethesearchtoolthatisused,inordertoimproveitsperformance.

AcknowledgmentsTheauthorswouldliketothanktheanonymousreviewersofthefirstversionofthisabstractfortheirhelpfulcommentsandsuggestions.ThisresearchwassupportedbytheNetherlandsInstituteforSoundandVision(partiallyinthecontextofBerberHagedoornasSoundandVisionResearcherinResidencein2016-7)andtheNetherlandsOrganisationforScientificResearch(NWO)underprojectnumberCI-14-25aspartoftheMediaNowproject.ThisresearchwasalsosupportedbyCLARIAH,CommonLabInfrastructureofArtsandHumanities,inthecontextoftheResearchPilotNarrativizingDisruption:Howexploratorysearchcansupportmediaresearcherstointerpret‘disruptive’mediaeventsaslucidnarratives(https://www.clariah.nl/projecten/research-pilots/nardis),CLARIAH-projectnumberCC17-13.Allcontentrepresentstheopinionoftheauthors,whichisnotnecessarilysharedorendorsedbytheirrespectiveemployersand/orsponsors.

BibliographyAkker,C.vanden,Legêne,S.,Erp,Mvan,Aroyo,L.,Segers,R.Meij,L.vander,Ossenbruggen,J.van,Schreiber,G.Wielinga,B.,Oomen,J.,Jacobs,G.(2011).DigitalHermeneutics:AgoraandtheOnline

53

UnderstandingofCulturalHeritageCategoriesandSubjectDescriptors.WebSci11,Koblenz,Germany.

Boer,V.de,Oomen,J.,Inel,O.,Aroyo,L.,Staveren,E.van,Helmich,W.,&Beurs,D.de.(2015).DIVEintotheEvent-BasedBrowsingofLinkedHistoricalMedia.WebSemantics:Science,ServicesandAgentsontheWorldWideWeb,35(3),152–158.

DeLeeuw,S.(2012).EuropeanTelevisionHistoryOnline:HistoryandChallenges.VIEWJournalofEuropeanTelevisionHistoryandCulture,1(1),3–11.

Jiménez-Martínez,C.(2016).Integrativedisruption:therescueofthe33Chileanminersasalivemediaevent.In:Fox,A.,(ed.)GlobalPerspectivesonMediaEventsinContemporarySociety.IGIPublishers,Hershey,USA,60-77.

Katz,E.,andLiebes,T.(2007).‘NoMorePeace!’:HowDisaster,TerrorandWarHaveUpstagedMediaEvents.InternationalJournalofCommunication1,157-166.

Kemman,M,andKleppe,M.(2014)."UserRequired?OntheValueofUserResearchintheDigitalHumanities."SelectedPapersfromtheCLARIN2014Conference,October24-25,2014,Soesterberg,TheNetherlands.No.116.LinköpingUniversityElectronicPress.

Kruijt,M.(2016).SupportingExploratorySearchwithFeatures,Visualizations,andInterfaceDesign:ATheoreticalFramework.UniversityofAmsterdam.

Warwick,C.(2012)."StudyingusersinDigitalHumanities."DigitalHumanitiesinpractice,1-21.

2.BiasintheanalysisofmultilinguallegislativespeechLauraHollink,AstridvanAggelen,JaccovanOssenbruggenCentrumWiskunde&Informatica,Amsterdam,TheNetherlandsl.hollink@cwi.nl

InthispaperweinvestigatetheapplicationofnaturallanguageprocessingtoolstothemultilingualproceedingsoftheEuropeanParliament.Thisworkispartofastudyinwhichweexplore(1)howsubcorporaindifferentlanguagesmayleadtodifferentconclusionsaboutthepoliticallandscape,(2)howtodeterminewhatapotentiallanguage-relatedbiasoriginatesfrom,and(3)towhatextentwecanlimitorevenpreventanunwantedlanguage-bias.

Parliamentaryspeechhasbeenusedtostudypartypositions[1,2,3],issueselection[4,5,6,7]andthelevelofdisagreementwithinadebate[8].Manystudieshavemovedawayfrommanualcoding(whichisdoneine.g.[4,5])andinsteadpositionspeechtextsononeormore(latent)dimensionsinstatisticalmodelsbasedonrelativewordfrequencies[1,2,3,6,7,8],oftenincombinationwithbasicpre-processingstepssuchasstemmingandstopping.Thesemodelsandtools,whileimperativetoanalysebiggerdatasets,addasourceoferrorsandbias.Onesourceofpotentialbiascomesfromthefactthattheusedtoolsperformdifferentlyondifferentlanguages.ConsideringthattheaforementionedstudieswerecarriedoutontheEuropean,Irish,US,Spanish,NorwegianandSwedishlegislatures,thecomparabilityandreproducibilityoftheresultsfordifferentlanguagesisunclear.

IntheEuropeanParliament,thespokenaccountsappearin(currently)24languages.Here,theuncertaintystemsnotonlyfromtoolsthatperformdifferentlyoneachlanguage,butalsofromthefactthattheavailabilityofdataineachlanguagevaries.MembersofParliament(MEPs)arefreetospeakinanyoftheofficiallanguages.Speechesaresometimestranslatedinto(some)otherlanguages,dependingonprioritizationwiththeEP,specifictranslation-requestsofthemembersand

54

(supposedly)budgetaryconstraints.Thus,weareleftwith24subcorporaofvaryingsize,oneperlanguage,includingbothoriginalandtranslatedspeech.

Theneedtostudylanguage-effectsinthiscontexthasbeenrecognisedbefore.Prokschetal.[3]reportedamodestlanguage-effect42intheirstudyofpartypositionsintheEuropeanParliament,whichtheyascribedtotranslationratherthanactualdifferencesinpositiontakingbetweenthreecountries.However,whiletheoveralleffectmaybesmall,wearguethatspecificlocaleffectscouldstillleadtosignificantbiasesintheresults.Forexample,FrenchtranslationsofGermantextsseemedtosystematicallygetamoreneutralpositionthantheoriginaltext,whiletheoppositewasnotthecase.ItisimportanttorealisethattheproceedingsoftheEuropeanParliamentarenotonlyacorpusforresearchers.ResidentsoftheEuropeanUnionhavearighttoaccessthesedocumentsinordertomakeinformedvotesandtoholdtheMEPsaccountable43.ThisrightwouldbecompromisedwhenFrenchspeakingcitizenscometodifferentconclusionsaboutwhathasbeendiscussedthanGermanspeakingcitizens.Ouraimistogaininsightintohowworkingwithsubcorporaindifferentlanguagesmayleadtodifferentconclusionsaboutthepoliticallandscape.

Inthisstudy,weusethedataprovidedbytheTalkofEuropeproject[9],inwhichspeechtranscriptsandallavailabletranslationswerecrawledfromthewebsiteoftheEP44,andtranslatedintothesemanticwebformatRDF.Dataisavailablefrom1999to2015andcontainsaround300Kspeechesin22Kdebates.Weapplytopicdetectiontosixlanguage-specificsubcorporaoftheproceedingsoftheEuropeanParliament:German,English,French,Italian,SpanishandDutch.WeusetheJEXsoftwaredevelopedbytheEuropeanCommission'sJointResearchCentre,whichlearnsmulti-labelcategorisationrulesfromdocumentsthatwerepreviouslymanuallyindexedusingthemultilingualEurovocthesaurus[10].Theadvantageofusingthistoolover,forinstance,widelyusedtopicmodelingapproachessuchasLDA[11],isthattheoutputisdirectlycomparableacrosslanguages:thetoolusesasinglethesaurus,Eurovoc,toclassifydocumentsineachlanguage,andconceptsintheEurovocthesaurushavelabelsinalllanguages.Inalaterstageofthestudy,weplantoincludeothertopicdetectiontechniques,andwidenthescopetoallEUlanguages.

Over2000distinctEurovoctopicsweredetectedinthesixsubcorpora.Thefrequencydistributionsovertopicsvaryperlanguage.Figure1visualisesthedistancebetweenlanguages.WeuseKullback–Leiblerdivergence[12],anon-symmetricmeasureforthedifferencebetweentwodistributions.Ahigherscore,visualizedasareddercolour,signifiesagreaterdistance.Forexample,ItalianandFrencharerelativelyclose,whileSpanishandGermanarefarapart.Therearefourhypothesesastowhatthesedifferencesoriginatefrom:

1. MEPsspeakingonelanguageindeedspeakaboutdifferenttopicsthantheircolleagueswhospeakinanotherlanguage.

2. Thereisabiasintheselectionofspeechesthatarebeingtranslated.3. Thereisabiasinhowcertaintopicsaretranslated,e.g.translatorsusemoreambiguousor

polarizedlanguage.4. Thetopicdetectiontoolworksdifferentlyononelanguagethanonanother.

42 A correlation coefficient ranging between 0.86 and 0.93 when comparing party positions derived from texts in German, French and English [3]. 43 Regulation (EC) No 1049/2001 of the European Parliament and of the Council 44 http: //www.europarl.europa.eu

55

Figure1:Heatmapofdifferencesbetweentopicdistributionsinlanguages.

Inourpresentation,wewilltacklethisissuefromtwosides.Firstly,wecomparedifferentsubsetsoftopicsbasedonwhetherornotspeechesweretranslated,andtowhichlanguages,toexplorehypotheses1and2.Then,tostudyhypothesis4(andtoalesserextenthypothesis3)wezoomintotopicsthatappeartobeparticularlydistinctivebetweenlanguages,andcomparethetopicannotationstowhatwasactuallysaidinthedebates.Asanexampleofthelattermethod,Figure2showsthedifferencesinfrequencyofthedetectedtopics“nuclearweapons”and“nuclearenergy”.Remarkably,onlyFrenchandItalianspeechesseemtobeaboutnuclearweapons,whileEnglishandSpanishspeechesareoftenaboutnuclearenergy.Asacomparison,Figure3plotstheoccurrencesofthephrases“nuclearweapons”and“nuclearenergy”(andtranslationsthereof)intherawspeechtexts.Here,partoftheeffectisgone,suggestinganerrorofthetopicannotationsoftware,whilepartoftheeffectremains-Germantextsindeedseemtotalklessaboutbothnuclearweaponsandnuclearenergy.

Withthisstudy,weaimtocontributetothediscussionaboutsystematicmethodsfortoolcriticismandsourcecriticisminacomplexmultilingualcontextliketheEuropeanParliament.

Figure2:Frequencyoftopicsindebates.

56

Figure3:Frequencyofphrasesindebatetexts.

References[1]Benoit,Kenneth,andMichaelLaverNd.EstimatingIrishPartyPositionsUsingComputerWordscoring:The2002Elections.IrishPoliticalStudiesVol.18,Iss.1,2003.

[2]Laver,MichaelJ.,KennethR.Benoit,andJohnGarry.ExtractingPolicyPositionsfromPoliticalTextsUsingWordsasData.AmericanPoliticalScienceReview97(2):311–31,2003.

[3]Proksch,S.-O.andSlapin,J.B.PositionTakinginEuropeanParliamentSpeeches,BritishJournalofPoliticalScience,40(3),pp.587–611,2010.

[4]HannaBäck,MarcDebus&JochenMüller.WhoTakestheParliamentaryFloor?TheRoleofGenderinSpeech-makingintheSwedishRiksdag.PoliticalResearchQuarterly67:504–518,2014.

[5]MarkusBaumann.ConstituencyDemandsandLimitedSupplies:ComparingPersonalIssueEmphasesinCo-sponsorshipofBillsandLegislativeSpeech.ScandinavianPoliticalStudies,Vol.39,issue4,pp.366-387,2016.

[6]Pardos-Prado,Sergi,andIñakiSagarzazu.ThePoliticalConditioningofSubjectiveEconomicEvaluations:TheRoleofPartyDiscourse.BritishJournalofPoliticalScience46(4),799-823,2016.

[7]KevinM.Quinn,BurtL.Monroe,MichaelColaresi,MichaelH.Crespin,DragomirR.Radev.Anautomatedmethodoftopic-codinglegislativespeechovertimewithapplicationtothe105th-108thUSSenate.MidwestPoliticalScienceAssociationMeeting.2006.

[8]BenjaminE.Lauderdale,AlexanderHerzog.MeasuringPoliticalPositionsfromLegislativeSpeech.PolitAnal;24(3):374-394,2016.

[9]AstridvanAggelen,LauraHollink,MaxKemman,MartijnKleppe,andHenriBeunders.Thedebatesoftheeuropeanparliamentaslinkedopendata.SemanticWeb,8(2):271–281,2017.

[10]PouliquenBruno,SteinbergerRalf,CameliaIgnat.AutomaticAnnotationofMultilingualTextCollectionswithaConceptualThesaurus.InProceedingsoftheWorkshopOntologiesandInformationExtractionattheSummerSchoolTheSemanticWebandLanguageTechnology-ItsPotentialandPracticalities(EUROLAN'2003).Bucharest,Romania,28July-8August2003.

[11]Blei,DavidM.,Ng,AndrewY.,Jordan,MichaelI.Lafferty,John,ed.LatentDirichletAllocation.JournalofMachineLearningResearch.3(4–5):pp.993–1022,2003.

[12]Kullback,S.,Leibler,R.A.Oninformationandsufficiency.AnnalsofMathematicalStatistics.22(1):79–86,1951.

57

SessionI

CulturalHeritageDataforResearch:AEuropeanaResearchPanelNienkevanSchaverbeke,HeadofEuropeanaCollectionsMarjoleindeVos,EuropeanaDataPartnerServicesDr.AgiatisBenardou,DigitalCurationUnit,R.C."Athena",InstitutefortheManagementofInformationSystems

Panelmembers:NienkevanSchaverbeke-HeadofEuropeanaCollections-sessionChair

Dr.AgiatisBenardou-DigitalCurationUnit,R.C."Athena",InstitutefortheManagementofInformationSystems-ResearcherNeedsManagement

1MemberofourBoardfromaresearchnetwork(http://research.europeana.eu/blogpost/europeana-research-advisory-board-established)-TBC

Marjolein de Vos - Europeana, Digitised Medieval Manuscripts Maps - Data Quality

Dr. Caroline Ardrey - University of Birmingham - Europeana Grants Winner

Dr. Dana Mustata - University of Groningen. Academic in a digital humanities related field, outsider to Europeana - TBC

CulturalHeritageDataforResearch:AEuropeanaResearchPanel

InthispanelmembersoftheEuropeanaResearchAdvisoryBoard,EuropeanaDataPartnerServices,oneoftheResearchGrantswinnersand,importantly,anacademicexternaltoEuropeanawillpresentanddiscussthevalueofEurope’sculturalheritagedataforresearchinthehumanitiesandsocialsciences,andthewaysinwhichEuropeanaResearchispromotingandenablingitsuse.Thepanelispartofalargerdiscussiongoingonaboutmakingculturalheritageavailableforresearchandtheopportunities,challenges,andconsiderationsinvolvedinthis.

Inshort,thepanelwillfocusonthefollowingpoints:

• EuropeanaResearch-Objectives&Achievements• Relationshiptootherresearchnetworksandinfrastructures(DARIAH,CLARIN,EHRI,Parthenos

etc)• Researcherneedsandcommunityengagement• Dataaggregationandqualityimprovement• UsingEuropeanadatainresearch

EuropeanaResearchwasestablishedasalinkbetweenculturalheritageinstitutionsandresearchers.WerecognizethatundertakingresearchonthedigitisedcontentofEurope’sgalleries,museums,libraries,andarchiveshashugepotentialthatshouldbeexploited.Butissueswithregardstolicensing,interoperability,andaccesscanoftenimpedethere-useofthatdatainresearch.EuropeanaResearchaimstohelpwiththeseissues,liberatingculturalheritageformeaningfulacademicre-use.WeworkonaseriesofactivitiestoenhanceandincreasetheuseofEuropeanadataforresearch,anddevelopthecontent,capacity,andimpactofEuropeana,byfosteringcollaborationsbetweenEuropeanaandtheculturalheritageandresearchsector,aswellasliaisingwithotherdigitalresearchinfrastructuresandnetworks.

EuropeanaResearchisgovernedbyanAdvisoryBoardcomprisingofrenowneddigitalhumanitiesexpertswhohelpusgrowandstrengthenservicesforDHresearchers.Inthefirstsectionofthepanel

58

wewillhighlightourmainobjectivesandgreatestachievements,suchastheResearchGrantsProgramme.

Followingthisintroduction,oneofourpanelmembers,arepresentativefromaresearchnetworkthatwecollaboratewithandanacademicwhoisnotconnectedtoEuropeanawillexpandandelaborateonthisrelationshipbetweentheirnetworkandEuropeana,andthevaluethereof.

Sinceourtargetaudienceareresearchcommunitiesinthehumanitiesandthesocialsciences,itisvitaltounderstandtheirheterogeneousneedsvisàvistheirinformationbehaviourandtheirinteractionwithdigitalcontent.Inthispartofthepanel,wewillgointodetailabouthowwecometounderstandtheneedsofourusers,howtocatertothem,andhowwecontinuouslydevelopandfurtherthisunderstandingandadapttotherequirements.

Withmorethan54millionobjectsfrom40countriesandinavarietyoflanguages,theEuropeanaportalcontainsasubstantialamountofdatatomanage.TheDataPartnerServicesteamdoesnotonlyworkcontinuouslyoningestingnewdatafortheportal,butalsoinveststimeintoevaluatingandimprovingexistingdata.Wemakedataqualityplanswithaggregatorsanddirectproviderstofurtherfindabilityandgranularityoftherecordsintheportal.Furthermore,thereisaspecialassignedDataQualityCommitteethatworksonrefiningandexpandingtheEuropeanaDataModel.Duringthispartofthepanel,wewilltalkabouttheworkthatisbeingdonefromthemetadataperspectiveondataquality,theimportanceofunderstandingresearchersneedsforthis,andthevalueofculturalheritagedataforresearch.

In2016theEuropeanaResearchGrantsProgrammewaslaunched,inwhichDigitalHumanitiesresearcherswereencouragedtoapplywithaprojectwhereEuropeanadatawouldbecentralinansweringtheirresearchquestion.Theunprecedentedsuccessofthiscallforproposalsshowsushowimportantitistomakeheritagedataavailable;thevarietyinideasshowingustherangeofpotentialofwhatisintheportal.TofurtherillustrateandstrengthenthepointsthatwillbementionedinthepaneloneofthewinnersoftheEuropeanaResearchGrantsProgramme2016willdiscussherprojectasashowcaseofEuropeanadatare-useforresearchandthepotentialofferedtoresearchcommunitiesthroughopenaccess,clearlicensing,andadequatedigitaltools.

Afterprovidingshortexplanationsonthepointsmentionedinthisproposal,wewillencouragediscussionfromthepanelandtheaudienceonthesematters.ThesecouldleadtovaluableinsightsforEuropeanaResearchinthewiderdiscussionofopeningupculturalheritagefortheresearchcommunity.WealsowelcomesuggestionsforEuropeanaResearch’sfutureactivitiesandimprovingservices.

59

SessionJ

Textmininginpractice:Adiscussiononuser-appliedtextminingtechniquesinhistoricalresearch.Language:English,Duration:60minutes

Inthispanelwelookattheapplicationoftextminingtechniquesinhistoricalresearch.Inrecentyears,textmininghascomewithinreachofanyvaguelycomputer-literatescholar.Thegrowingavailabilityoflargedigitaltextcollectionsleadstogrowingabilitiestoapplydigitalandquantitativeapproachestothestudyofhistoricaltexts.CommonlyusedlanguagesandstatisticalenvironmentssuchasPythonandR,offerapplicablesoftwaresolutionsforfree.Thishasliberatedhistoriansandotherhumanitiesscholarsfromtheshacklesoftime-consumingandoftenexpensiveprogrammingworkbyhiredexternalprogrammers.

Techniquesliketopicmodelling,wordembeddings,sentimentandemotionminingareincreasinglybeingusedinthehumanitiesandsocialsciences.Historians,politicalscientists,sociologistsandothersnowhavetheopportunitytouseadvancedtextminingtechniquesonlargedatasetsfromtheirdesktops.Althoughstillmostlyexperimental,thepotentialgainsnowappearenormous.

Itisoftenclaimedthatthisenablesresearcherstostudyconceptsanddevelopmentsinlongitudinal,systematicandquantitativewaysthatwereimpossiblebefore.Butwhatdothesedigitaltechniquesreallyaddtomoretraditionalapproaches?Howcantraditionalapproachesandinnovativedigitalmethodologiesbepairedinameaningfulandenrichingmanner?Doesquantitativetextanalysisprimarilyprovidecontexttoexistingknowledge,orisitaradicaldeparturefromwhatwentbefore?

Webelievethatquantitativetextanalysiscouldwellprovetobeadramatic,agenda-settingchange.Asyet,however,severalproblemsneedtobeaddressed.First,mostofthetechniquesinvolvedarelessthanadecadeold,researchersarescatteredamongdepartmentsanddisciplines,andthereisasyetnooverarchingdiscussionaboutbestpractices,pitfallsandproblemswithmethodology,orevenasharedplatformtodiscussbasictechnicalproblemshasbeenestablished.Thereisadistinctneedforabetterexchangeofinformationandsharingofexperience,bothinsideandoutsidetheworldofdigitalhumanities.

Asecondproblemthatneedstobeaddressedistheslowadvancementofnewtechniquesinpublishedresearchoutsidethenarrowdigitalhumanitiesworld.Anecdotalevidencesuggeststhatleadingjournalsinthehumanities,politicalandsocialsciencesarenotparticularlykeenonpapersusingtext-miningmethodologies.Thisunwillingnessisatleastinpartinspiredbytheproblemmentionedabove.Therearefewestablishednormstoevaluatethevalidityofnewtechniques.Ontheotherhand,conservatismmayalsoplayarole.

Athirdproblem,whichalsoimpactspublicationopportunities,isthatthebulkofpublicationssingtext-miningtechniquesarestillprimarilyabouttextmining.Thecorporaused,andtheresearchquestionsasked,inmanycasesstillseemperipheraltotechnologicalglitz.Itisofcourseusefultoinvestigatethetechnicalopportunitiesthatnewtechniqueshavetooffer,butforthewiderdisseminationofthesetechniquesitwillprobablyprovenecessarytotackleexistingresearchproblemsinvariousfieldsandshowthatthisparticularfieldofthedigitalhumanitieshassomethingtooffertothestudyofhistory.

Weproposetodiscusstheseproblemswithamixedpanelofexperiencedtextminingresearchersfromdifferent(sub-)disciplines.Ourcentralgoalistodiscusspracticesforvalidationoftechniquesandmethodologies.Wewanttocomeupwithaproposalforintegratingtextminingtechniquesin

60

historicalresearchpracticeinameaningful,substantive,andcontributiveway,andpavethewayforthemoveoftextminingintocommonresearchpractice,beyondthecurrenthype.

Chair:

• Dr.RalfFutselaar(EUR/NIOD)

Panelmembers:

• Dr.JessedeDoes(IvdNT)• Prof.dr.YasutoNakano(KGU,Japan)• Dr.MartijnSchoonvelde(VU)• MilanvanLange,MA(NIOD/UU)

61

SessionK

MappingHistoricalLeiden:TheCreationofaDigitalAtlasOrganiser: ArievanSteensel,UniversityofGroningen(a.van.steensel@rug.nl)

Panellist: JaapEvertAbrahamse,CulturalHeritageAgency(j.abrahamse@cultureelerfgoed.nl)

Speakers: EllenGehring,ErfgoedLeidenenOmstreken(e.gehring@erfgoedleiden.nl)RoosvanOosten,LeidenUniversity(r.m.r.van.oosten@arch.leidenuniv.nl)ArievanSteensel,UniversityofGroningen(a.van.steensel@rug.nl)

Thedigitalrevolutionhasrenderedmapsevenmoreusefulforallkindsofpurposes,suchasnavigating,locatingservices,orgeotaggingactivities.Moreover,agrowingarrayofdigitaltechnologies,applicationsandplatformsoffernewresearchopportunitiesforscholarsinthehumanities,forwhommapsarebothasourceaboutthepastandatooltostudythepast,andtheyallowheritageorganisationstounlock,visualiseandanalysediversehistoricalandarchaeologicaldataandobjectsinnovativelyonthebasisofgeographicalrelations.Itisbeyonddoubtthatthespatialencodingofobjectsandtextualinformationoffersanewframeworkofanalysisandenablesustobetterexploretheexperiencesandmeaningsofspaceandplaceinthepast.45Tools,mapsanddataareoftenreadilyavailableforthestudyofthemorerecentpast,butthisislessthecaseforthepre-modernperiod.Ingeneral,itrequiresaconsiderabletimeinvestmenttodevelophistoricalGeoInformationSystems(GIS)andonlinemappingplatforms.Theseefforts,however,payoffinthelongrun,sincetheseapplicationsopenawholerangeofnewresearchopportunitiesandnovelwaystopresentandvisualiseresearchresults.46

ThispanelpresentsandcriticallydiscussesthefirstresultsoftheMappingHistoricalLeidenproject,whichaimstodevelopadynamicdigitalatlasofthepre-moderncityofLeiden.Thefirstphaseofthisproject–acollaborationbetweenhistorians,archaeologistsandLeiden’sheritageorganisation(ErfgoedLeidenenOmstreken)–wasrecentlycompleted(thefirstversionoftheatlasisaccessibleonlineathlk.erfgoedleiden.nl,inDutch).Themappingtoolstillrequiresfurthertechnicalimprovementstomakeiteasiertouploadandanalyseadditionaldata,andmoregeocodeddatasetswillbecomeavailableinthecomingmonths.Thetoolenablesuserstolink,identifyandsearchdataacrossplaceandtime,ratherthanprovidingstaticsnapshotsoftheurbanspaceinthepast.

Apartfromitstechnicalresourcesandaspects,themappingtool’sresearchpossibilitieswillbedemonstratedbytwocasestudies:oneontherelationbetweenspaceandwealthinsixteenth-centuryLeiden,andtheotheronthecity’ssanitaryinfrastructureintheearlymodernperiod.Together,thesepresentationswillofferanopportunitytodiscussthepossibilitiesofdigitalmappingtoolsandthevalueofcollaborationbetweenscholarsandspecialistsfromtheheritagesectorinthe45 See, for example, Anne Kelly Knowles and Amy Hillier, eds., Placing History: How Maps, Spatial Data, and

GIS Are Changing Historical Scholarship (Redlands, Calif: ESRI Press, 2008); David J. Bodenhamer, John Corrigan, and Trevor M. Harris, eds., The Spatial Humanities: GIS and the Future of Humanities Scholarship (Bloomington: Indiana University Press, 2010); Alexander von Lünen and Charles Travis, eds., History and GIS: Epistemologies, Considerations and Reflections (Dordrecht: Springer, 2013); Ian N. Gregory and A. Geddes, eds., Toward Spatial Humanities: Historical GIS and Spatial History (Bloomington: Indiana University Press, 2014).

46 See, for example, Onno Boonstra and Gerrit Bloothooft, eds., Tijd en ruimte: nieuwe toepassingen van GIS in de alfawetenschappen (Utrecht: Matrijs, 2009); a theme issue of PCA. Post Classical Archaeologies 2 (2012) on GIS for archaeologists and historians; Hélène Noizet, Boris Bove, and Laurent Jacques Costa, eds., Paris de parcelles en pixels: analyse géomatique de l’espace parisien médiéval et moderne (Saint-Denis: Presses Universitaires de Vincennes, 2013); Nicholas Terpstra and Colin Rose, eds., Mapping Space, Sense, and Movement in Florence: Historical GIS and the Early Modern City (New York: Routledge, 2016).

62

fieldofdigitalhumanities,butalsothepracticalandtechnicalchallengesofhistoricalGISandpotentialpitfallsofpartnerships.

Presentation1(EllenGehring):OneSizeFitsAll?DevelopingaMulti-FunctionalDigitalMappingTool

Buildingacutting-edgemapapplicationforscholars,heritagemanagersandthegeneralpublicisamajorchallengeintechnicalandmethodologicalterms.MappinghistoricalLeidenhasovercomesomeofthebarriers,andthispresentationfocusesonthetechnicalaspectsofthemappingtool.Crucialfortheproject,forexample,wasthedevelopmentofaso-calledhistoricalgeocoder,whichallowstolinkdifferentgeometricformsandtodefinetheirrelations.Apartfromtechnicalities,itwillbefurthershownhowverydiversedatacanbestandardisedthroughanadvanceduseofdatabasestoensuremeaningfulspatialanalyses.Thecodeofthemappingtoolisavailableasopensource,andsinceitisunnecessaryforotherstoreinventthewheel,itwillbefinallyexplainedhowthetoolcanbeutilisedinothercontexts.

Presentation2(ArievanSteensel):WealthandPlaceinLateMedievalLeiden:aParcel-BasedAnalysis

Leidenhasauniquesource,theso-calledBookofWaterwaysandStreets,whichcontainsaboutahundredcadastralmapsthatweredrawnforfiscalpurposesinthesecondhalfofthesixteenthcentury.Inthispresentation,itwillbefirstdemonstratedhowthesemapswereturnedintoageoreferencedbasemap.Secondly,itwillbeshownhowthissixteenth-centurypre-cadastralmapcanbeusedtoanalysetherelationbetweenwealthandspaceinthecityofLeidenataparcellevel,resultinginamorerefinedunderstandingofthecomplexrelationshipbetweenoccupation,wealthandplace,whichchallengescommonassumptionsaboutthesocialgeographyofpremoderncitiesandtowns.ThemainpointtobemadeisthathistoricalGISmakesitpossibletoreinterpretsourcesthatinformusabouttheimportanceofspaceandlocalityinstructuringhumaninteractions,aswellastopresentthesedatainanattractiveandaccessibleway.

Presentation3(RoosvanOosten):Wassanitaryinfrastructureaprivilege?

Scholarshavegenerallyacceptedthatsanitaryinfrastructurewastheprivilegeofthewealthyfew.However,withtheuncoveringofhundredsofcesspitsandwatersupplyfacilitiesinthetownofLeideninthepastdecades,thisassumptioncannowbetestedfordifferenttimeperiods.Inordertoinvestigatethequestionofaccessibilitytosanitaryarrangements,thearchaeologicallydocumentedsanitarystructuresmustbeplottedandfinancialvaluationattachedtothem.Socio-economicdatabasedontaxregistersareavailablefromabout1600,whichwillbemostusefulinthisventure.Furthermore,thankstoHISGIS,wealsohaveaccesstosocio-economicdatafrom1832,whichwillallowustoestablishalong-termperspectiveonthedevelopmentofLeiden’ssanitaryinfrastructure.

63

SessionL

1.WastheFerguutwrittenbyoneortwoauthors?TheoMeder,GosseBouma,HannahMars,TrudyHavinga(RUG)

In1989,WillemKuiperpublishedhisthesisontheMiddleDutchromanceFerguutinwhichheconcludedthattheromanceiswrittenbytwoauthors.Kuipershoweddifferencesinwritingstyleatalllevels(rhyme,syntax,vocabulary,spelling)andconcludedthiswasnocoincidence.AccordingtoKuiper,thefirstauthortranslatedtheOldFrenchFergusbyGuillaumeleClerc,approximatelyuntilvs.2592,whereafterthesecondauthorcompletedthesecondhalfwithoutFrenchexample-inthespiritofFergus,butinhisownwords.Nowhereinthetextthereisaclearreferencetoadualauthorship(cf.theRomanvanWalewein),butthestylebreakhalfwaythroughthetextwasneverthelesssomethingthatascholarlikeEelcoVerwijsnoticedaswell.OtherresearchersquestionedordeniedthefindingthattheFerguutwaswrittenbytwoauthors,likeW.J.A.Jonckbloet,andaftertheappearanceofKuiper’sthesisalsoBartBesamuscaandMikeKestemont.WiththethesisofKestemontweenteredtheeraofe-humanities.WhereasKuiperhadtodohisquantitativestyleanalysisbyhand,todaytheprogramminglanguageRincollaborationwiththestylometricprogramStylocanperformthejobmuchfaster,morethoroughandcompletelyunbiased(Stylodoesn’tknoworcarewhattextsitgetspresentedandwhattheoutcomemaybe,whereashumanresearchersmaybeinfluencedbypreconceivedideas).Initsanalysis,thesoftwarenotonlytakesallthedifferencesintoaccount(likeKuiperdid),butallthesimilaritiesaswell,evenatlevelswherewritersandreadersarehardlyawareof,suchaswordorderandtheuseoffunctionwords.Atthisleveleveryauthorleaveshismostpersonalfingerprintbehind.

SomewhatcautiousKestemontfinallyassumesthatFerguutwaswrittenbyoneauthor,whoasatranslatorpulledopenanotherregisterthanasafreewriter.BecauseFerguutplaysnoprominentroleintheinvestigationofKestemont,wewanttozoominmorefocusedonthisparticularromance.Thecentralquestion:istheFerguutwrittenbyoneortwoauthors?

InordertoinvestigatewhetherthetwopartsoftheFerguutarestylisticallysimilar,wecomparethesimilaritybetweenthetwopartsoftheFerguutwiththesimilaritybetweentwoorthreepartsofother‘randomly’selectedMiddleDutchtextsfromaroundthesameperiodandregion,mostofthemdealingwithcourtlylife.Seventextsweknowtohavebeenwrittenbyasingleauthor,aneighthtextweknowthatitiswrittenbytwoauthors.Weinvolvethefollowingtextsintheanalysis:Ferguut,Beatrijs,DeBorchgravinnevanVergi,Lanceloetenhethertmetdewittevoet,VandenvosReynaerdebyWillem(theAernoutmentionedintheprefaceistheauthorofanOld-FrenchRenarttranche),threepoems(adeliberatemisfit)byWillemvanHildegaersberch(VandenSerpent,VandenPaepdiesijnBaeckgestolenwert,VandenWijnvaet)andDeRomanvanWalewein–forthisexperimentwelookedatthecompletetexts,andcutupthelongertextsintotwoorthreeevenpiecesincasetherewerenocleartextualdivisions.Alltheeditionshadtobethoroughlycleanedandconvertedtotxtformat.

OnlyDeRomanvanWaleweinismostcertainlywrittenbytwoauthors:toabouttwo-thirdsofthetotalnumberofverses,thestoryiswrittenbyPenninc(vs.1–7.880),thelastpartiswrittenbyPieterVostaert(vs.7.881–11.198).Fortheanalysiswethereforecutthistextintothreepieces,sothatthethirdpartiswrittenbyVostaert.Asanexperiment,wecutVandenVosReynaerdeinthreeevenpieces.Theotherlongertextswecutintotwoevenpieces.Ferguutiscutatthelocationwherethestyletransitionshouldoccur,sotheplacewherethesecondauthortookoverfromthefirst,accordingtoKuiper.AllthesetextsandfragmentsarethenpresentedtoStyloforanalysis.Inthisway,wecancomparethesimilaritybetweenthetwopartsoftheFerguutwiththesimilaritybetweenthetwo/threepartsofanumberoftextsthatweknowarewrittenbyasingleauthor,andthethreepartsofatextwhichweknowthatitwaswrittenbytwoauthors.Ifthestylometricanalysis

64

showsthatthetwopartsoftheFerguutlookasmuchalikeastwopartsofthetextsofoneauthor,andresembleeachothermorethanthefirsttwoandthethirdpartoftheWalewein,thatindicatesthattheFerguutwasalsowrittenbyoneauthor.IftheanalysisshowsthatthetwopartsoftheFerguutlooklessalikethanthetwopartsofthetextsofoneauthor,andjustasmuch,orlessthanthethreepartsoftheWaleweintogether,thismayindicatethattheFerguutiswrittenbytwoauthors.

Inabovegraph,basedonwordtri-grams,Styloshowswhatmanyalreadyexpected:allnovelsandwritersareclusteringneatlytogether(N.B.:thesamehappenswithwordbi-gramsandwithcharacterbi-gramsandtri-grams.Asonecanseeinthegraph,inhindsightthefulltextsneednothavebeenincluded,butwewantedtobeverysurewewouldnotencounteranynastysurprises).ThethreepartsoftheReynaertarestylisticallymostalike,thetwopartsoftheBeatrijsmostlyresembleeachother,Vergipart1looksmostlikeVergipart2etc.AlsothetwopartsofFerguutstylisticallymatcheachotherratherthananyothertext.Eventheexemplum,thejestandthesongofHildegaersberhsharethestyleofoneandthesameauthor.OnlyWaleweinexhibitstheexpecteddeviation:Part3wandersoffandpositionsitselfsomewherebetweenFerguutandReynaert,ratherthannexttotheotherpartsoftheWalewein.ThisgraphofthestylometricanalysisjustifiesnootherconclusionthanthattheWaleweiniswrittenbytwoauthors,butFerguutbyoneauthor.Furthermore,itshowsthatthethreeArthurianromancesandReynaertclustertogether,andthecourtly,religiousandmoralistictextsstandtogetherseparately.Weexperimentedwithallkindsofdifferentparameters,buttheresults(practically)remainedthesame.Rollingdeltaresultedintonothingconclusive.OnlycuttinguptheFerguutinevensmallerpiecesandclusteringthemresultedinthestyledifferencesthatKuiperdiscovered,basedonsmallpiecesofcomparison,butdeprivedofalong-termsimilarityoverviewoverthetextmaterial.

65

Reservationscanbemadeforthetechniquesused:stylometricsworksbetterwithlongertextsthanshorterones,stylometricsworksbetteronStandardModernDutchthanonMiddleDutchtextswithitsunstablespelling,stylometricsworksbetteronMiddelDutchrhymepairs,alltheeditionsshouldbeeitherdiplomaticorcriticalorinanyotherwaynormalized/standardizedetcetera.

Still,allthingsconsidered,basedonmultiplestylometricexamination,StyloseesmoresimilaritiesthandifferencesbetweenthetwopartsoftheFerguut,bothonthelevelofwordorderandtheuseoffunctionwords–traitsthatareconsideredtoberatherpersonalforeachauthor.TheFerguutismostprobablywrittenbyoneauthor.Inwritingthesecondhalfofthetext,theauthormay–alsostylistically–beinspiredbythefairytaleknownasATU314ATheShepherdandtheThreeGiants,thatwaspresentintheOld-FrenchFergusaswell.WhatwealreadyknewaboutWaleweinisconfirmed:thelastpartoftheromanceshowsmorestylisticdifferencesthansimilaritiescomparedtootherromancesliketheReynaertandevenFerguut,andthereforeWaleweinwaswrittenbytwoauthors.Finally,itisgoodtoknownowthatoneauthorcouldhaveseveralstylisticregisters:oneforwhenhetranslated,andoneforwhenhefreelyretoldastory.

ReferencesB.Besamusca:‘DeVlaamseopdrachtgeversvanMiddelnederlandseliteratuur:eenliterair-historischprobleem’,in:Denieuwetaalgids84(1991),p.150-162.

A.Th.Bouwman:ReinaertenRenart.HetdiereneposVandenvosReynaerdevergelekenmetdeOudfranseRomandeRenart.2parts,Amsterdam1991.

W.Bisschop&E.Verwijs(eds.):WillemvanHildegaersberch:Gedichten.’s-Gravenhage1870.

K.H.vanDalen-Oskam:DestijlvanR.Amsterdam2013.

T.Dekker,J.vanderKooi&T.Meder:VanAladdintotZwaankleefaan.Lexiconvansprookjes:ontstaan,ontwikkeling,variaties.Nijmegen1997.

M.Draak(ed.):Lanceloetenhethertmetdewittevoet.6thimprint,DenHaag1979.

M.Eder,J.Rybicki&M.Kestemont:‘StylometrywithR:apackageforcomputationalanalyses’,in:TheRJournal(2016),asdownload:https://journal.r-project.org/archive/accepted/eder-rybicki-kestemont.pdf

G.A.vanEs(ed.):DejeestevanWaleweinenhetschaakbord.Zwolle1957.

J.D.Janssens,R.vanDaele&V.Uyttersprot(eds.):VandenVosReynaerde.HetComburgsehandschrift.2ndimprint,Leuven1998.

W.J.A.Jonckbloet(ed.):Beatrijs.EenesprokeuitdeXIIIeeuw.DenHaag1841.

W.J.A.Jonckbloet:GeschiedenisderNederlandscheletterkunde.4thimprint,Groningen1888,part1.

M.Kestemont:Hetgewichtvandeauteur.StylometrischeauteursherkenninginMiddelnederlandseliteratuur.Gent2013.

P.deKeyser(ed.):DeBorchgravinnevanVergi.Antwerpen1943.

W.Kuiper:Dieridderemettenwittenscilde.Oorsprong,overleveringenauteurschapvandeMiddelnederlandseFerguut,gevolgddooreendiplomatischeeditieeneendiplomatischglossarium.Amsterdam1989.

E.Rombauts,N.dePaepe&M.J.M.deHaan(eds.):Ferguut.DenHaag1982.

66

E.Stamatatos:‘Asurveyofmodernauthorshipattributionmethods’,in:JournaloftheAssociationforInformationScienceandTechnology60(2008)3,p.538–556.

H.-J.Uther:TheTypesofInternationalFolktales.AClassificationandBibliography.3volumes.Helsinki2004.

2.StylometryappliedtobookpreferencesPeterBoot,peter.boot@huygens.knaw.nl

IntroductionOneoftheoldestandmostactivefieldsinDigitalHumanitiesisauthorshipattribution.Ithasbeenshownmanytimesthatwritershaveacharacteristicstylethatcanbeusedtotellthemapart(e.g.Burrows,2002).Itisalsowellknownthatwordusagecanbeusedtopredictpersonalitycharacteristics(e.g.Noecker,Ryan,&Juola,2013).Personalitycharacteristicsinturnarerelatedtopreferencesindifferentartforms(e.g.Cantador,Fernández-Tobías,Bellogín,Kosinski,&Stillwell,2013).Thissuggeststhat,asonewouldhope,thestylisticdifferenceswherebywetellauthorsapart(suchasdifferencesinfunctionwordusage)arenotjustmeaninglesspreferencesforonefunctionwordoveranother,butarerelatedtoartisticpreference,inawaythatisstilltobeclarified.

Thispaper,continuingearlierwork(Boot,2014),triestocontributetothatclarification,inthatitwillremovethemiddleterm(thepersonalitycharacteristics)andshowthatthereisadirectrelationbetweenthewordsthatpeopleuseandtheirpreferencesinart,inthiscase,forbooks.ThewritersthatIstudyherearethewritersofbookreviews,notbooks.Inthefirstsection,Iwillusebookreviewsandratingsfrombookdiscussionsitesandshowcorrelationsbetweenwordusageandbookratings.Inthesecondsection,Iwilltakeanexploratoryapproachandcreateaclusteringofreviewersbywordusage.Forthetwoclusters,Iwillthenlookattheirpreferredwordusage,aswellasthewordusageinthebookdescriptionsoftheirpreferredbooks.

CorrelationsbetweenwordusageandratingsThedatathatthepaperuseswerecollectedfromanumberofDutchbookdiscussionsites.Thesesitesincludehebban.nl,lezerstippenlezers.be,bol.comandthenowdefunctsiteswatleesjij.nuanddizzie.nl.

Thecorrelationswerecomputedasfollows:Iselectedreviewsfromuserswhohadwrittenatleast100000characters,excludingsomeuserswithmultipleaccounts.Icomputedrelativewordfrequenciesintheirreviews,andnormalizedtheresults(centeraroundzeroanddividebythestandarddeviation).Inordertoremovewordswiththematiclinkstobooks(murder,war,castle,love)IlimitedthecomputationtowordsdefinedasfunctionwordsintheDutchLIWC2007dictionary(Boot,Zijlstra,&Geenen,2017,inpress).ForthesameusersIretrievedthebookratingsandcreatedamatrixofusersbyrating,excludingbooksthatwereratedonlyonce.Icomputedthebiascorrecteddistancecorrelation(amultivariategeneralizationofthecorrelationcoefficient,seeSzékely&Rizzo,2013)betweenthetwomatrices,andrepeatedthatcomputationforreviewsinallgenres,inliteratureandintheliterarythriller.TheresultsaregiveninthefirstrowofTable1.

Tobeabsolutelysurethatnocontent-aspectsofthereviewswerereflectedinthewordusage,IrepeatedthecomputationusingPart-of-speech-tags.ThetextsweretaggedusingTreetaggerandinsteadoftherelativewordfrequenciesIusedrelativefrequenciesofPOSbigrams.Theresultsaregiveninthesecondrowofthetable.

67

Table 1

Correlationswithp-values Allgenres189reviewers166reviews(avg.)

Literature41reviewers126reviews(avg.)

Literarythriller32reviewers88reviews(avg.)

functionwords(200)vs.ratings 0.20(0.000) 0.16(0.000) 0.41(0.000)

POSbigrams(100)vs.ratings 0.16(0.000) 0.10(0.002) 0.22(0.000)

Itishardtointerpretthesecorrelationsizes,butitisclearthatthereareverysignificantcorrelationsbetweenfunctionwordusageandbookratings.ThefactthatthesecorrelationspersistevenwhenlookingatPOSbigramsshowsthattherelationistosomeextentbasedpurelyonlinguisticstyle,notoncontent.WhysequencesofPOS-tagsshouldberelatedtoliterarypreferenceisanintriguingquestionthatthispaperwillnotsolve.

ExploratoryanalysisTogetafeelforwhatthiscorrelationmightmeanintermsofrealreviewsandratings,Icreatedaclusteringbasedonfunctionwordusageforagroupofreviewers.Iremovedafewoutliersandwasleftwithtwoclusters,cluster1containing20reviewersandcluster2containing11.

Ithenlookedattheirreviewsandpreferredbooks.Asampleofreviewsfromcluster1showedtheirinformal,directandverypersonalwriting,characteristicsthatweremuchlessprominentincluster2.Thisimpressionisconfirmedwhenlookingatcontrastivekeywordsinthereviewsofbothclusters.The20keywordswiththelargesteffectsize(Gabrielatos&Marchi,2011)forbothclustersareshownintable2.Itisclearcluster1prefersthefirstperson,cluster2hasmoreinterestinwriting.

Table2

Cluster Preferredreviewwords

1 thought(wasoftheopinion),very,because,completely,me,actually,therefore,read(pastpart.),beautiful,afterall,had,have(1stpers.sing.),am,I,very,all,good,otherwise,yet,again

2 writer(fem.),writer,novel,reader,years,under,know,these,characters,one,between,gives,second,the,them,of,until,end,in,who

Turningtotheratings,whilethereweremanybooksthatwereratedsignificantlyhigherbyoneofthegroups,thepreferenceswerehardtounderstandintermsoftaste.Ratingssummedbygenredidn’tshowaveryclearpictureeither.Itwasonlywhenlookingatcontrastivewordusageinthe(publisher-provided)bookdescriptionsforbooksreadbyeitherclusterthataclearerpictureemerged.

Table3

Cluster Keywordsinpreferredbookdescriptions

1 thriller,investigation,police,murdered,murder,case,body,someone,further,secret,above,know,very,sits,very,disappeared,within,nothing,appears,found,become,part,truth,books,there,something,else

2 inwhich,without,about,parents,family,city,bigstories,last,exist,us,we,writer,history,love,country,tells,century,novel,Netherlands,war

68

Hereitbecomesclearthatcluster1prefersthrillersandpolicenovels,whilecluster2hasaless-focussedinterestinfamily,writingandthecountry.Itisworthwhiletorepeatthattheseclustersofcontentwordsresultfromclusteringreviewersonthebasisoffunctionwords.

ConclusionTakentogether,thecorrelationsandtheexploratoryanalysisshowthatthereisarelationbetweenthefunctionwordsthatpeopleuseandtheirpreferencesforbooks.Thisrelationstillholdsatthelevelofpart-of-speechtags.Thisclearlyshowsthatthewordusagethathelpstellauthorsapartistosomeextentrelatedtoartisticpreference.Apossibleexplanationwouldbethatthereviewersunconsciouslyimitatethebookstheyreadintheiruseoffunctionwords.Thatseemsunlikely,amongotherreasonsbecausetheeffectisalsovisiblewhenwejustlookatthereviewsinasinglegenre(secondandthirdcolumnoftable1).Themorelikelyexplanationisthatfunctionwordusageisatleastinpartdeterminedbyartisticpreferenceandrelatedpersonalitycharacteristics.The‘fingerprint’metaphorthatisoftenusedinthiscontext,withitssuggestionofanessentiallyrandomidentifier,unlikelytoberelatedtoartisticpreference,mustthereforebeconsideredasinappropriate.

LiteratureBoot,P.(2014).Dimensionsofliteraryappreciation.Worduseandratingsonabookdiscussionsite.DigitalHumanities2014.Retrievedfromhttp://dharchive.org/paper/DH2014/Paper-825.xml

Boot,P.,Zijlstra,H.,&Geenen,R.(2017,inpress).TheDutchtranslationoftheLinguisticInquiryandWordCount(LIWC)2007dictionary.DutchJournalofAppliedLinguistics,6(1).

Burrows,J.(2002).‘Delta’:Ameasureofstylisticdifferenceandaguidetolikelyauthorship.LiteraryandLinguisticComputing,17(3),267-287.

Cantador,I.,Fernández-Tobías,I.,Bellogín,A.,Kosinski,M.,&Stillwell,D.(2013).RelatingPersonalityTypeswithUserPreferencesinMultipleEntertainmentDomains.Proceedingsofthe1stWorkshoponEmotionsandPersonalityinPersonalizedServices(EMPIRE2013),atthe21stConferenceonUserModeling,AdaptationandPersonalization(UMAP2013).

Gabrielatos,C.,&Marchi,A.(2011).Keyness:Matchingmetricstodefinitions.Theoretical-methodologicalchallengesincorpusapproachestodiscoursestudies-andsomewaysofaddressingthem.

Noecker,J.,Ryan,M.,&Juola,P.(2013).Psychologicalprofilingthroughtextualanalysis.LiteraryandLinguisticComputing,28(3),382-387.

Székely,G.J.,&Rizzo,M.L.(2013).Thedistancecorrelationt-testofindependenceinhighdimension.JournalofMultivariateAnalysis,117,193-213.

3.Corpusenrichmentfor17thcenturyDutch:apilotstudyFeikeDietz1,MarjovanKoppen2,IreneKramer1andMarijnSchraagen21InstituteforCulturalInquiry,2UtrechtInstituteofLinguisticsOTSUtrechtUniversity

1 IntroductionTheDutchlanguageinthe17thcenturywasamixtureoffadinglinguisticpropertiesfromtheprecedinglanguagephase,MiddleDutch,andupcomingnewwaystoconstructwordsandsentences.Withintheselanguagedynamicsweobserveatypeoflanguagevariationthathasrarely

69

beenaddressedbefore:variationwithinindividuallanguageusers(intra-authorvariation).Theaimofthecurrentprojectistodescribeandanalyseindetailthelinguisticandliterary/rhetoricalcontextsinwhichintra-authorvariationoccurs.Asaprerequisite,thedataneedstobeannotatedlinguistically,usingpartofspeech(POS)informationand(morpho-)syntacticstructure,andsociolinguistically,describingvariousfactorsthatinfluencelanguageuse.

InapilotprojectwerestrictourresearchtothelettersofthefamousDutchauthorandpoliticianP.C.Hooft,writtenbetween1600and1638.Thiscollectionisrelativelylarge(approximately800letters,∼300.000words)andcontainssociolinguisticvariationintypeofcorrespondentandtypeofletter.Thecorpuscanbeused,i.a.,tostudythelossofnegativeconcordinDutch,whichisobservedinHooft’slettersfromthisperiod(Paardekooper,2016).

AsastartingpointforobtainingPOStags,theAdelheidtaggerforMiddleDutch(vanHalterenandRem,2013)isused.BecausethetaggeristrainedonMiddleDutch,theresultsarenothighlyaccuratefor17thcenturytexts.Therefore,acorrectionprocedureforPOS-tagsandlemmasisperformedbyhumanannotators.Additionally,theannotatorsprovidethenecessarysociolinguisticinformationaboutlettersandcorrespondents.Whenannotationiscompleted,adetailedandsystematicanalysisoflinguisticphenomenawillbecomefeasible.

2 ApproachThesourcedataisavailableinadiplomaticedition(VanTricht,1976).WeusethiseditionafterseparatingHooftsoriginalseventeenthcenturytextsfromthemetadata(pagenumbers,footnotes,annotations).

Figure1:Exampleofthenewlydevelopedannotationtool

2.1 Part-of-Speechtagging

AcollaborationwiththeNederlabproject(Brugmanetal.,2016)isestablishedtoincreaseavailabilityoftheenrichedcorpus,byincludingthePOStaggingandsociolinguisticmetadataintheNederlabresearchinfrastructure.TheintegrationnecessitatesconversionoftheCRMtagsetusedbyAdelheidtotheCGNtagsetusedbyNederlab.Additionally,thetaggingneedstoberepresentedintotheFoLiA

70

XMLformatforlinguisticannotation(vanGompelandReynaert,2013).TheCRMtagsetismoreextensivethanCGN,notablyintheuseofsurfaceformfeaturessuchasform-e(wordsendingin-e).Surfaceformfeaturesarerelatedtocasemarking,whichisanimportantaspectinthestudyoflinguisticvariationin17thcenturyDutch.Therefore,wedecidedtokeepthesefeaturesinthemappingtoCGNtags(seeFigure1).

2.2 Sociolinguistictagging

Akeyhypothesisinintra-authorvariationistheinfluenceofsociologicalfactorsonlinguisticchoices.Toevaluatethishypothesissystematically,alllettersarebeingannotatedwiththefollowinginformation:

• Goal:expressthanks,askadvice,recommend,invite• Topic:politics,religion,personalaffairs,administration• Forindividualcorrespondents:

o name,gender,yearofbirthanddeatho statusofcorrespondentasliteraryauthoro relationtoHooft:familymembers,literaryfriends,politicians,etc.

• Forgroupcorrespondents:o nameo domain:government,financialorlegalinstitutions,civilassociations

• Letterstructure:greeting,introduction,narratio,closingformulas

2.3 Annotationprocess

Atoolhasbeendeveloped(seeFigure1)toperformPOSandsociolinguisticannotationinanefficientway.Apoolofannotatorsisavailableforthetask,whichwillperformpartlyoverlappingannotationstoallowforagreementmeasurements.Theannotationprocessiscurrentlyongoing.Aprotocolhasbeendevelopedtoguidethepost-correctionprocess(seeFigure2forexamples).

Figure2:Annotationguidelineexamples

3 AnalysisInrelatedwork(Kramer,2016)theuseofnegationbyHoofthasbeenstudiedmanually.KramershowsthatHooftusesmostlysinglenegationindifferentsyntacticalenvironments(subclauses,inversion,mainclauses,localnegation,V1(verb-initial)sentences).Additionally,thenegationparticlenietcanbeusedasalternativeforthenounnothing.Furthermore,Hooftusesbipartitenegationinalmostallsyntacticalenvironmentsaswell(allexceptinV1).InKramer’sresearch,not

Comparative and superlative adjectives are annotated individually. This

rule is also applied for irregular adverbs, such as veel, meer, meest and

wel/goed, beter, best. As an example, minste in the sentence below (1634,

Van Tricht p. 527) receives a separate lemma minst:

. . . waer aen het minste deel niet en zal hebben, Me Jo↵r

e

.

Nominatives and non-nominatives are di↵erentiated. We chose not to de-

nominate dative, genitive, accusative and ablative. Instead, the surface

form, related to case marking, is annotated. An example from 1633 (Van

Tricht p. 437):

Veel gelux

N(ev,non-nom,form-s)

met . . . den

LID(bep,form-n)

jongen

N(ev,non-

nom,form-n)

Arnout, dien god geeve ’t lof des

LID(bep,form-s)

geenen nae te

ijvren, daer hij den naem af draeght.

71

oneenvironmentseemedtoparticularlyaskfortheuseofbipartitenegation.Thisresearch,however,encompassedonly107letters.Thefullyannotatedcorpuswillallowamorequantitativeanalysis,aswellasalargerrangeandhigherlevelofdetailoflinguisticphenomena.

NobelsandRutten(2014)notetheinfluenceofgenderandsocialclassonnegation(p.41):‘whilesinglenegationspreadfromthenorthtothesouth,italsoturnedintoasocialvariant,astheupperranksinsocietyandmaleletterwritersseemedtobequickertopickupontheincomingvariantthanthelowerranksandfemaleletterwriters’.NobelsandRutten(2014)alsonote(p.43)thattraditionsinletterwritingaffectlinguisticdevelopment:‘fixedformulaewerememorizedasawhole(orcopied)bywritersfromanysocialbackground.Thesefixedformulaeoccurincertainpartsoftheletters,mostlyinthebeginningandtheending’.Withthecurrentannotationeffort,thistypeofobservationscanbestudiedsystematically.

ReferencesBrugman,H.,Reynaert,M.,vanderSijs,N.,vanStipriaan,R.,TjongKimSang,E.,andvandenBosch,A.(2016).Nederlab:TowardsasingleportalandresearchenvironmentfordiachronicDutchtextcorpora.InProceedingsofLREC2016.

vanGompel,M.andReynaert,M.(2013).Folia:Apracticalxmlformatforlinguisticannotation-adescriptiveandcomparativestudy.ComputationalLinguisticsintheNetherlandsJournal,3:63–81.

vanHalteren,H.andRem,M.(2013).Dealingwithorthographicvariationinatagger-lemmatizerforfourteenthcenturyDutchcharters.LanguageResourcesandEvaluation,47(4):1233–1259.

Kramer,I.(2016).Variatieinnegatie,eensyntactischenretorischeanalysevanhetgebruikvanenkeleentweeledigenegatieindebrievenvanP.C.Hooftvan1633tot1638aanJoostBaekenTesselschadeRoemersdochterVisser.BAthesis,UniversiteitUtrecht.

Nobels,J.andRutten,G.(2014).Languagenormsandlanguageuseinseventeenth-centuryDutch:negationandthegenitive.InRutten,G.,editor,Normsandusageinlanguagehistory,1600-1900.Asociolinguisticandcomparativeperspective.,pages21–48.JohnBenjaminsPublishingCompany.

Paardekooper,P.(2016).Bloeienondergangvanonbeperktne/en,vooraldatbijniet-woorden.Neerlandistiek.nl.

vanTricht,H.(1976).DebriefwisselingvanPieterCorneliszoonHooft.TjeenkWillink/Noorduijn.