15
Comparing Vocabulary Term Recommenda5ons using Associa5on Rules and Learning To Rank A User Study Johann Schaible , Pedro Szekely, and Ansgar Scherp at ESWC 2016

Comparing Vocabulary Term Recommendations using Association Rules and Learning To Rank: A User Study

Embed Size (px)

Citation preview

ComparingVocabularyTermRecommenda5onsusingAssocia5onRulesandLearningToRank

AUserStudyJohannSchaible,PedroSzekely,andAnsgarScherpatESWC2016

ProblemStatement:ReuseVocabularyTerms!

2

§  WhenmodelingLOD,itisaccustomedtoreusevocabularyterms(àclassesandproper5es)

§  However,itisachallengingtask

IncreasingneedforVocabularyTermRecommenda5ons

3

§  Popularityofacandidate(i.e.vocabularyterm)}  NumberofLODsourcesusingacandidate}  NumberofLODsourcesusingcandidate’svocabulary}  Numberoftotaloccurrencesofacandidate

§  Candidatefromanalreadyusedvocabulary§  Collabora5vefiltering:

}  Howdidothersusearecommenda5oncandidate

TermRecommenda7onsbasedon…

ExampleofSchema-LevelPaRerns(SLPs):

slp = ({swrc:Publication}, {dc:creator}, {foaf:Person})

Resourcesoftypeswrc:PublicaConareconnectedtoresourcesoftypefoaf:Personviathepropertydc:creator

4

§  RulescalculatedonthesetoffrequentSLPs:

U7lizedApproaches(1/2):Associa7onRules(AR)

SLP-feature

SLPLOD = SPLscomputedfromdatasetsontheLODcloudSLPLOD = {slp1, slp2, ..., slpn}

slpi = ({swrc:Publication}, {dc:creator}, {foaf:Person})si = ({swrc:Publication}, {}, {foaf:Person})

si ! (slpi � si) := dc:creator

Recommenda5on:

Whenusingasetofgivenvocabularyterms,whichfurtherclassesandproper5esdidothersalsouse?

5

§  Familyofsupervisedmachinelearningalgorithmsbasedondatawithrelevanceannota5ons

§  StateoftheartinIRtocomputeageneralizedrankingmodeloveragivensetoffeatures

§  Rankingmodelisderivedbyobservingcorrela5onsbetweenfeaturevaluesandcandidaterelevance

§  Features:}  (i)numberofdatasetsusingavocabularyterm,(ii)numberoftotaloccurrencesofavocabularyterm

}  Termfromanalreadyusedvocabulary}  SLP-feature

U7lizedApproaches(2/2):LearningToRank(L2R)

WhyaUserStudy?

6

§  Inofflineevalua5ons}  Thereisnogoldstandarddata}  Noobserva5onsofusersandtheirbehavior

§  InA/B-Tests(onlineevalua5on)}  Nofullfunc5oningsystemyet}  Notenoughuserstomakemeaningfulresults

Studyinacontrolledlabenvironmentwithinvitedpar5cipants

§  La5n-squarewithinsubjectdesignstudy}  Eachpar5cipantaskedtomodelthreedifferentdatasetsasLOD(max.6minuteseach)with(a)LearningToRankbasedrecommenda5ons,(b)Associa5onRulebasedrecommenda5ons,(c)Norecommenda5ons

§  Par5cipantsfirsttrainonexampledata}  Avoidscarry-overeffects}  Par5cipantsgetusedtothesystem

7

UserStudy-Procedure

§  Task:FinishthemodelforthedatafromMusic,Museum,andProductOffersdomainwithKarma1

§  Replaceowl:Thingandrdfs:labelwithbeEerfi`ngclasses,proper5esrespec5vely

§  Defineobjectproper5esspecifyingthat}  amusicianisamemberofaband}  amusicianrecordedanalbum}  amusicianhasaWikipediapage

8

UserStudy-ModelingTasks

1)hRp://usc-isi-i2.github.io/karma/

?

§  Thepar5cipants’effort}  TaskComple5on5me(max.6min.toavoid5redness)}  Recommenda5onacceptancerate(numberoftermschosenfromrecommenda5ons)

§  Thequalityoftheresul5ngdata}  Numberofvocabularytermsthatwerealsousedbyfivedifferentdatamodelingexperts

§  LevelofsaCsfacConwithbothrecommenders}  5-pointLikertscalera5ngARandL2R}  RankingofL2R,AR,andusingnorecommenda5ons

9

UserStudy-Measurements

§  20par5cipants(5female)}  18inacademia,2inbothacademiaandindustry}  2masterstudents,14researchassociates,3postdocs,1professor

}  8recruitedfromUSC,12recruitedfromGESIS

§  Knowledgeandexperience}  Karma:7hightoexpertknowledge,13noneatall}  LOD:averageexperienceof3years}  Self-ratedexperience(5-pointLikert):M=2.8,SD=1.6}  Taskknowledge(5-pointLikert):M=2.1,SD=1.1

10

Results(1/3)-Par7cipants

11

Results(2/3)TimeComple5on

Recommenda5onAcceptance

DataQuality

12

Results(3/3)

§  Generallevelofsa5sfac5on(5-pointLikertscale)

§  ComparingARtoL2Rdirectly

M = 3.00, SD = 1.1

M = 4.23, SD = 0.7

LearningToRank:

Associa5onRules:

ARmuchworse ARmuchbeRer

M = 4.56, SD = 0.4

Ra5ng:

Ranking: Allpar5cipantsrankedARhigherthanL2R

§  ARfiltersoutinappropriateterms,L2Rranksthematalowerposi5on

§  Addi5onalfeaturesletL2Rrankpopularbutinappropriatetermshigher

§  WithL2Rbasedrecommenda5ons,itwasobservedthatpar5cipants}  overlookedrelevantrecommenda5oninthetop-10list}  feltuncertain,suchthattheysearchedlongerandokenusedstringbasedsearch

13

DiscussionofResults

§  ARbasedrecommenda5ons,i.e.,collabora5vefiltering,performsbeRerin}  Time,effort,quality,generalsa5sfac5on

§  WithL2Runsure,withARmoresuretherecommenda5onsarecorrectandcommonlyused

14

Conclusion

UsingAR-basedrecommenda5ons,par5cipantswithliRleLODanddomainexper5sewereabletoproducehighqualityLODclosetotheexperts

àEasiervocabularyreusetodecreaseheterogeneityindatarepresenta5on

ThankYou!

15

1.  Copyoftheques5onnaireandrawresultsoftheuserstudy:hRp://dx.doi.org/10.7802/1206

2.  Accompanyingmaterialandmodelingresults:hRps://github.com/WanjaSchaible/termpicker_karmaeval_material

Acknowledgements:Genera5ngthegoldstandard:LauraHollink,BenjaminZapilko,RubenVerborgh,JérômeEuzenat,andOscarCorchoThankstothepar5cipantsoftheuserstudy