Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Questions mostfavorable LeastfavorableHoweasywasittoreadtheschemasummary? Freebase Diverse Graph Experts YPS09 Concise Tight
Howmuchunderstandingofthedatacanyougainfromit? Graph Freebase YPS09 Diverse Concise Tight Experts
Howhelpfulwasitinassistingyoutounderstandthedata? Graph Freebase YPS09 Diverse Experts Concise Tight
Isitmissingimportantinformation? YPS09 Concise Experts Graph Tight Freebase Diverse
SteepFlag-DownCost
GeneratingPreviewTablesforEntityGraphsNingYan1#,SonaHasani*,Abolfazl Asudeh *,Chengkai Li *
HuaweiU.S.R&DCenter# UniversityofTexasatArlington*1TheworkwasdonewhileatUTA. InnovativeDatabaseandInformationSystemsResearch(IDIR)LaboratoryInnovativeDatabaseandInformationSystemsResearch(IDIR)Laboratory
UserStudy
NeedforaQuickOverview
ExperimentResults
PreviewTables
OptimalPreviewDiscovery
AttributeScoring
Ultra-heterogeneousEntityGraphs
q Freebase:1.9billiontriples
qDBpedia :3billiontriples
q YAGO:120milliontriples
q LinkedOpenData:52billiontriples
Approach 1: Schema Graph
Approach 2: Schema Summaryq Schemasummarizationinrelational
database[YangPVLDB09,YangPVLDB11]
q XMLsummarization[YuVLDB06]q Graphsummarization
[TianSIGMOD08,ZhangICDE10]
FILM Actor Genres
4 6 5
FILMACTOR Actor AwardWinners
2 6 2
Keyattributescoring
4× (6+5) = 44
2× (6+2) = 16
Score of thePreview
60+
Findthepreviewwithhighestscorethatsatisfies
SizeconstraintNumberofkeyattributesKNumberofnon-key attributesN
Distancebetweentwopreviewtablesd
0
0.2
0.4
0.6
0.8
1
1 6 11 16
Book
0
0.2
0.4
0.6
0.8
1
1 3 5 7 9 11 13 15 17 19
Music
0
0.2
0.4
0.6
0.8
1
1 3 5 7 9 11 13 15 17 19
TV
0
0.2
0.4
0.6
0.8
1
1 3 5 7 9 11 13 15 17 19
People
Optimal p@kYPS09[Yang PVLDB09]CoverageRandom Walk
KeyAttribute Non-keyAttribute
Domain YPS09 Coverage Random Walk Coverage Entropy
books 0.4 0.55 0.43 0.43 0.43
film -0.01 0.48 0.25 0.35 0.35
music 0.37 0.33 0.46 0.42 0.41
TV 0.37 0.69 0.65 0.47 0.47
people 0.36 0.31 0.29 0.43 0.43
Tight Diverse Freebase Experts YPS09 SchemaGraph
Concise z=1.59p=0.0559
z=−2.28p=0.0113
z=0.49p=0.3121
z=−0.13p=0.4483
z=0.36p=0.3594
z=−0.43p=0.3336
Tight z=−3.48p=0.0003
z=−1.12p=0.1314
z=−1.69p=0.0455
z=−1.282p=0.0999
z=−1.93p=0.0268
Diverse z=2.57p=0.0051
z=2.10p=0.0179
z=2.60p=0.0047
z=1.70p=0.0446
Freebase z=−0.61p=0.2709
z=−0.15p=0.4404
z=−0.87p=0.1922
Experts z=0.49p=0.3121
z=−0.29p=0.3859
YPS09 z=−0.77p=0.2206
http://linkeddata.org/
Large and complex graphs capturing millions ofentities and billions of relationships betweenentities.
Applications:search,recommendationsystems,businessintelligence,healthinformatics,factchecking
EntityGraph
FILM FILMSETDECORATOR FILMEDITOR FILM FILMDIRECTOR FILMPRODUCER
PRODUCTION COMPANY FILM FILMWRITER FILM
TooManyPreviews.WhichOnetoChoose?
A B
AggregateScoring
Coverage-basedmethod:Coverage(Genres)=5
Entropy-basedmethod:Entropy(Genres)=(2/3)log(3/2)+(1/3)log(3/1)=0.28
Coverage-basedmethod:Coverage(FILM)=3
Randomwalk-basedmethod:Stationarydistributionofarandomwalkprocessdefinedovertheschemagraph
Tight
Diverse
Concise
FILM Performances Genres DirectedBy
FILM DIRECTOR FilmsDirected
FILM PRODUCER Films Produced
FILMFESTIVAL Location Focus
FILM COMPANY Films
FILM CHARACTER Portrayed inFilm
Tight
Diverse
Algorithms
WeassumeallKkeyattributesareorderedarbitrarily.optimalconcisepreview(k,n,X)isthebestof:optimalconcisepreview(k,n,X-1)optimalconcisepreview(k-1,n-1,X-1)� X-th Key-attributewith1non-keyattributeoptimalconcisepreview(k-1,n-2,X-1)� X-th Key-attributewith2non-keyattributes……optimalconcisepreview(k-1,k-1,X-1)� X-th Key-attributewith(n-k+1)non-key
attributes
Concisepreview,dynamicprogrammingalgorithm
Tight/Diversepreview,Apriori propertyalgorithm
NP-hard
Diverse(Gs,k,k,2,0)
Tight(Gs,k,k,1,0)
1.Construct2-cliquesbyenumeratingallkeyattributepairs2.fori=3tokgenerate i-cliques from(i-1)-cliques based on Aprioriproperty3.find the k-clique with highestscore,return asoptimalpreview
Systemssortedbyaverageuserexperiencescoresacrossfivedomains
Pairwisecomparisonsofconversionrates,domain=“music”, α=0.1
Keyattributescoring(precision-at-k)
Comparisonbetweenrankingsbyourapproachandthecrowd,PearsonCorrelationCoefficient(PCC)
dist(Ti,Tj)≤d
dist(Ti,Tj)≥d
Schemagraphof“Film”domaininFreebaseEntitygraph:2Mentities,18MedgesSchemagraph:63entitytypes,136edges
Schema Graph itself can be too complex.
Timetakenonexistencetests,domain=“music”
FILM Director Genres
MeninBlack BarrySonnenfeld {ActionFilm,ScienceFiction}
MeninBlackII BarrySonnenfeld {ActionFilm,ScienceFiction}
I,Robot _ {ActionFilm}
Domain Coverage Entropy
books 0.8 0.786
film 0.2 0.25
music 0.528 0.589
TV 0.622 0.379
people 0.708 0.606
MeanReciprocalRank(MRR)ofNon-keyattributes
Domains:film,books,music,TV,peopleHand-craftedpreviewtables10PhDstudents inDatabaseresearchgroupIndividuallyandasagroup$20giftcard
Existence/experiencequestionsq Schemagraphq Concisepreviewq Tightpreviewq Diversepreviewq Freebasegroundtruthq YPS09q Hand-crafted preview tables84Master’s andPhD students indatabase area$15gift card
0
0.2
0.4
0.6
0.8
1
1 6 11 16
Film
Non-keyattributescoring