Transcript
Page 1: sig-sona-poster-final - Rangerranger.uta.edu/.../2016/tabview-sigmod2016-yan-poster.pdfTitle sig-sona-poster-final Created Date 6/27/2016 5:40:13 PM

Questions mostfavorable LeastfavorableHoweasywasittoreadtheschemasummary? Freebase Diverse Graph Experts YPS09 Concise Tight

Howmuchunderstandingofthedatacanyougainfromit? Graph Freebase YPS09 Diverse Concise Tight Experts

Howhelpfulwasitinassistingyoutounderstandthedata? Graph Freebase YPS09 Diverse Experts Concise Tight

Isitmissingimportantinformation? YPS09 Concise Experts Graph Tight Freebase Diverse

SteepFlag-DownCost

GeneratingPreviewTablesforEntityGraphsNingYan1#,SonaHasani*,Abolfazl Asudeh *,Chengkai Li *

HuaweiU.S.R&DCenter# UniversityofTexasatArlington*1TheworkwasdonewhileatUTA. InnovativeDatabaseandInformationSystemsResearch(IDIR)LaboratoryInnovativeDatabaseandInformationSystemsResearch(IDIR)Laboratory

UserStudy

NeedforaQuickOverview

ExperimentResults

PreviewTables

OptimalPreviewDiscovery

AttributeScoring

Ultra-heterogeneousEntityGraphs

q Freebase:1.9billiontriples

qDBpedia :3billiontriples

q YAGO:120milliontriples

q LinkedOpenData:52billiontriples

Approach 1: Schema Graph

Approach 2: Schema Summaryq Schemasummarizationinrelational

database[YangPVLDB09,YangPVLDB11]

q XMLsummarization[YuVLDB06]q Graphsummarization

[TianSIGMOD08,ZhangICDE10]

FILM Actor Genres

4 6 5

FILMACTOR Actor AwardWinners

2 6 2

Keyattributescoring

4× (6+5) = 44

2× (6+2) = 16

Score of thePreview

60+

Findthepreviewwithhighestscorethatsatisfies

SizeconstraintNumberofkeyattributesKNumberofnon-key attributesN

Distancebetweentwopreviewtablesd

0

0.2

0.4

0.6

0.8

1

1 6 11 16

Book

0

0.2

0.4

0.6

0.8

1

1 3 5 7 9 11 13 15 17 19

Music

0

0.2

0.4

0.6

0.8

1

1 3 5 7 9 11 13 15 17 19

TV

0

0.2

0.4

0.6

0.8

1

1 3 5 7 9 11 13 15 17 19

People

Optimal p@kYPS09[Yang PVLDB09]CoverageRandom Walk

KeyAttribute Non-keyAttribute

Domain YPS09 Coverage Random Walk Coverage Entropy

books 0.4 0.55 0.43 0.43 0.43

film -0.01 0.48 0.25 0.35 0.35

music 0.37 0.33 0.46 0.42 0.41

TV 0.37 0.69 0.65 0.47 0.47

people 0.36 0.31 0.29 0.43 0.43

Tight Diverse Freebase Experts YPS09 SchemaGraph

Concise z=1.59p=0.0559

z=−2.28p=0.0113

z=0.49p=0.3121

z=−0.13p=0.4483

z=0.36p=0.3594

z=−0.43p=0.3336

Tight z=−3.48p=0.0003

z=−1.12p=0.1314

z=−1.69p=0.0455

z=−1.282p=0.0999

z=−1.93p=0.0268

Diverse z=2.57p=0.0051

z=2.10p=0.0179

z=2.60p=0.0047

z=1.70p=0.0446

Freebase z=−0.61p=0.2709

z=−0.15p=0.4404

z=−0.87p=0.1922

Experts z=0.49p=0.3121

z=−0.29p=0.3859

YPS09 z=−0.77p=0.2206

http://linkeddata.org/

Large and complex graphs capturing millions ofentities and billions of relationships betweenentities.

Applications:search,recommendationsystems,businessintelligence,healthinformatics,factchecking

EntityGraph

FILM FILMSETDECORATOR FILMEDITOR FILM FILMDIRECTOR FILMPRODUCER

PRODUCTION COMPANY FILM FILMWRITER FILM

TooManyPreviews.WhichOnetoChoose?

A B

AggregateScoring

Coverage-basedmethod:Coverage(Genres)=5

Entropy-basedmethod:Entropy(Genres)=(2/3)log(3/2)+(1/3)log(3/1)=0.28

Coverage-basedmethod:Coverage(FILM)=3

Randomwalk-basedmethod:Stationarydistributionofarandomwalkprocessdefinedovertheschemagraph

Tight

Diverse

Concise

FILM Performances Genres DirectedBy

FILM DIRECTOR FilmsDirected

FILM PRODUCER Films Produced

FILMFESTIVAL Location Focus

FILM COMPANY Films

FILM CHARACTER Portrayed inFilm

Tight

Diverse

Algorithms

WeassumeallKkeyattributesareorderedarbitrarily.optimalconcisepreview(k,n,X)isthebestof:optimalconcisepreview(k,n,X-1)optimalconcisepreview(k-1,n-1,X-1)� X-th Key-attributewith1non-keyattributeoptimalconcisepreview(k-1,n-2,X-1)� X-th Key-attributewith2non-keyattributes……optimalconcisepreview(k-1,k-1,X-1)� X-th Key-attributewith(n-k+1)non-key

attributes

Concisepreview,dynamicprogrammingalgorithm

Tight/Diversepreview,Apriori propertyalgorithm

NP-hard

Diverse(Gs,k,k,2,0)

Tight(Gs,k,k,1,0)

1.Construct2-cliquesbyenumeratingallkeyattributepairs2.fori=3tokgenerate i-cliques from(i-1)-cliques based on Aprioriproperty3.find the k-clique with highestscore,return asoptimalpreview

Systemssortedbyaverageuserexperiencescoresacrossfivedomains

Pairwisecomparisonsofconversionrates,domain=“music”, α=0.1

Keyattributescoring(precision-at-k)

Comparisonbetweenrankingsbyourapproachandthecrowd,PearsonCorrelationCoefficient(PCC)

dist(Ti,Tj)≤d

dist(Ti,Tj)≥d

Schemagraphof“Film”domaininFreebaseEntitygraph:2Mentities,18MedgesSchemagraph:63entitytypes,136edges

Schema Graph itself can be too complex.

Timetakenonexistencetests,domain=“music”

FILM Director Genres

MeninBlack BarrySonnenfeld {ActionFilm,ScienceFiction}

MeninBlackII BarrySonnenfeld {ActionFilm,ScienceFiction}

I,Robot _ {ActionFilm}

Domain Coverage Entropy

books 0.8 0.786

film 0.2 0.25

music 0.528 0.589

TV 0.622 0.379

people 0.708 0.606

MeanReciprocalRank(MRR)ofNon-keyattributes

Domains:film,books,music,TV,peopleHand-craftedpreviewtables10PhDstudents inDatabaseresearchgroupIndividuallyandasagroup$20giftcard

Existence/experiencequestionsq Schemagraphq Concisepreviewq Tightpreviewq Diversepreviewq Freebasegroundtruthq YPS09q Hand-crafted preview tables84Master’s andPhD students indatabase area$15gift card

0

0.2

0.4

0.6

0.8

1

1 6 11 16

Film

Non-keyattributescoring