11

Click here to load reader

Generating Illustrative Snippets for Open Data on the Web

Embed Size (px)

Citation preview

Page 1: Generating Illustrative Snippets for Open Data on the Web

Generating Illustrative Snippetsfor Open Data on the Web

Gong Cheng, Cheng Jin, Wentao Ding, Danyun Xu, Yuzhong Qu

Websoft Research GroupNational Key Laboratory for Novel Software Technology

Nanjing University, China

Websoft

Page 2: Generating Illustrative Snippets for Open Data on the Web

The Web is in the era of open data.

Page 3: Generating Illustrative Snippets for Open Data on the Web

Dataset search engines have emerged.

Page 4: Generating Illustrative Snippets for Open Data on the Web

Metadata about a dataset is served,

Page 5: Generating Illustrative Snippets for Open Data on the Web

and only metadata is served.

Page 6: Generating Illustrative Snippets for Open Data on the Web

We proposeto also serve an illustrative snippet,

Dataset:A set of entity-property-value triples

Snippet:A size-limited subset of triples

Snippet generation

Page 7: Generating Illustrative Snippets for Open Data on the Web

and to serve a high-quality snippet.

• CoverageTo cover the most important entity types and properties.

• FamiliarityTo contain entities familiar to average users.

• CohesionTo describe a set of related entities.

Page 8: Generating Illustrative Snippets for Open Data on the Web

To this end, we formulate and solve a newcombinatorial optimization problem:

• Maximum-weight-and-coverage connected graph problem (MwcCG)

Page 9: Generating Illustrative Snippets for Open Data on the Web

To this end, we formulate and solve a newcombinatorial optimization problem:

• Maximum-weight-and-coverage connected graph problem (MwcCG)

CoverageFamiliarity Cohesion

Quality of snippet

Page 10: Generating Illustrative Snippets for Open Data on the Web

Experiment results

Baseline: PageRank-based snippet (Rietveld et al., ISWC’14)

Our snippet

Page 11: Generating Illustrative Snippets for Open Data on the Web

Summary

• Motivation• To help people quickly know the contents of a large dataset

• Our contribution• We propose to automatically extract an optimal illustrative snippet

pursuing coverage, familiarity, and cohesion.• We formulate a new combinatorial optimization problem:

to maximize coverage & weights, constrained by graph connectivity.• We solve the problem using an approximation algorithm.

• Paper• Gong Cheng, Cheng Jin, Wentao Ding, Danyun Xu, Yuzhong Qu.

Generating Illustrative Snippets for Open Data on the Web.In Proc. WSDM ’17.