Generating Illustrative Snippets for Open Data on the Web

Preview:

Citation preview

Generating Illustrative Snippetsfor Open Data on the Web

Gong Cheng, Cheng Jin, Wentao Ding, Danyun Xu, Yuzhong Qu

Websoft Research GroupNational Key Laboratory for Novel Software Technology

Nanjing University, China

Websoft

The Web is in the era of open data.

Dataset search engines have emerged.

Metadata about a dataset is served,

and only metadata is served.

We proposeto also serve an illustrative snippet,

Dataset:A set of entity-property-value triples

Snippet:A size-limited subset of triples

Snippet generation

and to serve a high-quality snippet.

• CoverageTo cover the most important entity types and properties.

• FamiliarityTo contain entities familiar to average users.

• CohesionTo describe a set of related entities.

To this end, we formulate and solve a newcombinatorial optimization problem:

• Maximum-weight-and-coverage connected graph problem (MwcCG)

To this end, we formulate and solve a newcombinatorial optimization problem:

• Maximum-weight-and-coverage connected graph problem (MwcCG)

CoverageFamiliarity Cohesion

Quality of snippet

Experiment results

Baseline: PageRank-based snippet (Rietveld et al., ISWC’14)

Our snippet

Summary

• Motivation• To help people quickly know the contents of a large dataset

• Our contribution• We propose to automatically extract an optimal illustrative snippet

pursuing coverage, familiarity, and cohesion.• We formulate a new combinatorial optimization problem:

to maximize coverage & weights, constrained by graph connectivity.• We solve the problem using an approximation algorithm.

• Paper• Gong Cheng, Cheng Jin, Wentao Ding, Danyun Xu, Yuzhong Qu.

Generating Illustrative Snippets for Open Data on the Web.In Proc. WSDM ’17.