16
From Big Linked Data to Linked Big Data: DBpedia as a framework for data integration Giuseppe Futia 1 , Antonio Vetrò 1 , Giuseppe Rizzo 2 1- Nexa Center for Internet and Society, DAUIN, Politecnico di Torino 2- Istituto Superiore Mario Boella (ISMB) 7th DBpedia Community Meeting in Leipzig 15 September 2016

From Big Linked Data to Linked Big Data - DBpedia as a framework for data integration

Embed Size (px)

Citation preview

Page 1: From Big Linked Data to Linked Big Data - DBpedia as a framework for data integration

From Big Linked Data to Linked Big Data: DBpedia as a framework fordata integration

Giuseppe Futia1, Antonio Vetrò1, Giuseppe Rizzo2

1- Nexa Center for Internet and Society, DAUIN, Politecnico di Torino 2- Istituto Superiore Mario Boella (ISMB)

7th DBpedia Community Meeting in Leipzig15 September 2016

Antonio Vetro'
cosa vuoi dire in questa slide ? cosa rappresentano le bolle ? non è chiaro al momento
Giuseppe Futia
Concordo, sostituisco con 2 slide
Giuseppe Futia
Questa può essere una slide in cui inserisco uno schema di ciò che vorrei creare
Giuseppe Futia
Conviene rileggersi il papers di Fujitsu e cercare di estrarre concetti utili per lo scopo
Giuseppe Futia
DONE
Giuseppe Futia
Forse val la pensa riprendere i paper di DBpedia Spotlight e magari un altro su DBpedia NLP, così da riuscire a non dire minchiate durante la presentazione. Ed in queste fitterebbe anche con quelle che risultano essere blah blah.
Giuseppe Futia
Si può cominciare ad accennare qualcosa di relativo al Deep Learning per i testi? Forse si può accennare al papero di Facebook su questa cosa o anche agli studi che avevo cominciato a trovare. Magari utilizzando tensorflow in qualche maniera? Figatona
Giuseppe Futia
Devo leggermi il paper di RML
Giuseppe Futia
Conviene leggere anche i capitoli successivi di Map/Reduce, così da poter ipotizzare un primo esperimento che poi potrà essere realizzato.
Giuseppe Futia
Non ho capito il terzo punto che ho scritto
Giuseppe Futia
Bisogna leggere un paper a riguardo, un po' per capire che cosa si può dire a proposito dell'ontologia. Un po' per capire se ci sono use cases interessanti che avvallano le mie idee.
Page 2: From Big Linked Data to Linked Big Data - DBpedia as a framework for data integration

PhD candidate on semantics atNexa Center for Internet & Society,DAUIN, Politecnico di Torino

Page 3: From Big Linked Data to Linked Big Data - DBpedia as a framework for data integration

Experiences with LOD and DBpedia

• TellMeFirst, a tool for classifying and enriching textual documents built on DBpedia Spotlight (http://tellmefirst.polito.it)

• Contratti Pubblici, a tool for processing, exploring, and visualizing Italian Public Procurements (http://public-contracts.nexacenter.org/)

Page 4: From Big Linked Data to Linked Big Data - DBpedia as a framework for data integration

4

How TellMeFirst works

Page 5: From Big Linked Data to Linked Big Data - DBpedia as a framework for data integration

TellMeFirstResults obtained with a

description of theEyes Wide Shut movie

Page 6: From Big Linked Data to Linked Big Data - DBpedia as a framework for data integration

Anti-corruption National Authority

Contratti Pubblici (Synapta + Nexa)

Different data sources to build a search engine on Italian Public Contracts

Agency for Digital Italy

Page 7: From Big Linked Data to Linked Big Data - DBpedia as a framework for data integration

Linked Data repository of Public Contracts, linked to

DBpedia and SPC

Contratti Pubblici(Synapta + Nexa)

Contratti Pubblici

Page 8: From Big Linked Data to Linked Big Data - DBpedia as a framework for data integration

DBpedia in our projects

• TellMeFirst:–Training set used for the semantic classification task–Several interlinks used for the enrichment task

• Contratti Pubblici:–Data enrichment to enable advanced SPARQL queries–Data quality improvement (i.e., consistent labels)

Page 9: From Big Linked Data to Linked Big Data - DBpedia as a framework for data integration

• Big Linked Data–Already implemented as shown by the exponential growth of Linked Data in the last years

• Linked Big Data–RDF data model for Big Data Variety–Meta information to enable powerful analytics–Simplify Big Data access, integration, and interlinking

From Big Linked Data to Linked Big Data

Page 10: From Big Linked Data to Linked Big Data - DBpedia as a framework for data integration

Big Data notion of Variety• Variety of data and representation formats

• Variety of conceptualizations and data models

• Variety related to temporal and spatial dependencies

• Variety as a “generalization of the semantic heterogeneity as studied in the field of Linked Data”

(Pascal Hitzler & Krzysztof Janowicz)

Page 11: From Big Linked Data to Linked Big Data - DBpedia as a framework for data integration

PhD research questions (i)

• RQ1: How can the technological foundations of Linked Data and Big Data can be further improved and combined to create an open software architecture for a multi-thematic, multi-perspective, and multi-medial knowledge graph from heterogeneous sources?

Page 12: From Big Linked Data to Linked Big Data - DBpedia as a framework for data integration

PhD research questions (ii)

• RQ2: Which are the features of a research method to meet and evaluate security, scalability, performance, openness, interoperability of the software architecture mentioned earlier? And how we can measure the quality of the knowledge graph produced with this software architecture?

Page 13: From Big Linked Data to Linked Big Data - DBpedia as a framework for data integration

Key ideas for my PhD• Get concepts and ontologies from the DBpedia

knowledge base to support semantic alignment during the integration stage

• Use frameworks for data integration of structured information with Big Data technologies:RDF Mapping Language (RML) + Hadoop or Spark

• Exploit Machine Learning techniques to increment datasets with unstructured data (i.e., Deep Learning)

Page 14: From Big Linked Data to Linked Big Data - DBpedia as a framework for data integration

DBpedia as knowledge base for:

• Entity linking and annotations in documents

• Assertion of additional categories for data

• Improvement of multilingual information

• Estimation of data quality of integrated information according to different features (i.e., provenance)

Page 15: From Big Linked Data to Linked Big Data - DBpedia as a framework for data integration

Challenges• Greater accuracy (integrating different datasets)

• Immediacy (near-real time data, from new data sources)

• Flexibility (not constrained by database structure)

• Better analytics (the ability to change the rules)

• Data quality (reliability and effectiveness of data)

Page 16: From Big Linked Data to Linked Big Data - DBpedia as a framework for data integration

Suggestions and/or comments?

[email protected]

Repository GitHubhttps://github.com/giuseppefutia/