Upload
giuseppefutia
View
86
Download
4
Embed Size (px)
Citation preview
From Big Linked Data to Linked Big Data: DBpedia as a framework fordata integration
Giuseppe Futia1, Antonio Vetrò1, Giuseppe Rizzo2
1- Nexa Center for Internet and Society, DAUIN, Politecnico di Torino 2- Istituto Superiore Mario Boella (ISMB)
7th DBpedia Community Meeting in Leipzig15 September 2016
PhD candidate on semantics atNexa Center for Internet & Society,DAUIN, Politecnico di Torino
Experiences with LOD and DBpedia
• TellMeFirst, a tool for classifying and enriching textual documents built on DBpedia Spotlight (http://tellmefirst.polito.it)
• Contratti Pubblici, a tool for processing, exploring, and visualizing Italian Public Procurements (http://public-contracts.nexacenter.org/)
4
How TellMeFirst works
TellMeFirstResults obtained with a
description of theEyes Wide Shut movie
Anti-corruption National Authority
Contratti Pubblici (Synapta + Nexa)
Different data sources to build a search engine on Italian Public Contracts
Agency for Digital Italy
Linked Data repository of Public Contracts, linked to
DBpedia and SPC
Contratti Pubblici(Synapta + Nexa)
Contratti Pubblici
DBpedia in our projects
• TellMeFirst:–Training set used for the semantic classification task–Several interlinks used for the enrichment task
• Contratti Pubblici:–Data enrichment to enable advanced SPARQL queries–Data quality improvement (i.e., consistent labels)
• Big Linked Data–Already implemented as shown by the exponential growth of Linked Data in the last years
• Linked Big Data–RDF data model for Big Data Variety–Meta information to enable powerful analytics–Simplify Big Data access, integration, and interlinking
From Big Linked Data to Linked Big Data
Big Data notion of Variety• Variety of data and representation formats
• Variety of conceptualizations and data models
• Variety related to temporal and spatial dependencies
• Variety as a “generalization of the semantic heterogeneity as studied in the field of Linked Data”
(Pascal Hitzler & Krzysztof Janowicz)
PhD research questions (i)
• RQ1: How can the technological foundations of Linked Data and Big Data can be further improved and combined to create an open software architecture for a multi-thematic, multi-perspective, and multi-medial knowledge graph from heterogeneous sources?
PhD research questions (ii)
• RQ2: Which are the features of a research method to meet and evaluate security, scalability, performance, openness, interoperability of the software architecture mentioned earlier? And how we can measure the quality of the knowledge graph produced with this software architecture?
Key ideas for my PhD• Get concepts and ontologies from the DBpedia
knowledge base to support semantic alignment during the integration stage
• Use frameworks for data integration of structured information with Big Data technologies:RDF Mapping Language (RML) + Hadoop or Spark
• Exploit Machine Learning techniques to increment datasets with unstructured data (i.e., Deep Learning)
DBpedia as knowledge base for:
• Entity linking and annotations in documents
• Assertion of additional categories for data
• Improvement of multilingual information
• Estimation of data quality of integrated information according to different features (i.e., provenance)
Challenges• Greater accuracy (integrating different datasets)
• Immediacy (near-real time data, from new data sources)
• Flexibility (not constrained by database structure)
• Better analytics (the ability to change the rules)
• Data quality (reliability and effectiveness of data)