Linked Data DBpedia - Data DBpedia ... label Siemens@de ; dbo: ... Linked Data DBpedia Linked Open Data LOD-Cloud 2014 Linked Data - Datasets under an open access

  • Published on
    24-May-2018

  • View
    213

  • Download
    1

Transcript

Linked Data & DBpediaM. Freudenberg, K. Mller & M. AckermannAKSW/KILT - LeipzigDBpedia AssociationLinked Data & DBpediaAKSW/KILT- Knowledge Integration and Language Technology- Part of Agile Knowledge Engineering and Semantic Web (AKSW)Knowledge IntegrationLanguage TechnologyKILTLinked Data & DBpediaTim Berners-Lee- British computer scientist- director of the W3C- Inventor of the World Wide Web (1989 @ CERN)- Over his frustration of disconnected islands of information (about scientists, projects and results)- Published the first Website: http://info.cern.ch/hypertext/WWW/TheProject.htmlhttp://info.cern.ch/hypertext/WWW/TheProject.htmlhttp://info.cern.ch/hypertext/WWW/TheProject.htmlLinked Data & DBpediaFrom the WWW to the Web of Data- applying the principles of the WWW to datadata is relationships,not only propertiesLinked Data & DBpediaTimBLs next leap: from WWW to WODUse Linked Data to build a Web Of Data- applying the principles of the WWW to data- Data is relationships not only properties- The more data you have to connect together the more you can find out- using Linked Data to:- Bridging disciplines and domains (by linking their data) - Unlock the potential of island repositories dont hoard your data, if possible: share itWatch the TED talk of TimBL about Linked Datahttps://www.ted.com/talks/tim_berners_lee_on_the_next_web?language=enhttps://www.ted.com/talks/tim_berners_lee_on_the_next_web?language=enLinked Data & DBpediaLinked Data Principles1. Use HTTP URIs as identifiers for resources so people can look up the names2. Provide data at the location of URIs to provide data for interested parties3. Include links to other resources so people can discover more things bridging disciplines and domains the more linked resources, the more one can find out Linked Data & DBpediaRDF - Resource Description Framework is a so called Triple.http://dbpedia.org/resource/Siemens- Statements of subject > predicate > object "Siemens"labelPredicate ObjectSubjectLinked Data & DBpediaKnowledge Graphs Combining multiple Triples is known as a Graph Linking resource to resource inside or outside the current graph/dataset A knowledge-base of this style is considered a Knowledge Graph (KG)Linked Data & DBpediaThe Data (in RDF/XML) Siemens Linked Data & DBpediaThe Data (in Turtle) dbo:type ; rdfs:label "Siemens"@de ; dbo:location . dbo:country . http://dbpedia.org/resource/Aktiengesellschafthttp://dbpedia.org/resource/Munichhttp://dbpedia.org/resource/AktiengesellschaftLinked Data & DBpediaLinked Data vs Open Data4. Final principle: Open your data using open licenses Not all linked data is open Licensed data can still profit from using standards Can be enriched with links to Linked Data Can be accessed by standard toolsLinked Data & DBpedia5 Linked Open DataLinked Data & DBpediaWhy publish Linked Data Ease of discovery through linking Easy to consume by humans and machines Reduce data redundancy Support collaboration & interoperability Add value and visibilityLinked Data & DBpediaBenefits of Linked Data: Consumer View- discover more related data by following links- reuse the data of other datasets- combine data safely from different sources- formulate sophisticated queries example in appendix- query data over multiple repositories- semantic enrichment of text resources- semantic feature for machine learning models (e.g. deep learning, word embeddings, etc.) Linked Data & DBpediaBenefits of Linked Data: Publisher View- link data to any other resource on the web, thereby increasing the value of your data- making your data discoverable (via links)- exhaustive descriptions of large and changing domains (Gene Ontology, Human Disease Ontology)- structured representation of large, versatile datasets (Knowledge Graphs, Thesauri, Taxonomies)- deal with unstructured data (text) as no DB-Schema could- data and schemata using the same format (RDF)- store metadata alongside the actual data (e.g. DCAT)Linked Data & DBpediaLinked Open DataLOD-Cloud 2014Linked Data -Datasets under an open access- 1014 datasets- any subject- over 50B triples- over 100M links Linked Data & DBpediaDBpedia First public Knowledge Graph Has become the focal point of the so called Linked Open Data Cloud. Is the most universal dataset (since its based on Wikipedia). Links actively to many relevant Linked Open Datasets. Is a link destination for many other Datasets.(more on DBpedia later)Linked Data & DBpediaOther Linked Data Sets: Freebase Managed, hosted by Google until 2015 Now (in part) subsumed by Wikidata extracted structured data from Wikipedia and other Sources available in RDF Differences to DBpedia Freebase used several sources (but DBpedia+ does as well) Freebase can be directly edited by users Ontology and mappings were not coordinated by a community never established a community which enriched or validated the data, mostly generated by crawlersLinked Data & DBpediaWikidata Initialized by Wikimedia Germany e.V. in 2012 free knowledge base about the world that can be read edited by humans and machines alike can offer a variety of statements from different sources DBpedia is extracting information from Wikidata to fuse it with knowledge from Wikipedia Goal is to provide a single point of truth for facts in Wikipedia across different language versionsLinked Data & DBpediaOther Datasets Geonames geographical database covers all countries contains over eleven million placenames e.g. http://www.geonames.org/3399415/fortaleza.html Linked Open Vocabularies (LOV) Keeps track of available open ontologies and provides them as a graph Search for available ontologies, open for reuse e.g. http://lov.okfn.org/dataset/lov/vocabs/foaf Lexvo.org information about languages, words, characters, and other human language-related entities e.g. http://www.lexvo.org/page/iso639-3/deuhttp://www.geonames.orghttp://www.geonames.orghttp://www.geonames.org/3399415/fortaleza.htmlhttp://lov.okfn.org/dataset/lovhttp://lov.okfn.org/dataset/lovhttp://lov.okfn.org/dataset/lov/vocabs/foafhttp://www.lexvo.orghttp://www.lexvo.orghttp://www.lexvo.org/page/iso639-3/deuLinked Data & DBpediaExcursus: OntologiesThis is a concise introduction to ontologies and their role as schemata in Linked Data.(No worries, we keep this short ;)Linked Data & DBpediaLevels of KnowledgeLinked Data & DBpediaDifferent PerceptionsLinked Data & DBpediaConceptualizationLinked Data & DBpediaOntologies in Computer Science An ontology has a common language (symbols, expressions) Syntax The meaning of symbols and expressions is clear Semantics Symbols and expressions with similar semantics are grouped in concepts (classes) Conceptualization Concepts are organized in a hierarchical way Taxonomy Concepts might be related to others Relations Implicit knowledge can be made explicit ReasoningLinked Data & DBpediaOntology, a definitionAn ontology is an explicit, formal specification of a shared conceptualization. (Thomas R. Gruber, 1993)Linked Data & DBpediaExampleLinked Data & DBpediaAxiomsAxioms are knowledge definitions in the ontology that were explicitly defined and have not been proven true.Implicit knowledge can be made explicit by logical induction: Reasoning over an ontologySource for the ontology related slides:http://www.slideshare.net/SergeLinckels/semantic-web-ontologiesLinked Data & DBpediaOntology LanguageTo express ontologies in a formal, machine readable way, in order to reason over the outlined knowledge, we need a specialized language. most common: Web ontology language (OWL) represent rich and complex knowledge about things based on a subset of First Order Logic (FOL) can be used to verify the consistency of a knowledge can make implicit knowledge explicit as the data it conceptualizes, it is serializable in RDFLinked Data & DBpediaHow to utilize Linked Data StandardsAny OWL ontology/taxonomy can be used in a non LD context.- through its ability to link resources, RDF based ontologies can easily amalgamate, thereby making them reusable- extending ontologies to fit a narrower use cases- reducing ontologies of a certain area to fit a broader scope- separating semantic structure (classes, properties) from use case specific restrictions (e.g. cardinalities) -> SHACL- Example: DataIDThe W3C, responsible for common standards on the Web, is focusing on RDF based standards in many fields.http://w3c.github.io/data-shapes/shacl/http://wiki.dbpedia.org/projects/dbpedia-dataidLinked Data & DBpediaIncremental adoption of LD technologiesLinked Data standards and technologies are manifold and, at times, confusing.Fortunately, introducing Linked Data into existing IT environments can be accomplished in an incremental fashion: Collect data without given schemata/ontologies Very helpful when dealing with semi- or unstructured data Use RDF Views on top of existing DBMS With an easy to change R2RML mapping Develop an domain ontology over time (Open World Assumption) Especially useful in fast changing domains Enrich data with every iteration of you data management cycle See ALIGNED methods for more Start using LD based tooling: e.g. Rel Finderhttp://www.visualdataweb.org/relfinder/demo.swf?obj1=Sm9obiBDbGVlc2V8aHR0cDovL2RicGVkaWEub3JnL3Jlc291cmNlL0pvaG5fQ2xlZXNl&obj2=VGVycnkgR2lsbGlhbXxodHRwOi8vZGJwZWRpYS5vcmcvcmVzb3VyY2UvVGVycnlfR2lsbGlhbQ==&obj3=TW9udHkgUHl0aG9ufGh0dHA6Ly9kYnBlZGlhLm9yZy9yZXNvdXJjZS9Nb250eV9QeXRob24=&name=REJwZWRpYSAobWlycm9yKSAoZnJvbSBVUkwgcGFyYW1ldGVycyk=&abbreviation=ZGJwMTQ2MjczNzExMTgxMQ==&description=TGlua2VkIERhdGEgdmVyc2lvbiBvZiBXaWtpcGVkaWEu&endpointURI=aHR0cDovL2RicGVkaWEub3JnL3NwYXJxbA==&dontAppendSPARQL=ZmFsc2U=&defaultGraphURI=aHR0cDovL2RicGVkaWEub3Jn&isVirtuoso=dHJ1ZQ==&useProxy=dHJ1ZQ==&method=UE9TVA==&autocompleteLanguage=ZW4=&autocompleteURIs=aHR0cDovL3d3dy53My5vcmcvMjAwMC8wMS9yZGYtc2NoZW1hI2xhYmVs&ignoredProperties=aHR0cDovL2RicGVkaWEub3JnL29udG9sb2d5L3dpa2lQYWdlV2lraUxpbmssaHR0cDovL2RicGVkaWEub3JnL3Byb3BlcnR5L3dpa2lQYWdlVXNlc1RlbXBsYXRlLGh0dHA6Ly9kYnBlZGlhLm9yZy9wcm9wZXJ0eS93aWtpbGluayxodHRwOi8vZGJwZWRpYS5vcmcvcHJvcGVydHkvd29yZG5ldF90eXBlLGh0dHA6Ly9wdXJsLm9yZy9kYy90ZXJtcy9zdWJqZWN0LGh0dHA6Ly93d3cudzMub3JnLzE5OTkvMDIvMjItcmRmLXN5bnRheC1ucyN0eXBlLGh0dHA6Ly93d3cudzMub3JnLzIwMDIvMDcvb3dsI3NhbWVBcyxodHRwOi8vd3d3LnczLm9yZy8yMDA0LzAyL3Nrb3MvY29yZSNzdWJqZWN0&abstractURIs=aHR0cDovL2RicGVkaWEub3JnL29udG9sb2d5L2Fic3RyYWN0&imageURIs=aHR0cDovL2RicGVkaWEub3JnL29udG9sb2d5L3RodW1ibmFpbCxodHRwOi8veG1sbnMuY29tL2ZvYWYvMC4xL2RlcGljdGlvbg==&linkURIs=aHR0cDovL3B1cmwub3JnL29udG9sb2d5L21vL3dpa2lwZWRpYSxodHRwOi8veG1sbnMuY29tL2ZvYWYvMC4xL2hvbWVwYWdlLGh0dHA6Ly94bWxucy5jb20vZm9hZi8wLjEvcGFnZQ==&maxRelationLegth=Mg==Linked Data & DBpediaLinked Data in the context of Big DataIn 2006, Clive Humby coined the phrase "the new oil" for (digital) data, heralding the ever-expanding realm of what is now summarised as: Big Data.What role does Linked Data play in the context of buzzwords like: Big Data Smart Data (e.g. Smart Data Forum)http://www.digitale-technologien.de/DT/Navigation/EN/Foerderprogramme/Smart_Data/smart_data.htmlLinked Data & DBpediaThe four Vs of Big DataLinked Data & DBpediaThe four Vs heatmap for Linked DataGartner Study in 2013 found:- many organizations find the variety dimension a much bigger challenge than volume or velocity.Linked Data to the rescue:- Combine multiple sources with different structures- while retaining the flexibility to add new ones- without adapting schematas- query combined data, or multiple sources at once- detecting patterns in the datahttps://www.gartner.com/doc/2589121https://www.gartner.com/doc/2589121Linked Data & DBpediaLinked Data in the context of Big Data- Linked Data can describe any kind of data- no matter the amount or domain- especially useful in here:- graph structured data- social media, knowledge graphs- (multi-) lingual data- easy incorporation of unstructured data- perfect for annotating purposes- data for complex domains - e.g. taxonomies in life sciences- ontologies, metadata and provenance info- ontos are modeled with OWL a RDF extensionBig DataLinked Data & DBpediaLinked Data in the context of Big Data- Linked Data can describeany kind of data- no matter the amount or domain- especially useful in here:- graph structured data- social media, knowledge graphs- (multi-) lingual data- easy incorporation of unstructured data- perfect for annotating purposes- data for complex domains - e.g. taxonomies in life sciences- ontologies, metadata and provenance info- ontos are modeled with OWL a RDF extensionBig Data Smart DataLinked Data & DBpediaLinked Data in the context of Big Data- Linked Data can describeany kind of data- no matter the amount or domain- especially useful in here:- graph structured data- social media, knowledge graphs- (multi-) lingual data- easy incorporation of unstructured data- perfect for annotating purposes- data for complex domains - e.g. taxonomies in life sciences- ontologies, metadata and provenance info- ontologies are modeled with OWL a RDF extensionBig Data Smart DataLinked DataLinked Data & DBpediaLinked Data in Research Computer science: especially graph- and NLP-related, QA, AI e.g. IBM: Natural Language Understanding of Unstructured Data Life sciences: to describe complex domains (large ontologies & taxonomies) e.g. Human Disease Ontology (Digital) Humanities: to manage (record, annotate) large text records e.g. Homer Multitext Project Libraries: recording of metadata and interlinking it with other institutions e.g. Deutsche Nationalbibliothekhttps://www.ibm.com/developerworks/community/blogs/nlp/entry/natural_language_understanding_of_unstructured_data1?lang=enhttp://disease-ontology.orghttp://www.homermultitext.org/about.htmlhttp://www.dnb.de/EN/Service/DigitaleDienste/LinkedData/linkeddata_node.htmlLinked Data & DBpedia Online Search (large Knowledge Graphs) e.g. Google Social Media (social network analysis) e.g. Facebook Publishing Industry (large text corpora annotation) e.g. Wolters Kluwer, NYT Broadcasting (ontology-centered Linked Data services) e.g. BBC (Open) Government Data e.g. US publishing data as RDF http://data.govLinked Data in Industryhttp://linkeddatadeveloper.com/Projects/Linking-Enterprise-Data/Manuscript/led-hondros.htmlhttp://open.blogs.nytimes.com/2010/01/13/more-tags-released-to-the-linked-data-cloud/http://www.bbc.co.uk/blogs/internet/entries/78d4a720-8796-30bd-830d-648de6fc9508http://data.govLinked Data & DBpediaWho uses RDF (in public)http://radar.oreilly.com/2010/05/facebook-open-graph-and-the-se.htmlhttps://www.linkedin.com/company/linkedin-economic-graphLinked Data & DBpediaDBpedia a fused, multi-domain, multilingual dataset Is a crowed sourced community effort to extract structured information from Wikipedia and Wikidata. Enriches the extracted information w. semantic layer Provides a query service and many additional tools.Linked Data & DBpediaA web of knowledgeLinked Data & DBpediaDBpedia History DBpedia project was started in 2006 as a collaboration of Freie U. Berlin, U. Leipzig and U. Mannheim Has been a key factor in the rapid growth of the LOD initiative and the overall success of Linked Data http://wiki.dbpedia.orgExample: http://dbpedia.org/page/Leipzig_University http://en.wikipedia.org/wiki/Monty_Python http://dbpedia.org/resource/Monty_Python http://wiki.dbpedia.orghttp://wiki.dbpedia.orghttp://dbpedia.org/page/Leipzig_UniversityLinked Data & DBpediaSome statisticsThe latest release (2015-10): was extracted from 127 language editions describing up to 20 million things with 8.8 billion RDF statements (triples) Mirroring every wikipedia page = 6.2M things of which 4.6M have abstracts, 955K have geo coordinates and 1.54M depictions In general we observed a significant growth in raw infobox and mapping-based statements of close to 10%.Linked Data & DBpediaStructure of DBpedia Structured in language datasets with multiple subsets Each sub-dataset specializes on a certain type of data (see below) mapping-based types and facts governed by the DBpedia OntologyLinked Data & DBpediaDBpedia Ontology A cross-domain ontology maintained and extended by the community in the DBpedia Mappings Wiki manually created based on the most commonly used infoboxes currently covers 685 classes which form a subsumption hierarchy and are described by 2,795 different properties subsumption hierarchy with a maximal depth of 5 is maintained and extended by the community in the DBpedia Mappings WikiDBpedia Ontology ExtractLinked Data & DBpediaDBpedia Mappings Wiki a community effort to: develop an ontology schema provide mappings from Wikipedia Infoboxes properties to this ontology creating an alignment between Wikipedia and Dbpedia eliminating name variations in properties and classes big boost for Precisionhttp://mappings.dbpedia.org/Linked Data & DBpediaExtracting a DBpedia Wikipedia articles consist mostly of free text also comprise various types of structured information depending on the template used for a specific article (e.g. Actor, Village etc.) including: infobox templates, categorisation information, images, geo-coordinates, links to external web pages, disambiguation pages, redirects between pages, other language linksLinked Data & DBpediaWikipedia Article Structure Title Abstract Infoboxes Geo-coordinates Categories Images Links other language versions other Wikipedia pages To the Web Redirects DisambiguationsLinked Data & DBpediaInfobox EncodingLinked Data & DBpediaDIEF - DBpedia Information Extraction Framework extracts structured information from Wikipedia and turns it into a rich knowledge base Mapping-Based Infobox Extraction, Raw Infobox Extraction, Feature Extraction, Statistical Extraction Updated to adapt to changes in Wikipedia Expanded for new knowledge extraction methods E.g. by multiple GSOC projects (extraction tables, NIF,...) Open Source code in Scala & Java Linked Data & DBpediaDBpedia Live Wikipedia articles are continuously revised at a very high rate English Wikipedia, in June 2013, had approximately 3.3 million edits per month (^= 77 edits per minute) Dbpedia Live was developed to keep Dbpedia in synchronization with Wikipedia works on a continuous stream of updates from Wikipedia and processes that stream on the flyLinked Data & DBpediaAccessing and Querying DBpedia per resource view Linked Data interfaceshttp://dbpedia.org/page/Immanuel_Kant navigation view LodLive Browserhttp://en.lodlive.it/?http://dbpedia.org/resource/Immanuel_Kant querying for resources SPARQL (introduced later) DBpedia Lookup Servicehttp://dbpedia.org/page/Immanuel_Kanthttp://dbpedia.org/page/Immanuel_Kanthttp://dbpedia.org/page/Immanuel_Kanthttp://en.lodlive.it/?http://dbpedia.org/resource/Immanuel_Kanthttp://en.lodlive.it/?http://dbpedia.org/resource/Immanuel_Kanthttp://en.lodlive.it/?http://dbpedia.org/resource/Immanuel_Kanthttp://en.lodlive.it/?http://dbpedia.org/resource/Immanuel_KantLinked Data & DBpediaDBpedia Lookup Service REST service to query for DBpedia resources index of DBpedia resource, including alternative names/labels(page redirects, disambiguisation links, ) search by complete keywords and prefix search results ranked by relevance (Page Rank) filtering by DBpedia ontology classeshttp://lookup.dbpedia.org/api/search/KeywordSearch?QueryClass=place&QueryString=berlinhttp://lookup.dbpedia.org/api/search/KeywordSearch?QueryClass=person&QueryString=berlinhttp://lookup.dbpedia.org/api/search/KeywordSearch?QueryClass=place&QueryString=berlinhttp://lookup.dbpedia.org/api/search/KeywordSearch?QueryClass=place&QueryString=berlinhttp://lookup.dbpedia.org/api/search/KeywordSearch?QueryClass=person&QueryString=berlinhttp://lookup.dbpedia.org/api/search/KeywordSearch?QueryClass=person&QueryString=berlinLinked Data & DBpediaDBpedia internationalised non-English versions of DBpedia offers coverage of more entities more detailed or up-to-date information for entities associated with the particular countries international mapping community helps in provision of localized dbpedia datasets for 125 languages 15 DBpedia chapters (by languages) autonomous management of mapping, organisation of local community, hosting of datasets and services canonicalized datasets facts derived from localized Wikipedias, but only statements for resources also present in Englisch DBpediaLinked Data & DBpediaDBpedia Association- founded in 2014, based in Leipzig- goal: supporting the DBpedia community and provide free data and services to the general public- Data Releases- Software Maintenance- Dissemination- Data Accessibility- Communication (internal and external)- persons and organisations can become member:- gaining support for all DBpedia specific problems (queries, tools etc.)- deciding on the future of DBpedia- acquiring help for creating and linking their own datasetsLinked Data & DBpediaA need for informationWhich films starred John Cleese without any other members of Monty Python?Linked Data & DBpediaSPARQL Protocol and RDF Query Language RDF data query language also query and data transfer protocol specifications (HTTP-based) graph-data oriented, designed independently from ontology & related reasoning but some SPARQL implementations can provide reasoning (e.g. RDFS+) declarative approach carrying several similarities to SQLtutorial: https://jena.apache.org/tutorials/sparql.htmlhttps://jena.apache.org/tutorials/sparql.htmlhttps://jena.apache.org/tutorials/sparql.htmlhttps://jena.apache.org/tutorials/sparql.htmlLinked Data & DBpediaBasic Graph PatternsLinked Data & DBpediaGraph Group PatternLinked Data & DBpediaFiltering Unwanted ResultsLinked Data & DBpediaSPARQL: Combination of Complex Graph PatternsLinked Data & DBpediaSPARQL - additional constructs alternative result types: ASK true, if a valid binding can be found CONSTRUCT create new graph from result bindings combinators and modifiers for queries/graph patterns: UNION, MINUS, Subqueries LIMIT, OFFSET, DISTINCT, ORDER BY property paths as regular language (*,+,^,{n,m}) e.g. rel:hasParent / rel:hasChild{2} / rel:hasFriend+ sizable library of functions and operators for resources and literal valuesfind it all at: http://www.w3.org/TR/sparql11-query/http://www.w3.org/TR/sparql11-query/http://www.w3.org/TR/sparql11-query/http://www.w3.org/TR/sparql11-query/Linked Data & DBpediaEnd of session 1Grab a coffeeNext session:helpful LD technologies for:- NLP- Link Discovery- Data Fusion A common Use Case for integrating LD technologiesLinked Data & DBpediaNIF - Natural Language Processing Interchange Format RDF/OWL-based utilizes various existing standards: RDF, OWL2, PROV Ontology, ITSRDF, promotes stable URIs to identify primary text, its structure, annotations and their meta-dataLinked Data & DBpediaInteroperability for Language Data and ToolsStructural Interoperability:unanimous data format and structure of annotations RDF & NIF vocabConceptual Interoperability:identical vocabularies/taxonomies for annotations (or linkage to common reference vocabulary) Ontology of Linguistic Annotations (OLiA), GOLD, ...Access Interoperability:unanimous, widespread, easily adoptable method for access REST Linked Data & DBpediaNIF: String Relations and Text StructureLinked Data & DBpediaIntegration of NIF service resultsLinked Data & DBpediaIntegration of NIF service resultsLinked Data & DBpediaIntegration of NIF service resultsLinked Data & DBpediaLinking OWL Ontologies for Conceptual Interoperability Ontology of Linguistic Annotations linking specific annotation tag sets into ling. refrence models/ontologies machine-actionable, granular representations of semantics of tags (beyond string values)Linked Data & DBpediaNIF: Further Widespread Requirements Covered Provenance and Confidence for Annotations Multiple Alternative Annotationsexdoc:2_offset_23_29 nif:anchorOf "Berlin" ; itsrdf:taIdentRef ; nif-ann:taIdentConf "0.9"^^xsd:decimal ; nif-ann:taIdentProv exdoc:eEntityProdServiceInvocation ; nif:annotationUnit [ itsrdf:taIdentRef ; nif-ann:taIdentConf "0.32"^^xsd:decimal ; nif-ann:taIdentProv exdoc:eEntityExpServiceInvocation ] .Linked Data & DBpediaAvailable NIF Resources Corpora Services Tokenisation Annotation Validation Combining Tools Outputs Documentation, SpecsLinked Data & DBpediaNIF Corpora Brown Corpus AQUAINT News Corpus NER corpora RSS-500, Reuters-128, KORE 50 Microposts NEEL ACE Mutlilingual DBpedia abstract corpora English, French, German, Dutch, ...Linked Data & DBpediaNIF Services OpenNLP POS tags Stanford NLP POS tags, lemmatization Snowball Stemming DBpedia Spotlight Validation (via RDFUnit)Linked Data & DBpediaMapping Languages Helps to create class mappings between source dataset and target RDF ontology E.g. Table heading to RDF predicates (e.g. rdfs:label)Linked Data & DBpediaMapping Languages cont. R2RML: Only supports mappings between relational databases and RDF RML: Extension of R2RML and supports other input data formats such as CSV, JSON, XML SML: Extension of R2RML and supports other input formats such as CSVhttps://www.w3.org/TR/r2rml/http://rml.io/http://sml.aksw.org/Linked Data & DBpediaETL Frameworks Extract Transform Load (ETL) Common in Data Warehousing Extract phase: Extract data from data sources (e.g. CSV, JSON, database, etc.) Transform phase: Transform data for storing in target format Boilerplating/Normalization Content Enrichment (e.g. loading of geo-information) Load phase: Loads data into target data store (e.g. Virtuoso)Linked Data & DBpediaETL Frameworks - LDIF Linked Data Integration Framework (LDIF) Hadoop based ETL pipeline Supports Provenance Metadata Components: Scheduler Data Import (Crawl, Sparql, Dump) ETL Custom Mapping Languagehttp://ldif.wbsg.de/http://ldif.wbsg.de/http://ldif.wbsg.de/http://ldif.wbsg.de/https://en.wikipedia.org/wiki/Apache_Hadoophttps://en.wikipedia.org/wiki/Apache_HadoopLinked Data & DBpediaETL Frameworks - Unified Views Joined project between Semantic Web Company and Semantica.cz Supported by LOD2 FP7 project Components: Frontend UI Backend Database Scheduler Possible to add custom plugins Linkhttps://www.semantic-web.at/unifiedviewshttps://www.semantic-web.at/unifiedviewsLinked Data & DBpediaLink Discovery Frameworks Finding links between related data items in different datasets Use cases: owl:sameAs links Class mappings (e.g. like R2RML) data transformation Survey Matching algorithms: string similarity, geo-location matching, regular expressions, etc. Link Discovery Strategies Rule based: Using predefined rules to find matching data items Statistical based: Using machine learning techniques to find matching data itemshttp://www.semantic-web-journal.net/system/files/swj1029.pdfhttp://www.semantic-web-journal.net/system/files/swj1029.pdfLinked Data & DBpediaLink Discovery Frameworks - LIMES Link discovery framework for MEtric Spaces Fast, large-scale link discovery using specification language Linkhttp://aksw.org/Projects/LIMES.htmlhttp://aksw.org/Projects/LIMES.htmlLinked Data & DBpediaLink Discovery Frameworks - Silk UI driven linking framework Uses its own specification language Supports for data transformation Linkhttp://silkframework.org/http://silkframework.org/Linked Data & DBpediaData Fusion Fusing of multiple records representing the same real-world object into a single, consistent, and clean representation (Bleiholder & Naumann 2008) Possible use cases: Same value for the same property in all datasets (e.g. name) Different value for the same property in all datasets (e.g. age) New information Problems: No unique IDs Real world data is dirty, big and complex No training data for many linkage applications Trustworthiness of external dataLinked Data & DBpediaData Fusion - Strategies Rule based Using observed value from most updated source Taking average/maximum/minimum for numerical values Idea is to improve efficiency Statistically based Unsupervised/supervised strategies: Vote: Take the value which is supported by largest number of sources Quality based: evaluate trustworthiness Web-link-based, IR-based, Bayesian, graphical model Relation Based Extends Quality Based methods and considers relationship between sources (e.g. copy data around, etc.)Linked Data & DBpediaData Fusion - LD-FusionTool Developed in conjunction with Unified Views project Features: Resolution of schema and identity conflicts Resolution of data conflicts Quality Assessment Provenance Tracking No machine learning based fusion Linkhttp://mifeet.github.io/LD-FusionTool/http://mifeet.github.io/LD-FusionTool/Linked Data & DBpediaData Fusion - Sieve Developed in conjunction with LDIF project Features: Resolution of data conflicts Quality Assessment Provenance Tracking Support for Plugins Linkhttp://sieve.wbsg.de/http://sieve.wbsg.de/Linked Data & DBpediaData Fusion - SieveExampleshttp://sieve.wbsg.de/#exampleshttp://sieve.wbsg.de/#examplesLinked Data & DBpediaALIGNED - Software an Software & Data Engineering quality-centric, software and data engineering research project funded by Horizon 2020 (EC) will develop new ways to build and maintain IT systems that use big data on the webLinked Data & DBpediaALIGNED - One PageLinked Data & DBpediaALIGNED - Goals New methodology for parallel software and data engineering of web-scale information systems. Linked Data the unifying foundation for system specification, process and tool integration. Support evolution of software dependent on heterogeneous, complex data of varying quality with an independent lifecycle. Linked Data & DBpediaALIGNing Problem: the example of DBpediaLots of code & a lot more data Wikipedia evolves over time Infobox Templates change, merge, deleted New formatting templates Structural differences per language edition DBpedia Ontology and Mappings change as well Code should adapt to all the changes hard at this (data) scale Data Quality will suffer Linked Data & DBpediaUnit-testing to the rescue? Software & Data testing Straightforward for software (since 70s) Preliminary for (RDF) data RDFUnit, SPIN W3C Data Shapes WGData testing Generation: manual, (Semi)automatic, ... Linking: data & software testsLinked Data & DBpediaRDF Unithttp://rdfunit.aksw.org http://rdfunit.aksw.orghttp://rdfunit.aksw.orgLinked Data & DBpediaRDF Unit Input: ontologies, updated Data Quality Patterns (DQP),and the Datasets to test against Produces Data Unit Test Cases automatically by applying DQPs to the Axioms of an ontology User defined test cases are added as well Runs all Data Unit Test Cases against a given Dataset Generates Test Case Result data for every violating triple for one of these DQPs ( evaluate Test Case Results to change triples, software or DQP/Test Cases, then run RDF Unit again )Linked Data & DBpediaDBpedia+ WorkflowLinked Data & DBpediaFREME: Multilingual Content Enrichment & Curation two year H2020 innovation action: bridge language and data driven by four business use-casesbusiness partners:vistatec - translation, localisation, content creation/curationtilde - language & terminology servicesagroknow - agriculture & food information & researchwripl - content optimisation & personalisation, SEOLinked Data & DBpediaCurrent State and Challenges for Content EnrichmentLinked Data & DBpediaContribution of FREMEvarious target user groups: developers content authors content architects ...several access modes: graphical interfaces programmatically official endpoints and local service instanceLinked Data & DBpediaFREMEs e-Services e-Entity: for enriching content with information on named entities e-Link: for enrichment with linked data sources e-Terminology: for detecting terms and enriching them with term related information; e-Translation: for providing custom machine translation systems e-Publishing: for exporting enriched contend in the ePubLinked Data & DBpediaFREME DemoLinked Data & DBpediaLinked Data in FREME service interoperability and integration using NIF low entrance barrier: start for test / limited volumes immediately, just using REST queries utilization of several popular Linked Data knowledge bases:Europeana (cultural heritage data), ORCID (research idetifiers), ONLD (organisation names), Library of Congress Authoritieshttp://europeanahttp://orcid.org/https://www.lib.ncsu.edu/ld/onld/http://id.loc.gov/descriptionshttp://id.loc.gov/descriptionshttp://id.loc.gov/descriptionsLinked Data & DBpediaNamed Entity Spotting, Recognition and LinkingLinked Data & DBpediaChoice of the Most Appropriate NER/NEL Tool Topic/Domain of used training data knowledge bases linked against Overall Performance (Precision,Recall,...) Support for/ Performance for specific Entity Categories (Companies, Artists, )?Linked Data & DBpediaGERBIL - Sustainable Entity Annotation Benchmarks unified experiment setups extensible for additional services and datasets experiment results as Linked Data Resources easy documentation improving reproducibilityLinked Data & DBpediaSmart Data Web (SDW)- BMWi funded project- Main goal: Data collection for the German industry using state of the art extraction and enrichment technologies- Use Cases:- Supply Chain Management- Market ResearchLinked Data & DBpediaSDW - Knowledge Graph- AKSW/KILT group responsible for Knowledge Graph- Curated sources:- DBpedia, PermID, GRID, etc.- Uncurated sources:- Twitter, news feeds, etc.- Data quality and persistence- RdfUnit: test driven data-debugging framework- LIMES: link discovery for the web of datahttp://aksw.org/Projects/RDFUnit.htmlhttp://aksw.org/Projects/LIMES.htmlLinked Data & DBpediaSDW - Use Case: Supply Chain Management Public KG is fed continuously with information from news channels, websites, etc. Corporate/Internal KG is connected to public KG Detection of potential problems in supply chain Need to check suppliers regularly Check for compliance Quality of products Who else is being supplied by this supplier? strike, natural disaster, insolvency, etc. ... Get information about potential problems as quickly as possibleLinked Data & DBpediaSDW - Use Case: Market Research Public KG is fed continuously with information from news channels, websites, etc. Finding more information about value chain: Potential leads/potential customers Competition Potential suppliers Price development on the market Customer satisfaction Connecting information from different data silo Finding out about new relationsLinked Data & DBpediaSDW - NLP- Extraction of company and traffic events using state of the art NLP and machine learning technologies- Demohttp://ta.dfki.de/http://ta.dfki.de/Linked Data & DBpediaSDW - Linked Data Common public knowledge graph (KG) Modular corporate ontology ETL pipeline for different datasets Fusion of different datasets and web data Unique URIs for all entities (through knowledge fusion) Store meta-data (e.g. provenance) for each RDF statement Use KG for NLP tasks (e.g. Named Entity Recognition, Disambiguation, etc.) Use KG for Enterprise SearchLinked Data & DBpediaUse Case: Introducing Linked Data into an established IT environment Task 1: Support transformation from relational database data to RDF from different sources Establish a transformation process from SQL data to RDF. Combine multiple DBs into a single RDF Task 2: Generate Software components capable of manipulating the underlying data by any user Based on the domain description (ontology) Task 3: Enable data quality checks using data constraints Applying an iterative process of developing an ontology, providing test results and correcting both data and ontology in a changing overall software environment.Linked Data & DBpediaTask 1: Transforming DB-data into RDF Using a Mapping Language like R2RML can generate a RDF (or Triple-) View of the relational data Databases like Virtuoso already have all tools needed for this mapping included and provide an automated mapping function if needed. Multiple approaches for automated creation of R2RML mappings exist creating Wrapper layer on to of the Relational Database (RDB) A different approach is the query rewriting SPARQL SQL, needs an ontology adaptation of the RDB schema Multiple mappings for additional databases https://www.w3.org/TR/r2rml/http://virtuoso.openlinksw.comLinked Data & DBpediaTask 2: ALIGNED Tool - Semantic Booster Given a domain description as input (ontology, DB schema, etc.) Optional mapping to an existing DB schema Automatic creation of a Booster Specification Automatic generation of a SQL based DB-specification. The data is propagated through the Booster web interfaceLinked Data & DBpediaSemantic Booster - Features Generating high-quality software components from precise models with metadata annotations. Using Model-Driven Engineering techniques to generate a complete information system. key feature: Allows for the smooth transition of data when the underlying database is updated. Enables domain experts to develop systems conformant with existing standards, datasets or systems.Linked Data & DBpediaBooster Web InterfaceLinked Data & DBpediaTask 3: Data Quality ValidationOption A: use the Form Validation methods of the Booster Web Interface to validate user input directly Useful for simple domains without many restrictionsOption B: Use RDF Unit to automatically generate data unit tests for more complex domainsLinked Data & DBpediaCreating data unit tests with RDF Unit Common restrictions (e.g. cardinalities, domain/range) are automatically transformed into unit tests by RDF U. More complex restriction can be inserted with new unit patterns enabling RDF Unit to generate the test itself Using custom SPARQL queries defining the pattern to look for by the user A failed unit test will produce error object in RDF serialization containing the necessary metadata to pinpoint the offending tripleLinked Data & DBpediaEvaluating RDF Unit results Test case results can be used to implement a (semi-) automatic process to improve the tested data or its generating software Or to provide statistics about the quality of a dataset ...Linked Data & DBpediaCompleting the tasksIn addition to validation processes, any number of tasks can be executed on the given data e.g. extracting NIF annotations on linguistic data Spotlight Entity recognition etc.Linked Data & DBpediaSummary WOD: applying the principles of the WWW to data Bridging disciplines and domains (by linking their data) Linked Data makes Smart Data out of Big Data Many Linked Data Standards can be reused for Big Data DBpedia can be used as for many domains and processes Linked Data can be applied in many different parts of commercial environmentsLinked Data & DBpediaQ&A

Recommended

View more >