Upload
peter-haase
View
1.968
Download
0
Embed Size (px)
DESCRIPTION
Slides from my keynote at the 1st International Workshop on Semantic Music and Media (SMAM2013) http://iswc2013.semanticweb.org/content/smam-2013
Citation preview
Mapping, Interlinking and Exposing MusicBrainz as Linked Data 1st Interna*onal Workshop on Seman*c Music and Media (SMAM2013) Sydney, Oct 21, 2013 Peter Haase
What this talk is about A Linked Data Perspec=ve
affiliation affiliation (previous)
participatesIn participatesIn
isAbout
publishedTo
builtWith
worksOn
EUCLID: EdUca=onal Curriculum for the usage of LinkedData
@euclid_project euclidproject euclidproject
http://www.euclid-project.eu
Other channels
eBook Course
EUCLID Scenario
Visualiza*on Module
Metadata Streaming providers
Physical Wrapper
Downloads
Dat
a ac
quis
ition
R2R Transf. LD Wrapper
Musical Content
App
licat
ion
Analysis & Mining Module
LD D
atas
et
Acc
ess
LD Wrapper
RDF/ XML
Integrated Dataset
Interlinking Cleansing Vocabulary Mapping
SPARQL Endpoint
Publishing
RDFa
Other content
MusicBrainz
• MusicBrainz is an open music encyclopedia that collects music metadata and makes it available to the public.
• MusicBrainz aims to be: • The ul=mate source of music informa=on by allowing anyone to contribute and releasing the data under open licenses. • The universal lingua franca for music by providing a reliable and unambiguous form of music iden*fica*on, enabling both people and machines to have meaningful conversa*ons about music.
• Like Wikipedia, MusicBrainz is maintained by a global community of users and we want everyone — including you — to par*cipate and contribute.
• MusicBrainz is operated by the MetaBrainz Founda*on, dedicated to keeping MusicBrainz free and open source.
Publishing Rela=onal Databases as RDF: W3C RDB2RDF
Task: Publish data from rela*onal DBMS as Linked Data
Approach: map from
rela*onal schema to seman*c vocabulary with R2RML
Publishing: two alterna*ves –
• Translate SPARQL into SQL on the fly
• Batch transform data into RDF, infer, index , integrate and provide SPARQL access in a triplestore
LD Dataset
Access
Integrated Data in
Triplestore
Interlinking Cleansing Vocabulary Mapping
SPARQL Endpoint
Publishing
Data acquisi*
on
R2RML Engine
Rela*onal DBMS
Publishing MusicBrainz
Music Ontology MusicBrainz DB R2RML
h"ps://wiki.musicbrainz.org/Next_Genera;on_Schema h"p://musicontology.com
Table Recording(gid, length) Ontology concept mo:recording R2RML Mapping
Concrete Example Mapping
MusicBrainz Next Gen Schema ar=st As pre-‐NGS, but
further a`ributes
ar=st_credit Allows joint credit
release_group Cf. ‘album’
versus:
release medium
• track • tracklist
• work • recording
https://wiki.musicbrainz.org/Next_Generation_Schema
Music Ontology OWL ontology with following core concepts (classes) and
rela*onships (proper*es):
Source: http://musicontology.com
R2RML Class Mapping Mapping tables to classes is ‘easy’: lb:Artist a rr:TriplesMap ; rr:logicalTable [rr:tableName "artist"] ; rr:subjectMap [rr:class mo:MusicArtist ; rr:template "http://musicbrainz.org/artist/{gid}#_"] ; rr:predicateObjectMap [rr:predicate mo:musicbrainz_guid ; rr:objectMap [rr:column "gid" ; rr:datatype xsd:string]] .
R2RML Property Mapping Mapping columns to proper*es can be easy: lb:artist_name a rr:TriplesMap ; rr:logicalTable [rr:sqlQuery """SELECT artist.gid, artist_name.name FROM artist INNER JOIN artist_name ON artist.name =
artist_name.id"""] ; rr:subjectMap [rr:template "http://musicbrainz.org/artist/{gid}#_"] ; rr:predicateObjectMap [rr:predicate foaf:name ; rr:objectMap [rr:column "name"]] .
NGS Advanced Rela=ons Major en**es (Ar*st, Release Group, Track, etc.) plus URL
are paired (l_ar*st_ar*st)
Each pairing of instances refers to a Link
Links have types (cf. RDF proper*es) and a`ributes
http://wiki.musicbrainz.org/Advanced_Relationship
R2RML Mapping Editor
Rela*onal Database
R2RML Mappings
R2RML Engine SPARQL Endpoint
R2RML: Expose data from relational DBMS as RDF / via SPARQL Endpoint
R2RML Edi*ng Made Easy! Hides vocabulary intricacies from end-‐user
Access to metadata about rela*onal databases
Preview of generated triples and SQL queries
Very expressive (Supports most of R2RML)
Problem: R2RML Mappings are hard to create
See our R2RML Mapping Editor in the ISWC Demo Session on Wednesday!
Scale MusicBrainz RDF derived via R2RML:
lb:artist_member a rr:TriplesMap ; rr:logicalTable [rr:sqlQuery """SELECT a1.gid, a2.gid AS band FROM artist a1 INNER JOIN l_artist_artist ON a1.id = l_artist_artist.entity0 INNER JOIN link ON l_artist_artist.link = link.id INNER JOIN link_type ON link_type = link_type.id INNER JOIN artist a2 on l_artist_artist.entity1 = a2.id WHERE link_type.gid='5be4c609-‐9afa-‐4ea0-‐910b-‐12ffb71e3821'"""] ; rr:subjectMap [rr:template "http://musicbrainz.org/artist/{gid}#_"] ; rr:predicateObjectMap [rr:predicate mo:member_of ; rr:objectMap [rr:template "http://musicbrainz.org/artist/{band}#_" ; rr:termType rr:IRI]] .
150M Triples
Some Sta=s=cs – RDF Dump
(Lead) Table Triples Time (s) area 59798 2 artist 36868228 423 dbpedia 172017 13 label 201832 3 medium 18069143 163 recording 11400354 209 release_group 3050818 31 release 9764887 151 track 75506495 794 work 1728955 20
156822527 1809
Informa=on Workbench PlaGorm for Linked Data Applica=ons
§ Open standards and technologies
• Seman*c Wiki based frontend (Using SMW Syntax)
• Suppor*ng W3C standards (OWL, RDF, SPARQL,, …)
• Community Edi*on (Open Source) + Enterprise Edi*on (Commercial)
§ Seman*cs-‐ & Linked Data-‐based integra=on of private and public data sources based on data providers
• Generic and specific providers for various data formats and sources
• Supports established mapping frameworks (e.g. R2RML, SILK, …)
• Named graphs for managing contexts and provenance
§ Intelligent Data Access and Analy=cs • Flexible self-‐service UI • Visualiza*on, explora*on,
dashboarding and repor*ng • Seman*c search
§ Collabora=on and knowledge management
• Cura*on & authoring • Collabora*ve workflows
Data storage and management plajorm
Reusable UI and data integra*on components
Customized applica*on solu*ons
External resources to reuse data and create mashups
Realiza=on within the Informa=on Workbench Architecture
The “MusicBrainz Explorer” Applica=on
Data
Data Providers
Ontology
Templates
Widgets
Music Ontology
R2RML
Template: …
Ontology as a “Structural Backbone” Resource page
RDF Data Graph
Ontology (RDFS/OWL)
The_Beatles Yesterday
mo:Ar=st
mo:Track
rdf:type rdf:type
Template:mo:Track
UI templates
Template:mo:Ar=st
Resource page
Defining data
structure
Defining UI
structure
Information Workbench: Browsing a Music Artist
Information Workbench: Visualization techniques
Naviga=on Through the Data
Source: http://musicbrainz.fluidops.net/resource/Analytical5
SPARQL visualization
SELECT ?release ((SUM(xsd:double(?duration/60000))) AS ?avg) WHERE { <http://dbpedia.org/resource/The_Beatles> foaf:made ?release . ?release mo:record ?record . ?record mo:track ?track . ?track mo:duration ?duration .} GROUP BY ?release ORDER BY DESC(?avg) LIMIT 10
SPARQL Query
Result set
Top ten The Beatles releases according to the sum of track durations in minutes
SPARQL visualization
Top ten The Beatles releases according to the sum of track durations in minutes Widget
Visualization: Bar chart
{{#widget: BarChart | query ='SELECT (COUNT(?Release) AS ?COUNT) ?label WHERE { <http://musicbrainz.org/artist/8538e728-‐ca0b-‐4321-‐b7e5-‐cff6565dd4c0#_> foaf:made ?Release. ?Release rdf:type mo:Release . ?Release dc:title ?label .} GROUP BY ?label ORDER BY DESC(?COUNT) LIMIT 20' | settings = 'Settings:barvertical_mb' | asynch = 'true' | input = 'label' | output = 'COUNT' | height = '300’}}
Information Workbench: SPARQL visualization Top ten The Beatles releases according to the sum of track durations in minutes Other visualiza*ons of the same result set …
Line chart:
Pie chart:
Automated Widget Suggestion
Bar chart
Line chart
Pie chart
1
2 3 Table
Pivot view
Select a suggested visualization Visualization automatically built
R2RML Mappings • h`ps://github.com/LinkedBrainz/MusicBrainz-‐R2RML
MusicBrainz RDF Dump • h`p://mbsandbox.org/~barry/ MusicBrainz Linked Data Demo system • h`p://musicbrainz.fluidops.net/ Informa*on Workbench • h`p://www.fluidops.com/informa*on-‐workbench/
Euclid Project • h`p://euclid-‐project.eu/
Try it out!
Acknowledgements The Euclid Project Barry Norton Michael Meier Andriy Nikolov Yves Raimond Kurt Jacobson Thomas Gaengler Juan Sequeda Simon Dixon
(in no par;cular order)
Contact Peter Haase fluid Opera*ons AG Altro`str. 31 Walldorf Germany +49 (0) 6227 358087-‐0 www.fluidops.com [email protected]
Thank you!