29
Mapping, Interlinking and Exposing MusicBrainz as Linked Data 1st Interna*onal Workshop on Seman*c Music and Media (SMAM2013) Sydney, Oct 21, 2013 Peter Haase

Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Embed Size (px)

DESCRIPTION

Slides from my keynote at the 1st International Workshop on Semantic Music and Media (SMAM2013) http://iswc2013.semanticweb.org/content/smam-2013

Citation preview

Page 1: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Mapping,  Interlinking  and  Exposing  MusicBrainz  as  Linked  Data  1st  Interna*onal  Workshop  on    Seman*c  Music  and  Media  (SMAM2013)  Sydney,  Oct  21,  2013  Peter  Haase  

Page 2: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

What  this  talk  is  about  A  Linked  Data  Perspec=ve  

affiliation affiliation (previous)

participatesIn participatesIn

isAbout

publishedTo

builtWith

worksOn

Page 3: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

EUCLID:  EdUca=onal  Curriculum  for  the  usage  of  LinkedData    

@euclid_project euclidproject euclidproject

http://www.euclid-project.eu

Other channels

eBook Course

Page 4: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

EUCLID  Scenario  

Visualiza*on  Module  

Metadata Streaming providers

Physical  Wrapper  

Downloads

Dat

a ac

quis

ition

R2R  Transf.  LD  Wrapper  

Musical Content

App

licat

ion

Analysis  &  Mining  Module  

LD D

atas

et

Acc

ess

LD  Wrapper  

RDF/  XML  

Integrated  Dataset  

Interlinking   Cleansing  Vocabulary  Mapping  

SPARQL Endpoint

Publishing

RDFa  

Other content

Page 5: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

MusicBrainz  

•  MusicBrainz  is  an  open  music  encyclopedia  that  collects  music  metadata  and  makes  it  available  to  the  public.  

•  MusicBrainz  aims  to  be:  •   The  ul=mate  source  of  music  informa=on  by  allowing  anyone  to  contribute  and  releasing  the  data  under  open  licenses.  •   The  universal  lingua  franca  for  music  by  providing  a  reliable  and  unambiguous  form  of  music  iden*fica*on,  enabling  both  people  and  machines  to  have  meaningful  conversa*ons  about  music.  

•  Like  Wikipedia,  MusicBrainz  is  maintained  by  a  global  community  of  users  and  we  want  everyone  —  including  you  —  to  par*cipate  and  contribute.  

•  MusicBrainz  is  operated  by  the  MetaBrainz  Founda*on,  dedicated  to  keeping  MusicBrainz  free  and  open  source.  

Page 6: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Publishing  Rela=onal  Databases  as  RDF:  W3C  RDB2RDF  

Task:  Publish  data  from  rela*onal  DBMS  as    Linked  Data  

 Approach:  map  from  

rela*onal  schema  to  seman*c  vocabulary  with  R2RML  

 Publishing:  two  alterna*ves  –  

•  Translate  SPARQL  into  SQL  on  the  fly  

•  Batch  transform  data  into  RDF,  infer,  index  ,  integrate  and  provide  SPARQL  access  in  a  triplestore  

LD  Dataset  

Access  

Integrated  Data  in  

Triplestore  

Interlinking   Cleansing  Vocabulary  Mapping  

SPARQL  Endpoint  

Publishing  

Data  acquisi*

on  

R2RML  Engine  

Rela*onal  DBMS  

Page 7: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Publishing  MusicBrainz  

Music  Ontology  MusicBrainz  DB     R2RML  

h"ps://wiki.musicbrainz.org/Next_Genera;on_Schema    h"p://musicontology.com  

Table  Recording(gid,  length)   Ontology  concept  mo:recording    R2RML  Mapping  

Concrete  Example  Mapping  

Page 8: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

MusicBrainz  Next  Gen  Schema  ar=st    As  pre-­‐NGS,  but      

       further  a`ributes  

ar=st_credit    Allows  joint  credit  

release_group    Cf.  ‘album’    

       versus:  

release  medium    

•  track  •  tracklist  

•  work  •  recording  

https://wiki.musicbrainz.org/Next_Generation_Schema

Page 9: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Music  Ontology  OWL  ontology  with  following  core  concepts  (classes)  and  

rela*onships  (proper*es):  

Source: http://musicontology.com

Page 10: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

R2RML  Class  Mapping  Mapping  tables  to  classes  is  ‘easy’:    lb:Artist  a  rr:TriplesMap  ;      rr:logicalTable  [rr:tableName  "artist"]  ;      rr:subjectMap            [rr:class  mo:MusicArtist  ;            rr:template                        "http://musicbrainz.org/artist/{gid}#_"]  ;      rr:predicateObjectMap            [rr:predicate  mo:musicbrainz_guid  ;            rr:objectMap  [rr:column  "gid"  ;                                          rr:datatype  xsd:string]]  .    

Page 11: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

R2RML  Property  Mapping  Mapping  columns  to  proper*es  can  be  easy:    lb:artist_name  a  rr:TriplesMap  ;      rr:logicalTable  [rr:sqlQuery            """SELECT  artist.gid,  artist_name.name                    FROM  artist                    INNER  JOIN  artist_name  ON  artist.name  =  

artist_name.id"""]  ;      rr:subjectMap  [rr:template                                            "http://musicbrainz.org/artist/{gid}#_"]  ;      rr:predicateObjectMap            [rr:predicate  foaf:name  ;            rr:objectMap  [rr:column  "name"]]  .  

Page 12: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

NGS  Advanced  Rela=ons  Major  en**es  (Ar*st,  Release  Group,  Track,  etc.)  plus  URL  

are  paired    (l_ar*st_ar*st)  

Each  pairing    of  instances    refers  to  a  Link  

Links  have  types      (cf.  RDF  proper*es)    and  a`ributes  

     

http://wiki.musicbrainz.org/Advanced_Relationship

Page 13: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

R2RML  Mapping  Editor  

Rela*onal  Database  

R2RML  Mappings  

R2RML  Engine   SPARQL  Endpoint  

R2RML: Expose data from relational DBMS as RDF / via SPARQL Endpoint

R2RML  Edi*ng  Made  Easy!  Hides  vocabulary  intricacies  from  end-­‐user  

Access  to  metadata  about  rela*onal  databases  

Preview  of  generated  triples  and  SQL  queries  

Very  expressive  (Supports  most  of  R2RML)  

Problem: R2RML Mappings are hard to create

See our R2RML Mapping Editor in the ISWC Demo Session on Wednesday!

Page 14: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Scale  MusicBrainz  RDF  derived  via  R2RML:  

lb:artist_member  a  rr:TriplesMap  ;      rr:logicalTable  [rr:sqlQuery          """SELECT  a1.gid,  a2.gid  AS  band                FROM  artist  a1                    INNER  JOIN  l_artist_artist  ON  a1.id  =  l_artist_artist.entity0                      INNER  JOIN  link  ON  l_artist_artist.link  =  link.id                      INNER  JOIN  link_type  ON  link_type  =  link_type.id                      INNER  JOIN  artist  a2  on  l_artist_artist.entity1  =  a2.id                  WHERE  link_type.gid='5be4c609-­‐9afa-­‐4ea0-­‐910b-­‐12ffb71e3821'"""]  ;      rr:subjectMap  [rr:template  "http://musicbrainz.org/artist/{gid}#_"]  ;      rr:predicateObjectMap            [rr:predicate  mo:member_of  ;            rr:objectMap  [rr:template  "http://musicbrainz.org/artist/{band}#_"  ;                                        rr:termType  rr:IRI]]  .  

150M Triples

Page 15: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Some  Sta=s=cs  –  RDF  Dump  

(Lead) Table Triples Time (s) area 59798 2 artist 36868228 423 dbpedia 172017 13 label 201832 3 medium 18069143 163 recording 11400354 209 release_group 3050818 31 release 9764887 151 track 75506495 794 work 1728955 20

156822527 1809

Page 16: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Informa=on  Workbench  PlaGorm  for  Linked  Data  Applica=ons  

 §  Open  standards  and  technologies  

•  Seman*c  Wiki  based  frontend    (Using  SMW  Syntax)    

•  Suppor*ng  W3C  standards  (OWL,  RDF,  SPARQL,,  …)  

•  Community  Edi*on  (Open  Source)  +  Enterprise  Edi*on  (Commercial)  

§  Seman*cs-­‐  &  Linked  Data-­‐based  integra=on  of  private  and  public  data  sources  based  on  data  providers  

•  Generic  and  specific  providers  for  various  data  formats  and  sources  

•  Supports  established  mapping  frameworks  (e.g.  R2RML,  SILK,  …)  

•  Named  graphs  for  managing  contexts  and  provenance  

§  Intelligent  Data  Access  and  Analy=cs  •  Flexible  self-­‐service  UI  •  Visualiza*on,  explora*on,  

dashboarding  and  repor*ng  •  Seman*c  search  

§  Collabora=on  and  knowledge  management  

•  Cura*on  &  authoring  •  Collabora*ve  workflows  

Page 17: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Data  storage  and  management  plajorm  

Reusable  UI  and  data  integra*on  components    

Customized  applica*on  solu*ons  

External  resources  to  reuse  data  and  create  mashups  

Realiza=on  within  the    Informa=on  Workbench  Architecture  

Page 18: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

The  “MusicBrainz  Explorer”  Applica=on  

Data

Data Providers

Ontology

Templates

Widgets

Music Ontology

R2RML

Page 19: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Template:  …        

Ontology  as  a  “Structural  Backbone”  Resource  page        

RDF  Data  Graph  

Ontology  (RDFS/OWL)  

The_Beatles  Yesterday  

mo:Ar=st  

mo:Track  

rdf:type  rdf:type  

Template:mo:Track        

UI  templates  

Template:mo:Ar=st        

Resource  page        

Defining  data  

structure  

Defining  UI  

structure  

Page 20: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Information  Workbench:    Browsing  a  Music  Artist  

Page 21: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Information  Workbench:    Visualization  techniques  

Page 22: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Naviga=on  Through  the  Data  

Source: http://musicbrainz.fluidops.net/resource/Analytical5

Page 23: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

SPARQL  visualization  

SELECT  ?release                  ((SUM(xsd:double(?duration/60000)))  AS  ?avg)    WHERE  {      <http://dbpedia.org/resource/The_Beatles>                    foaf:made  ?release  .    ?release  mo:record  ?record  .    ?record  mo:track  ?track  .    ?track  mo:duration  ?duration  .}    GROUP  BY  ?release  ORDER  BY  DESC(?avg)  LIMIT  10  

SPARQL  Query    

Result  set  

Top ten The Beatles releases according to the sum of track durations in minutes

Page 24: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

SPARQL  visualization  

Top ten The Beatles releases according to the sum of track durations in minutes Widget  

Visualization:  Bar  chart  

{{#widget:  BarChart  |  query  ='SELECT  (COUNT(?Release)  AS  ?COUNT)  ?label  WHERE  {        <http://musicbrainz.org/artist/8538e728-­‐ca0b-­‐4321-­‐b7e5-­‐cff6565dd4c0#_>  foaf:made  ?Release.      ?Release  rdf:type  mo:Release  .    ?Release  dc:title  ?label  .}  GROUP  BY  ?label  ORDER  BY  DESC(?COUNT)  LIMIT  20'  |  settings  =  'Settings:barvertical_mb'    |  asynch  =  'true'  |  input  =  'label'  |  output  =  'COUNT'  |  height  =  '300’}}  

Page 25: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Information  Workbench:    SPARQL  visualization  Top ten The Beatles releases according to the sum of track durations in minutes Other  visualiza*ons  of  the  same  result  set  …  

Line  chart:  

Pie  chart:  

Page 26: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Automated  Widget  Suggestion  

Bar chart

Line chart

Pie chart

1  

2   3  Table

Pivot view

Select a suggested visualization Visualization automatically built

Page 27: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

R2RML  Mappings  •  h`ps://github.com/LinkedBrainz/MusicBrainz-­‐R2RML  

MusicBrainz  RDF  Dump  •  h`p://mbsandbox.org/~barry/  MusicBrainz  Linked  Data  Demo  system  •  h`p://musicbrainz.fluidops.net/  Informa*on  Workbench  •  h`p://www.fluidops.com/informa*on-­‐workbench/  

Euclid  Project  •  h`p://euclid-­‐project.eu/  

   

Try  it  out!  

Page 28: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Acknowledgements  The  Euclid  Project  Barry  Norton    Michael  Meier  Andriy  Nikolov  Yves  Raimond  Kurt  Jacobson  Thomas  Gaengler  Juan  Sequeda  Simon  Dixon  

 (in  no  par;cular  order)  

 

Page 29: Mapping, Interlinking and Exposing MusicBrainz as Linked Data

Contact    Peter  Haase  fluid  Opera*ons  AG  Altro`str.  31  Walldorf  Germany    +49  (0)  6227  358087-­‐0  www.fluidops.com  [email protected]    

Thank  you!