52
Un chem2 bio2rdf DBpedia live URI Burner Opencyc Diseasome FU-Berlin DNB GND Bio2RDF NDC Bio2RDF Mesh CKAN Freebase Linklion Organic Edunet Biomodels RDF Reactome RDF Disgenet IServe Linked TCGA RDF License Harvest RKB Explorer Lisbon Austrian Ski Racers RKB Explorer LAAS RKB Explorer Wiki JISC RKB Explorer Eprints RKB Explorer Curriculum RKB Explorer NSF RKB Explorer DBLP RKB Explorer ACM RKB Explorer Southampton RKB Explorer Deepblue RKB Explorer Irit RKB Explorer RAE2001 Geo nked Data Bio2RDF Ncbigene Bio2RDF DBSNP DBpedia PT DBpedia ES DBpedia CS Alpino RDF YAGO KUPKB Bio2RDF Taxon- concept Assets GNU Licenses DBpedia VIVO University of Florida StatusNet Mrblog Bio2RDF Dataset EUNIS Uniprot KB StatusNet Timttmy StatusNet Somsants StatusNet Ilikefreedom Drugbank FU-Berlin StatusNet Dtdns StatusNet Status.net StatusNet Fragdev Morelab StatusNet Macno DBpedia EU Bio2RDF Taxon Uniprot Metadata Linked Geo Data Project Wiki Enipedia Linked MDB Sider FU-Berlin DBpedia DE DBpedia EL DBpedia Lite Drug Interaction Knowledge Base StatusNet Qdnx Hellenic ire Brigade StatusNet Lydiastench Taxon- concept Occurences W3C StatusNet 1w6 Linked Life Data Semantic Web DogFood UMBEL StatusNet Ssweeny StatusNet Quitter StatusNet Jonkman StatusNet Thelovebug Bio2RDF OMIM Uniprot Taxonomy DBpedia NL StatusNet Russwurm DBpedia KO Dailymed FU-Berlin DBpedia IT Aves3D LT StatusNet Gomertronic StatusNet Progval Testee DBpedia JA StatusNet Cooleysekula Product DB StatusNet Postblue StatusNet Skilledtests StatusNet Fcac Clean Energy Data Reegle StatusNet Legadolibre Geo Names Bio2RDF GeneID GNI Archiveshub Linked Data Code Haus Ordnance Survey Linked Data NUTS Geo- vocab LOD ACBDLS FOAF- Profiles Net ble DBpedia FR h StatusNet Ourcoffs StatusNet Hackerposse LOV Bio2RDF Taxonomy StatusNet Morphtown StatusNet StatusNet chromic Geospecies linkedct StatusNet linuxwrangling Linked Open Data of Ecology StatusNet chickenkiller Taxon concept Functional Manipulation of Large Data Graphs David Hyland-Wood [email protected] @prototypo 1 June 2016

Functional manipulations of large data graphs 20160601

Embed Size (px)

Citation preview

Page 1: Functional manipulations of large data graphs 20160601

Linked Datasets as of August 2014

Uniprot

chem2bio2rdf

DBpedialive

URIBurner

Linguistics

Social Networking

Life Sciences

Cross-Domain

Government

User-Generated Content

Publications

Geographic

Media

Opencyc

DiseasomeFU-Berlin

DNBGND

Bio2RDFPubmed

Bio2RDFNDC

Bio2RDFMesh

CKAN

Freebase

LinklionOrganicEdunet

BiomodelsRDF

ReactomeRDF

Disgenet

IServe

LinkedTCGA

RDFLicense

EprintsHarvest

RKBExplorerLisbon

AustrianSki

Racers

RKBExplorer

LAAS

RKBExplorer

Wiki

ExplorerJISC

RKBExplorerEprints

RKBExplorer

CurriculumRKBExplorer

NSF

RKBExplorer

DBLP

RKBExplorer

ACM

RKBExplorer

Southampton

RKBExplorerDeepblue

RKBExplorer

Irit

RKBExplorerRAE2001

ExplorerBudapest

GeoLinkedData

Bio2RDFNcbigene

Bio2RDFDBSNP

Bio2RDFClinicaltrials

DBpediaPT

DBpediaES

DBpediaCS

AlpinoRDF

YAGO

KUPKB

Bio2RDF

Taxon-conceptAssets

GNULicenses

DBpedia

VIVOUniversityof Florida

StatusNetMrblog

Bio2RDFDataset

EUNIS

UniprotKB

StatusNetTimttmy

StatusNetSomsants

StatusNetIlikefreedom

DrugbankFU-Berlin

StatusNetDtdns

StatusNetStatus.net

StatusNetFragdev

Morelab

StatusNetMacno

DBpediaEU

Bio2RDFTaxon

UniprotMetadata

LinkedGeoData

ProjectWiki

Enipedia

LinkedMDB

SiderFU-Berlin

DBpediaDE

DBpediaEL

DBpediaLite

DrugInteractionKnowledge

BaseStatusNet

Qdnx

HellenicFire Brigade

StatusNetLydiastench

Taxon-concept

Occurences

W3C

StatusNet1w6

LinkedLifeData

Semantic WebDogFood

UMBEL

StatusNetSsweeny

StatusNetQuitter StatusNet

Jonkman

StatusNetThelovebug

Bio2RDFOMIM

UniprotTaxonomy

DBpediaNL

StatusNetRusswurm

DBpediaKO

DailymedFU-Berlin

DBpediaIT

Aves3D

NALT

StatusNetGomertronic

StatusNetProgval

Testee

DBpediaJA

StatusNetCooleysekula

ProductDB

StatusNetPostblue

StatusNetSkilledtests

StatusNetFcac

CleanEnergyData

Reegle

StatusNetLegadolibre

GeoNames

Bio2RDFGeneID

GNI

StatusNetSoucy

ArchiveshubLinkedData

CodeHaus

OrdnanceSurveyLinkedData

NUTSGeo-vocab

LODACBDLS

FOAF-Profiles

StatusNetSamnoble

DBpediaFR

StatusNetRainbowdash

StatusNetOurcoffs

StatusNetHackerposse

LOV

Bio2RDFTaxonomy

StatusNetMorphtown

StatusNetpiana

StatusNetchromic

Geospecies

linkedct

StatusNetlinuxwrangling

LinkedOpen Data

ofEcology

StatusNetchickenkiller

Taxonconcept

Functional Manipulation of Large Data Graphs

David Hyland-Wood [email protected]

@prototypo 1 June 2016

Page 2: Functional manipulations of large data graphs 20160601
Page 3: Functional manipulations of large data graphs 20160601
Page 4: Functional manipulations of large data graphs 20160601

Something Something elsea relationship

Page 5: Functional manipulations of large data graphs 20160601

UQ Universityis a

Page 6: Functional manipulations of large data graphs 20160601

UQ

The University of Queensland

label

Universityis a

Group of 8

affiliation

Page 7: Functional manipulations of large data graphs 20160601
Page 8: Functional manipulations of large data graphs 20160601
Page 9: Functional manipulations of large data graphs 20160601

We’ve Seen This Before

Page 10: Functional manipulations of large data graphs 20160601
Page 11: Functional manipulations of large data graphs 20160601

08 Oct 2007

Page 12: Functional manipulations of large data graphs 20160601
Page 13: Functional manipulations of large data graphs 20160601
Page 14: Functional manipulations of large data graphs 20160601
Page 15: Functional manipulations of large data graphs 20160601
Page 16: Functional manipulations of large data graphs 20160601
Page 17: Functional manipulations of large data graphs 20160601
Page 18: Functional manipulations of large data graphs 20160601
Page 19: Functional manipulations of large data graphs 20160601

The RDF Data Model

• Turtle • TriG • N-Triples • N-Quads • JSON-LD • RDFa • RDF/XML

Standard serialisation formats:

}Turtle family of RDF formats

Possibly lossy alternatives:

• CSV • ODATA • etc

Page 20: Functional manipulations of large data graphs 20160601

$ curl http://dbpedia.org/page/University_of_Queensland

$ curl http://dbpedia.org/data/University_of_Queensland

$ curl http://dbpedia.org/data/University_of_Queensland.n3 > University_of_Queensland.n3

https://en.wikipedia.org/wiki/University_of_Queensland

HTML

RDF in XML (Yuck!)

Many formats, e.g. sane RDF, ODATA, Microdata, JSON…

Page 21: Functional manipulations of large data graphs 20160601
Page 22: Functional manipulations of large data graphs 20160601
Page 23: Functional manipulations of large data graphs 20160601
Page 24: Functional manipulations of large data graphs 20160601
Page 25: Functional manipulations of large data graphs 20160601
Page 26: Functional manipulations of large data graphs 20160601

UQ

The University of Queensland

label

affiliationGroup of 8

34228

number of undergraduate students

48771

number of students

Page 27: Functional manipulations of large data graphs 20160601
Page 28: Functional manipulations of large data graphs 20160601

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

Page 29: Functional manipulations of large data graphs 20160601

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/>select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

Page 30: Functional manipulations of large data graphs 20160601

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergradswhere { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

Page 31: Functional manipulations of large data graphs 20160601

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> .?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

Page 32: Functional manipulations of large data graphs 20160601

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name .OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

Page 33: Functional manipulations of large data graphs 20160601

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students}OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads}FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

Page 34: Functional manipulations of large data graphs 20160601

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" )} ORDER BY DESC (?students)

Page 35: Functional manipulations of large data graphs 20160601

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

Page 36: Functional manipulations of large data graphs 20160601
Page 37: Functional manipulations of large data graphs 20160601

OpenStreetMap

Wikimedia Commons

DBpedia

US EPA RCRA

US EPA FRS

ABT Associates

Page 38: Functional manipulations of large data graphs 20160601
Page 39: Functional manipulations of large data graphs 20160601
Page 40: Functional manipulations of large data graphs 20160601
Page 41: Functional manipulations of large data graphs 20160601
Page 42: Functional manipulations of large data graphs 20160601

UQ

The University of Queensland

label

ANU

Australian National University

label

Monash

affiliationUMelbourne

affiliation

UNSW

affiliation

USydney

affiliation

UAdelaideaffiliation

Go8

memberOf memberOf

memberOfmemberOf

memberOf

memberOf

memberOf

University of Melbourne

label

Monash University

label

University of Adelaide

label

Group of 8label

University of Sydney

label

Universityof NSW

label

Page 43: Functional manipulations of large data graphs 20160601

UQ

The University of Queensland

label

ANU

Australian National University

label

Monash

affiliation

UMelbourne

affiliation

UNSW

affiliation

USydney

affiliation

UAdelaide

affiliation

Page 44: Functional manipulations of large data graphs 20160601
Page 45: Functional manipulations of large data graphs 20160601

Graphs in Scalaval graph: Graph[String, String] = Graph(vertexRDD, edgeRDD)

// Create a subgraph based on the vertices connected // by an "affiliation" property. val affiliationRelatedSubgraph = graph.subgraph(t => t.attr == "http://dbpedia.org/ontology/affiliation")

// Find connected components of affiliationRelatedSubgraph. val ccGraph = affiliationRelatedSubgraph.connectedComponents()

Page 46: Functional manipulations of large data graphs 20160601

Graphs in Scala// Create a hashmap of componentLists. affiliationRelatedSubgraph.vertices.leftJoin (ccGraph.vertices) { case (id, u, comp) => comp.get }.foreach { case (id, startingNode) => { if (!(componentLists.contains(startingNode))) { componentLists(startingNode) = new ListBuffer[VertexId] } componentLists(startingNode) += id } }

Page 47: Functional manipulations of large data graphs 20160601

Graphs in Scala// Output a report on the connected components. println("------ connected components in related triples ------\n") for ((component, componentList) <- componentLists){ if (componentList.size > 1) { for(c <- componentList) { println(labelMap(c)); } println("--------------------------") } }

Page 48: Functional manipulations of large data graphs 20160601

------ connected components in related triples ------

Australian National University University of Sydney University of Adelaide University of New South Wales -------------------------- The University of Queensland University of Melbourne Monash University --------------------------

Page 49: Functional manipulations of large data graphs 20160601

Resources

• Slides: http://w3id.org/people/prototypo/talks/UQ-DKE-20160601/slides

• Code: http://w3id.org/people/prototypo/talks/UQ-DKE-20160601/code

Page 50: Functional manipulations of large data graphs 20160601

Resources

• Callimachus: http://callimachusproject.org

• Apache Spark: http://spark.apache.org

• GraphX Programming Guide: http://spark.apache.org/docs/latest/graphx-programming-guide.html

Page 51: Functional manipulations of large data graphs 20160601

Attributions

• Linking Open Data cloud diagram by Richard Cyganiak and Anja Jentzsch, used under a CC license: http://lod-cloud.net/

Page 52: Functional manipulations of large data graphs 20160601

This work is Copyright © 2015 David Hyland-Wood It is licensed under the Creative Commons Attribution 3.0 Unported LicenseFull details at: http://creativecommons.org/licenses/by/3.0/

You are free:

to Share — to copy, distribute and transmit the work

to Remix — to adapt the work

Under the following conditions:

Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).

Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.