Upload
david-wood
View
195
Download
0
Embed Size (px)
Citation preview
Linked Datasets as of August 2014
Uniprot
chem2bio2rdf
DBpedialive
URIBurner
Linguistics
Social Networking
Life Sciences
Cross-Domain
Government
User-Generated Content
Publications
Geographic
Media
Opencyc
DiseasomeFU-Berlin
DNBGND
Bio2RDFPubmed
Bio2RDFNDC
Bio2RDFMesh
CKAN
Freebase
LinklionOrganicEdunet
BiomodelsRDF
ReactomeRDF
Disgenet
IServe
LinkedTCGA
RDFLicense
EprintsHarvest
RKBExplorerLisbon
AustrianSki
Racers
RKBExplorer
LAAS
RKBExplorer
Wiki
ExplorerJISC
RKBExplorerEprints
RKBExplorer
CurriculumRKBExplorer
NSF
RKBExplorer
DBLP
RKBExplorer
ACM
RKBExplorer
Southampton
RKBExplorerDeepblue
RKBExplorer
Irit
RKBExplorerRAE2001
ExplorerBudapest
GeoLinkedData
Bio2RDFNcbigene
Bio2RDFDBSNP
Bio2RDFClinicaltrials
DBpediaPT
DBpediaES
DBpediaCS
AlpinoRDF
YAGO
KUPKB
Bio2RDF
Taxon-conceptAssets
GNULicenses
DBpedia
VIVOUniversityof Florida
StatusNetMrblog
Bio2RDFDataset
EUNIS
UniprotKB
StatusNetTimttmy
StatusNetSomsants
StatusNetIlikefreedom
DrugbankFU-Berlin
StatusNetDtdns
StatusNetStatus.net
StatusNetFragdev
Morelab
StatusNetMacno
DBpediaEU
Bio2RDFTaxon
UniprotMetadata
LinkedGeoData
ProjectWiki
Enipedia
LinkedMDB
SiderFU-Berlin
DBpediaDE
DBpediaEL
DBpediaLite
DrugInteractionKnowledge
BaseStatusNet
Qdnx
HellenicFire Brigade
StatusNetLydiastench
Taxon-concept
Occurences
W3C
StatusNet1w6
LinkedLifeData
Semantic WebDogFood
UMBEL
StatusNetSsweeny
StatusNetQuitter StatusNet
Jonkman
StatusNetThelovebug
Bio2RDFOMIM
UniprotTaxonomy
DBpediaNL
StatusNetRusswurm
DBpediaKO
DailymedFU-Berlin
DBpediaIT
Aves3D
NALT
StatusNetGomertronic
StatusNetProgval
Testee
DBpediaJA
StatusNetCooleysekula
ProductDB
StatusNetPostblue
StatusNetSkilledtests
StatusNetFcac
CleanEnergyData
Reegle
StatusNetLegadolibre
GeoNames
Bio2RDFGeneID
GNI
StatusNetSoucy
ArchiveshubLinkedData
CodeHaus
OrdnanceSurveyLinkedData
NUTSGeo-vocab
LODACBDLS
FOAF-Profiles
StatusNetSamnoble
DBpediaFR
StatusNetRainbowdash
StatusNetOurcoffs
StatusNetHackerposse
LOV
Bio2RDFTaxonomy
StatusNetMorphtown
StatusNetpiana
StatusNetchromic
Geospecies
linkedct
StatusNetlinuxwrangling
LinkedOpen Data
ofEcology
StatusNetchickenkiller
Taxonconcept
Functional Manipulation of Large Data Graphs
David Hyland-Wood [email protected]
@prototypo 1 June 2016
Something Something elsea relationship
UQ Universityis a
UQ
The University of Queensland
label
Universityis a
Group of 8
affiliation
We’ve Seen This Before
08 Oct 2007
The RDF Data Model
• Turtle • TriG • N-Triples • N-Quads • JSON-LD • RDFa • RDF/XML
Standard serialisation formats:
}Turtle family of RDF formats
Possibly lossy alternatives:
• CSV • ODATA • etc
$ curl http://dbpedia.org/page/University_of_Queensland
$ curl http://dbpedia.org/data/University_of_Queensland
$ curl http://dbpedia.org/data/University_of_Queensland.n3 > University_of_Queensland.n3
https://en.wikipedia.org/wiki/University_of_Queensland
HTML
RDF in XML (Yuck!)
Many formats, e.g. sane RDF, ODATA, Microdata, JSON…
UQ
The University of Queensland
label
affiliationGroup of 8
34228
number of undergraduate students
48771
number of students
# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)
# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/>select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)
# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergradswhere { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)
# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> .?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)
# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name .OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)
# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students}OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads}FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)
# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" )} ORDER BY DESC (?students)
# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)
OpenStreetMap
Wikimedia Commons
DBpedia
US EPA RCRA
US EPA FRS
ABT Associates
UQ
The University of Queensland
label
ANU
Australian National University
label
Monash
affiliationUMelbourne
affiliation
UNSW
affiliation
USydney
affiliation
UAdelaideaffiliation
Go8
memberOf memberOf
memberOfmemberOf
memberOf
memberOf
memberOf
University of Melbourne
label
Monash University
label
University of Adelaide
label
Group of 8label
University of Sydney
label
Universityof NSW
label
UQ
The University of Queensland
label
ANU
Australian National University
label
Monash
affiliation
UMelbourne
affiliation
UNSW
affiliation
USydney
affiliation
UAdelaide
affiliation
Graphs in Scalaval graph: Graph[String, String] = Graph(vertexRDD, edgeRDD)
// Create a subgraph based on the vertices connected // by an "affiliation" property. val affiliationRelatedSubgraph = graph.subgraph(t => t.attr == "http://dbpedia.org/ontology/affiliation")
// Find connected components of affiliationRelatedSubgraph. val ccGraph = affiliationRelatedSubgraph.connectedComponents()
Graphs in Scala// Create a hashmap of componentLists. affiliationRelatedSubgraph.vertices.leftJoin (ccGraph.vertices) { case (id, u, comp) => comp.get }.foreach { case (id, startingNode) => { if (!(componentLists.contains(startingNode))) { componentLists(startingNode) = new ListBuffer[VertexId] } componentLists(startingNode) += id } }
Graphs in Scala// Output a report on the connected components. println("------ connected components in related triples ------\n") for ((component, componentList) <- componentLists){ if (componentList.size > 1) { for(c <- componentList) { println(labelMap(c)); } println("--------------------------") } }
------ connected components in related triples ------
Australian National University University of Sydney University of Adelaide University of New South Wales -------------------------- The University of Queensland University of Melbourne Monash University --------------------------
Resources
• Slides: http://w3id.org/people/prototypo/talks/UQ-DKE-20160601/slides
• Code: http://w3id.org/people/prototypo/talks/UQ-DKE-20160601/code
Resources
• Callimachus: http://callimachusproject.org
• Apache Spark: http://spark.apache.org
• GraphX Programming Guide: http://spark.apache.org/docs/latest/graphx-programming-guide.html
Attributions
• Linking Open Data cloud diagram by Richard Cyganiak and Anja Jentzsch, used under a CC license: http://lod-cloud.net/
This work is Copyright © 2015 David Hyland-Wood It is licensed under the Creative Commons Attribution 3.0 Unported LicenseFull details at: http://creativecommons.org/licenses/by/3.0/
You are free:
to Share — to copy, distribute and transmit the work
to Remix — to adapt the work
Under the following conditions:
Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).
Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.