Using Neo4j for exploring the research graph connectionsmade by RD-Switchboard
Dr. Amir Aryani (ANDS), and Dr. Jingbo Wang (NCI)
October 2016
Agenda
• Background: RD-Switchboard & Research Graph
• Neo4j: Queries
• NCI: Graph connections made by RD-Switchboard using NCI’s metadata
• Q & A
Data Description Registry Interoperability (DDRI) Working Group
Research Data Alliance
Goal: enabling cross-platform discovery between research data infrastructures
Precipitous Growth RDA Launch / First Plenary March 2013
RDA Second Plenary September 2014
RDA Third Plenary
March 2014
RDA Fourth Plenary
September 2014
RDA Fifth Plenary
March 2015
Amsterdam, Netherlands
Washington, DC, USA
Dublin, Ireland Gothenburg, Sweden
240 participants
First Working Groups and Interest Groups
380 participants from 22 countries
First “neutral space” community meeting (Data Citation Summit)
First Organizational Partner Meet-up
First BOFs
497 Participants from 32 countries
First Organizational Assembly
6 co-located events
14 BOF, 12 Working Groups, 22 Interest Groups
San Diego, CA, USA
550 Participants from 40 countries
1st RDA Deliverables presented
Organizational Assembly and first OAB / Council meeting
10 co-located events
11 BOF, 14 Working Groups, 36 Interest Groups
383 Participants from 30 countries
2nd RDA Deliverables presented
Organizational Assembly / Council meetings
1st Adoption Day & Large scale data projects meeting
10 BOF, 10 Working Groups, 20 Interest Groups; 10 joint Sessions; 4 thematic Plenary Sessions
Research Data Alliance
June 2016: close to 4,200 members from 110 countries
DDRI WG Approach
Connecting datasets on the basis of co-authorship or other collaboration models such as joint funding and grants.
Research Data Alliance
https://researchdata.ands.org.au/idmm-immunome-database-for-marsupials-and-monotremes/11139
http://dx.doi.org/10.1371/journal.pone.0079092
One of the 105 articles …
doi:10.5061/dryad.4qq0v
Authors: Wong ESW, Nichol S, Warren WC, Belov K
Dryad Dataset
http://datadryad.org/resource/doi:10.5061/dryad.4qq0v
More info
http://researchgraph.org/schema
https://github.com/researchgraph/schema
https://github.com/rd-switchboard/Inference
Neo4j Queries1. Find a Dataset2. Fina a Publication3. Find a Grant4. Find a Researcher5. Find links to ORCID6. Find datasets that have DOI7. Find DOIs using prefix8. Find highly connected datasets9. Connections with multiple degrees of separation10. Find shortest path between two researchers
Find a Dataset
match (n:dataset) where n.doi='10.5524/100166' return n
match (n:dataset) where n.title='The genome of the Australian dragon lizard Pogona vitticeps' return n
Find a Publicationmatch (n:publication) where n.doi='10.5170/CERN-2014-008.181'
return n
match (n:cern:publication) where n.title='LHC Results - Highlights' return n
Find a Grantmatch (n:grant) where n.purl='purl.org/au-research/grants/
arc/LP0991658' return n
match (n:grant) where n.title='Hyper-accumulations of monosulfidic sediments' return n
Find a Researchermatch (n:researcher) where
n.scopus_id='37071260700' return n
match (n:researcher) where n.orcid='0000-0002-7875-2902' return n
match (n:researcher) where n.last_name='Rajiah' and n.first_name='Kingston' return n
Find links to ORCID
match (n:dataset:dryad)- -(o:orcid) return count(n)
match (n:dataset:ands)- -(o:orcid) where n.ands_group='The University of Sydney' return n limit 10
Find Highly Connected Datasets
match (n:ands:dataset)--(x) return n.key, n.title, count(x) order by count (x) DESC limit 25
Connections with Multiple Degrees of Separation
match (n:ands:dataset)-[*1..3]-(d:dryad:dataset) return n.title, d.key limit 25
Find Shortest Path Between Two Researchers
MATCH p=shortestPath( (d1:dryad:dataset {doi: '10.5061/dryad.4qq0v'})-[*]-(d2:ands:dataset {doi:'10.1186/1471-2172-12-48'})
) RETURN p
nci.org.au
Research Data Collections 10PB+
CMIP5 3PB
Astronomy (Optical) 200 TB
WaterOcean1.5 PB
Atmosphere2.4 PB
Earth Observ.
2 PB
MarineVideos 10 TB
Geophysics 300 TB
Weather340 TB
© National Computational Infrastructure 2015
nci.org.aunci.org.au
Each individual catalogue record describes a linear relationship among entities:
© National Computational Infrastructure 2015
Current research record status
Researcher Ause
Data 1 Supported by Grant a Paper I, IIGenerate
Researcher B Data 1 Supported by Grant b Paper II, IIIuse Generate
Researcher B Data 2 Supported by Grant b Paper IV
use Generate
nci.org.aunci.org.au
Relational database is converted and presented in graph database using Research Data Switchboard (RD-Switchboard):
© National Computational Infrastructure 2015
Graph database structure
Researcher A
use Supported by
Grant a Paper IGenerate
Researcher B
Data 1
Supported by Grant b Paper IIIuse Generate
Data 2
Supported by
Paper IV
use Generate
Paper IIGenerate
nci.org.aunci.org.au
NCI GeoNetwork architecture http://geonetwork.nci.org.au
© National Computational Infrastructure 2015
Catalogue system infrastructure
nci.org.au
RD-Switchboard benefits so far…
© National Computational Infrastructure 2015
• Identify the missing critical metadata entries;
• Identify errors in the catalogue entries;
• Provide analytical view of how research data has been used so far (high-level of utilisation or underutilised?);
• Evaluate the impact of the datasets, researchers and institutes;
• Encourage the usage of URI, DOI and ORCID, etc.
nci.org.au
researcher 2researcher 1 paper 2paper 1 dataset
Any conflict of interest?
Possible collaboration?
data2 data3
data4 data5
eResearch BOF
Tuesday 11 October 2016 / 16:35
BoF: Research Graph: Connecting Researchers, Research Data, Publications and Grants using the Graph Technology