21
Enabling Linked Data in Open PHACTS by Hugh Williams Professional Services, OpenLink Software

Enabling Linked Data in Open PHACTS...LOD2 –Linked Open Data large scale integration project LDBC - Transparency and Relevance for Graph DB, RDF performance GeoKnow - GeoData is

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Enabling Linked Data in Open PHACTS...LOD2 –Linked Open Data large scale integration project LDBC - Transparency and Relevance for Graph DB, RDF performance GeoKnow - GeoData is

Enabling Linked Data

in

Open PHACTS

by Hugh WilliamsProfessional Services, OpenLink Software

Page 2: Enabling Linked Data in Open PHACTS...LOD2 –Linked Open Data large scale integration project LDBC - Transparency and Relevance for Graph DB, RDF performance GeoKnow - GeoData is

ODBC, JDBC, ADO.NET, and OLE-DB compliant Data Access Drivers for Oracle,

SQL Server, Informix, Ingres, Sybase, Progress, MySQL, and PostgreSQL

High-Performance & Scalable Multi-Model (Relational & Graph) Database

Technology

Data Integration Middleware (Data Virtualization Technology across a wide variety of

Protocols & Formats)

Web Application Server Technology

Linked Data Deployment & Management

Socially-enhanced Distributed Collaborative Applications Platforms (Weblogs, Wikis,

Feed Aggregation and Syndication, Web File Systems, Discussion Forums, etc.)

Identity Management.

OpenLink Company Overview

OpenLink Software is a privately-held company founded in 1992 by its President &

CEO, Kingsley Idehen. The company is an industry acclaimed technology innovator

in the following areas:

Page 3: Enabling Linked Data in Open PHACTS...LOD2 –Linked Open Data large scale integration project LDBC - Transparency and Relevance for Graph DB, RDF performance GeoKnow - GeoData is

Virtuoso Product Architecture Diagram

A high-performance,

scalable, secure, and

operating-system-

independent server designed

to handle contemporary

challenges associated with

data access, data integration,

and data management.

Page 4: Enabling Linked Data in Open PHACTS...LOD2 –Linked Open Data large scale integration project LDBC - Transparency and Relevance for Graph DB, RDF performance GeoKnow - GeoData is

Original contributor to and supporter of the Linked Open

Data movement

Page 5: Enabling Linked Data in Open PHACTS...LOD2 –Linked Open Data large scale integration project LDBC - Transparency and Relevance for Graph DB, RDF performance GeoKnow - GeoData is

Bio2RDF

NueroCommons

UniProt

ChEMBL

Others …

Other Life Science Linked Data projects hosted in Virtuoso

Page 6: Enabling Linked Data in Open PHACTS...LOD2 –Linked Open Data large scale integration project LDBC - Transparency and Relevance for Graph DB, RDF performance GeoKnow - GeoData is

Hosting Provider for the

Open PHACTS platform

– Initial setup and hosting of

Open PHACTs public API

service

– 24/7 maintenance of the

platform

Continuing as hosting

Provider for the Open

PHACTS platform for the

Foundation since Q4

2014

OpenLink main contributions to the Open PHACTS project

Page 7: Enabling Linked Data in Open PHACTS...LOD2 –Linked Open Data large scale integration project LDBC - Transparency and Relevance for Graph DB, RDF performance GeoKnow - GeoData is

API # Triples # of Buffers Buffers in Use File size pages

v1.4 2288155489 4000000 1731998 7992320

v1.5 2729734901 4000000 2938504 11952128

v2.0 2937381162 4000000 1890390 12496896

Open PHACTS Virtuoso RDF Database Stats

• 48GB RAM per API instance

• 16 CPUs (Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz)

• Disk Storage - SSD Drives

Page 8: Enabling Linked Data in Open PHACTS...LOD2 –Linked Open Data large scale integration project LDBC - Transparency and Relevance for Graph DB, RDF performance GeoKnow - GeoData is

Challenges

The RDF Tax

– Complex analytical queries touching large parts of the data

– Database working set in memory critical to performance

– Query optimisation critical to performance

– API has become more complex thus need to ensure queries

generated are optimal for execution

– Poor performance comes from non-optimal queries or query

plans

– Join for every attribute

• Vectoring & right plans help

Page 9: Enabling Linked Data in Open PHACTS...LOD2 –Linked Open Data large scale integration project LDBC - Transparency and Relevance for Graph DB, RDF performance GeoKnow - GeoData is

Challenges

Platform Monitoring– Monitoring of the multiple components that constitute platform

– Zabbix SNMP tool integrated for monitoring services to ensure online

– Alerts sent of to sys admin and others to auto notify of failures

– Required to help meet 90% uptime SLA

Page 10: Enabling Linked Data in Open PHACTS...LOD2 –Linked Open Data large scale integration project LDBC - Transparency and Relevance for Graph DB, RDF performance GeoKnow - GeoData is

Recent EU FP7 Projects

LOD2 – Linked Open Data large scale integration

project

LDBC - Transparency and Relevance for Graph

DB, RDF performance

GeoKnow - GeoData is everywhere, how to carry

the planet in your pocket

Fusepool P3 – Platform for improving publishing

and reuse of data

HOBBIT – Developing Platform for providing

Holistic Benchmarking of Big Linked Data

Page 11: Enabling Linked Data in Open PHACTS...LOD2 –Linked Open Data large scale integration project LDBC - Transparency and Relevance for Graph DB, RDF performance GeoKnow - GeoData is

Agrimetrics **

DNA Drug Bank, Japan

Bayer Crop Science

Eli-Lilly **

Johnson & Johnson

Novartis **

Roche

Sanofi **

Syngenta **

UCB

Pharma & Life Science Customers

Page 12: Enabling Linked Data in Open PHACTS...LOD2 –Linked Open Data large scale integration project LDBC - Transparency and Relevance for Graph DB, RDF performance GeoKnow - GeoData is

Bank of America

BAE Systems

Bloomberg

Booz Allen Hamilton

Capita

Daimler Benz

Data.gov (US)

Deutsche Bank

Elsevier

European Commission

French National Library

Fluor Enterprises, Inc.

Fujitsu

GNOSS

Northrop Grumman

Ordina

RTI

Samsung

St. Judes Medical

and many more …

Enterprise Customers

Page 13: Enabling Linked Data in Open PHACTS...LOD2 –Linked Open Data large scale integration project LDBC - Transparency and Relevance for Graph DB, RDF performance GeoKnow - GeoData is

Make public datasets available for public download as part of the Linked

Open Data Cloud

What can we learn from others

Page 14: Enabling Linked Data in Open PHACTS...LOD2 –Linked Open Data large scale integration project LDBC - Transparency and Relevance for Graph DB, RDF performance GeoKnow - GeoData is

Provide SPARQL Endpoint in addition to API enabling direct querying of data by eperts

and exposure as 5-star Linked Data

What can we learn from others

Page 15: Enabling Linked Data in Open PHACTS...LOD2 –Linked Open Data large scale integration project LDBC - Transparency and Relevance for Graph DB, RDF performance GeoKnow - GeoData is

Provide Faceted Search & Find service to enable discovery of available data in RDF Quad

Store.

What can we learn from others

Page 16: Enabling Linked Data in Open PHACTS...LOD2 –Linked Open Data large scale integration project LDBC - Transparency and Relevance for Graph DB, RDF performance GeoKnow - GeoData is

Dynamic Transformation of Structured & Unstructured Data source to RDF in Transient or

Persistent form using Virtuoso “Sponger” Middleware

What can we learn from others

Page 17: Enabling Linked Data in Open PHACTS...LOD2 –Linked Open Data large scale integration project LDBC - Transparency and Relevance for Graph DB, RDF performance GeoKnow - GeoData is

ODBC compliance enables use of client applications (e.g. Microsoft Access) as front-ends

for Virtuoso, 3rd party RDBMS engines, and the World Wide Web hosted Linked Open

Data Cloud.

What can we learn from others

Page 18: Enabling Linked Data in Open PHACTS...LOD2 –Linked Open Data large scale integration project LDBC - Transparency and Relevance for Graph DB, RDF performance GeoKnow - GeoData is

Attribute & Role Based Access Controls ( ABAC & RBAC)

for Authentication & Authorisation to Protected Resources

What can we learn from others

Virtuoso Authentication Layer (VAL)

– Digest via SQL Accounts

– OAuth Protocol

– Web-ID Protocol

– Web-ID with Delegation

– LDAP

– Kerberos

– Social Media SSO services ie

Facebook, LinkedIn, etc

– Graph Groups

Page 19: Enabling Linked Data in Open PHACTS...LOD2 –Linked Open Data large scale integration project LDBC - Transparency and Relevance for Graph DB, RDF performance GeoKnow - GeoData is

Structured Aware RDF bringing RDF performance to about 90% of SQL using

Characteristic Sets

What can we learn from others

Page 20: Enabling Linked Data in Open PHACTS...LOD2 –Linked Open Data large scale integration project LDBC - Transparency and Relevance for Graph DB, RDF performance GeoKnow - GeoData is

In Summary …

Publish dataset as part of LOD

Provide public SPARQL Endpoint

Faceted Search for data discovery

Dynamic Transformation of data to RDF

Querying of RDF data with traditional client/server apps

Authentication & Authorisation (ABAC & RBAC)

Structure aware RDF bringing performance on “par” with

SQL

Page 21: Enabling Linked Data in Open PHACTS...LOD2 –Linked Open Data large scale integration project LDBC - Transparency and Relevance for Graph DB, RDF performance GeoKnow - GeoData is

Additional Information

OpenLink Software – OpenLink Software - www.openlinksw.com

– OpenLink Virtuoso - virtuoso.openlinksw.com

– Universal Data Access - uda.openlinksw.com

Social Media Data spaces– http://virtuoso.openlinksw.com/blog/ (weblog)

– https://plus.google.com/112399767740508618350/posts (Google+)

– https://twitter.com/OpenLink (Twitter)

– http://www.linkedin.com/company/openlink-software (LinkedIn)

– Hashtag: #LinkedData (Anywhere)