Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Enabling Linked Data
in
Open PHACTS
by Hugh WilliamsProfessional Services, OpenLink Software
ODBC, JDBC, ADO.NET, and OLE-DB compliant Data Access Drivers for Oracle,
SQL Server, Informix, Ingres, Sybase, Progress, MySQL, and PostgreSQL
High-Performance & Scalable Multi-Model (Relational & Graph) Database
Technology
Data Integration Middleware (Data Virtualization Technology across a wide variety of
Protocols & Formats)
Web Application Server Technology
Linked Data Deployment & Management
Socially-enhanced Distributed Collaborative Applications Platforms (Weblogs, Wikis,
Feed Aggregation and Syndication, Web File Systems, Discussion Forums, etc.)
Identity Management.
OpenLink Company Overview
OpenLink Software is a privately-held company founded in 1992 by its President &
CEO, Kingsley Idehen. The company is an industry acclaimed technology innovator
in the following areas:
Virtuoso Product Architecture Diagram
A high-performance,
scalable, secure, and
operating-system-
independent server designed
to handle contemporary
challenges associated with
data access, data integration,
and data management.
Original contributor to and supporter of the Linked Open
Data movement
Bio2RDF
NueroCommons
UniProt
ChEMBL
Others …
Other Life Science Linked Data projects hosted in Virtuoso
Hosting Provider for the
Open PHACTS platform
– Initial setup and hosting of
Open PHACTs public API
service
– 24/7 maintenance of the
platform
Continuing as hosting
Provider for the Open
PHACTS platform for the
Foundation since Q4
2014
OpenLink main contributions to the Open PHACTS project
API # Triples # of Buffers Buffers in Use File size pages
v1.4 2288155489 4000000 1731998 7992320
v1.5 2729734901 4000000 2938504 11952128
v2.0 2937381162 4000000 1890390 12496896
Open PHACTS Virtuoso RDF Database Stats
• 48GB RAM per API instance
• 16 CPUs (Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz)
• Disk Storage - SSD Drives
Challenges
The RDF Tax
– Complex analytical queries touching large parts of the data
– Database working set in memory critical to performance
– Query optimisation critical to performance
– API has become more complex thus need to ensure queries
generated are optimal for execution
– Poor performance comes from non-optimal queries or query
plans
– Join for every attribute
• Vectoring & right plans help
Challenges
Platform Monitoring– Monitoring of the multiple components that constitute platform
– Zabbix SNMP tool integrated for monitoring services to ensure online
– Alerts sent of to sys admin and others to auto notify of failures
– Required to help meet 90% uptime SLA
Recent EU FP7 Projects
LOD2 – Linked Open Data large scale integration
project
LDBC - Transparency and Relevance for Graph
DB, RDF performance
GeoKnow - GeoData is everywhere, how to carry
the planet in your pocket
Fusepool P3 – Platform for improving publishing
and reuse of data
HOBBIT – Developing Platform for providing
Holistic Benchmarking of Big Linked Data
Agrimetrics **
DNA Drug Bank, Japan
Bayer Crop Science
Eli-Lilly **
Johnson & Johnson
Novartis **
Roche
Sanofi **
Syngenta **
UCB
Pharma & Life Science Customers
Bank of America
BAE Systems
Bloomberg
Booz Allen Hamilton
Capita
Daimler Benz
Data.gov (US)
Deutsche Bank
Elsevier
European Commission
French National Library
Fluor Enterprises, Inc.
Fujitsu
GNOSS
Northrop Grumman
Ordina
RTI
Samsung
St. Judes Medical
and many more …
Enterprise Customers
Make public datasets available for public download as part of the Linked
Open Data Cloud
What can we learn from others
Provide SPARQL Endpoint in addition to API enabling direct querying of data by eperts
and exposure as 5-star Linked Data
What can we learn from others
Provide Faceted Search & Find service to enable discovery of available data in RDF Quad
Store.
What can we learn from others
Dynamic Transformation of Structured & Unstructured Data source to RDF in Transient or
Persistent form using Virtuoso “Sponger” Middleware
What can we learn from others
ODBC compliance enables use of client applications (e.g. Microsoft Access) as front-ends
for Virtuoso, 3rd party RDBMS engines, and the World Wide Web hosted Linked Open
Data Cloud.
What can we learn from others
Attribute & Role Based Access Controls ( ABAC & RBAC)
for Authentication & Authorisation to Protected Resources
What can we learn from others
Virtuoso Authentication Layer (VAL)
– Digest via SQL Accounts
– OAuth Protocol
– Web-ID Protocol
– Web-ID with Delegation
– LDAP
– Kerberos
– Social Media SSO services ie
Facebook, LinkedIn, etc
– Graph Groups
Structured Aware RDF bringing RDF performance to about 90% of SQL using
Characteristic Sets
What can we learn from others
In Summary …
Publish dataset as part of LOD
Provide public SPARQL Endpoint
Faceted Search for data discovery
Dynamic Transformation of data to RDF
Querying of RDF data with traditional client/server apps
Authentication & Authorisation (ABAC & RBAC)
Structure aware RDF bringing performance on “par” with
SQL
Additional Information
OpenLink Software – OpenLink Software - www.openlinksw.com
– OpenLink Virtuoso - virtuoso.openlinksw.com
– Universal Data Access - uda.openlinksw.com
Social Media Data spaces– http://virtuoso.openlinksw.com/blog/ (weblog)
– https://plus.google.com/112399767740508618350/posts (Google+)
– https://twitter.com/OpenLink (Twitter)
– http://www.linkedin.com/company/openlink-software (LinkedIn)
– Hashtag: #LinkedData (Anywhere)