29
Barcelona Annual Conference Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA [email protected] Marc Andersen StatGroup ApS, Denmark [email protected]

Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA [email protected] Marc Andersen

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

Barcelona

Annual Conference

Monday, 10th October 2016

Semantics 101 for Pharma

Tim Williams,

UCB Biosciences Inc., USA

[email protected]

Marc Andersen

StatGroup ApS, Denmark

[email protected]

Page 2: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

101

Related PhUSE 2016 Presentations• Interactive Visualization of Linked Data

Monday, 14:30 Data Visualization

• Generating Analysis Results and MetadataMonday, 16:00 Trends and Technology

• Constructing Interoperable Study Documents From A Semantic Technology-based Repository

Poster

• CS Discussion ClubTuesday 11:00 – 12:30

Page 3: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

102

Page 4: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

103

Thank you

and

Enjoy the Conference!

Page 5: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

104

Learning Resources• PhUSE Wiki “Semantic Technology Working Groups”

http://www.phusewiki.org/wiki/index.php?title=Semantic_Technology

• PhUSE Wiki “Semantic Technology Curriculum” http://www.phusewiki.org/wiki/index.php?title=Semantic_Technology_Curriculum

• White papers, publications, presentations.

• “Learning SPARQL” by Bob DuCharmehttp://www.learningsparql.com/index.html - examples for download

• Semantic University by Cambridge Semanticshttp://www.cambridgesemantics.com/semantic-university

• RDF Primerhttp://www.w3.org/TR/2014/NOTE-rdf11-primer-20140624/

• CDISC Standards in RDF User Guide v1 Final

http://www.cdisc.org/system/files/members/standard/RDF/CDISC%20Standards%20RDF%20User%20Guide%201.0%20Final%202015-07-21.pdf

• Knowledge Engineering with Semantic Web Technologies 2015 https://open.hpi.de/courses/semanticweb2015

Page 6: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

105

Exercises

Due to time constraints and the large number of attendees, we were unable to provide hands-on experience during the session. This section provides exercises and a link to materials so you may try creating and querying Linked Data on your own.

To obtain files for the exercises, go to:http://www.phusewiki.org/wiki/index.php?title=Semantic_Technology_Curriculum

Download the file: PhUSECSS-Semantics101-AttendeeFiles.zip

Page 7: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

106

Introduction to Jena Fuseki

• Apache-Jena – contains the APIs, SPARQL engine, the TDB native RDF database and command line tools

ARQ, RIOT …• Apache-Jena-Fuseki – the Jena SPARQL

server

Page 8: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

107

Load a File into Fuseki• File: ex001.ttl

@prefix css: <http://www.example.org/CSS/> .

@prefix ct: <http://bio2rdf.org/clinicaltrials/> .

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ct:NCT00799760 css:title "Evaluation of Efficacity…"@en ;

css:phase "Phase 3"@en ;

css:enrollment "541"^^xsd:int .

Instructions sent to attendees/available on wiki

Page 9: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

108

Query #1: Getting StartedSee

Exercises

File: ex002.rq

PREFIX css: <http://www.example.org/CSS/>

SELECT *

WHERE{

?s ?p ?o .

} LIMIT 10

Page 10: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

109

PREFIX css: <http://www.example.org/CSS/>

PREFIX ct: <http://bio2rdf.org/clinicaltrials/>

SELECT ?nctid ?title

WHERE{

?nctid css:title ?title .

}

ct:NCT00799760 css:title "Evaluation of Efficacity and Safety…”@en ;

S

Query #2: Graph Pattern for Title

Query

PData

O

?nctidcss:title

?title

Page 11: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

110

Query for Study TitleFile: ex003.rq

PREFIX css: <http://www.example.org/CSS/>

PREFIX ct: <http://bio2rdf.org/clinicaltrials/>

SELECT ?nctid ?title

WHERE{

?nctid css:title ?title .

}

See Exercises

Page 12: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

111

Upload another fileFile: ex004.TTL

@prefix css: <http://www.example.org/CSS/> .

@prefix ct: <http://bio2rdf.org/clinicaltrials/> .

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ct:NCT00799760 css:title "Evaluation of Efficacity …”@en ;

css:phase "Phase 3"@en ;

css:enrollment "541"^^xsd:integer ;

css:primOutcome css:outcome1 .

css:outcome1 rdf:type ct:primary-outcome;

ct:measure "RT-PCR for influenza A virus…"@en ;

ct:time-frame "2 days".

See Exercises

Page 13: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

112

css:title "Evaluation of Efficacity …”@en ;

css:phase "Phase 3"@en ;

css:enrollment "541"^^xsd:integer ;

css:outcome1 rdf:type ct:primary-outcome;

css:primOutcome css:outcome1.

ct:NCT00799760

"RT-PCR for influenza A virus…"@en ;ct:measure

ct:time-frame

Graph Query

ct:NCT00799760 ?outURIcss:primOutcome

Query for Primary Outcome

"2 days".

Data

?outURIct:measure

?outcome

Page 14: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

113

SPARQL Query PREFIX css: <http://www.example.org/CSS/>

PREFIX ct: <http://bio2rdf.org/clinicaltrials/>

SELECT ?outcome

WHERE

{

ct:NCT00799760 css:primOutcome ?outURI .

?outURI ct:measure ?outcome .

}

Retrieve data that matches the Graph Pattern

NCTID ?outURIprimOutcome measure

?outcome

Page 15: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

114

Query for Study Outcome

PREFIX css: <http://www.example.org/CSS/>

PREFIX ct: <http://bio2rdf.org/clinicaltrials/>

SELECT ?outcome

WHERE{

ct:NCT00799760 css:primOutcome ?outURI .

?outURI ct:measure ?outcome . }

File: ex005.rq

See Exercises

Page 16: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

115

ns1:NCT00799760 rdf:type ns2:Resource ,

ns2:Clinical-Study .

ns1:NCT00799760 ns3:title "Evaluation of Efficacity and Safety

of Oseltamivir and Zanamivir"@en .

ns2:actual-enrollment 541 ;

…AND MUCH MORE….

Trial Triples with SPARQLhttp://lod.openlinksw.com/sparql

DESCRIBE <http://bio2rdf.org/clinicaltrials:NCT00799760>

Page 17: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

116

Query for Study Outcome

PREFIX css: <http://www.example.org/CSS/>

PREFIX ct: <http://bio2rdf.org/clinicaltrials/>

SELECT ?outcome

WHERE{

ct:NCT00799760 css:primOutcome ?outURI .

?outURI ct:measure ?outcome . }

File: ex005.rq

See Exercises

Page 18: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

117

Query with RR Packages:• rrdf• rrdflibs

http://github.com/egonw/rrdf

Requires Java 7 or higher

rrdf, rrdflibs

Willighagen E. (2014) Accessing biological data in R with semantic web technologies. PeerJ PrePrints 2:e185v3See https://dx.doi.org/10.7287/peerj.preprints.185v3

Page 19: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

118

File: queryLocalTTL.R

library(rrdf)

dataSource = load.rdf(“<path to the TTL file>/ex004.ttl",

format="N3")

query = 'PREFIX css: <http://www.example.org/CSS/>

PREFIX ct: <http://bio2rdf.org/clinicaltrials/>

SELECT ?primaryOutcome

WHERE

{

ct:NCT00799760 css:primOutcome ?outURI .

?outURI ct:measure ?primaryOutcome .

}'

queryResult = as.data.frame(sparql.rdf(dataSource, query))

queryResult

See Exercises

Page 20: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

119

Query an Endpoint with R

library(rrdf)

endpoint = "http://localhost:3030/test/query"

query = "SELECT * WHERE {?s ?p ?o . } LIMIT 10 "

queryResult = sparql.remote(endpoint, query)

queryResult

File: queryLocalFuseki.R

See Exercises

Page 21: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

120

Query with SASSAS Macros:%sparqlquery - SPARQL query%sparqlupdate - SPARQL update

https://github.com/MarcJAndersen/SAS-SPARQLwrapper

Implementation:• SAS PROC HTTP to access the

service • Send query/update as text file• Input result using SAS LIBNAME

for XML

Other approaches: • PROC groovy to execute Java Code

from Apache Jena• SAS Java objects to interface to Apache

Jena

Requires running SPARQL service, for example Apache Jena

Page 22: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

121

File: queryLocalFuseki.sas

Assumptions: • Service active at endpoint• TTL file uploaded to store

Page 23: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

122

Query a Remote SourceAt: http://lod.openlinksw.com/sparql

Page 24: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

123

Create RDF using R

• R with rrdf, rrdflibs

https://github.com/egonw/rrdf

• R Data frame to RDF

– Excel->data frame-> to RDF

– SAS dataset -> data frame -> RDF

rrdf, rrdflibs

Page 25: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

124

Create RDF using R

Packages: rrdf, rrdflibs• add.triple()

– Add a triple :object is a URI

• add.data.triple()

– Add triple: object is a literal

Page 26: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

125

Create RDF using R

Try or follow along

File: createTTLFromR.R

Output File: createTTLFromR.TTL

Page 27: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

126

Create RDF using SAS

• SAS accessing SPARQL service using PROC HTTP– All functions provided by the service, see SPARQL 1.1

Protocol (https://www.w3.org/TR/sparql11-protocol/)– Implemented as SAS macros

https://github.com/MarcJAndersen/SAS-SPARQLwrapper

• SAS generating text files with– RDF in Turtle– SPARQL INSERT statements

Page 28: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

127

Output File:

createTTLFromSAS.TTL

Create RDF using SASFile: createTTLFromSAS.SAS

21

3

Try or follow along

Page 29: Semantics 101 for Pharma - PHUSE Wiki · 2016-10-17 · Monday, 10th October 2016 Semantics 101 for Pharma Tim Williams, UCB Biosciences Inc., USA tim.williams@ucb.com Marc Andersen

128

Validate• Apache Jena RIOT (RDF I/O Technology)

riot –validate CreateTTLFromEditor.TTL

Example errors1. Forgot PAV prefix

08:45:44 ERROR riot :: line: 9, col: 16] Undefined prefix: pav

2. Incorrect triples termination

08:45:44 ERROR riot :: [line: 9, col: 32] Unexpected IRI

for predicate…

* note: requires Apache Jena in the system path