Upload
alisoncallahan
View
798
Download
4
Embed Size (px)
Citation preview
Bio2RDF Best Practices
1. Assign a URI for all things2. Assign labels and identifiers3. Declare and assign types4. Provide dataset provenance
1. Assign URIs for all things
● The base Bio2RDF URI pattern:http://bio2rdf.org/namespace:identifier
● Data provider record identifiers are maintained from source
● Linked Data = no blank nodes!
1. Assign URIs for all things
● Data provider records are maintained from source○ e.g. DrugBank’s resource IRI for
Leucovorin
http://bio2rdf.org/drugbank:DB00650
1. Assign URIs for all things
● Vocabulary namespaces are used for dataset specific types and predicates
http://bio2rdf.org/drugbank_vocabulary:Drug
● Resource namespaces are used to assign an identifier when one isn't a provided by the source
- unique identifier with UUID, hash, counter, concatenated strings, etc
http://bio2rdf.org/drugbank_resource:DB00440_DB00650
1. Assign URIs for all things
● All valid namespaces are listed in the Bio2RDF Life Sciences Registry
○ ensures that URIs are consistent across all Bio2RDF datasets
○ registry is publicly available at http://tinyurl.com/dataregistry
2. Assign labels and identifiers
● Use rdfs:label to assign a language-specified label for all resources○ can be a source provided title, a script generated
phrase, or a phrase provided in a third party dataset○ Pattern: rdfs:label "label [ns:id]"@lang
● Use Dublin Core predicates for source-provided label and identifiers○ Pattern: dc:title "label"@lang (assign language tag
only when one is provided)○ Pattern: dc:identifier "ns:id"^^xsd:string
2. Assign labels and identifiers
● Use Bio2RDF predicates to assign Bio2RDF namespace and Bio2RDF identifiers:
○ Pattern: bio2rdf_vocabulary:namespace "ns"^^xsd:string
○ Pattern: bio2rdf_vocabulary:identifier "id"^^xsd:string
2. Assign labels and identifiers
Example: DrugBank entry for Nitrazepam
drugbank:DB0159 rdfs:label "Nitrazepam [drugbank:DB0159]"@en ;dc:title “Nitrazepam”@en ; dc:identifier “drugbank:DB0159”^^xsd:string ;bio2rdf_vocabulary:namespace “drugbank”^^xsd:string ;bio2rdf_vocabulary:identifier “DB0159”^^xsd:string .
3. Declare and assign types
● All resources should be typed as being resources of the dataset○ Pattern: rdf:type namespace_vocabulary:Resource
● Instances of a dataset vocabulary type should also be typed as owl:NamedIndividual○ Pattern: rdf:type namespace_vocabulary:Type○ Pattern: rdf:type owl:NamedIndividual
● Classes should be typed as owl:Class○ Pattern: rdf:type owl:Class○ If superclass has been described using
namespace_vocabulary pattern, then link class using rdfs:subClassOf
3. Declare and assign types
● Object properties and datatype properties should also be typed○ Pattern: rdf:type owl:ObjectProperty○ Pattern: rdf:type owl:DatatypeProperty
● Examples:drugbank:DB0159
rdf:type drugbank_vocabulary:Resource ;rdf:type owl:Class ; rdfs:subClassOf drugbank_vocabulary:Drug .
drugbank_vocabulary:ddi-interactor-inrdf:type owl:ObjectProperty .
4. Provide dataset provenance
data item
Bio2RDF dataset
Features-Entity-dataset link-Creator-Publisher-Date created-License & rights-Source-Availability- SPARQL endpoint- Data dump
VocabulariesVoIDDublin CoreW3C ProvenanceBio2RDF vocabulary
Source dataset
prov:wasDerivedFrom
void:inDataset
4. Provide dataset provenance
● link every resource to the versioned/dated Bio2RDF dataset in which it is described
○ Pattern: void:inDataset <http://bio2rdf.org/dataset:namespace-dd-mm-yyyy.rdf>
○ Example:drugbank:DB0159 void:inDataset <http://bio2rdf.org/dataset:drugbank-03-07-2013> .
PHP : Hypertext Preprocessor
● A general-purpose open source scripting language○ homepage : http://php.net
● PHP scripts can be executed from the command line or embedded in HTML documents
● Syntactically similar to C/C++/Java but it is not strongly typed
Using the Bio2RDF PHP API to create an RDFizer
● Basic structure of a Bio2RDFizer script:
○ Initialize script parameters - input file(s), default dataset namespace, etc.
○ Define a Run() function that handles downloading and iterating over input files, as well as function calls to parse and convert input data to RDF
○ Define function(s) to convert input data to RDF using Bio2RDF API helper functions
Using the Bio2RDF PHP API to create an RDFizer
● Bio2RDF PHP API defines helper functions that implement Bio2RDF best practices:○ getNamespace() ○ getVoc()○ getRes()
○ triplify($subject, $predicate, $object) //object is an rdf resource○ triplifyString($subject, $predicate, "string")// object is a literal
○ describeIndividual($uri, $label, $type, $title, $description, $language)○ describeClass( ... )○ describeProperty ( ... )
Example: The Comparative Toxicogenomics Database
CTD Bio2RDFizer script is available on GitHub
Using and contributing to the Bio2RDF project on GitHub
1. Fork the bio2rdf-scripts and php-lib repositories on Githubhttps://help.github.com/articles/fork-a-repo
2. Write some code!3. Commit code to your fork4. Make a pull request to the bio2rdf-scripts
repo