Upload
toshiaki-katayama
View
131
Download
3
Embed Size (px)
DESCRIPTION
Slide presented at the BioHackathon 2014 symposium http://2014.biohackathon.org/symposium
Citation preview
NBDC / DBCLS presents
BioHackathon 2014Standardization and utilization of human genome informationwith Semantic Web technologies
Toshiaki Katayama <[email protected]>http://jp.linkedin.com/in/toshiakikatayamaDatabase Center for Life Science (DBCLS),Research Organization of Information and Systems (ROIS), Japan
2014/11/9 @ Tohoku Medical Megabank, Sendai, Japan
Mission of NBDC/DBCLS
• Biomedical domain• So many databases, so many publications
• Integration of life science databases• To accelerate data driven science
• Standardization and interoperability• Semantic Web and Linked Open Data• Software development• :
• Right technology + collaborative community• BioHackathon = Bio + Hack + Marathon = effective innovation
http://2014.biohackathon.orgBioHackathon 2014 - the 7th NBDC/DBCLS BioHackathon
• BioHackathon 2008 in Tokyo• Towards integrated Web service in life science with Open Bio* libraries• http://hackathon.dbcls.jp
• BioHackathon 2009 in Okinawa• Integration of Web services in bioinformatics applications• http://hackathon2.dbcls.jp
• BioHackathon 2010 in Tokyo• Integration and interpretation of biological knowledge with the Semantic Web technologies• http://hackathon3.dbcls.jp
• BioHackathon 2011 in Kyoto• Creation and utilization of Linked Data in life sciences• http://2011.biohackathon.org
• BioHackathon 2012 in Toyama• Biomecial applications based on the Semantic Web technologies• http://2012.biohackathon.org
• BioHackathon 2013 in Tokyo• Semantic interoperability and standardization of bioinformatics data and Web services• http://2013.biohackathon.org
Linked Open Data
• Use URIs as names for things
• Use HTTP URIs• so that people can look up those names
• When someone looks up a URI• provide useful information• using the standards (RDF*, SPARQL)
• Include links to other URIs• so that they can discover more things
• Genome annotation / Protein annotation / Biomedical ontologies / URIs
http://togogenome.orgTogoGenome: RDF-based genome DB
}Accumulate annotations in RDF
Genome
Regulatory region Protein coding gene rRNA gene
↑ ↑ ↑ ↑<exon>
<gene> rdfs:subClassOf obo:SO_0000704 ;faldo:location [ ... ] ;rdfs:label "geneA" ;rdfs:seeAlso <UniProt> .
rdfs:subClassOf obo:SO_0000147 .
← FALDO locations← Sequence ontology types
← Label of annotations← Link to external resources
↑ ↑ ↑ ↑
Annotation w/ in-house developed ontologies
Accumulate annotations in RDF+
In-house developed ontologies•MEO (environment)•MPO (phenotype)•GMO (growth medium)•MCCV (culture collection)•PDO (infectious disease)
↓Stored in triple store
↓SPARQL query
↓TogoGenome / TogoStanza
Genome sequences
NCBI: BioProject/RefSeq -- existing reference seqsDDBJ: Annotation pipeline/GTPS -- newly sequenced
Ontologies
NCBO: BioPortal, OBO (GO, SO, ...)DBCLS: FALDO, MEO, MPO, GMO, MCCV, PDO ...DDBJ: INSDC, Taxonomy, ...Titech: PDO, ...GOLD: Environmental metadata
Samples and metadata
INSDC, NCBI: SRA, GEODBCLS: RefEx, KusarinokoBulk data: Literatures, Images, ...
Annotations
UniProt: Protein functions and linksFormats: GFF3, GTF, GVF, DAS, BED, ..Tools: Cufflinks, BLAST, InterProt, ...
<gene> rdf:type insdc:Gene ; so:so_part_of <chromosome> .
<mRNA> rdf:type insdc:Messenger_RNA ; sio:is-transcribed-from <gene> ; sio:has-ordered-part <p1>, <p2>, ... .
<p1> sio:has-value "1"^^xsd:integer ; sio:refers-to <exon1> .
<p2> sio:has-value "2"^^xsd:integer ; sio:refers-to <exon2> .
<exon1> rdf:type insdc:Exon ; faldo:location <region1> .
<region1> rdf:type faldo:Region ; faldo:begin <position1> ; faldo:end <position2> .
<position1> rdf:type faldo:ExactPosition, faldo:ForwardStrandPosition ; faldo:position 12345 ; faldo:reference <chromosome> .
Genome
Regulatory region Protein coding gene rRNA gene
↑ ↑ ↑ ↑<exon>
<gene> rdfs:subClassOf obo:SO_0000704 ;faldo:location [ ... ] ;rdfs:label "geneA" ;rdfs:seeAlso <UniProt> .
rdfs:subClassOf obo:SO_0000147 .
← FALDO locations← Sequence ontology types
← Label of annotations← Link to external resources
↑ ↑ ↑ ↑
INSDC/RefSeq/Ensembl RDF:
RDF summit May 17-20, 2014Standardization of RDF models for genomics
Standardization of INSDC nucleotide annotations in RDF
INSDC
Ontology for locations of annotations
Common URIs to be shared
Common RDF model for genomes
Transcriptomes and regulations
Personal/Japanese genomes
http://genomicsandhealth.org/Global Alliance for Genomics and Health
The greatest need was a common framework of international standards designed to enable and oversee how genomic and clinical data are shared in an effective, responsible, and interpretable manner.
to develop this common framework, enabling learning from data while protecting participant autonomy and privacy.
Over 180 organizations world wide (2013-14) + Google (since 2014/2/28)
To enable secure sharing of genomic and clinical data
Tohoku Medical Megabank Organization
• As one of the reconstruction plans, Tohoku Medical Megabank Organization was founded for rebuilding the community medical system by developing a biobank that combines medical and genome information for supporting health and welfare in the Tohoku area.