Upload
chandan-deb
View
62
Download
1
Tags:
Embed Size (px)
Citation preview
BioSQL: A Generic relational model for Bioinformatics
Chandan Kumar Deb10272
Ph.D. (Computer Application)
BI-691
Contents
Generic Data Model Overview of BioSQL SchemaPreface of BioSQLDependency of BioSQLIntroductionInstallation BioSQL
IntroductionFor database management Relational model is very important
Conceptualization of real world thing into logical model
First formulated and proposed in 1969 by Eadger F. Codd
Logical model is use making relation and their relationship
Introduction..
Relational Model
• Table• Tuple
• Relation Instance• Relation
schema• Relation
Key•Attribute
Domain
• Key Constraint• Domain Constraint•Referenti
al Integrity
Constraint
Introduction
This model is represented in terms of tuples, grouped into relations
A database organized in terms of the relational model is a relational database
Relational data model is the primary data model
This used widely around the world for data storage and processing
Generic Data ModelThe generic data model is the generalization of the conventional data model
This generic data model defines the standardised relation types
Consensus among the different Relational Modeler of can produce a generic model of a particular domain
Preface of BioSQL
Generic Data Model
Ewan Birney started BioSQL in 2001
Major Redesign and Refactorings 2002-2003
PhyloDb module added at 2006
V1.0 released in March 2008
Preface of BioSQLCovering sequences, features, sequence and feature annotation, a reference taxonomy, and ontologies
Required highly normalized relational model
Local storage of global biological data
BioSQL schema is not strongly typed paradigm
Derived entity always is in object oriented sense
Weakly typed paradigm
Generic, but can hold any number of specialization
Overview of BioSQL schema
Annotation Bundle
Overview of BioSQL schema
SeqfeatureWith
locationAnd
Annotation
Ontology term and
Relationship
Bioentry with taxon and names
spaces
BioEntry
Core entity of BioSQL
Track any single entry or record in a biological databasesThe BIOENTRY contains information about the record's public name, public accession and version
BioDatabase
A BIODATABASE is simply a collection of bioentries
one BIOENTRY may only belong to one BIODATABASE
one BIODATABASE may contain many bioentries
BioSequence
In BioSQL, all relation have bioentries
BIOSEQUENCE table contains the raw sequence information associated with a BIOENTRY
Alphabet information ('protein', 'dna', 'rna')
One to One Relationship with BIOENTRY
BioEntryRelationship
BIOENTRY may themselves be related to one another
(e.g., a PDB record may be composed of multiple subrecords for separate chains)
Taxon,Taxon Name
Basic taxonomic information about the organism to which a given BIOENTRY refers
Reflect the structure of NCBI's taxonomy database
Each BIOENTRY can be associated with only one taxon
Many BIOENTRY can be associated with the same taxon
Annotation Bundle
Overview of BioSQL shcema
SeqfeatureWith
locationAnd
Annotation
Ontology term and
Relationship
Bioentry with taxon and names
spaces
Schema overview
Seqfeatures Location &Annotation
LocationSeqFeatureSEQFEATURE_RELATIONSHIPLocationQ.valueS.Q.ValueS.F DBxref
Annotation Bundle
Overview of BioSQL shcema
SeqfeatureWith
locationAnd
Annotation
Ontology term and
Relationship
Bioentry with taxon and names
spaces
Term and Ontology
Term is used to "label" a seqfeature's
name
An ontology is essentially a
dictionary of terms in a somewhat-
controlled vocabulary
Annotation Bundle
Overview of BioSQL shcema
SeqfeatureWith
locationAnd
Annotation
Ontology term and
Relationship
Bioentry with taxon and names
spaces
Annotation Bundle
Overview of BioSQL shcema
SeqfeatureWith
locationAnd
Annotation
Ontology term and
Relationship
Bioentry with taxon and names
spaces
The BioSQL project provides a well thought out relational database schema for storing biological sequences and annotations
Advantages of reusability
Compatible with several programming languages like BioPython, BioPerl, BioJava, BioRuby etc
Flexible storage of data via a key/value pair model
Advantages of BioSQL
Extensible with the required situation
Overall data model based on GenBank flat files
It also allows great flexibility in choosing the data used by Snapshot since sequence data from any source, including online databases
locally generated sequence data can be added
Advantages of BioSQL
Limitation…
This is a single user solution
This is the least flexible since the database can not be shared
No Consideration of protein secondary structure prediction
Conclusion…Local ‘GenBank’ with random access
‘GenBank’ in Relational format
Easy load of NCBI taxonomy data into Local DB
Integrated sequence and annotation databases
Handy Tool For Bioinformatics Community
References
•http://biojava.org/wiki/BioJava:Tutorial:Installing_and_using_BioSQL
•http://biopython.org/wiki/BioSQL
•http://biosqlweb.appspot.com/
•http://en.wikipedia.org/wiki/Generic_data_model
•http://userweb.eng.gla.ac.uk/umer.ijaz/bioinformatics/BIOSQL_tutorial.pdf
References
•http://www.bioinformatics.be/new/faq/mygenbank-howto-setup-your-local-relational-database-for-storing-sequence-data/
•http://www.bioperl.org/wiki/BioPerl_db
•http://www.biosql.org/wiki/Main_Page
•https://biorelated.wordpress.com/2009/01/07/bio-graphics-biosql-and-rails-part-1/
•https://github.com/biosql/biosql/blob/master/INSTALL