Upload
theodore-spencer
View
218
Download
1
Tags:
Embed Size (px)
Citation preview
Dan MasigaMolecular Biology and Biotechnology DepartmentInternational Centre of Insect Physiology and Ecology, Nairobi, Kenya
The BARCODE Data StandardBARCODE Data Standard: Enabling Molecular Diagnostics
for Biodivesity
Western and Central Africa: DNA barcoding MeetingOne-day course on DNA barcoding: Practical advice23rd October 2008
The Infrastructure of Taxonomy
• Collections and databases of specimens• Codes of Taxonomic Nomenclature• Compilations of taxonomic names• Data repositories (characters, gene
sequences, images, trees)• Monographs• Floristic and faunistic surveys/inventories• Revisions• The (undigitized) Taxonomic Literature
Roles of INSDCan archival database/repository
for nucleotide sequence
Output of Project A
Output of Project B
Output of Project C
Common
access
interface
Standardization of data structure including data items and values
Assignment of a unique identifier (an accession number) to a sequence
Users
New tools for taxonomyD
NA
Barc
od
ing
The ability to compare genotype information across a huge range of organisms is a powerful tool
“Only [27%] of papers had a legitimate specimens examined section, with museum numbers for each
voucher, and names of the museums where the specimens used in the study could be examined”
Couplets Consisting of:
“Species Name - DNA Sequence”DNA Sequence”
•Basis of a “look-up table” enabling molecular diagnostic applications
•However, both elements need validation
•Underlying specimens and associated raw sequence data are not typically available for secondary inspection
Problem Areas
TRANSPARENCY AND TRACEABILITY
• Genetic Data Quality• Specimen Data Quality• Taxonomy • Access to Information
Barcoders began calling for a Paradigm ShiftParadigm Shift
Depositing barcode sequences in public database, along with primer sequences, trace files and associated quality scores makes this species identification technique widely accessible. Reference
DNA barcode sequences should be derived from, and liked to, specimens of known promenance in web-accessible collections in
order to validate this system of molecular diagnostics.
Rationale for Defining “BARCODE” keyword in GenBank
• Provides the community with reference records with verifiable and retrievable data:– Associated with retrievable voucher specimens
(liberally defined: tissue, DNA, etc.)– Linked to on-line metadata– Meet an agreed upon standard of taxonomic
identification– Provide an assured level of data completeness– On an agreed upon gene region – Recommended for use in identifying unknowns
The Barcode Data StandardBarcode Data Standard Establishing a new data standard for “BARCODE”
keyword records in DDBJ/EMBL/GenBank:
1.Minimum 500bp, <1% ambiguous base calls2.Double stranded sequence3.Trace files and associated quality scores4.Primers used to generate sequence5.Linkages to:
• A morphological voucher specimen• Structured reference to collections• Geospatial reference information• Valid species name• Who performed the identification• Literature citations
Features, Qualifiers and Values
The Feature table is updated based on discussions at the International Collaborators meeting of INSDC
NCBI Trace Archive accepts BARCODE as a keyword that identifies “a DNA
sequence analysis of a uniform target gene to enable species identification”
Triplet structure for specimen identifiers
/specimen_voucher=“<institution-code>|<collection-code>|<specimen-id>”
<institution-code>- abbreviation of the archiving institution <collection-code>- collection within the institution (*) <specimen-id>- specimen identifier within the collection The above approach is used in the DarwinCore/GBIF and is parallel to the Life Science Identifier (LSID) that is an Object Management Group (OMG) standard.
(*) museums & herbaria culture collections stock centers germplasm repositories (seed banks) frozen tissue banks zoos/aquaria/botanical gardens DNA banks, personal collections e-voucher archives