View
1.023
Download
1
Category
Tags:
Preview:
DESCRIPTION
Regional GBIF NODES meeting of Europe in March 2010. Presentation of current activities from the NordGen NODE. Implementations of the GBIF IPT toolkit for genebanks in Europe. Upgrade for selected genebanks from the BioCASE publishing toolkit to the IPT. First step of a scheduled larger implementation planned to start in 2011 as part of the EuroGeneBank application pending EU funding decision. NordGen IPT EURISCO
Citation preview
GBIF IPT installations for EURISCOGBIF Tools and Darwin Core extension for germplasm
Cartoon by Sasha Kopf (Creative Commons)
European GBIF Nodes Meeting 2010, March 10th-12th Alicante, SpainDag Endresen, Nordiv Genetic Resources Center, NordGen
Topics for this session
GBIF IPT installation for EURISCO Overview of the project Darwin Core extension for germplasm GBIF informatics tools Integrated Publishing Toolkit (IPT) IPT installations for EURISCO Possible PGR network model
Darwin Core extension for Germplasm,
(presented at TDWG 2009)
Opened up for use of new GBIF technology
in gene banking world
Proposal to implement GBIF technology as a test in the
European gene banking community
“... a feasibility study aimed at demonstrating the practical implementation of the GBIF decentralised architecture strategy and in particular in the context of the EURISCO Network.”
“... focused on the adoption of the IPT by selected gene banks in Europe, the publishing of richer content using the Darwin Core germplasm extension and the indexing of these published resources by the EURISCO platform.”
“... implemented in the context of EURISCO and therefore in close collaboration with the EURISCO Coordinator.”
From the contract between NordGen and GBIF:
GBIF Informatics Suite
GBIF tools to empower decentralized thematic or regional networks
Darwin Core extension for germplasm makes these tools usable for crop gene banks.
Darwin CoreThe purpose of DwC terms is to facilitate data sharing • a well-defined standard core vocabulary
• a flexible framework to maximize re-usability
The Darwin Core can be extended by adding new terms to share additional information.
TDWG standard 2009
“The Darwin Core is primarily based on taxa, their occurrence in nature as documented by observations, specimens, and samples, and related information.”
http://rs.tdwg.org/dwc/
DwC star schema model
http://code.google.com/p/darwincore-germplasm
http://rs.nordgen.org/dwc
DwC extension for Germplasm
DwC Germplasm : DRAFT 0.1 : August 26, 2009
• “MCPD in Darwin Core”
• Maintained by gene banks worldwide
• Additional terms to describe germplasm samples
• Includes the new terms for crop trait experiments developed as part of the European EPGRIS3 project
• Includes a few additional terms for new international crop treaty regulations
DwC Germplasm (1)
DwC Germplasm (2)
DwC Germplasm (3)
DwC Germplasm (4)
DwC Germplasm (5)
GermplasmDistributionPerhaps add new terms to facilitate the reporting of germplasm distribution for the ITPGRFA (International Treaty for Genetic Resources for Food and Agriculture)
GermplasmManagementThe Millennium Seed Bank (Kew) has contributed feedback to the DwC-G modeling and proposed to include a number of seed management descriptors.
• Seed processing terms• Seed cleaning• Seed germination testing
ConservationStatusSuggested by ENSCONET - threat status for populations in situ
DwC Germplasm (6)
Mapping of DwC-G terms to the MCPD descriptors
Mapping of DwC-G terms to the MCPD descriptors (continued)
MCPD -> ABCD 2.06 (2004)National Inventory CodeInstitute CodeAccession NumberCollecting NumberCollecting Institute CodeGenusSpeciesSpecies Authority„Subtaxa“„Subtaxa“ AuthorityCommon Crop NameAccession NameAcquisition Date
Country of OriginLocation of Collection SiteLatitude of CSLongitude of CSElevation of CSCollecting Date of SampleBreeding Institute CodeBiological Status of
AccessionAncestral DataCollecting/Acquisition
Source
Donor Institute CodeDonor Accession NumberOther Identification (Number)
associated with the accession
Location of Safety DuplicatesType of Germplasm StorageRemarksDecoded Collecting InstituteDecoded Breeding InstituteDecoded Donor InstituteDecoded Safety Duplication
LocationAccession URL
Descriptors marked red did not match the earlier versions of ABCD ABCD was extended by a PGR section [W. Berendsohn, H. Knüpffer]
Helmut KnüpfferIPK Gatersleben
Walter BerendsohnBGBM
http://www.ecpgr.cgiar.org/epgris/Tech_papers/EURISCO_Descriptors.pdf
Home: http://code.google.com/p/gbif-providertoolkit/ Primary developers: Markus Döring, Tim Robertson, John WieczorekSource code: Java Released: 2009DEMO at http://ipt.gbif.org/ Genebank Example at http://ipt.nordgen.org/ipt/
A tool in support of data publishers.
A simple and straightforward mechanism to share primary biodiversity data following the Darwin Core standard.
Open source, Java based web application.
Provides a local tool for data quality assessment.
Integrated Publishing Toolkit (IPT)
GBIF Integrated Publishing Toolkit (IPT)
- Java 1.5 or higher is required- Apache Tomcat is recommended (1 GB
RAM+)- GBIF IPT is provided as a WAR archive (for
easy deployment)- GeoServer is included for web mapping
(OGC Compliant, WFS, WMS, etc)- H2 Embedded Java Database (with JDBC
interface and web console)- Hibernate (object relational mapping)
IPT Interfaces
REST XML TAPIR DwC Archive OGC (WFS, WMS, Web Mapping) EML (Ecological Markup Language)
Darwin Core Archive (DwC-A) DwC-A publish dwc records including extensions Simple text based format Zipped single file archive
Germplasm.txt
http://code.google.com/p/gbif-ecat/wiki/DwCArchive
Alternatives:-------
• TAPIR (2004 ->)-------
• DiGIR (PHP, 2001-2006)
• TapirLink (PHP, 2007 ->)-------
• BioCASE (Python, 2001-2008)
• PyWrapper3 (2006-2008)-------
• EURISCO (tab-delimited, 2003) -------
• ICIS (Java, 1996 ->)-------
• BioMOBY (Perl, 2001 ->)
IPT service from NordGen at http://ipt.nordgen.org/ipt/
• Embeds its own database
• Multilingual
• Has a user management feature based on roles, which allows for multiple data managers to share a common instance
• Manages multiple data sources
• Several upload options: relational database management systems or data files
• Public web interface allows for data browsing and full text search
• Customised detail pages
GBIF IPT
GBIF IPT implements the Darwin Core Standard; and provides an interface to easily build extensions to the core Darwin Core terms.
The draft germplasm extension is one example of how-to extend the Darwin Core terms for the GBIF IPT.
The IPT user interface includes
the germplasm extension
XML interface includes thegermplasm extension
Addresses the need of Nodes managers, to aggregate indexes of published primary biodiversity data.
Aims to ease the complexity of heterogeneous networks of data publishers, by shielding the end-user from the complexities of the different protocols.
The Harvesting and Indexing Toolkit (HIT)
A Yellow Page reference of Biodiversity resources.
The IPT and HIT instances installed in the course of this project will be registered in the GBRDS.
Any biodiversity organisation should be able to register their resources and services into the GBRDS and contribute to the discovery services.
Biodiversity Resources Discovery System (GBRDS)
Objectives of the European genebank project
Evaluate the GBIF decentralized architecture
Upgrade of the Integrated Publishing Toolkit (IPT) with the genebank extension and develop associated documentation.
Install and test the IPT installation in various genebanks in Europe that, as far as possible, are also EURISCO/ECPGR partners.
Test the registration of IPT installation through the GBIF Global Biodiversity Resources Discovery System (GBRDS).
Test the Harvesting and Indexing Toolkit (HIT) installation for the EURISCO platform.
Install an IPT instance on the EURISCO platform and synchronize with GBIF central Index.
Project runs until 20 December 2010.
IPT deployment in Europe NordGen in Sweden covering 5 countries (Denmark, Sweden, Finland,
Norway and Iceland)
EURISCO / Bioversity-HQ (Italy)
Bioversity-Montpellier (France)
IPK Gatersleben (Germany)
WUR CGN (The Netherlands)
CRI (Czech Republic)
VIR (Russia)
Balkan countries (Albania, Bosnia, Croatia, Macedonia, Serbia, Romania)
Baltic countries (Estonia, Latvia, Lithuania)
32
Possible PGR Network model The gene bank dataset is
shared from the holding gene bank.
The National Inventory (NI) endorse all national gene banks (and eventually individual accessions) for EURISCO.
ECPGR Crop databases can access passport data from EURISCO and additional crop specific data from the genebank IPT interface.
Standard data sharing tools ensure that the genebank dataset is available to other relevant decentralized thematic, regional or global networks.
Using GBIF technology (and contributing to its development), the PGR community can easily establish specific PGR networks without duplicating GBIF's work.
The compatibility of data standards between PGR and biodiversity collections made it possible to integrate the worldwide germplasm collections into the biodiversity community (TDWG, GBIF).
Potential of GBIF technology
http://data.gbif.org/datasets/network/2
• GBIF, Global Biodiversity Information Facility http://www.gbif.org
• TDWG, Biodiversity Information Standards http://www.tdwg.org
• BioCASE, The Biological Collection Access Service for Europe. http://www.biocase.org
• Bioversity International http://www.bioversityinternational.org
Things can happen in a band, or any type of collaboration, that would not otherwise happen. (Jim Coleman, Musician)
Special thanks to:
Recommended