53
Biodiversity Data Provider Software Hands-on exercises with TAPIR Stockholm Biodiversity Informatics Symposium 2008 (SBIS2008) Dag Terje Filip Endresen, Nordic Genetic Resources Center (Sweden) / Bioversity International (Italy)

Data exchange alternatives, SBIS conference in Stockholm (2008)

Embed Size (px)

DESCRIPTION

Biodiversity Data Publishing Software for the Stockholm Biodiversity Informatics Symposium 2008 (SBIS2008). Stockholm, 3rd December 2008. Dag Endresen (Bioversity/NordGen).

Citation preview

Page 1: Data exchange alternatives, SBIS conference in Stockholm (2008)

Biodiversity Data Provider Software

Hands-on exercises with TAPIR

Stockholm Biodiversity Informatics Symposium 2008 (SBIS2008)Dag Terje Filip Endresen, Nordic Genetic Resources Center (Sweden) / Bioversity International (Italy)

Page 2: Data exchange alternatives, SBIS conference in Stockholm (2008)

2

Fallacies of Distributed Computing

1. The network is reliable.2. Latency is zero.3. Bandwidth is infinite.4. The network is secure.5. Topology doesn't change.6. There is one administrator.7. Transport cost is zero.8. The network is homogeneous.This list of fallacies came about at Sun Microsystems around 1994.

Page 3: Data exchange alternatives, SBIS conference in Stockholm (2008)

3

TAPIR

Cartoon by Sasha Kopf (Creative Commons)

Page 4: Data exchange alternatives, SBIS conference in Stockholm (2008)

4

TAPIR• TAPIR - TDWG Access Protocol for

Information Retrieval. • During the 2004 TDWG meeting in

Christchurch, NZ, work started on a unified protocol and named TAPIR.

• TAPIR is based on the protocol from the two data provider software, BioCASE and DiGIR.

Page 5: Data exchange alternatives, SBIS conference in Stockholm (2008)

5

Provider software, wrappers

• DiGIR (2002, not active)– http://digir.sourceforge.net

• BioCASE (2003, PyWrapper)– http://www.biocase.org

• PyWrapper3 (2006, not active)– http://trac.pywrapper.org/

• TapirLink (2007)– http://wiki.tdwg.org/twiki/bin/view/TAPIR/TapirLink

• GBIF Provider Toolkit (2009)– http://code.google.com/p/gbif-providertoolkit

Page 6: Data exchange alternatives, SBIS conference in Stockholm (2008)

6

BioCASE 2.5.ORC• The BioCASE provider software is a

product of the EU funded BioCASE project (2001-2004).

• Developed at BGBM in Berlin. • Last updated in April 2008, with

support for Python version 2.5 and less required external

• Implement the BioCASE provider to share data as ABCD 2.06.

http://www.biocase.org

Page 7: Data exchange alternatives, SBIS conference in Stockholm (2008)

7

1. Make sure you have Python 2.5 installed

(command line: python –v)

2. Download the latest provider software from http://www.biocase.org

3. Uncompress the BioCASE provider software to a folder on your system [provider_software_2.5.0RC.tar.gz]

(tar –xzvf provider_...tar.gz)

4. Run setup.py, (python setup.py)

5. Configure your web server to mount biocase/www ashttp://localhost/biocase/

Hint: You will find an example for httpd.conf as the last terminal output from running setup.py

BioCASE 2.5.ORC

Page 8: Data exchange alternatives, SBIS conference in Stockholm (2008)

8

BioCASE 2.5.ORC6. Visit the library test page: http://localhost/biocase/utilities/testlibs.cgi

6a. Download latest 4 Suite from http://4suite.org/ Uncompress and install [4Suite-XML-1.0.2.tar.bz2]

6b. Install additional python libraries, including the desired database driver. For each python package: (python setup.py install)

6c. Graphviz is useful to visualize the databasetable structure.

Page 9: Data exchange alternatives, SBIS conference in Stockholm (2008)

9

BioCASE 2.5.ORC

7. Configuration

• Add datasource (dsa)

• Database connection

• Database table structure

• Mapping of data model to standard schema

Page 10: Data exchange alternatives, SBIS conference in Stockholm (2008)

10

BioCASE 2.5.ORC8. Query FormThe manual query form is illustrative for understanding exactly how the wrapper software works! http://localhost/biocase/utilities/queryforms/qf_manual.cgi?dsa=sesto

Page 11: Data exchange alternatives, SBIS conference in Stockholm (2008)

11

Page 12: Data exchange alternatives, SBIS conference in Stockholm (2008)

12

PyWrapper3Home: http://trac.pywrapper.org/Primary developers: Markus Döring, Javier de la Torre

14/07/2008 - Development stalledWe are sorry to inform you that development of the TAPIR branch of PyWrapper has been stalled. The latest 3.1 alpha version is not stable and not recommended for production! (Message from the home page)

• PyWrapper 3.0.0 (Latest stable version, requires Python 2.4)• PyWrapper 3.1.0 alpha (development version, works with

Python 2.5)

PyWrapper is tested and verified to work fine with Windows, Mac OS X and Linux.

Page 13: Data exchange alternatives, SBIS conference in Stockholm (2008)

13

Required configuration

Web server: Any CGI compliant web server: Apache, IIS etc. (The built in CherryPy web server can also be used).

Database: Major databases are supported, including MySQL, Oracle, SQLServer, Sybase, Access, PostgreSQL. Theoretically any database with a Python library should work.

Python PyWrapper is developed with the Python programming language. (The latest version from the SVN code repository works with Python version 2.5)

Apache, MySQL and Python are open source software, free to use - even for commercial products.

Page 14: Data exchange alternatives, SBIS conference in Stockholm (2008)

14

Installation

http://trac.pywrapper.org/pywrapper/wiki/InstallationGuide

1. Download the latest PyWrapper3 installer.Use SVN export or checkout for Python 2.5 support

2. Uncompress to a folder of your choice.Example: “/usr/local/pywrapper3/”Example: “C:\pywrapper\”

Local installation: If you have a Subversion client installed, you may use the automatic installer. (Local Python and libraries are installed to your pywrapper folder)

promt$ svn export svn://svn.pywrapper.org:80/pywrapper/trunk pywrapper promt$ cd pywrapper/tools promt$ /bin/sh install.sh

This will require that you have a bash shell, and probably that you have a Unix line system like e.g. FreeBSD, Linux or Mac OS X…

3. Execute: pywrapper/setup.pyExample: promt$ python setup.py (Mac OS X,

Linux)On Windows locate setup.py and double-click

Page 15: Data exchange alternatives, SBIS conference in Stockholm (2008)

15

Start standalone server

Execute start_server.py (default port is 8080)

promt$ cd webapp/ promt$ ./start_server.py 8088 (example to start on

port 8088)

On a Windows system you may do this in a MS-DOS window (or double-click the file - if you accept the default port).

Some messages will pass across your screen. Please be patient, this could take a minute. Wait for the message “start server …” and find find PyWrapper at:

http://localhost:8088/pywrapper

Page 16: Data exchange alternatives, SBIS conference in Stockholm (2008)

16

Configuration

After successful installation, you will need to configure your data provider. Follow the instructions from the PyWrapper documentation web page to configure.

Data sources. If you provide more datasets or several databases they will be configured as individual data sources (dsa).

Database connection. For PyWrapper to access your database.

Database structure. Define the relevant database tables, the primary keys and foreign keys.

Data model. Map your database model to the standard represented by the XML Schemas you choose.

http://trac.pywrapper.org/pywrapper/wiki/Documentation

Page 17: Data exchange alternatives, SBIS conference in Stockholm (2008)

17

Screen examples

PyWrapper comes with a graphical web based configuration tool

For more information and more screen dumps from the configuration of PyWrapper, see:http://circa.gbif.net/Public/irc/gbif/nodes/library?l=/meetings/2007_10_amsterdam/tapir_pywrapper3/_EN_1.0_&a=i

Page 18: Data exchange alternatives, SBIS conference in Stockholm (2008)

18

TapirLink 0.6.1

Page 19: Data exchange alternatives, SBIS conference in Stockholm (2008)

19

TapirLink 0.6.1

Uncompress PHP source codeEg: /usr/local/tapir/tapirlink

Home: http://wiki.tdwg.org/twiki/bin/view/TAPIR/TapirLinkPrimary developers: Renato De Giovanni, Dave VieglaisDownload: http://sourceforge.net/project/showfiles.php?group_id=38190

Read permissions on all directoriesWrite on cache, config, log, statistics

Mount admin and www directory for your web server.

Example: Apache “httpd.conf”

Alias /tapirlink "/usr/local/tapir/tapirlink/www”Alias /tapirlink-admin "/usr/local/tapir/tapirlink/admin" <Location /tapirlink> Order allow,deny Allow from all</Location> <Location /tapirlink-admin> Order allow,deny Allow from all</Location>

Page 20: Data exchange alternatives, SBIS conference in Stockholm (2008)

20

TapirLink 0.6.1

Step 1: Describe your new resource

Start by adding a new resource http://localhost/tapirlink-admin/

Page 21: Data exchange alternatives, SBIS conference in Stockholm (2008)

21

Step 3: Table structure

TapirLink 0.6.1Step 2: Data source, database connection

Page 22: Data exchange alternatives, SBIS conference in Stockholm (2008)

22

Step 4: Filter

Step 5: Select mapping standards to use

TapirLink 0.6.1

Page 23: Data exchange alternatives, SBIS conference in Stockholm (2008)

23

Step 5b: Mapping data schema (ABCD 2.06)

TapirLink 0.6.1

etc…

Page 24: Data exchange alternatives, SBIS conference in Stockholm (2008)

24

Step 5c: Mapping data concepts (Darwin Core)TapirLink 0.6.1

etc…

Step 5d: Remember that DwC have an extension for geospatial descriptors

etc…

Page 25: Data exchange alternatives, SBIS conference in Stockholm (2008)

25

Step 6: Settings

TapirLink 0.6.1

New resource successfully configured

Page 26: Data exchange alternatives, SBIS conference in Stockholm (2008)

26

Test resource with client form:http://localhost/tapirlink/tapir_client.php

TapirLink 0.6.1

The XML Client form is very illustrative for understanding exactly how the wrapper software works!

Page 27: Data exchange alternatives, SBIS conference in Stockholm (2008)

27

Service interface

Page 28: Data exchange alternatives, SBIS conference in Stockholm (2008)

28

EXAMPLE OF A SERVICE REQUEST

All exchanged data is formatted with XML tags.

Page 29: Data exchange alternatives, SBIS conference in Stockholm (2008)

29

EXAMPLE OF A SERVICE RESPONSE

...

Page 30: Data exchange alternatives, SBIS conference in Stockholm (2008)

30

EXAMPLE TAPIR SERVICE REQUEST

Page 31: Data exchange alternatives, SBIS conference in Stockholm (2008)

31

EXAMPLE TAPIR SERVICE RESPONSE

singer:/sourcenamesinger:/taxonomy/genussinger:/taxonomy/speciessinger:/taxonomy/subspeciessinger:/holding/IDsinger:/holding/namesinger:/origin/collecting/countrysourcesinger:/origin/collecting/countrysourceIDsinger:/status/biologicalstatussinger:/status/biologicalstatusID

...

Page 32: Data exchange alternatives, SBIS conference in Stockholm (2008)

32

EXAMPLE TAPIR SERVICE SEARCH REQUEST

Page 33: Data exchange alternatives, SBIS conference in Stockholm (2008)

33

EXAMPLE TAPIR SERVICE SEARCH RESPONSE

Page 34: Data exchange alternatives, SBIS conference in Stockholm (2008)

34

EXAMPLE OF OAI-PMH SERVICE REQUEST

http://an.oa.org/OAI-script?verb=GetRecord&identifier=oai:arXiv.org:hep-th/9901001&metadataPrefix=oai_dc

OAI-PMH requests are expressed as HTTP requests.

OAI-PMH requests must be submitted using either the HTTP GET or POST methods.

Page 35: Data exchange alternatives, SBIS conference in Stockholm (2008)

35

EXAMPLE OF OAI-PMH SERVICE RESPONSE

OAI-PMH responses are formatted as HTTP responses.

With The Content-Type as text/xml.

Page 36: Data exchange alternatives, SBIS conference in Stockholm (2008)

36

OAI-PMH PROTOCOL, METADATA FORMATS

Request types (verb):

IdentifyListMetadataFormatsListSetsGetRecordListIdentifiersListRecords

For purposes of interoperability, the metadataPrefix `oai_dc’ is reserved for Dublin Core.

Communities adopt own metadataPrefixes for own metadata fomats. Relevant formats/schemas for Biodiversity Informatics are Darwin Core and ABCD.

Page 37: Data exchange alternatives, SBIS conference in Stockholm (2008)

37

Why sharing data?

Page 40: Data exchange alternatives, SBIS conference in Stockholm (2008)

40

Distributed network

The image is from the BioCASE web site

Page 41: Data exchange alternatives, SBIS conference in Stockholm (2008)

41

Decentralized network

EURISCO(Europe)

NordGen(Northern Europe)

IPK Gatersleben(Germany)

IHAR(Poland)

(Other European gene banks...)

SINGER(CGIAR)

(CGIARInternationalFuture Harvest gene banks...)

USDA GRIN (USA)

(USDA ARSNational Germplasm Repositories...)

WUR CGN(Netherlands)

GBIF(Global BiodiversityInformation Facility)

USER

ALIS(Accession Level Information System)

Web Services

MCPD

Svalbard Global Seed Vault(Safe Backup)

Page 42: Data exchange alternatives, SBIS conference in Stockholm (2008)

42

Crop Wild Relatives

ARMLKA

BOL

MDG

UZB

National Datasetsare shared with the central CWR data index.

The national datasets as well as access to other International datasets are provided from the CWR data portal.

EURISCO

SINGERhttp://www.cropwildrelatives.org

Page 43: Data exchange alternatives, SBIS conference in Stockholm (2008)

43

Data portal example

Page 44: Data exchange alternatives, SBIS conference in Stockholm (2008)

44http://wwwdev.ngb.se/portal/index.php?scope=demo

Page 45: Data exchange alternatives, SBIS conference in Stockholm (2008)

45

Page 46: Data exchange alternatives, SBIS conference in Stockholm (2008)

46

Page 47: Data exchange alternatives, SBIS conference in Stockholm (2008)

47

Page 48: Data exchange alternatives, SBIS conference in Stockholm (2008)

48

Page 49: Data exchange alternatives, SBIS conference in Stockholm (2008)

49

Outlook• The compatibility of data standards between PGR and biodiversity

collections made it possible to integrate the worldwide germplasm collections into the biodiversity community.

• Using GBIF technology (and contributing to its development), the PGR community can easily establish specific PGR networks without duplicating GBIF's work.

• Use of GBIF technology and integration of PGR collection data into GBIF allows PGR users to simultaneously search PGR collections and other biodiversity collections, and to get access to the data (and possibly the material) of relevant biodiversity collections.

• The establishment of new data portals on a specific crop, a regional thematic network or similar subset of the total global biodiversity datasets; can be done with rather few efforts! This requires only that all the relevant datasets are provided by GBIF compatible web services (like the BioCASE PyWrapper).

Page 50: Data exchange alternatives, SBIS conference in Stockholm (2008)

50

• Participation and the sharing of your institute datasets with global and national biodiversity projects

• is important for your public and scientific visibility,

• promoting the use (usefulness) of your data

• and ultimately for the continued funding of your institutional activities.

Page 51: Data exchange alternatives, SBIS conference in Stockholm (2008)

51

Special thanks to

• Bioversity International [http://www.bioversityinternational.org]

• GBIF, Global Biodiversity Information Facility [http://www.gbif.org]

• BioCASE, The Biological Collection Access Service for Europe. [http://www.biocase.org]

• TDWG, Biodiversity Information Standards [http://www.tdwg.org]

Page 52: Data exchange alternatives, SBIS conference in Stockholm (2008)

52

Special thanks to

• BioCASE and PyWrapper3 software– Markus Döring– Javier de la Torre

• DiGIR and TapirLink software– Renato de Giovanni– Dave Vieglais

Page 53: Data exchange alternatives, SBIS conference in Stockholm (2008)

53

Thank you for listening!