Upload
randell-floyd
View
214
Download
0
Embed Size (px)
Citation preview
BioDataa new bioassessment database for the USGS
Briefing for the CDI 2011.06.08
http://aquatic.biodata.usgs.gov
Today
What is BioData? Why Did We Build It? Current Capabilities Future Possibilities Data Integration/Interoperability Challenges
What is BioData? – in a nutshell
A data management, storage, and distribution system for aquatic bioassessment data.
• data capture• data curation• data publication
Why We Built It - A Brief History
1992 – National Water-Quality Assessment Program (NAWQA) began collecting bioassessment data (macroinvert, fish, algae, stream habitat)
NAWQA Study Units
Why We Built It - A Brief History
1992 – National Water-Quality Assessment Program (NAWQA) begins collecting bioassessment data (macroinvert, fish, algae, stream habitat)
1992 – 1999: Local data management and national data aggregations
1999 – NAWQA national bioassessment database – (BioTDB)
WRD Needs Assessment (2006)
Surveyed WRD Science Centers to find out: How much aquatic ecology data is being collected
outside the NAWQA Program? What kinds? What methods? Where and how are data being stored?
What We Discovered
Water collaborative projects with other agencies, states, localities, and partners are producing as much data as the NAWQA Program 80 % of WSC’s reported projects collecting aquatic ecology
data 120 projects had a macroinvertebrate, fish, algae, or habitat
component (2000 – 2005) Approximately 15,000 samples
The majority of samples are being collected using NAWQA and USEPA national stream bioassessment protocols
Samples are being sent to a variety of taxonomic labs
What We Discovered
The data are stored electronically, but are very difficult to discover, access, and integrate 47% in Excel 13% are in EPA databases 19% in home-grown relational databases
79%
U.S. Department of the InteriorU.S. Geological Survey
BioDataa new bioassessment database for the USGS
briefing for the USGS GCMRC 5/9/2011
http://aquatic.biodata.usgs.gov
What Should We Do?
1. Do nothing?
2. Implement a federated system?
3. Incrementally refurbish existing NAWQA database?
4. Redesign and “re-build” using modern, web-enabled, extensible architecture? (BioData)
Biodata - Version 1 Objective
A data storage, retrieval, and distribution system for aquatic bioassessment data most commonly produced by USGS WRD projects.
“Most Commonly Produced” Project Objectives
Setting
Types of Data
Sampling Protocols
Bioassessment and monitoring
Streams and rivers
Macroinvertebrates Fish Algae Study reach habitat
NAWQA USEPA
Additional Characteristics
An internet application Available to any USGS ecologist. Designed to be adapted and extended Support scientific workflow Serve as an online data archive Curate taxonomic nomenclature - map it
forward and harmonize it across all the data Support biologist lab data exchange Readily add web data services
BioDataRetrieval
(DWH)
project data management
BioDataInput
data distribution
field data lab data
• field data input• data exchange with
labs• data review
external data
• NAWQA legacy data
public web site
web data services
application-specific output
Data Retrieval Featureshttps://aquatic.biodata.usgs.gov
Real-time feedback on how many samples your query will return
Save the query to your desktop – then email to friends for them to run
Variety of file formats Multiple data sets downloaded in one step
Data Retrieval Demo
https://aquatic.biodata.usgs.gov
BioDataRetrieval
(DWH)
project data management
BioDataInput
data distribution
field data lab data
• field data input• data exchange with
labs• data review
external data
• NAWQA legacy data
public web site
web data services
application-specific output
Data Input/Management Features
Retrieve restricted (unreleased) data Manage and organize data by project Project control over rights to enter and edit
data Built in help and data validation checks Auto-saving Data entry screens tailored to field sheets Send electronic orders to labs
Data Input/Mgt Demo
Data integration – touchpoints
First challenge – find the data Second challenge - compatible methods?
Data integration – touchpoints
First challenge – find the data Second challenge - compatible methods? Third challenge – get the data
We need to pick a data exchange standard
Data integration – touchpoints
First challenge – find the data Second challenge - compatible methods? Third challenge – get the data Fourth challenge – harmonize taxonomy
Does “Thienemannimyia group” = “Thienemannimyia gr.” ?? Does ITIS solve this?
ITIS
ITIS
Only handles published names We have to handle unpublished names Provisional = new taxon claimed but not
“officially” published Conditional = uncertain or indeterminate
identification, e.g. “Thienemannimyia group”
ITIS is not complete for all groups Fish – good, we can integrate tightly with it Macroinvertebrates – doable Algae – ITIS not ready yet
Data integration – touchpoints
First challenge – find the data Second challenge - compatible methods? Third challenge – get the data Fourth challenge – harmonize taxonomy
Does “Thienemannimyia group” = “Thienemannimyia gr.” ??
Fifth challenge – integrate with physio-chemical and ancillary data Common geospatial framework would help
NHD
Which NHD? NHD “snap to” service with API’s that
developers could use in their application(s)? Service to translate NHD address to other
versions of NHD (and future)