24
SINE Workshop, 29-31 Oct 2001, SDSC Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit James H. Beach Biodiversity Research Center University of Kansas [email protected]

Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit

  • Upload
    tave

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit. James H. Beach Biodiversity Research Center University of Kansas [email protected]. Museums and their Data. 3 B specimens – and data – documenting the distribution of life on earth 2 M species - PowerPoint PPT Presentation

Citation preview

Page 1: Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit

SINE Workshop, 29-31 Oct 2001, SDSC

Biodiversity Data Retrieval and Integration

Distributed species, data, computation and credit

James H. BeachBiodiversity Research Center

University of [email protected]

Page 2: Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit

SINE Workshop, 29-31 Oct 2001, SDSC

Museums and their Data

• 3 B specimens – and data – documenting the distribution of life on earth

• 2 M species • 300 years of biological exploration• Data are held in dynamic, autonomous, self-

organizing and spatially-distributed collections

Page 3: Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit

SINE Workshop, 29-31 Oct 2001, SDSC

Paris Museum Mexican Birds

Page 4: Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit

SINE Workshop, 29-31 Oct 2001, SDSC

British Museum Mexican Birds

Page 5: Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit

SINE Workshop, 29-31 Oct 2001, SDSC

Field Museum Mexican Birds

Page 6: Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit

SINE Workshop, 29-31 Oct 2001, SDSC

KU Museum Mexican Birds

Page 7: Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit

SINE Workshop, 29-31 Oct 2001, SDSC

“World Museum” Mexican Birds

Page 8: Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit

SINE Workshop, 29-31 Oct 2001, SDSC

The Species Analyst Network

Data Resources

ClientAPI

DesktopApplications

Broadcast query

• Direct access to live primary data• Ownership and control maintained locally• Z39.50, HTTP, XML data, XML Query

Page 9: Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit

SINE Workshop, 29-31 Oct 2001, SDSC

Species Analyst HTML Gateway

Page 10: Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit

SINE Workshop, 29-31 Oct 2001, SDSC

Results of Species Analyst Query

Page 11: Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit

SINE Workshop, 29-31 Oct 2001, SDSC

GARP: Genetic Algorithm for Rule-set Production

• Developed by David Stockwell, San Diego Supercomputer Center

• Takes advantage of multiple algorithms (BIOCLIM, logistic regression, etc.)

• Different decision rules may apply to different sectors of species’ distributions

• Uses a genetic algorithm for choosing rules• Implemented on WWW, and open for

public use

Page 12: Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit

SINE Workshop, 29-31 Oct 2001, SDSC

Species Analyst + GARP: A Powerful Tool

• Integrates distributed biodiversity data• Provides current information on species’

ranges• Models species’ ecological niches • Predicts geographic distributions• Integrates niche models with environmental

change scenarios, e.g. global climate change and biodiversity, invasive species, emerging diseases

Page 13: Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit

SINE Workshop, 29-31 Oct 2001, SDSC

Asian Longhorn Beetle (Anoplophora glabripennis)

Page 14: Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit

SINE Workshop, 29-31 Oct 2001, SDSC

Longhorn Beetle - Modeled Asian Distribution

Page 15: Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit

SINE Workshop, 29-31 Oct 2001, SDSC

Asian Longhorn Beetle – Predicted U.S. Distribution

Page 16: Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit

SINE Workshop, 29-31 Oct 2001, SDSC

A Global Encyclopedia of Life or The World According to GARP

• Research– Biogeographic analysis on distributions– Invasive species predictions– Monitoring and conservation planning– Global climate change impacts on Biota

• Outreach, Education and Training– Backyard biodiversity, spatial data queries, GIS

functions– Interactive data entry, observational data

• Data Analysis Services for Museums– Uniqueness and value of collections holdings– Data quality issues– Summary statistics and analyses

Page 17: Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit

SINE Workshop, 29-31 Oct 2001, SDSC

A Global Encyclopedia of Life or The World According to GARP (2)

• Every documented species with georeferenced localities in the Species Analyst Network

• North America, Western Hemisphere, World• Resolution 1 Km grid NA, 10 Km elsewhere• 1 M+ species in collections with data?• Computational Requirements

Page 18: Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit

SINE Workshop, 29-31 Oct 2001, SDSC

Metacomputing Museum Data

• Global species distributions: parallel computation

• SETI @ Home– Collaborative computing– 1 M simultaneous users

• Port GARP to Win32 to run in background or foreground

Page 19: Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit

SINE Workshop, 29-31 Oct 2001, SDSC

Lifemapper =

Georeferenced Species Data + Distributed Query

Architecture+ Predictive Modeling + Distributed Computation+ Spatial Map and Model

Archive + Open Access Web Portal

Page 20: Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit

SINE Workshop, 29-31 Oct 2001, SDSC

Lifemapper Demonstration

• Server• GARP

client

Page 21: Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit

• Diversify modeling options, add interactivity, 3D analysis and visualization

• Add new classes of data layers, remote sensing, human impacts element, ecological models

• Add observational species data• Embed dispersion models, temporal

dimension • Add internet services API, UDDI, SOAP• Add more value-added services for data

providers• Embed LM data and analysis tools within a

semantic research and decision support network

• Integrate LM into informal and formal science education

Lifemapper Future Directions

Page 22: Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit

SINE Workshop, 29-31 Oct 2001, SDSC

Lifemapper Social Scaling

• Distributed authorship• Desktop computing• User preferences• Value-added collections data analysis• Acknowledgement and accreditation of

contributions, ranks and statistics

Page 23: Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit

SINE Workshop, 29-31 Oct 2001, SDSC

Museums as Sensor Networks

• Data are dynamic, servers & connections– Deborah Estrin -- Adaptive self-organization of the network,

unattended and untethered -- parallels to curators and collection managers.

• Self-assembling, observational data• Do not usually have the requirement of real time• Changes are as important

– Source data (West Nile virus), model outputs– Frank Vernon mentioned that in many cases it is not the data

values per se it is the change that is of importance

• People as part of the Network– Doug Goodin people are part of the technological system” museum

are sensors, they are observatories, but the latency of bringing the data into analysis engines is not measured in milliseconds but in field seasons, or decades to get formal publication of new scientific concepts. Many specimens and data are centuries old

Page 24: Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit

SINE Workshop, 29-31 Oct 2001, SDSC

Acknowledgements

• University of Kansas– Dave Vieglais, Ricardo Pereira, Aimee

Stewart, Greg Vorontsov, Town Peterson, BRC• SDSC

– David Stockwell, Environmental Computing• University of Massachusetts-Boston

– Bob Morris, CS, Rob Stevenson, Biology• UC Berkeley

– John Wiecorek, Museum of Vertebrate Zoology– Dan Wertheimer, Space Science Laboratory

• Agriculture Canada– Derek Munro, ITIS Canada Office

• California Academy of Sciences– Stan Blum, Informatics