12
Building Integrated Data Streams for Large-Scale Paleoclimatology & Biogeography CDSCO Neotoma DB www.neotomadb.org Neotoma DB www.neotomadb.org C4P Jack Williams Simon Goring UW-Madison

Building Integrated Data Streams for Large- Scale Paleoclimatology & Biogeography CDSCO Neotoma DB Neotoma DB Jack

Embed Size (px)

Citation preview

Page 1: Building Integrated Data Streams for Large- Scale Paleoclimatology & Biogeography CDSCO Neotoma DB  Neotoma DB  Jack

Building Integrated Data Streams for Large-Scale Paleoclimatology & Biogeography

CDSCO

Neotoma DB

www.neotomadb.org

Neotoma DBwww.neotomadb.org C4P

Jack WilliamsSimon GoringUW-Madison

Page 2: Building Integrated Data Streams for Large- Scale Paleoclimatology & Biogeography CDSCO Neotoma DB  Neotoma DB  Jack

Many Big Questions require assembly of individual records into larger networks

Do global temperatures lead or lag CO2 during deglaciations?

21,000 11,000 Modern15,000 7,000

%

Spruce distributions: last glacial maximum to present

%

%

%

No Data

Williams et al. (2004) Ecological Monographs

SprucePollen

Ice IceIce

How far and fast can species migrate when climates change?

Global temperatures & CO2: 22ka->0ka

Shakun et al. (2012) Nature

Page 3: Building Integrated Data Streams for Large- Scale Paleoclimatology & Biogeography CDSCO Neotoma DB  Neotoma DB  Jack

Paleoecological Data: Key characteristics

• ‘Long Tail’: Collected in the field by small scientific teams. Workers vary w.r.t. data management expertise, capacity, interest

• Highly valuable – specimens & samples collected decades ago are still analyzed

• Scientific expertise distributed by proxy type, region, time period, and/or taxonomic group

C4P

Page 4: Building Integrated Data Streams for Large- Scale Paleoclimatology & Biogeography CDSCO Neotoma DB  Neotoma DB  Jack

Community Data Repositories have emerged to tackle these bigger questions

Neotoma DBwww.neotomadb.org

Key Characteristics

Open Data

Curated by Community

Standardized Taxonomy

Time: Age Controls and Age Models

Paleobiology DBpaleobiodb.org

Page 5: Building Integrated Data Streams for Large- Scale Paleoclimatology & Biogeography CDSCO Neotoma DB  Neotoma DB  Jack

PALEOBIOLOGICAL DATA

CONSORTIUM

COMMUNITYGEODATA

OPEN-SOURCE

BIODATA

Paleobiology DB

NOW DBContinental Scientific Drilling Office (CDSCO)

Digimorph

NOAA Paleoclimatology

DarwinCore

iDigPaleo

MorphoBank

Neotoma DB

VertNet

Early Career Members-at-Large

ROpenSci

GBIF/BISON

STEPPE

Open Geospatial Consortium

Integrated Earth Data Alliance

iDigBio

C4P

• Share best practices & protocols

• Build compatibility between geo- & bioinformatics

Page 6: Building Integrated Data Streams for Large- Scale Paleoclimatology & Biogeography CDSCO Neotoma DB  Neotoma DB  Jack

Neotoma Paleoecology Database: Design Concepts• Spatiotemporal database: species

occurrences & abundances in space and time

• Age controls and age models stored

• Centralized IT and Distributed Scientific Governance. Neotoma composed of several constituent databases (e.g. North American Pollen Database, FAUNMAP)

• Open data accessible via Explorer, APIs, R Neotoma

• Broad user community: Paleoecologists, ecosystem modellers, paleoclimatologists, biogeographers, educators, … Neotoma DB

www.neotomadb.org

Page 7: Building Integrated Data Streams for Large- Scale Paleoclimatology & Biogeography CDSCO Neotoma DB  Neotoma DB  Jack

• Time: Late Neogene (~last 5 million years)

• Most records: 104-105 yrs• Space: North American to

Global• Datasets:

• Plants & pollen• Vertebrates• Ostracodes• Diatoms• Insects• Testate Amoebae• Physical Sedimentology

Brewer et al. 2012 TREE

Neotoma Domain Temporal Domains of Paleoecological Databases

Neotoma DBwww.neotomadb.org

Page 8: Building Integrated Data Streams for Large- Scale Paleoclimatology & Biogeography CDSCO Neotoma DB  Neotoma DB  Jack

Paleoecol-ogists

Ecosystem Modelers

Biogeograph-ers

Neotoma DB

Neotoma as Boundary OrganizationData UsersPaleoecologists

Pollen

Vertebrates

Insects

Diatoms

Ostracodes

Amoebae

Packrat Middens

Informatics & Computer Scientists

IEDA GeoWSOpen Core

Paleoclimat-ologists

Best PracticesShared Protocols

Data Data

New Questions

Page 9: Building Integrated Data Streams for Large- Scale Paleoclimatology & Biogeography CDSCO Neotoma DB  Neotoma DB  Jack

Paleodata Workflows:

State of Field1. Cores Collected

2. Cores Split, Sampled, Logged

3. Proxies Measured by PIs

4. Papers Written

5. Data & Metadata Assembled

6. Data Deposited (Journals, NOAA-Paleo, Neotoma, etc.)

Consequences: • Variably documented data

• Challenging project management

• Multiple inefficiencies, sources of data friction

• Synthetic research hard at anything beyond site scale

Neotoma DBwww.neotomadb.org C4P

7. Data Synthesized into Regional-Global Studies

9. New Analyses.

8. Metadata gaps discovered

Page 10: Building Integrated Data Streams for Large- Scale Paleoclimatology & Biogeography CDSCO Neotoma DB  Neotoma DB  Jack

Key Need: Integrated Data Workflows1. Cores Collected, Tagged with IGSNs, Metadata Logged In

Field

2. Cores Split, Sampled, Logged, Samples Tagged with IGSNs, Data

Stored in Common Data Structures (Open Core Data)

3. Proxies Measured by PIs, Data Stored in Common Data Formats

4. Papers Written, Embargoed Data Passed to Community Data Repositories

(e.g. Neotoma)

5. Data & Metadata Assembled

6. Paper Published, Embargo Lifted from RepositoryNeotoma DBwww.neotomadb.org C4P

Page 11: Building Integrated Data Streams for Large- Scale Paleoclimatology & Biogeography CDSCO Neotoma DB  Neotoma DB  Jack

Current & Future Neotoma Activities1. Data Uploads

2. Partnership with LacCore/CDSCO et al. to establish common standards & linked data flows

3. neotoma R – establishing data models, integration with R packages

4. API development, user-driven

5. New tools for data visualization & exploration

Neotoma DBwww.neotomadb.org

1

Neotoma2

Users

Page 12: Building Integrated Data Streams for Large- Scale Paleoclimatology & Biogeography CDSCO Neotoma DB  Neotoma DB  Jack

This talk represents the work of many

Neotoma PIs & Developers: Eric C. Grimm, Russ Graham, Mike Anderson, Allan Ashworth, Brian Bills, Jessica Blois, Bob Booth, Ed Davis, Don Charles, Simon Goring, Steve Jackson, Alison Smith, Jack Williams

C4P Steering Committee: Kerstin Lehnert, David Anderson, Doug Fils, Leslie Hsu, Chris Jenkins, Anders Noren, Tom Olsewski, Dena Smith, Mark Uhen, Jack Williams

Neotoma DBNSF-Geoinformatics

NSF-Earth Cube

Eric Grimm

C4P