Upload
cyndy-parr
View
309
Download
0
Embed Size (px)
Citation preview
Big Data Initiatives for Agroecosystems
Cynthia ParrKnowledge Services DivisionNational Agricultural Library
Ecological Society of America, 2015
Outline
• Data management at the National Agricultural Library
• Four examples1. Insects 5K – i5K Workspace2. Life Cycle Assessment3. Long-Term Agroecosystem
Research 4. Ag Data Commons
• General principles8.1 million items, Agricola, PubAg
3http://blog.thingarage.com/
raw data
citable publication
4
raw data collection
cleaning, enrichment, analysis
registration, preservation
temporary data
referable data
citable data
citable publication
Modified from Peter Wittenberg, Research Data Alliancehttps://rd-alliance.org/group/data-fabric-ig.html
i5k.nal.usda.gov
5
Genome project hosting at the i5k Workspace
• 27 pilot genomes hosted; 45 total– Storage and dissemination of a
genome assembly and anything mapped to it.
– BLAST, JBrowse Genome Browser• Manual Curation: Web Apollo• Post-curation maintenance
– Quality Control – Official Gene Set generation
• Research plan• Generate material• Sequencing• Assembly• Automated
annotation
• Manual Curation• Official gene set
generation• Genome project
maintenance
• Biological insights/Publication
Genome Project Trajectory
Unformatted, non-standard
LCA Commons Concept
LCA Community
Open LCA FrameworkCommon computing environment, application,
data standards, and development
NAL LCADC
NREL USLCI
XYZ LCI DB
ABCLCI DB
Distributed computing environment & application
Common data standards
Distributed computing environment
DEFLCI DB
Common application & data standards
Interoperability Tools
Ag Data Commons
Catalog and Repository
Long Term Agro-ecosystem Research (LTAR)
LTAR Data
Common Observatory– Meteorology– Hydrology– Eddy flux CO2
– Non-CO2 gasses– Soil– Biological
10
Common Experiment Approach
– Business as usual– Aspirational
Will include data about– Management practices– Results
LTAR Data Loss N=194 of ~500 citations in 2011 LTAR site proposals
Bad links to data
No data available
80% of papers provide no way to obtain data
Data are accessible
Refers to general data source
LTAR information management
• Support for download of files, web services• Metadata in FGDC CSDGM, ISO 19115, EML,
Project Open Data• Catalog of instrument specs using SensorML 2• Data dictionaries in ISO 19110• Weather data to be converted to other formats• Field names could be converted to match different
conventions (AgMIP, etc.)
Ag Data Commons
13
data.nal.usda.gov
EnhancedDKAN
Distributed repositories
AG DATA COMMONS
Search & Knowledge Discovery
Thesaurus &Indexing
Ag Data CommonsRepository
Organization & Curation
Grant management
systems
INGESTION DISSEMINATION
PubAg
DatasetSubmission
Analytics & Tools
Data.govForest Service
NCBI
Ag Data Commons
Catalog
Color Legend:BuildingAdapt/Re-useExisting
LCA Commons
Guiding principle 1:a distributed network ….
Geospatial Catalog
Geospatial Repository
STEWARDS
Ag Data Commons (catalog)
Ag Data Commons
(repository)
USDA Enterprise Inventory
National Weather Service
Data.gov
Ecosystems.data.gov
of Networks…
Public access to open, machine readable data enables larger
scale, integrative and innovative data science
The long tail
Guiding principle 2:big data AND long tail
Guiding principle 3:curation adds value
• Data dictionaries• Standards & templates• Linkages• Semantics• Preservation
Thanks!
National Agricultural LibraryKnowledge Services Division: Susan McCarthy
LTARJeffrey Campbell, Charles Lockwood
i5K Monica Poelchau, Chris Childers
LCA Commons Peter Arbuckle, Ezra Kahn
Ag Data Commons Ursula Pieper, Jocelyn McNamara, Qing Qu, Erin Antognoli, Melissa Lowrey, Jaylen Nathwani, NuCivic
… and collaborators and testers