Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
The AgBioData ConsortiumGenetic, Genomic and Breeding
Databases Working TogetherLisa Harper,
Jacqueline Campbell, Ethy Cannon, Sook Jung, Dorrie Main, Monica Poelchau, Ramona Walls and AgBioData Members
NSF Midwest Big Data Hub- Digital AgricultureAll Hands MeetingSept 21, 2018 Lincoln NE
Overview Introduction to AgBioDataObjective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards complianceThoughts on what make a successful collaboration
2
Data Through the Ages Proper data management is critical for scientific
discovery and reproducibility. This has always been true
Databases store these data and decide on the best methods for data storage, descriptions, access and sharing. This has been true since late 1980s.
The NEW thing: Database Curation has become increasingly important as data volume and complexity increases. This has been sneaking up on us curators for the past 20 years. 3
Data vs. Curation in Ag Biology
4
Data Volume and Complexity
Curator Time
1990s Today
Sometimes I feel like this…
5
Gotta moveonto that NEW
project!Wait, someone
else can USE this Data!!
Idea
Funding
Experiments
AnalysisPublication
Data LifeCycle
Comply with the FAIR data principles –making data Findable, Accessible, Interoperableand Re-usable.
https://www.force11.org/group/fairgroup/fairprinciples
Reuse
AgBioData: Background Consistent data handling of genomic, genetic and breeding
(GGB) data in agriculture is still challenging
Many Database curators do similar things, yet don’t Communicate and Share.
The AgBioData consortium was founded 2015 in an effort to “leverage” our work.
We aim to improve handling of data in GGB databases via collaboration, communication, and the development of recommendations and standards specific to GGB data (FAIR).
Facilitate enhanced basic, translational and applied research outcomes
7
The AgBioData consortium
Comprised of > 100 members from >30 databases
8https://www.agbiodata.org/databases
AgBaseAgroPortal
Animal QTLdbAraPort
CassavaBaseCitrus Genome Database
Cool Season Food Legume DatabaseCottonGen
CyVerseGenome Database for Rosaceae
Genome Database for VacciniumGrainGenes
GrameneGRIN
Hardwood Genomicsi5K National Ag Library
Legume Information SystemMaizeGDB
Maize Stock CenterMusaBase
National Animal Disease CenterPeanutBasePlanteome
Solanaceae Genomics NetworkSoyBase
SweetPotatoBaseT3
TAIRTreeGenes
YamBase
Obj 1: Developing and Agreeing on Standards and Best Practices. Following Meetings at PAG and via Zoom, we divided
into 6 Working Groups: Biocuration, Ontologies, Metadata and Persistance, Data Sharing, Database Platforms and Communications (2015-2017).
Working Groups meet independently every month via Zoom (2016-2017).
Monthly “All hands” virtual meetings (ongoing).
9
Obj 1: Developing and Agreeing on Standards and Best Practices. In 2017, NSF supported 45 AgBioData members to attend
a workshop to develop a white paper describing challenges and recommendations for GGB databases
This was a 2 day WORK ONLY meeting. No formal talks.
Followed by more virtual meetings.
White paper submitted in March 2018, pub Sept 2018
10
AgBioData consortium recommendations for sustainable genomics and genetics databases for agriculture
Each Section contains:Overview of the issues
Challenges and Opportunities
Recommendations11
https://doi.org/10.1093/database/bay088
BioCuration Regular communication between biocurators
Work with researchers, publishers and funders to promote more data love, and self curation
Report errors and omissions of data to authors, editors, databases and publishers
12
Ontologies Use ontologies in data curation to allow cross-query
and integration across platforms
Make methods transparent to researchers when computational annotation is done instead of manual curation.
Facilitate ontology annotation by data generators
13
Metadata and Persistence Use what already exists if possible, collaborate with
existing groups esp. when creating new metadata standards.
Improve database compliance.
Encourage and enforce use of persistent identifiers for data sets.
Explore tools and processes that can ease the burden of metadata collection for both data curators and researchers.
14
Programmatic access to Data Move to a federated model of data exchange.
Minimize authentication hurdles to data access.
Store data in a manner that is consistent with community standards and adopt appropriate ontologies.
Select one or more database schemas and APIs that are well supported by the development community.
Provide explicit licensing information that specify download and reuse policies.
15
Database Platforms Choose from available platforms and use open-source
tools if possible
Plan for web services.
Make your database discoverable by indexing and providing a search engine.
Make your data connected using best practices for exposing, sharing and using Uniform Resource Identifiers (URI) and Resource Description Framework (RDF).
16
Communication (People) Communication among databases folks
- Join AgBioData- Attend monthly conference calls
Communication with researchers- Provide tutorials and outreach- Communicate on data management
Communication with funding agencies and journals- Engage in joint outreach activities - Collaborate on data management guidelines and
enforcement 17
Read the Paper in Database (Oxford)
2018
18
Current Projects Monthly Seminars/Discussions (via Zoom) on Database issues.
Genome Nomenclature 1. How to name and version genome assemblies.
2. How to name and version the “official” annotation sets for each assembly.
3. How to name and version second party annotations.
4. How to name each gene model within a set- while trying to keep the assembly and version obvious
Etc.19
Now the Hard Part
20
Obj 2: Working Towards Compliance Funding for AgBioData activities
Currently applying for a NIFA FACT Coordinated Innovation Network grant, and considering more
Establish Priorities and means of compliance
Requires a “mind set” shift.
Eventually have personnel dedicated to interoperability, consistency and enforcement of data standards
21
Role of Databases Currently
22
Gotta move onto that NEW project!Wait, someone else can
USE this Data!!
Future
23
Here is my Data!With Metadata! Correct format!
Thank You!!I will take GOOD Care
of it!
So easy to get the Data!
Thoughts on Large Collaborations Recruit Team Players
Value everyone, reward non-PIs, Foster a sense of mutual trust and respect
Provide means for good communication for all
Work with an Amazing Steering Committee
Promote Open and Free exchange of ideas
GO SLOW and steady. Don’t overwork people
Go gently, but Bully with Enthusiasm when needed24
Requests (please) Acknowledge/cite the databases you use in
your research
Provide your data (FAIR)
Be advocates for your database
25
Acknowledgements NSF PGRP
USDA ARS
USDA NIFA
Land Grant Universities
AgBioData community of researchers
THANK YOU!!