25
The AgBioData Consortium Genetic, Genomic and Breeding Databases Working Together Lisa Harper, Jacqueline Campbell, Ethy Cannon, Sook Jung, Dorrie Main, Monica Poelchau, Ramona Walls and AgBioData Members NSF Midwest Big Data Hub- Digital Agriculture All Hands Meeting Sept 21, 2018 Lincoln NE

The AgBioData Consortium - digital.ag.iastate.edu · Overview Introduction to AgBioData Objective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards compliance

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The AgBioData Consortium - digital.ag.iastate.edu · Overview Introduction to AgBioData Objective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards compliance

The AgBioData ConsortiumGenetic, Genomic and Breeding

Databases Working TogetherLisa Harper,

Jacqueline Campbell, Ethy Cannon, Sook Jung, Dorrie Main, Monica Poelchau, Ramona Walls and AgBioData Members

NSF Midwest Big Data Hub- Digital AgricultureAll Hands MeetingSept 21, 2018 Lincoln NE

Page 2: The AgBioData Consortium - digital.ag.iastate.edu · Overview Introduction to AgBioData Objective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards compliance

Overview Introduction to AgBioDataObjective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards complianceThoughts on what make a successful collaboration

2

Presenter
Presentation Notes
Data ‘lifecycle’ within a database Database roles/tasks within the data life cycle
Page 3: The AgBioData Consortium - digital.ag.iastate.edu · Overview Introduction to AgBioData Objective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards compliance

Data Through the Ages Proper data management is critical for scientific

discovery and reproducibility. This has always been true

Databases store these data and decide on the best methods for data storage, descriptions, access and sharing. This has been true since late 1980s.

The NEW thing: Database Curation has become increasingly important as data volume and complexity increases. This has been sneaking up on us curators for the past 20 years. 3

Presenter
Presentation Notes
Data ‘lifecycle’ within a database Database roles/tasks within the data life cycle
Page 4: The AgBioData Consortium - digital.ag.iastate.edu · Overview Introduction to AgBioData Objective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards compliance

Data vs. Curation in Ag Biology

4

Data Volume and Complexity

Curator Time

1990s Today

Presenter
Presentation Notes
Data ‘lifecycle’ within a database Database roles/tasks within the data life cycle
Page 5: The AgBioData Consortium - digital.ag.iastate.edu · Overview Introduction to AgBioData Objective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards compliance

Sometimes I feel like this…

5

Gotta moveonto that NEW

project!Wait, someone

else can USE this Data!!

Presenter
Presentation Notes
Data ‘lifecycle’ within a database Database roles/tasks within the data life cycle
Page 6: The AgBioData Consortium - digital.ag.iastate.edu · Overview Introduction to AgBioData Objective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards compliance

Idea

Funding

Experiments

AnalysisPublication

Data LifeCycle

Comply with the FAIR data principles –making data Findable, Accessible, Interoperableand Re-usable.

https://www.force11.org/group/fairgroup/fairprinciples

Reuse

Page 7: The AgBioData Consortium - digital.ag.iastate.edu · Overview Introduction to AgBioData Objective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards compliance

AgBioData: Background Consistent data handling of genomic, genetic and breeding

(GGB) data in agriculture is still challenging

Many Database curators do similar things, yet don’t Communicate and Share.

The AgBioData consortium was founded 2015 in an effort to “leverage” our work.

We aim to improve handling of data in GGB databases via collaboration, communication, and the development of recommendations and standards specific to GGB data (FAIR).

Facilitate enhanced basic, translational and applied research outcomes

7

Presenter
Presentation Notes
Data ‘lifecycle’ within a database Database roles/tasks within the data life cycle
Page 8: The AgBioData Consortium - digital.ag.iastate.edu · Overview Introduction to AgBioData Objective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards compliance

The AgBioData consortium

Comprised of > 100 members from >30 databases

8https://www.agbiodata.org/databases

AgBaseAgroPortal

Animal QTLdbAraPort

CassavaBaseCitrus Genome Database

Cool Season Food Legume DatabaseCottonGen

CyVerseGenome Database for Rosaceae

Genome Database for VacciniumGrainGenes

GrameneGRIN

Hardwood Genomicsi5K National Ag Library

Legume Information SystemMaizeGDB

Maize Stock CenterMusaBase

National Animal Disease CenterPeanutBasePlanteome

Solanaceae Genomics NetworkSoyBase

SweetPotatoBaseT3

TAIRTreeGenes

YamBase

Presenter
Presentation Notes
Emphasize that the databases are diverse in maturity and scope – some well-established databases with model organism data are present (e.g. TAIR), and some newer resources with small research communities and ‘shallow’ data are also present (e.g. i5k workspace) – but all can benefit from standards and data sharing
Page 9: The AgBioData Consortium - digital.ag.iastate.edu · Overview Introduction to AgBioData Objective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards compliance

Obj 1: Developing and Agreeing on Standards and Best Practices. Following Meetings at PAG and via Zoom, we divided

into 6 Working Groups: Biocuration, Ontologies, Metadata and Persistance, Data Sharing, Database Platforms and Communications (2015-2017).

Working Groups meet independently every month via Zoom (2016-2017).

Monthly “All hands” virtual meetings (ongoing).

9

Presenter
Presentation Notes
Data ‘lifecycle’ within a database Database roles/tasks within the data life cycle
Page 10: The AgBioData Consortium - digital.ag.iastate.edu · Overview Introduction to AgBioData Objective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards compliance

Obj 1: Developing and Agreeing on Standards and Best Practices. In 2017, NSF supported 45 AgBioData members to attend

a workshop to develop a white paper describing challenges and recommendations for GGB databases

This was a 2 day WORK ONLY meeting. No formal talks.

Followed by more virtual meetings.

White paper submitted in March 2018, pub Sept 2018

10

Presenter
Presentation Notes
Data ‘lifecycle’ within a database Database roles/tasks within the data life cycle
Page 11: The AgBioData Consortium - digital.ag.iastate.edu · Overview Introduction to AgBioData Objective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards compliance

AgBioData consortium recommendations for sustainable genomics and genetics databases for agriculture

Each Section contains:Overview of the issues

Challenges and Opportunities

Recommendations11

https://doi.org/10.1093/database/bay088

Presenter
Presentation Notes
Data ‘lifecycle’ within a database Database roles/tasks within the data life cycle
Page 12: The AgBioData Consortium - digital.ag.iastate.edu · Overview Introduction to AgBioData Objective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards compliance

BioCuration Regular communication between biocurators

Work with researchers, publishers and funders to promote more data love, and self curation

Report errors and omissions of data to authors, editors, databases and publishers

12

Presenter
Presentation Notes
Data ‘lifecycle’ within a database Database roles/tasks within the data life cycle
Page 13: The AgBioData Consortium - digital.ag.iastate.edu · Overview Introduction to AgBioData Objective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards compliance

Ontologies Use ontologies in data curation to allow cross-query

and integration across platforms

Make methods transparent to researchers when computational annotation is done instead of manual curation.

Facilitate ontology annotation by data generators

13

Presenter
Presentation Notes
Data ‘lifecycle’ within a database Database roles/tasks within the data life cycle
Page 14: The AgBioData Consortium - digital.ag.iastate.edu · Overview Introduction to AgBioData Objective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards compliance

Metadata and Persistence Use what already exists if possible, collaborate with

existing groups esp. when creating new metadata standards.

Improve database compliance.

Encourage and enforce use of persistent identifiers for data sets.

Explore tools and processes that can ease the burden of metadata collection for both data curators and researchers.

14

Presenter
Presentation Notes
Data ‘lifecycle’ within a database Database roles/tasks within the data life cycle
Page 15: The AgBioData Consortium - digital.ag.iastate.edu · Overview Introduction to AgBioData Objective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards compliance

Programmatic access to Data Move to a federated model of data exchange.

Minimize authentication hurdles to data access.

Store data in a manner that is consistent with community standards and adopt appropriate ontologies.

Select one or more database schemas and APIs that are well supported by the development community.

Provide explicit licensing information that specify download and reuse policies.

15

Presenter
Presentation Notes
Data ‘lifecycle’ within a database Database roles/tasks within the data life cycle
Page 16: The AgBioData Consortium - digital.ag.iastate.edu · Overview Introduction to AgBioData Objective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards compliance

Database Platforms Choose from available platforms and use open-source

tools if possible

Plan for web services.

Make your database discoverable by indexing and providing a search engine.

Make your data connected using best practices for exposing, sharing and using Uniform Resource Identifiers (URI) and Resource Description Framework (RDF).

16

Presenter
Presentation Notes
Data ‘lifecycle’ within a database Database roles/tasks within the data life cycle
Page 17: The AgBioData Consortium - digital.ag.iastate.edu · Overview Introduction to AgBioData Objective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards compliance

Communication (People) Communication among databases folks

- Join AgBioData- Attend monthly conference calls

Communication with researchers- Provide tutorials and outreach- Communicate on data management

Communication with funding agencies and journals- Engage in joint outreach activities - Collaborate on data management guidelines and

enforcement 17

Presenter
Presentation Notes
Data ‘lifecycle’ within a database Database roles/tasks within the data life cycle
Page 18: The AgBioData Consortium - digital.ag.iastate.edu · Overview Introduction to AgBioData Objective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards compliance

Read the Paper in Database (Oxford)

2018

18

Presenter
Presentation Notes
Data ‘lifecycle’ within a database Database roles/tasks within the data life cycle
Page 19: The AgBioData Consortium - digital.ag.iastate.edu · Overview Introduction to AgBioData Objective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards compliance

Current Projects Monthly Seminars/Discussions (via Zoom) on Database issues.

Genome Nomenclature 1. How to name and version genome assemblies.

2. How to name and version the “official” annotation sets for each assembly.

3. How to name and version second party annotations.

4. How to name each gene model within a set- while trying to keep the assembly and version obvious

Etc.19

Presenter
Presentation Notes
Data ‘lifecycle’ within a database Database roles/tasks within the data life cycle
Page 20: The AgBioData Consortium - digital.ag.iastate.edu · Overview Introduction to AgBioData Objective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards compliance

Now the Hard Part

20

Presenter
Presentation Notes
Data ‘lifecycle’ within a database Database roles/tasks within the data life cycle
Page 21: The AgBioData Consortium - digital.ag.iastate.edu · Overview Introduction to AgBioData Objective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards compliance

Obj 2: Working Towards Compliance Funding for AgBioData activities

Currently applying for a NIFA FACT Coordinated Innovation Network grant, and considering more

Establish Priorities and means of compliance

Requires a “mind set” shift.

Eventually have personnel dedicated to interoperability, consistency and enforcement of data standards

21

Presenter
Presentation Notes
Data ‘lifecycle’ within a database Database roles/tasks within the data life cycle
Page 22: The AgBioData Consortium - digital.ag.iastate.edu · Overview Introduction to AgBioData Objective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards compliance

Role of Databases Currently

22

Gotta move onto that NEW project!Wait, someone else can

USE this Data!!

Presenter
Presentation Notes
Data ‘lifecycle’ within a database Database roles/tasks within the data life cycle
Page 23: The AgBioData Consortium - digital.ag.iastate.edu · Overview Introduction to AgBioData Objective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards compliance

Future

23

Here is my Data!With Metadata! Correct format!

Thank You!!I will take GOOD Care

of it!

So easy to get the Data!

Presenter
Presentation Notes
Data ‘lifecycle’ within a database Database roles/tasks within the data life cycle
Page 24: The AgBioData Consortium - digital.ag.iastate.edu · Overview Introduction to AgBioData Objective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards compliance

Thoughts on Large Collaborations Recruit Team Players

Value everyone, reward non-PIs, Foster a sense of mutual trust and respect

Provide means for good communication for all

Work with an Amazing Steering Committee

Promote Open and Free exchange of ideas

GO SLOW and steady. Don’t overwork people

Go gently, but Bully with Enthusiasm when needed24

Presenter
Presentation Notes
Data ‘lifecycle’ within a database Database roles/tasks within the data life cycle
Page 25: The AgBioData Consortium - digital.ag.iastate.edu · Overview Introduction to AgBioData Objective 1: Agreeing on Standards and Best Practices Objective 2: Working Towards compliance

Requests (please) Acknowledge/cite the databases you use in

your research

Provide your data (FAIR)

Be advocates for your database

25

Acknowledgements NSF PGRP

USDA ARS

USDA NIFA

Land Grant Universities

AgBioData community of researchers

THANK YOU!!

Presenter
Presentation Notes
Data ‘lifecycle’ within a database Database roles/tasks within the data life cycle