13
EC US Workshop on Plant Bioinformatics and Databases Sponsored by EC US Taskforce on Plant Biotechnology Research Jane Silverthorne and Annette Schneegans Workshop Co-Chairs Klaus Mayer (MIPs) and Doreen Ware (USDA-ARS)

EC US Workshop on Plant Bioinformatics and Databases Sponsored by EC US Taskforce on Plant Biotechnology Research Jane Silverthorne and Annette Schneegans

Embed Size (px)

Citation preview

Page 1: EC US Workshop on Plant Bioinformatics and Databases Sponsored by EC US Taskforce on Plant Biotechnology Research Jane Silverthorne and Annette Schneegans

EC US Workshop on Plant Bioinformatics and Databases

Sponsored by EC US Taskforce on Plant Biotechnology Research

Jane Silverthorne and Annette Schneegans

Workshop Co-Chairs Klaus Mayer (MIPs) and Doreen Ware (USDA-ARS)

Page 2: EC US Workshop on Plant Bioinformatics and Databases Sponsored by EC US Taskforce on Plant Biotechnology Research Jane Silverthorne and Annette Schneegans

Workshop • Objective: Identify recommendations for priority areas of collaboration

between EC and US in the area of Plant Bioinformatics and Databases• Welcome Trust Center Hinxton December 6-7, 2009• 41 Participants: Scientist and Observers from EC and US• Presentations: XLIXIR (Mark Foster), iPlant Initiative (Steve Goff), Animal

Bioinformatics Workshop Nov. 2009 (Jeff Silverstein), Next Generation Needs (Ewan Birney)

• Sessions/Breakout groups:– Cyberinfrastructure (Dan Stanzione, Paul Kersey )– Sequencing and Databases (Catherine Feuillet, Richard W. McCombie) – Data Integration (Heiko Schoof and Chris Town)– Data Analysis( Volker Brendel, Michelle Clamp)– Phenotyping and Ontology( Pankaj Jaiswal, Chris Schön)– Data Education (Ottoline Leyser, Anne-Francoise Lamblin)

• Whitepaper: – Session Chairs and Workshop Organizers; Preliminary draft in progress.

Page 3: EC US Workshop on Plant Bioinformatics and Databases Sponsored by EC US Taskforce on Plant Biotechnology Research Jane Silverthorne and Annette Schneegans

Vision

• A unified platform for plant genome biology

• A CI that would facilitate any analysis, at any scale, on any published data, by users at any level of expertise

Page 4: EC US Workshop on Plant Bioinformatics and Databases Sponsored by EC US Taskforce on Plant Biotechnology Research Jane Silverthorne and Annette Schneegans

Requirements to meet the “Vision”

• Primary and derived data needs to be annotated with standard metadata describing provenance, experimental design and analysis procedures.

• Knowledgebases need to store and present this data and ensure standards compliance, comparability and integration into an interoperability framework.

• Integration happens through user interfaces that aggregate data from the knowledgebases and allow cross-database queries and computational tools that cross-associate and correlate heterogeneous data.

Page 5: EC US Workshop on Plant Bioinformatics and Databases Sponsored by EC US Taskforce on Plant Biotechnology Research Jane Silverthorne and Annette Schneegans

Recomendations• Training that will make use of complementary approaches that address the

need for timeliness, efficiency, alternative methods of education and human networking

– In the life sciences, adoption and integration requires a user community that is educated in bioinformatics concepts, methods and tools as well as equipped with skills in computational and quantitative analytical approaches from the fields of computer sciences, statistics and mathematics.

• Standards for interoperability that will move plant informatics from the

current state to one in which the vision becomes technically feasible– Successful adoption of standards and best practices reduces duplication of effort

will then concentrate on innovative contributions and reduces the barrier of entry for new participants to test new approaches

• Genome Stewardship: genome sequencing and annotation are evolving processes

– there is lifecycle associated with data sets, whereby additional improvement for a reference genome sequence and annotation is needed even after a draft reference genome has been produced, for added value.

Page 6: EC US Workshop on Plant Bioinformatics and Databases Sponsored by EC US Taskforce on Plant Biotechnology Research Jane Silverthorne and Annette Schneegans

Education/Training

• on-demand web-based training material relevant for modern biology– e-learning resources offer an “anytime and

anywhere” on demand education opportunity that allows a broader outreach

• Creation of opportunities for short residential courses and exchange programs (e.g. http://www.ebi.ac.uk/training/MarieCurie/).– Both research and Infrastructure– opportunities to develop and nurture these

collaborations and partnerships will stimulate the required interdisciplinary interactions

Page 7: EC US Workshop on Plant Bioinformatics and Databases Sponsored by EC US Taskforce on Plant Biotechnology Research Jane Silverthorne and Annette Schneegans

Standards Development

• Standards Development is often best achieved as a collaborative effort between knowledgebases, data generators and developers of cyberinfrastructure.

• Together these group can more effectively shape standards and the active development of these standards are a necessary component of the project

• Development will require resource commitments in the form of coordination workshops, development of tools to facilitate annotation and data deposition, and curation.

Page 8: EC US Workshop on Plant Bioinformatics and Databases Sponsored by EC US Taskforce on Plant Biotechnology Research Jane Silverthorne and Annette Schneegans

Existing practices hinder emergence of de facto standards

• Perception that funding practices and professional recognition often encourage the development of new tools and resources, rather than the reuse or improvement

• Need for change, as well as a method to evaluate some means of measuring impact will be required, necessitating support mechanisms to establish metrics for evaluating the impact of tools, databases or datasets that go beyond web statistics and literature citations

Page 9: EC US Workshop on Plant Bioinformatics and Databases Sponsored by EC US Taskforce on Plant Biotechnology Research Jane Silverthorne and Annette Schneegans

Examples of types of standards

• Establish periodic assessment of important tools and datasets, similar to CASP, will be important for monitoring the quality of datasets and selecting the best tools for analysis and integration of data. This effort will be essential for ensuring best practice and quality for reference data.

• Plant Specific Ontologies; Coordinated efforts with respect to controlled vocabulary for data collection, submission to databases. For phenotypes relevant in a breeding context, data validation needs to be undertaken by linking data from surrogate systems with data from multi-environment field trials

• Community based curation and curation of legacy data

Page 10: EC US Workshop on Plant Bioinformatics and Databases Sponsored by EC US Taskforce on Plant Biotechnology Research Jane Silverthorne and Annette Schneegans

Plant Genome Sequence

• challenges associated with genome sequencing and assembly seem to be well handled, but there is a need for establishment of standards for sequence assembly

• new standards to describe the range of genome sequence models/assemblies that can now be produced

• mechanisms to represent the quality of the genome sequences and corresponding assembly models

• need for developing joint efforts and platforms to ensure stewardship of the sequences

Page 11: EC US Workshop on Plant Bioinformatics and Databases Sponsored by EC US Taskforce on Plant Biotechnology Research Jane Silverthorne and Annette Schneegans

Knowledgebases

• A single everything-integrated warehouse is not seen as practical. Instead, domain-specific, sustainable, community endorsed knowledgebases embedded in an interoperable network will be more flexible and scalable

• They must not only be catch-all bucket repositories storing and distributing datasets but also working with the data generators actively define (meta)data standards for integration, carry out integration for data they handle and make available integration-ready datasets through standard interfaces

• Reference for training

Page 12: EC US Workshop on Plant Bioinformatics and Databases Sponsored by EC US Taskforce on Plant Biotechnology Research Jane Silverthorne and Annette Schneegans

Genome Stewardship

• Cost effective and scientifically correct– Ad hoc integration efforts can suffer from restricted or overlapping scopes; and,

catastrophically, can disappear in the absence of institutional commitment to their maintenance

• Community expertise will be captured– Redundancy of stewardship and software development efforts will be reduced;

different communities will be able to learn each other’s lessons and contribute to the development of a unified system.

– The task of converting archived data into reference information for use by the community is in part a computational task; but also a biological one.

• Necessary to identify reference genomes and associated data – A clade-centric approach reduces the administrative and technical burden;

unlock the potential of evolutionary and comparative approaches; allow the analysis and exploitation of genomes with small research communities that could not justify independent resources of their own

– priority clades determined by science and economic priorities• Essential that appropriate repositories are developed for each type of data

Page 13: EC US Workshop on Plant Bioinformatics and Databases Sponsored by EC US Taskforce on Plant Biotechnology Research Jane Silverthorne and Annette Schneegans

Thanks and Questions

• Klaus Mayer• Sesssion Chairs: Dan Stanzione, Paul Kersey,

Catherine Feuillet, Richard W. McCombie, Heiko Schoof and Chris Town, Volker Brendel, Michelle Clamp, Ottoline Leyser, Anne-Francoise Lamblin

• Jane Sliverthorne and Annette Schneegans• All the other participants…. • Funding NSF Conference Grant and EC