16
NICL Program: Bioinformatics Track Washington, DC June 17, 2014 Cathy H. Wu, Ph.D. Edward G. Jefferson Chair of & Director Center for Bioinformatics & Computational Biology Program Coordinator, Delaware INBRE University of Delaware Bioinformatics Research Collaboration: NECC & BiND 5 th Biennial National IDeA Symposium of Biomedical Research Excellence (NISBRE)

Bioinformatics Research Collaboration: NECC & BiNDniclweb.org/wp-content/uploads/2014/09/BioinformaticsCathyWu.pdf · Collaboration Scientific Resource Education. DE ME NH RI VT

  • Upload
    vodiep

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

NICL Program: Bioinformatics Track

Washington, DCJune 17, 2014

Cathy H. Wu, Ph.D.Edward G. Jefferson Chair of & Director

Center for Bioinformatics & Computational BiologyProgram Coordinator, Delaware INBRE

University of Delaware

Bioinformatics Research Collaboration: NECC & BiND

5th Biennial National IDeA Symposium of Biomedical Research Excellence (NISBRE)

Collaborative use of specialized resources & expertise in an integrated process

• Little Skate (Leucoraja erinacea) Clones: MDIBL-Mount Desert Island Biological Lab (ME)

• Next-Generation Sequencing: UD DNA Sequencing & Genotyping Center (DE)

• Sequence Assembly: Vermont Genetics Network (VT) with ME, RI• Sequence Analysis & Annotation: Bioinformatics pipeline at UD CBCB (DE), ME, RI, NH, VT• Storage & Access of Sequence/Annotation data: Shared data center (DE, VT, ME)• Public Dissemination: NCBI (BioProject, SRA, GenBank), SkateBase (skatebase.org) • Scientific publications: Science [PMC3264428], PNAS [PMC3150877], Database [PMC3308154]

Skate Genome ProjectNortheast Cyberinfrastrucrture

Consortium (NECC)

2

Collaboration Scientific Resource Education

DE

ME

NH

RI

VT

NECC

NEBC

SequencingDE

Data StorageRegional Data Center

AnnotationDE, ME, VT, NH, RI

AssemblyVT, ME

Public AccessDE, MESkate DNA

Stage 32 embryoME

Little SkateProject Workflow

The Northeast Cyberinfrastructure Consortium (NECC)

The NECC is a consortium of 5 IDeA states, Maine, Vermont, Delaware, Rhode Island & New Hampshire established in 2006.

NECC Goals1. Build Cyberinfrastructure in NE region2. Workforce training and diversity3. Collaborative Research

The Northeast Bioinformatics Consortium (NEBC) began the Little Skate Genome Project in 2010 as a model for distributed collaboration using a data intensive project requiring integration of specialized resources and expertise.

http://www.necyberconsortium.org/

Current NEBC research efforts include characterization and comparison of embryonic transcriptomes from three chondrichthyan species: the little skate, Leucoraja erinacea, the small spotted catshark, Scyliorhinuscanicula, and the elephant shark, Callorhinchus milii.

Project Based DiscoveryThe project is a valuable resource for comparative biomedical research and evolutionary biology. Publications using little skate genome and transcriptome data are increasingly prevalent in the literature.

1. Boehm, T. & Swann, J. B. Origin and Evolution of Adaptive Immunity. Annu Rev AnimBiosci 2, 259–283 (2014).2. Braasch, I. et al. Connectivity of vertebrate genomes: Paired-related homeobox (Prrx) genes in spotted gar, basal teleosts, and tetrapods. Comp Biochem Physiol C ToxicolPharmacol 163, 24–36 (2014).3. Falcón, J. et al. Drastic neofunctionalization associated with evolution of the timezymeAANAT 500 Mya. Proc Natl Acad Sci U S A 111, 314–9 (2014).4. Modrell, M. S. et al. A fate-map for cranial sensory ganglia in the sea lamprey. Dev Biol385, 405–16 (2014).5. Moore, D. B. et al. Asynchronous evolutionary origins of Aβ and BACE1. Mol Biol Evol 31,696–702 (2014).6. Venkatesh, B. et al. Elephant shark genome provides unique insights into gnathostomeevolution. Nature 505, 174–9 (2014).7. Amemiya, C. T. et al. The African coelacanth genome provides insights into tetrapod evolution. Nature 496, 311–6 (2013).8. Frankenberg, S. & Renfree, M. B. On the origin of POU5F1. BMC Biol 11, 56 (2013).9. Gillis, J. A., Modrell, M. S. & Baker, C. V. H. Developmental evidence for serial homology of the vertebrate jaw and gill arch skeleton. Nat Commun 4, 1436 (2013).10. Lopes-Marques, M., Cunha, I., Reis-Henriques, M. A., Santos, M. M. & Castro, L. F. C. Diversity and history of the long-chain acyl-CoA synthetase (Acsl) gene family in vertebrates. BMC Evol Biol 13, 271 (2013).11. Richards, V. P., Suzuki, H., Stanhope, M. J. & Shivji, M. S. Characterization of the heart transcriptome of the white shark (Carcharodon carcharias). BMC Genomics 14, 697 (2013).12. Gaudet, P. et al. Recent advances in biocuration: meeting report from the Fifth International Biocuration Conference. Database (Oxford) 2012, bas036 (2012).13. Tossidou, I. et al. CD2AP regulates SUMOylation of CIN85 in podocytes. Mol Cell Biol 32,1068–79 (2012).14. Västermark, Å. et al. Identification of distant Agouti-like sequences and re-evaluation of the evolutionary history of the Agouti-related peptide (AgRP). PLoS One 7, e40982 (2012).15. Wang, Q. et al. Community annotation and bioinformatics workforce development in concert--Little Skate Genome Annotation Workshops and Jamborees. Database (Oxford)2012, bar064 (2012).16. King, B. L., Gillis, J. A., Carlisle, H. R. & Dahn, R. D. A natural deletion of the HoxC cluster in elasmobranch fishes. Science 334, 1517 (2011).17. Schneider, I. et al. Appendage expression driven by the Hoxd Global Control Region is an ancient gnathostome feature. Proc Natl Acad Sci U S A 108, 12782–6 (2011).

NEBC Research

Three workshops and an annotation Jamboree were coordinated by the NECC collaboration. The workshops were designed to teach gene and protein annotation from a next generation sequencing data perspective to participants with little or no experience. Instructors included regional NEBC experts, NIH and Industry leaders. Lecture materials are linked from SkateBase as and serve as a valuable educational resource available to the public.

Workshops

CurriculumSkateBase includes the infrastructure to teach gene and protein annotation. SkateBase has been used by NECC IDeA state institutions in both graduate and undergraduate classes. Through active and domain-targeted outreach, use of SkateBase as an educational model expanded outside the NECC institutions and includes the Virginia Institute of Marine Science (VIMS). Genes annotated by students are reviewed by SkateBase curators before creating a gene page with gene structural, functional annotation with linked homology and relevant PubMed references.

The American Elasmobranch Society (AES) is a non-profit organization that seeks to advance the scientific study of living and fossil sharks, skates, rays, and chimeras, and the promotion of education, conservation, and wise utilization of natural resources. Skatebase data and resources were presented to AES members at the Joint Meeting of Ichthyologists and Herpetologists in 2013 as well as the 2014 Plant and Animal Genome Conference. Researchers were introduced to the project and invited to use the resource for research and educational purposes. Continued and successful expansion of skatebase includes community annotation which will benefit by the participation of domain experts. SkateBaseis linked from the AES website as well as the Elephant Shark Genome Project website.

University of Maine at Machias• Introduction to Biochemistry

University of Rhode Island• Practical Tools for Molecular Sequence Analysis

University of Delaware• Bioinformatics• Experimental Molecular Biology

Georgetown• Bioinformatics

Virginia Institute of Marine Science• Molecular Genetic Data Analysis

AcknowledgementsFunding provided by a re-entry career award to JTW: NIGMS INBRE 3P20GM103446-12S1Skate genome sequencing was funded by: NIH NCRR ARRA Supplements to 5 P20 RR016463-12 (MDIBL), 5 P20 RR016472-12 (UD), 5 P20 RR16462 (UVM).The North East Cyberinfrasturcture Consortium is funded by:• NIH National Center for Research Resources grants: 5 P20 RR016463-12 (MDIBL), 5 P20 RR016472-12 (UD), 5 P20

RR16462 (UVM), 5 P20 RR016457-11 (URI), 5 P20 RR030360-03 (UNH)• NIH National Institute of General Medical Sciences grants: 8 P20 GM103423-12 (MDIBL), 8 P20 GM103446-12 (UD),

8 P20 GM103449 (UVM), 8 P20 GM103430-11 (URI), 8 P20 GM103506-03 (Dartmouth)• National Science Foundation EPSCoR grants: EPS-0904155 (UM), EPS-081425 (UD), EPS-1101317 (UVM), EPS-

1004057 (URI), EPS-1101245 (UNH).

Outreach

Cartilaginous fishes are divided into two major groups, elasmobranchs and holocephalins. The skate genome project is currently the only public elasmobranch sequencing project. SkateBase.org serves as the project hub. SkateBaseincludes all data generated by the research effort, relevant links, tools including SkateBlast for local queries, a gene table containing annotation features, and a project vitae. For community annotation and educational applications protein and gene annotation guides and examples are provided in addition to the online interface.

Protein Annotation Interface Gene Annotation Workflow

Gene Table

Global Chondrichthyan Genome Sequencing Efforts

The Skate Genome Project: A Model for Scientific Collaboration and EducationJennifer T. Wyffels, Benjamin L. King, Shawn W. Polson, James Vincent, Chuming Chen and Cathy H. Wu

North East Bioinformatics Collaborative of the North East Cyberinfrastructure Consortium

Skates as Biomedical Models:Fundamental Vertebrate Characteristics

Most evolutionarily distant jawed vertebrate

Pressurized circulatory system

Adaptive immune system

Neural crest

Renal physiology

Reproductive Modes:

Oviparity – Placental Viviparity & Parthenogenesis

4

5

HPC

Research Capabilities/Needs

Computer Science

NGS

Public Health

Data Mining

Data Management

BioStatistics

Medical InformaticsExpertiseNeeds

Integrated Clinical Genomic Variant Analysis 

Waters Poster Award

8

• Research: Foster interdisciplinary, cross-campus and inter-institutional research collaborations synergistic to UD strategic areas

• Education: Establish graduate degree programs– Fall 2010: Master’s Program in Bioinformatics & Computational Biology– Fall 2012: PhD program in Bioinformatics & Systems Biology

• Core: Provide scientific expertise and infrastructure support in Bioinformatics & Computational Biology for the Delaware research and education community

• > 60 affiliated faculty from five Colleges– CoE (Engineering), CAS (Arts & Sciences), Agriculture & Natural Resources (CANR),

Earth, Ocean & Environment (CEOE), Health Sciences (CHS)

CBCBPromote, coordinate and support

interdisciplinary activities in Bioinformatics &

Computational Biologyhttp://bioinformatics.udel.edu/

9

Bioinformatics Research Infrastructure

10

NGS (Next-Gen Sequencing) Data AnalysisShort Reads

Organize

Visualize

Analyze

Analysis Pipelines for• RNA-Seq

• miRNA• De novo Genome Assembly • Reference Mapping

• Genomic Structural Variation: SNP/Indel/CNV

• Reduced Representation Library

• Amplicon Library (16S rRNA)

• Metagenome

• Metatranscriptome

11

Bioinformatics NGS Analysis WorkflowVariant Detection SNPs Annotation & Filtering

12

Variant Pathway Analysis

The web-based iProXpress provides tools for functional profiling, such as pathway and GO enrichment analysis, and allows for custom display of selected fields from >160 databases integrated in PIR iProClass, including OMIM and KEGG

13

Gene Variant Network Analysis

• Patellae dislocation• Ligamentous laxity• Small thorax• Brachydactyly• Short femoral necks• Development delay• Flat midface• Depressed nasal bridge• Stub thumb

Pedigree & Phenotype Description

• STRING Interaction network of variant genes

• CYTOSCAPE clustering and visualization

14

15

Clinical NGS Analysis: Multi-Institution Multi-Core Collaboration

Project Coordination

• Two Cores already using iLab Solutions• Investigate unifying cores under iLab “Collaborating Cores” integration• Unified sample submission• Cores pass collaborative project seamlessly without user intervention• Project tracking• Mechanisms for simplified billing/accounting

Not just implementation . . .

How can we improve the process?

Project Workflow

16