24
Web Apollo and the VectorBase user community Gloria I. Giraldo-Calderón March 31, 2015

Web Apollo and the VectorBase user community Gloria I. Giraldo-Calderón March 31, 2015

Embed Size (px)

Citation preview

Web Apollo and the VectorBase user community

Gloria I. Giraldo-CalderónMarch 31, 2015

Outline

● Gene annotation o Gene automatic annotationso Gene manual annotation and metadatao Basics: A good vs a bad gene modelo Why do we need gene manual annotations and gene metadata?

● Why did we replace the Community Manual Annotation (CAP) with Web Apollo (WA)?o Offline vs. onlineo Advantages vs disadvantages

● How do we interact with WA developers and outreach representatives?

● How do we get the community to submit data?

Gene annotation

VectorBase gene “automatic” annotations

gap

100 Ns

Scaffolds orSupercontigs

mapping (Optional. Not possible with bioinformatics, must be experimental)

Gene prediction: evidence based (BLAST), Ab initio (SNAP), experimental evidence (ESTs, RNAseq, protein or peptide sequencing)

Gene “manual” annotation and metadata

Gene manual annotation and metadata

Metadata

- VectorBase gene ID (e.g., AGAP000002)

- Organism (species) (e.g., Anopheles gambiae)

- Symbol (e.g., para)

- Synonym (e.g., kdr, VSC)

- Description (e.g., voltage-gated sodium channel)

- Comments/notes (e.g., truncated gene, other part on scaffold xxx)

Why do we need gene manual annotations and gene metadata?

Gen

om

e B

row

ser:

G

ene

Pag

e

- Homologs and Phylogenetics - Ontology- Variation (e.g., Single Nucleotide Polymorphisms, SNPs)

Why do we need gene manual annotations and gene metadata?

For downstream analyses of gene(s), gene families or genomes such as:

Homologs and Phylogenetics

- wrong assignment of orthologs and paralogs- gene alignment ---> tree- wrong inference evolutionary relationships

between genes or species- branches with a wrong length, could lead to

misleading lineages changes over time (the longest the branch the larger the amount of change)

- wrong estimates about the ancestral and derived states, genes or species

- wrong taxonomic interpretations

OntologyGO: biological process(ion transport, sodium i.t., transmembrane transport )

GO: molecular function(ion channel activity, voltage-gated sodium

channel activity, calcium ion binding)

GO: cellular component(voltage-gated sodium channel,

membrane)

Variation (e.g., Single Nucleotide Polymorphisms, SNPs)

. . . T T A . . .

. . . T T T . . .

SNP

L 1014 F

Leucine ---> Phenylalanine

Hypothetical example:

- User is interested in gene “x”- They download this gene from VB - Start analyses- Finds/reports the presence/absence of the

SNP- If the gene of interest is not correctly

annotated, e.g., missing an exon or part of an exon, results are going to be wrong

* *

- The size of the genomes

- The phylogenetic distance among genomes

Number of genomes (genome size):- VB: 37 (110 Mbp – 3,000 Mbp) - EuPathDB: 186 (2 Mbp – 193 Mbp)- PATRIC: 3,481 Bacteria & 186 Archaea (10 kbp – 14 Mbp)- ViPR: 546,381 & IRD: 365,618 (few kbp – 250 kbp)

Why did we replaced the Community Manual Annotation (CAP)

with Web Apollo?

Offline vs. online curation

Community Manual Annotation (CAP)

Web Apollo

gene models

RNAseq

User-created Annotations

Advantages & Disadvantages

Community Manual Annotation (CAP)

- People had to use Artemis or (Desktop) Apollo: requires downloading scaffolds or supercontigs from VB

- VB gene updates can take 2 months or more → more than one person working on the same gene

- Most of the time our internal GFF3 validator found issues with submitted data files.

Web Apollo

- Is web-based, which allows easier collaboration

- There is not, however, a clear way to indicate/know when a user is “still working” or “done” with an annotation.

- New annotations though are instantaneously visualized by all users of WA.

How do we interact with Web Apollo developers and outreach representatives?

- Developers: ○ Monthly WA developers open conference call○ email

- Outreach: ○ Meetings, workshops and conferences○ email or phone

We are also subscribed to their user email list (help desk).

How do we get the community to submit data?

- First invitation comes from genome leaders directly (genome paper)

- Users send emails to our help desk ([email protected])

- During outreach events, such as workshops, meetings and conferences

- Social media post (Facebook and Twitter)

- Help content: Tutorial page

Genome group manual annotation efforts

- Workshops- Annotation jamborees- Webinars- Independent work

Help content: Tutorial page

- Decision tree

- FAQs

- Web Apollo resources: user guide, slides with speaker notes, sample exercises

- Documentation about available tracks

- Video tutorial (Intro, ~ 50 min) and a video clip (Intron/exon boundaries ~2:45 min)

User’s submission stats and Importation of data to VectorBase

To be continued by Daniel Lawson . . .