29
Annotator Interface Sharon Diskin GUS 3.0 Workshop June 18-21, 2002

Annotator Interface

  • Upload
    lea

  • View
    67

  • Download
    1

Embed Size (px)

DESCRIPTION

Annotator Interface. Sharon Diskin GUS 3.0 Workshop June 18-21, 2002. Outline. Current annotation efforts Motivation for new annotation tool Requirements for new annotation tool Thoughts on design and implementation Future plans. Current Annotation Efforts. Overview of Current Efforts. - PowerPoint PPT Presentation

Citation preview

Page 1: Annotator Interface

Annotator Interface

Sharon Diskin

GUS 3.0 Workshop

June 18-21, 2002

Page 2: Annotator Interface

Outline

Current annotation efforts Motivation for new annotation tool Requirements for new annotation tool Thoughts on design and implementation Future plans

Page 3: Annotator Interface

Current Annotation Efforts

Page 4: Annotator Interface

Overview of Current Efforts Automated annotation has been applied to the DoTS transcripts

– Predicted gene ownership (clustering of assemblies)– BlastX against NR

• Automated assignment of descriptions based on similarity– BlastX against ProDom and RPS-Blast against CDD

• Predicted GO Functions– Framefinder

• Predicted Protein Sequences– Blat alignments– EPCR, Index Words, etc…

Manual annotation efforts have focused on – validating the automated annotation and – adding additional information at the central dogma level

Manual annotation of the gene index utilizes an annotation tool, the GUS Annotator Interface, which directly updates the GUSdev database.

Page 5: Annotator Interface

•GenBank, dbEST sequences•Make Quality (remove vector, polyA, NNNs)

Incoming Sequences (EST/mRNA)

“Quality” sequences

“Unassembled” clusters

CAP4 assemblies(generate consensus sequences)

Dots Consensussequences

•Assemble sequences with CAP4

Blocked sequences

•Block with RepeatMasker

•Blastn to cluster sequences

Gene Cluster (RNA s in the Gene)

BLASTn DoTs consensus sequences

(98% identity, 150bps)

DoTS RNA transcripts

The assembly of sequences generates a consensus sequence or DoTS transcript

Page 6: Annotator Interface

Current Efforts: Gene Annotation (1)

GenerateDoTS

transcriptsFeature_1

Feature_5

Feature_2

Feature_3

Feature_4

Gene_A

Instance_1

Instance_5

Instance_2

Instance_3

Instance_4

Assembly_1

Assembly_5

Assembly_2

Assembly_3

Assembly_4

RNA RNAInstance Assembly

RNA_1

RNA_5

RNA_2

RNA_3

RNA_4

RNAFeature

Task 1: Validation of Gene Membership

… ……

Gene

Page 7: Annotator Interface

Current Efforts: Gene Annotation (2)

GenerateDoTS

transcriptsFeature_1

Feature_5

Feature_2

Feature_3

Feature_4

Gene_AInstance_1

Instance_5

Instance_2

Instance_3

Instance_4

Assembly_1

Assembly_5

Assembly_2

Assembly_3

Assembly_4

RNA RNAInstance Assembly

RNA_1

RNA_5

RNA_2

RNA_3

RNA_4

RNAFeature

- Removing RNAs from the cluster results in the creation of a new Gene

- An entry is made in the MergeSplit table for tracking purposes

- Similar process followed when an RNA is added to a Gene

… …

Gene_B

Gene

Page 8: Annotator Interface

Current Efforts: Gene Annotation (3)Task 2: Assign Reference RNA

– will be annotated further

– RNA table

Task 3: Assign Approved Gene Name/Symbol– Gene Table– Evidence: Comment (specifies database link)

Task 4: Assign Gene Description– Gene Table– Evidence: Comment

Task 4: Associate known Gene synonyms– GeneSynonym table– Evidence: Comment

Page 9: Annotator Interface

Current Efforts: RNA Annotation

Annotation of “Reference Sequence”

Task 1: Assign/Confirm Description of assembly– RNA table

Task 2: Confirm/Add/Delete GO Functions– ProteinGOFunction (in GUSdev, GO tables have been re-designed in

GUS3.0)– Evidence: Comments or Similarity (ProDom, CDD-Pfam, CDD-Smart,

or NR)

Page 10: Annotator Interface

Current Annotator Interface Architecture

GUSdev“XML” file

Annotator Interface

AnnotatorInterfaceSubmitter

GA-Plugin

JavaServletwrites

reads

executes

DBI(Insert/Update/Delete)

PerlObjectLayer

JDBC (Query Only)

Page 11: Annotator Interface

Current Annotator Interface

Page 12: Annotator Interface

Current Gene Annotation

Validate Cluster and Assign Reference RNA/Assembly

Page 13: Annotator Interface

Current Gene Annotation (cont.)

Assign Gene Name/Symbol

Assign Gene Description

Assign Gene Synonym(s) Evidence

Page 14: Annotator Interface

Current RNA (and Protein) Annotation

RNA Description

GO FunctionsEvidence

Page 15: Annotator Interface

Allgenes Display of Gene Annotation

Page 16: Annotator Interface

Allgenes Display of RNA Annotation

(Confirmed or manually added GO Functions)

RNA Description

Page 17: Annotator Interface

Status of Current Annotation(as of June 20, 2002)

1289 manually reviewed genes– 1003 with gene name– 697 with gene synonyms– 1046 with description

6146 manually reviewed RNAs/DoTS assemblies

949 ‘proteins’ with reviewed GO function

Page 18: Annotator Interface

Motivation for new tool Want to annotate using genomic sequence

• Create “curated” gene models specifying structure

• Increase structure of annotation in GUS

• Annotation of proteins

• Redefinition of annotation tasks

• Current interface not designed for this purpose

Page 19: Annotator Interface

Some Other Annotation Tools • Artemis

• Developed and used at Sanger

• Reads and writes flat files

• Supports rich set of annotations• Save as EMBL format

• Apollo• Combined effort including members from Sanger and

Berkeley

• Flat files (CORBA access to ENSEMBL)

• 2 versions, currently being merged• Sanger: annotation viewer

• Berkeley: focus on editing

No Existing Tool To Meet All of Our Needs

Page 20: Annotator Interface

Requirements At a High Level

Page 21: Annotator Interface

Requirements: Graphical View Provide alignment of features on genomic sequence

– could potentially display any feature type currently stored in GUS3.0

– features can be selected and used to generate “curated” features

– similar to display and functionality in Apollo Toggle (or configure) the display of each feature type Zoom to sequence level and will include links to

functionality relevant to the feature highlighted Also support creation of features “from scratch”

– based on literature, etc. Detail editors provide ability to change endpoints, etc.

Page 22: Annotator Interface

Gene Annotation Create curated gene model

– specify gene boundaries – specify location of exons (and thus introns)

• 5' exon boundary (putative transcription start site)• 3' exon boundary (include poly adenylation signal)

– automatic creation of Gene entry– merge with existing gene instances through GeneInstance table– tables/views affected:

• GeneFeature• ExonFeature• GeneInstance• Gene• MergeSplit

– evidence: features used to create model, PubMed ID– should be as easy as clicking on existing features and saying

make curated (then can modify endpoints, etc. if needed)

Page 23: Annotator Interface

Gene Annotation (2) Assign (HUGO or MGI approved) abbrievated gene name/symbol

– Gene Table– Evidence: ExternalDatabaseLink

Assign full gene name (MGI or HUGO full gene name)– Gene Table– Evidence: ExternalDatabaseLink

Assign abbrievated gene name/symbol synonyms (non-approved gene symbols)

– GeneSynonym Table– Evidence: ExternalDatabaseLink

Assign full gene name aliases– GeneAlias Table– Evidence: ExternalDatabaseLink

Page 24: Annotator Interface

Gene Annotation (3) Assign gene category (e.g. non-coding)

– Gene Table– Evidence:

• ExternalDatabaseLink/Literature Reference

• Similarity (eg. to known non-coding RNA)

Confirm/assign gene chromosomal location– GeneChromosomalLocation– Evidence:

• ExternalDatabaseLink/Literature Reference

• RH mapping data

• Alignments/Features

OMIM Link assignment (verification if computationally determined)

– ExternalDatabaseLink

Page 25: Annotator Interface

RNA Annotation (1)

Create “curated RNAs”– Define RNA transcript forms of gene (create RNAs)– Using exons defined by curated gene– 5' and 3' UTRs – Automatic creation of RNA entry– Merge existing RNA instances– Tables affected:

• RNAFeature

• UTRFeature

• RNAInstance

• RNA

– Evidence: Features used to create

Assign RNA categories to created RNAs (e.g. alternative form)– RNARNACategory Table

Page 26: Annotator Interface

RNA Annotation Assign (or confirm computed) RNA description

– RNA table– Evidence: Gene from which it is derived

Anatomy expression assignment(s)– RNAAnatomy– RNAAnatomyLOE– Evidence:

• ExternalDatabaseLink/Literature references

• Assembly anatomy percent from DoTS

• RAD experiments

Assign GO terms to curated RNA (non-coding RNAs, e.g. small RNA involved in splicing)

– GOTermAssociation– GOTermAssociationEvid– Evidence: ExternalDatabaseLInk, Literature References

Computational analysis performed on curated RNA sequences– Annotation workflow

• Framefinder translation, GO terms, Similarities, etc.

Page 27: Annotator Interface

Requirements: Protein Annotation Confirm/assign GO Function

– GOTermAssociation, GOTermAssociationEvid– Evidence: ExternalDatabaseLink and/or Literature References

Confirm/assign GO Biological Process– GOTermAssociation, GOTermAssociationEvid– Evidence: ExternalDatabaseLink and/or Literature References

Confirm/assign GO Cellular Component – GOTermAssociation, GOTermAssociationEvid– Evidence: ExternalDatabaseLink and/or Literature References

Assign protein name– Protein Table– Evidence: ExternalDatabaseLink, Literature Ref, Similarities

Assign protein name synonyms– Protein Table– Evidence: ExternalDatabaseLink, Literature Ref, Similarities

Page 28: Annotator Interface

Protein Annotation (2) Assign protein category (post-translational modifications)

– ProteinProteinCategory

– Evidence: ExternalDatabaseLink, Literature References

Protein-protein interactions assigned– Interaction

– InteractionInteractionLOE

– Evidence: PubMed ID, etc.

Protein pathway assignments– PathwayInteraction (for newly created interactions)

– Still under consideration: What is best way to link with existing pathway • for example, Pathway is represented in DoTS, and we want to say that this curated Protein is really the same as a protein in a pathway.

Assign post translational modification category Assign interactions involving this protein Assign pathway protein is known to be involved in Assign protein family Ability to modify and/or delete curated protein

Evidence will be associated with all annotation

Page 29: Annotator Interface

Next Steps/ Open Issues

Completion of Java Object Layer Decision regarding BioJava wrappers

– What exactly will this give us to aid in interface development (eg. FeatureRenderer, etc…)

Discussion on layout of interface– Joan’s input after experimentation with other tools

Depending on the above :– Client Side portion which communicates with remote GUS Server

– Interface Implementation