15
NCB I Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 [email protected] .gov ides from Michael Dicuccio’s Genome Workbench June 21, 2004 talk

NCBI Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 [email protected] Slides from Michael Dicuccio’s Genome Workbench

Embed Size (px)

Citation preview

Page 1: NCBI Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 huynh@ncbi.nlm.nih.gov Slides from Michael Dicuccio’s Genome Workbench

NC

BI

Genome Workbench

Chuong HuynhNIH/NLM/NCBI

Sao Paulo, BrasilJuly 15, 2004

[email protected]

Slides from Michael Dicuccio’s Genome Workbench June 21, 2004 talk

Page 2: NCBI Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 huynh@ncbi.nlm.nih.gov Slides from Michael Dicuccio’s Genome Workbench

NC

BI

Obtaining GenomeWorkbench

• Not officially released to the public• Beta version snapshots:• ftp://ftp.ncbi.nih.gov/toolbox/

gbench/

Page 3: NCBI Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 huynh@ncbi.nlm.nih.gov Slides from Michael Dicuccio’s Genome Workbench

NC

BI

NCBI Genome Workbench

Page 4: NCBI Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 huynh@ncbi.nlm.nih.gov Slides from Michael Dicuccio’s Genome Workbench

NC

BI

Genome Workbench: Goals

• Provide an interactive, client-side GUI• Provide full suite of annotation tools

– Sequin does a lot of this• older code• primarily a submission tool

• Provide a platform for visualization and analysis

• Provide a platform that offers easy extensibility

Page 5: NCBI Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 huynh@ncbi.nlm.nih.gov Slides from Michael Dicuccio’s Genome Workbench

NC

BI

Why Client-Side?

• Clients are now pretty fast– you can actually BLAST genomes on the client-

side!

• Access to private data– “If you can’t bring the data to GenBank, bring

GenBank to the data!”– Not just private data – extend to private data

sources, data management

• Ability to mix and match analytical methods

Page 6: NCBI Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 huynh@ncbi.nlm.nih.gov Slides from Michael Dicuccio’s Genome Workbench

NC

BI

Application Architecture

• Core application– provides application services, data

management, standard dialogs and components

• Plug-ins– handle most of the requests– everything is a plug-in

Page 7: NCBI Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 huynh@ncbi.nlm.nih.gov Slides from Michael Dicuccio’s Genome Workbench

NC

BI

Plugin Manager

Page 8: NCBI Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 huynh@ncbi.nlm.nih.gov Slides from Michael Dicuccio’s Genome Workbench

NC

BI

Core Application: MVC

• MVC = Model / View / Controller– 30+ year old paradigm for applications– separates responsibilities of the

application into discrete components• Genome Workbench uses this

extensively– Model = Data being viewed– View = Viewers on this data– Controller = Application, editing

framework• under construction

Page 9: NCBI Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 huynh@ncbi.nlm.nih.gov Slides from Michael Dicuccio’s Genome Workbench

NC

BI

Extensibility: Plug-Ins

• Framework provides standard interfaces for defining, manipulating plug-ins

• Dynamically loaded at runtime; Only loaded when needed

• Plug-ins live in shared libraries– can have more than one plug-in per library

• Don’t need to rebuild the entire application to add new features

• Three types:– Data sources, Viewers, Algorithms

Page 10: NCBI Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 huynh@ncbi.nlm.nih.gov Slides from Michael Dicuccio’s Genome Workbench

NC

BI

Extensibility: Scripting

• Wrap C++ interfaces with a bit of glue to make them available to scripting languages

• Goals are two-fold:– obtain command console for scripting language– write plug-ins entirely in a scripting language

• Focus initially on PERL, Python; intend to add others

• Not yet completed

Page 11: NCBI Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 huynh@ncbi.nlm.nih.gov Slides from Michael Dicuccio’s Genome Workbench

NC

BI

Client-Side Benefits

• Data Caching– data in GenBank is updated, but updates for

individual sequences are infrequent– Pattern of use is frequently optimal for caching

• BLAST request caching– BLAST requests valid for 24 hours– IDs unique, can be cached on the client-side

• Directory Indexing– can index directories of files– can search by content, molecule type, IDs, etc.

Page 12: NCBI Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 huynh@ncbi.nlm.nih.gov Slides from Michael Dicuccio’s Genome Workbench

NC

BI

Some Functionality NOT Enabled

• Only blastn works over the network

• Choose From Other Documents Does not work in 20040712 build

Page 13: NCBI Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 huynh@ncbi.nlm.nih.gov Slides from Michael Dicuccio’s Genome Workbench

NC

BI

Demo• Main Application Window

– List of loaded records• Graphical Sequence Viewer

– Navigation, GUI controls– Displayed features / annotations– Basic searching– Configuration properties

• Basic Sequence Analysis– Compositional Questions

• GC Content• CpG Islands• Protein, nucleotide molecular properties

– Searching• Pattern search• Named sets of patterns (Kozak Scan)• Restriction Sites• ProSITE• Open Reading Frames• Simple Repeats

Page 14: NCBI Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 huynh@ncbi.nlm.nih.gov Slides from Michael Dicuccio’s Genome Workbench

NC

BI

Demo

• Text Sequence Viewers– GenBank flat-file– FastA, ASN.1– Feature Tables

• Advanced Sequence Analysis– Gene Prediction

• GNOMON – HMM-based gene predictor– Protein characterization

• Conserved domains• Active regions

– Coiled coils– Antigenic Sites

• Side-by-Side Data– Ensembl models– Tab-delimited data

• Alignments– Local BLAST– Network BLAST– Global (Needleman-Wunsch) alignment– Splign

• Miscellaneous Visualization Topics

Page 15: NCBI Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 huynh@ncbi.nlm.nih.gov Slides from Michael Dicuccio’s Genome Workbench

NC

BI

Future Work

• Scripting– should be first option for extension

• Workspace Integration– only one top-level window– provide better integration of views,

tools– provide better data organization

• Full editing features