Upload
nicholas-chandler
View
219
Download
5
Embed Size (px)
Citation preview
NC
BI
Genome Workbench
Chuong HuynhNIH/NLM/NCBI
Sao Paulo, BrasilJuly 15, 2004
Slides from Michael Dicuccio’s Genome Workbench June 21, 2004 talk
NC
BI
Obtaining GenomeWorkbench
• Not officially released to the public• Beta version snapshots:• ftp://ftp.ncbi.nih.gov/toolbox/
gbench/
NC
BI
NCBI Genome Workbench
NC
BI
Genome Workbench: Goals
• Provide an interactive, client-side GUI• Provide full suite of annotation tools
– Sequin does a lot of this• older code• primarily a submission tool
• Provide a platform for visualization and analysis
• Provide a platform that offers easy extensibility
NC
BI
Why Client-Side?
• Clients are now pretty fast– you can actually BLAST genomes on the client-
side!
• Access to private data– “If you can’t bring the data to GenBank, bring
GenBank to the data!”– Not just private data – extend to private data
sources, data management
• Ability to mix and match analytical methods
NC
BI
Application Architecture
• Core application– provides application services, data
management, standard dialogs and components
• Plug-ins– handle most of the requests– everything is a plug-in
NC
BI
Plugin Manager
NC
BI
Core Application: MVC
• MVC = Model / View / Controller– 30+ year old paradigm for applications– separates responsibilities of the
application into discrete components• Genome Workbench uses this
extensively– Model = Data being viewed– View = Viewers on this data– Controller = Application, editing
framework• under construction
NC
BI
Extensibility: Plug-Ins
• Framework provides standard interfaces for defining, manipulating plug-ins
• Dynamically loaded at runtime; Only loaded when needed
• Plug-ins live in shared libraries– can have more than one plug-in per library
• Don’t need to rebuild the entire application to add new features
• Three types:– Data sources, Viewers, Algorithms
NC
BI
Extensibility: Scripting
• Wrap C++ interfaces with a bit of glue to make them available to scripting languages
• Goals are two-fold:– obtain command console for scripting language– write plug-ins entirely in a scripting language
• Focus initially on PERL, Python; intend to add others
• Not yet completed
NC
BI
Client-Side Benefits
• Data Caching– data in GenBank is updated, but updates for
individual sequences are infrequent– Pattern of use is frequently optimal for caching
• BLAST request caching– BLAST requests valid for 24 hours– IDs unique, can be cached on the client-side
• Directory Indexing– can index directories of files– can search by content, molecule type, IDs, etc.
NC
BI
Some Functionality NOT Enabled
• Only blastn works over the network
• Choose From Other Documents Does not work in 20040712 build
NC
BI
Demo• Main Application Window
– List of loaded records• Graphical Sequence Viewer
– Navigation, GUI controls– Displayed features / annotations– Basic searching– Configuration properties
• Basic Sequence Analysis– Compositional Questions
• GC Content• CpG Islands• Protein, nucleotide molecular properties
– Searching• Pattern search• Named sets of patterns (Kozak Scan)• Restriction Sites• ProSITE• Open Reading Frames• Simple Repeats
NC
BI
Demo
• Text Sequence Viewers– GenBank flat-file– FastA, ASN.1– Feature Tables
• Advanced Sequence Analysis– Gene Prediction
• GNOMON – HMM-based gene predictor– Protein characterization
• Conserved domains• Active regions
– Coiled coils– Antigenic Sites
• Side-by-Side Data– Ensembl models– Tab-delimited data
• Alignments– Local BLAST– Network BLAST– Global (Needleman-Wunsch) alignment– Splign
• Miscellaneous Visualization Topics
NC
BI
Future Work
• Scripting– should be first option for extension
• Workspace Integration– only one top-level window– provide better integration of views,
tools– provide better data organization
• Full editing features