Galaxy History: Genome Informatics 2008
Preview:
DESCRIPTION
Talk on Galaxy at Genome Informatics 2008, I was a session chair so no oversight on this one. We were obsessed with authorization that year, and this talk is probably the most detailed ever on roles, groups, and dataset security in Galaxy. Another classic team slide.
Citation preview
- Galaxy http://galaxy-project.org James Taylor, Emory
University
- Galaxy?
- Galaxy goals Making large-scale computational analysis more
accessible Facilitating transparent analysis Ensuring that analyses
are reproducible
- What Galaxy provides An open-source framework for integrating
various computational tools and databases into a cohesive workspace
A web-based service we provide, integrating many popular tools and
resources for comparative genomics A completely self-contained
application for building your own Galaxy style sites
- So, what about all this data?
- Tool suites
- What is a Galaxy Tool? The basic unit of analysis in Galaxy A
program, script, external web resource, whatever... Adapted to a
standard structured interface Parameters, data inputs, data
outputs
- Short read sequence analysis Analyzing read quality and
filtering Genomic analysis Mapping against assembled genomes
Coverage, polymorphism, ... Metagenomic analysis Mapping against
sequence databases Taxonomy analysis, visualization, ...
- Statistical Genetics Quality control and filtering Estimating
ancestry and correction Case control analysis ...
- Data and analysis management
- The Galaxy History
- Beyond the history
- Beyond the History I Workflows
- Galaxy workflows Abstract description of an analysis procedure
Essentially: what tools to run, and the flow of data between
tools
- Beyond the History II Data Libraries
- Galaxy Data Libraries Mechanism for storing and organizing
shared datasets in a Galaxy instance An instance can have many
libraries, each containing datasets organized using folders as well
as tags Full type specific metadata like any other dataset in
Galaxy
- Driving use cases Large shared datasets Genotype data
Sequencing reads Direct from the instrument! Data management for
distributed projects
- What about protected data?
- Galaxy dataset security Fine grained access controls for Galaxy
datasets Dierent actions on datasets require dierent permissions
Users and groups are granted these permissions Enforced throughout
Galaxy e.g. a History can still be shared, but access to individual
datasets in the history is controlled
- Security customization Authentication mechanism can be
replaced, or can leverage a single sign-on mechanism (e.g. through
a proxying web server) Authorization provider can be customized or
replaced
- Completely integrated with analysis Dataset restrictions
propagate through an analysis Analyses that combine datasets also
combine their restrictions
- Up next... Libraries: sequencer integration versioning tagging
and annotation automatic workflow triggering Security configurable
adapters to dierent authorization providers (e.g. directory
services)
- Acknowledgements Data and browser connections UCSC Biomart GMOD
Intermine Funding National Science Foundation Huck Institutes,
Pennsylvania Dept. of Health
- The Galaxy Team Guru Ananda | Penn State Dan Blankenberg | Penn
State Wen-Yu Chung | Penn State Nate Coraor | Penn State Greg Von
Kuster | Penn State Sergei Kosakovsky | UCSD Ross Lazarus | Harvard
MS Anton Nekrutenko | Penn State
- p.s. I have job openings for people who like to do cool stu:
James.Taylor@emory.edu