63
BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory British Columbia Cancer Agency, Vancouver CMMT, Vancouver

BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

Embed Size (px)

Citation preview

Page 1: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre BC Cancer

Agency

Of Mice and Motifs and Best Laid Plans

Michael Smith Genome Sciences CentreTerry Fox Laboratory

British Columbia Cancer Agency, Vancouver

CMMT, Vancouver

Page 2: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Of mice…

• To measure gene expression levels in tissues of developing mice to gain insight into the normal development process.

• To develop supporting technologies and techniques to improve the process for generating and analysing this data

Page 3: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

LongSAGE(Saha et al, 2002)

Data is:•not constrained to known transcripts – novel gene discovery•digital in nature•easy to transfer

Page 4: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Overview SAGE data

• 2 dataset– October Freeze

• 72 21-mer libraries• 8.55 million tags• 924,392 unique tag types• 49 tissues, 25 developmental stages

– January Freeze• 105 21-mer libraries (92 fully sequenced)• 11.65 million• 1,235,833 unique tag types

Page 5: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Page 6: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Processing SAGE data

Raw Library Tag ClusteringAssign

Confidencescores

Assign tags totranscripts

LocaliseTranscriptson genome

Analysis Tools:DiscoverySpace

Page 7: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Raw Library Tag ClusteringAssign

Confidencescores

Assign tags totranscripts

LocaliseTranscriptson genome

Analysis Tools:DiscoverySpace

Page 8: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Tag Clustering

• Tag Types cluster in tag space– Colinge and Ferge, 2001

• PCR error + Sequencing error– Akmaev and Wong, 2004

• We have used real PHRED values to quantify p-values per tag type

Page 9: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Filtering out tags with low sequence quality reduces error

rate

5.0%

5.5%

6.0%

6.5%

7.0%

7.5%

8.0%

8.5%

9.0%

9.5%

10.0%

0.001% 0.010% 0.100% 1.000% 10.000% 100.000%

1 - Sequence Quality

Err

or

Rat

e

Page 10: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Tag/ Tag Type Confidence

• Individual Tag Error = (Base Library Error) combined with (Tag Sequence Error)

• Combine Individual Tag Errors to generate Tag Type errors for each library

• Combine Tag Type errors from each library to generate Tag Type error for the metalibrary

Page 11: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Raw Library Tag ClusteringAssign

Confidencescores

Assign tags totranscripts

LocaliseTranscriptson genome

Analysis Tools:DiscoverySpace

Page 12: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

CMOST: Tag Mapping

Virtual tagdatabases

Tag “Modification”: single base permutation, addition, deletion

SAGE Library

RefSeq

Genome

Ensembl Transcripts

Mitochondrion

MGC

Page 13: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Raw Library Tag ClusteringAssign

Confidencescores

Assign tags totranscripts

LocaliseTranscriptson genome

Analysis Tools:DiscoverySpace

Page 14: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Tag Localization

MGC RefSeqGenome

Tag Mapper Known Exon

Exon Exon Exon

Page 15: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Tag Localization

MGC RefSeqGenome

Tag Mapper Novel Gene/Exon ?

Exon Exon Exon

Page 16: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Tag Localization

MGC RefSeqGenome

AmbiguousMapping

Exon Exon Exon

Tag Mapper

Page 17: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Raw Library Tag ClusteringAssign

Confidencescores

Assign tags totranscripts

LocaliseTranscriptson genome

Analysis Tools:DiscoverySpace

Page 18: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Abundant tags more likely to map

Page 19: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Coverage of Transcript Databases

Data source

Number of Transcripts

Number Observable(multiple)

% observed (multiple)

Number Observable(single)

% observed (single)

Ensembl(known)

25,226 24674 21277 19536 14334

Ensembl (predicted)

8,317 7598 4455 5122 1308

RefSeq NM 17,720 17,319 15,008 16,416 13,076

MGC 14,594 14,518 14,225 9,413 7,479

Page 20: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Is wider sampling better than very deep sampling ?

• 120,000 tags per library ~ equivalent to chip experiment (Lu et al, 2004)

• Ideally, would like 300,000-400,000 tags sampled to recover most genes

• Benefit to sampling a greater number of tissue/stage combinations

0

2000

4000

6000

8000

10000

12000

0 100000 200000 300000 400000 500000 600000 700000 800000 900000

Sampling Depth

No.

of N

M R

efSe

q ge

nes

obse

rved

Page 21: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Page 22: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Page 23: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

GO Analysis of 177 common genes

• 38% – metabolism• 19% - cell growth and/or

maintenance• 13% – transport• 6% - cell communication

Page 24: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Where do the tags map ?Location Gene

Evidence

All (A > 0)c A > 1 A > 10 A > 60 A > 1000

Number of Unique Locations

- 261,134 106,961 25,829 8,855 424

Annotated Exon

Known 12.1% 17.9% 23.8% 28.3% 34.7%

Novel 0.9% 1.2% 1.2% 1.1% 0.7%

Annotated UTR

Known 8.0% 14.6% 30.9% 46.0% 58.0%

Novel 0.3% 0.5% 1.0% 1.2% 1.4%

Intron Known 20.0% 14.3% 4.4% 1.8% 1.2%

Novel 1.5% 1.1% 0.4% 0.2% 0%

Putative UTR Known 0.5% 0.7% 0.8% 0.5% 0.5%

Novel 0.2% 0.2% 0.2% 0.2% 0%

Intergenic - 56.3% 49.5% 37.4% 20.8% 3.5%

Page 25: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

How many genes observed ?

• 107k transcripts covering 18.6k high quality annotated genes

• 14k transcripts covering 4k predicted RefSeq and ENSEMBL genes

• ~21k genes observed

Page 26: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

What are the “intergenic tags” ?

• 140k tags unaccounted for…• Novel genes ?• 24k transcripts covering 12k

UNIGENE and ENSEMBL EST genes• 36% map antisense to annotated

genes

• Many are singletons

Page 27: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Singletons

• Unannotated singletons – no genes, ESTs• 81% success rate for meta-singletons • 74% success rate for library singletons

Page 28: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Summary

• The majority of singletons represent bona fide transcriptional elements

• We have identified novel transcripts• Evidence of differentially regulated

variants resulting in different protein• Data providing functional annotation

Page 29: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

… and motifs…• The transcription of a gene is

dependent on at least – 1) the DNA binding factors present

in the nucleus at a given time and – 2) the DNA sequences, or cis-

regulatory motifs, present in the gene region to which these factors can bind

Our goal is to attempt to identify the regulatory motifs

Page 30: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

• High quality in-silico discovery of gene regulatory elements on a genome wide scale

Approach based on:

• Overrepresentation of similar DNA motifs in upstream sequences of genes with the same regulatory control

Project Goals

Page 31: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Our Method

• Use orthologous genesi.e. the equivalent genes different organisms.

• Use regions from genes which display strong co-expression (infer co-regulation).

Page 32: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Orthologues From ComparaDB

E.Birney at al., Nucl.Acids.Res. 32 (2004)M.Clamp et al., Nucl.Acids.Res. 31 (2003)

ActinAlphaCardiac

Page 33: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Multiple Sequence Alignment

ActinAlphaCardiac

Page 34: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Multiple Sequence Alignment

ActinAlphaCardiac

Page 35: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

1. Cancer Genome Anatomy Project; Gene Expression Omnibus2. Gene Expression Omnibus3. Stuart JM, Segal E, Koller D, Kim SK: A gene-coexpression network for global

discovery of conserved genetic modules. Science 2003, 302(5643):249-255.

Co-expression datasets

1.

2.3.

Page 36: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Pipeline/ cluster

...

(W)C

ON

SE

NS

US

Mo

tifS

amp

ler

ME

ME

MD

mo

du

le

Gib

bs

Mo

tif

Sam

ple

r

Bio

Pro

spec

tor

Accuracyassessmentframework (AAF)

Results

Co-expr'nEnsEMBL(vXX)

Visualization

Original outputfiles fromdiscoverymethods

Gene list

Sequence setbuilder

Synthetic ortholguegenerator (DUNE)

Assign method-independent motif

significance

Knownregulatory motifsfrom literature, incommon format

Accuracyassessment

FASTA sequence sets for each target gene

Target genesequence

set

Backgroundsequence

set

'Null' distnsequence

set

'Known' resources

TRANSFAC

JASPAR

User PFMs,site seqs, ...

GeneOntology

Manager for discovery pipeline jobs

Individualmotifs

Nonredundantmotif clusters

Modules

Versions

Protein-proteinbinding

Literaturemining

GenerateFASTA files formotif discovery

Co-expressed gene pipeline

SAGE data

Affy data

cDNA array data

Identify reliablyco-expressed

genes (Pearsondistance)

Method'wrapper', withmotif discoveryapplication andpre- and post-processing

AN

N-S

pec

Alig

nA

CE

Output files incommon text

format

Filtered motifs

cisRED

Filter motifsby p-value

Convert results

Identifyknownmotifs

Motif clustering

Module detection

Visualize results

Sequencedata

Orthologydata from'compara'

Accuracyresults

Training /optimizing

The Regulatory element Pipeline

Gene Expression Data Sequence Identification Algorithm ImplementationPost-ProcessingKnown ResourcesAccuracy Assessment

Page 37: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Set of Motif Discovery

Algorithms

✗WCONSENSUS✗PHYLOCON✗TEIRESIAS✗MOTIFSAMPLER✗MEME✗MDMODULE✗GIBBS✗CONSENSUS✗BIOPROSPECTOR✗ANNSPEC✗ETC.

Bck Files 2

Back Files N

Bck Files 1

Input MFA 1

Input MFA 2

Input MFA N

Convert

Input file, formatspecific to method M

Input file, formatspecific to method 2

Input file, formatspecific to method 1

HPC Cluster

368 CPUs running #Genes X #[Algorithm, Parameterset] jobs

Raw output(method dependant)

Convert

Standardized, MethodIndependent Results

Pipeline Core Parallel Multi-Method Pipeline

Page 38: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

width w

weight w1

weight w2

weight w3

weight wn

SequenceSimilarity(weighted)

Information Content Profile“known” # seq with hits

vs# seq in input file

#base freq compared to

whole genome

# input sequences

Scoring Function (for target sequence hit)

InformationContentProfiles

Transfac

JASPAR

Pipeline core

Discovery Output

Sequence weights based

on phylogenetic distance or co-

expression

Method Independent Scoring

Determine SNP profile for all species sequence

Page 39: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Cumulative distributions of MI scores

Page 40: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

HitPlotter: 1500bp

Page 41: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

...

TATA box

Page 42: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Page 43: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Page 44: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Page 45: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Page 46: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Page 47: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Co-occurring Motifs

• Red and Blue motifs co-occur in the promoter regions of these two genes

• The separation of the two motifs may be constrained

• Use co-occurrence motifs to define regulatory modules

Page 48: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Putting it all together…

Pipeline/ cluster

...

(W)C

ON

SE

NS

US

Mo

tifS

amp

ler

ME

ME

MD

mo

du

le

Gib

bs

Mo

tif

Sam

ple

r

Bio

Pro

spec

tor

Accuracyassessmentframework (AAF)

Results

Co-expr'nEnsEMBL(vXX)

Visualization

Original outputfiles fromdiscoverymethods

Gene list

Sequence setbuilder

Synthetic ortholguegenerator (DUNE)

Assign method-independent motif

significance

Knownregulatory motifsfrom literature, incommon format

Accuracyassessment

FASTA sequence sets for each target gene

Target genesequence

set

Backgroundsequence

set

'Null' distnsequence

set

'Known' resources

TRANSFAC

JASPAR

User PFMs,site seqs, ...

GeneOntology

Manager for discovery pipeline jobs

Individualmotifs

Nonredundantmotif clusters

Modules

Versions

Protein-proteinbinding

Literaturemining

GenerateFASTA files formotif discovery

Co-expressed gene pipeline

SAGE data

Affy data

cDNA array data

Identify reliablyco-expressed

genes (Pearsondistance)

Method'wrapper', withmotif discoveryapplication andpre- and post-processing

AN

N-S

pec

Alig

nA

CE

Output files incommon text

format

Filtered motifs

cisRED

Filter motifsby p-value

Convert results

Identifyknownmotifs

Motif clustering

Module detection

Visualize results

Sequencedata

Orthologydata from'compara'

Accuracyresults

Training /optimizing

Gene SpecificMotifs andModules

Tissue SpecificGene Expression

Patterns

Tissue SpecificMotifs andModules

Page 49: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

…And best laid plans

• But, Mousie, thou art no thy lane, In proving foresight may be vain; The best-laid schemes o' mice an' men Gang aft agley, An'lea'e us nought but grief an' pain, For promis'd joy!

– Robert Burns

Page 50: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

The Moral of the Story

• If you’re a mouse, don’t make your home in a farmer’s field – build it next to the field!

• Risk Management!• What are the issues associated with

running a large bioinformatics activity ?

Page 51: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Running a bioinformatics group

• What does everyone do ?• How are they doing it ?• Are they talking to the right people ?• Have they got the right requirements ?• Is anyone waiting for information ?• Are they running on schedule ?• Is there an issue that needs escalating?• Are there HR, training, management,

coaching issues that need to be addressed ?

Page 52: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Organizational Complexity

• The organizational complexity of bioinformatics projects has increased:– Made up of larger teams– Have multiple stakeholders– Contain many organizational layers

Page 53: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Technical Complexity

• Number of databases increasing• Number of methods increasing• Body of knowledge is developing

rapidly• Requirements change rapidly• Must be well-read in a large number

of fields

Page 54: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Common Statements

– “Things change all the time - it’s impossible to plan”

– “I’d like you to do some analysis”– “I don’t have time to plan”– “We’ll figure it out as we go along”

– Not so common – “An ounce of prevention is worth a pound of cure”

Page 55: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Software Engineering Management

• Large body of knowledge• Requirements engineering• Architecture and Design• Validation• Change management• Risk management

Page 56: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Solutions at the GSC

• CM controls• Bug tracking controls• Some validation controls• Various levels of design and architecture• Implementation of structured engineering

process under way to define, track and manage work– Requirements control– Risk/Change management

Page 57: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

• ~90% of work performed by the group can be planned or have a LOE assigned

• Some areas harder – finishing a genome, algorithm development, exploratory analysis

• There is always a schedule and a budget

Page 58: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Process controls risk… but at a cost

RE

=P

(L)*

S(L

)

Time and effort invested in Plans

P(L) = probability of loss

S(L) = size of loss

RE Due toInadequateplanning

RE due toMarket shareerosion

Page 59: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Hacky Scripts/Code have their place

• Ideal for prototyping– Only prototype when you are trying to

get a handle on things

• Throw away the prototype, when you’re done experimenting!

• …but stop and think!

Page 60: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

And standards…

Page 61: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

Steven JonesGenome Sciences Centre

Asim SiddiquiScott ZuyderduynRichard VarholDerek LeungKevin TeagueLisa LeeAnita Landry

Mouse Atlas

Elizabeth M. Simpson CMMT

Robert XieSlavita BohacecByron Kuo

Adrian BurkeGenomeBC

Caroline AstellProject Manager

Pamela HoodlessTerry Fox Laboratory

Jim RupertMona WuRebecca Cullum

Cheryl HelgasonCancer Endocrinology

Brad HoffmanTeresa Ruiz de AlagaraIda Zhang

Marco MarraGenome Sciences Centre

Jaswinder KhattraAllen DelaneyJennifer AsanoSusanna Chan

Gregory RigginsJohn Hopkins

Page 62: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

CisRedGSCMarco MarraGordon RobertsonRichard VarholKevin TeagueObi GriffithErin PleasanceDebra FultonKeven LinMikhail BilenkyNeil RoberstonMonica SluemerStephen MontgomeryAsim Siddiqui

Ian Holmes, UC Berkeley

Ewan Birney, EBI

Stanford UniversityRick MyersNathan TrinkleinShelley Force AlldredSarah Hartman

Page 63: BC Cancer Agency Genome Sciences Centre BC Cancer Agency Of Mice and Motifs and Best Laid Plans Michael Smith Genome Sciences Centre Terry Fox Laboratory

BC Cancer Agency

Genome Sciences Centre

www.mouseAtlas.org

www.cisRed.org