17
Genomic Arrays: Tools for cancer gene discovery Ian Roberts MRC Cancer Cell Unit Hutchison MRC Research Centre [email protected]

Genomic Arrays: Tools for cancer gene discovery Ian Roberts MRC Cancer Cell Unit Hutchison MRC Research Centre [email protected]

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Genomic Arrays: Tools for cancer gene discovery

Ian RobertsMRC Cancer Cell UnitHutchison MRC Research [email protected]

2/17

What’s a genomic array?

A platform of regularly spaced genomic sequences All known genes or a subset of genes of interest

A tool for querying the genome about damage Genomic gains (oncogenes) Genomic losses (tumour suppressor genes)

Applications Research disease gene discovery Clinical diagnostic tests

Comparative genomic hybridisation

Tumour DNA(Test)

Normal DNA(Reference)

+Available probe

GAIN: More test probe than reference probe (oncogene)

LOSS: Reference probe in excess of test(tumour suppressor)

Vast majority is normal Array platform

4/17

New generation arrays produce large amounts of data

Agilent 244K array

243,504 defined spots

Raw data is foreground and background signal intensities in two channels

Median ratio of foreground is important.

aCGH data analysis ...

... using camgrid

6/17

Genomic array analysis strategy using R1. array data is processed by snapCGH R package

Correct array data for background noise and mean distribution Order data by genomic location Apply an aCGH segmentation algorithm Draw some plots

2. Determine significant findings (in house R functions) Common and minimum genomic regions of gain and loss Summarise output

R www.cran.r-project.orgsnapCGH www.bioconductor.orgparrot R on camgrid http://www.bio.cam.ac.uk/local/condor-parrot.html

7/17

Old vs. New genomic array plots

Chromosome 7

Significant region detection is computationally intensive

Distributed aCGH analysis

Consolidate output

Preprocess dataInput data to snapCGH(e.g. 3 chrs, 2 analysis

methods)Condor Job 1

Condor Job 2

Generate genome ordered data and condor dagman

analysis batch files

Chr 1 Chr 2 Chr 3

DNA copy GLAD DNA

copy GLAD DNA copy GLAD

Perform aCGH analysis + region

detection (1 run per Chr per analysis

method)

DNAcopy dagman description file

Score combining

1. Clone call scoring

n. Clone call scoring

Segmentation Step

CR

I M

RI

De

tec

tio

n

Dagman job 1 … n

10/17

Condor job scripting in BASH & R

BASH function Responsible for producing required condor files for discrete jobs Default_submit has 2 positional parameters

R script name $1 Data files $2

Initiates aCGH analysis on grid.

Condor dagman R function set R-scripter

Writes the appropriate R script for the current job R-condor-submitter

Writes the condor job submission file R-condor-executer

Writes the condor job executable file R-job-descriptor

Writes the condor dagman description file

11/17

End user abstraction – start_aCGH.sh

aCGH analysis undertaken by a single shell command Manages array data input Collects user specified parameters

Chromosome range Segmentation algorithms Significance thresholds

Links condor R job scripting

12/17

start_aCGH.sh session on mole

…. continued …1 hr – 6 hr later!

aCGH region information and plots

14/17

Summary findings (38 arrays)

• Rapid identification of regions of interest

• Easy comparison of aCGH analysis via different algorithms

Bio HMM

DNAcopy

Sam

ple

per

cen

tag

eS

amp

le p

erce

nta

ge

Reg

ion

siz

eR

egio

n s

ize

15/17

Real life application

Retrospective analysis confirms initial findings!(summary of 38 samples)

OSMR

Sam

ple

per

cen

tag

e

Reg

ion

siz

e

16/17

Future development

Tailor output for specific user requirements Produce overall summary plot Apply approach to expression arrays

www.bio.cam.ac.uk/~ir210

Grace Ng Steph Carter Konstantina Karagavriliidou Jenny Barna Mark Calleja Nick Coleman