of 24 /24
Multi-Sample analysis of microarray based copy-number aberration data Gregory R. Grant [email protected] Mitchell Guttman [email protected] March 6, 2006 Copy Number Detection Meeting

Multi-Sample analysis of microarray based copy-number aberration data

  • Author
    jennis

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

Multi-Sample analysis of microarray based copy-number aberration data. Copy Number Detection Meeting. March 6, 2006. Gregory R. Grant [email protected] Mitchell Guttman [email protected] Motivating Framework. - PowerPoint PPT Presentation

Text of Multi-Sample analysis of microarray based copy-number aberration data

  • Multi-Sample analysis of microarray based copy-number aberration dataGregory R. [email protected]

    Mitchell [email protected]

    March 6, 2006Copy Number Detection Meeting

  • Motivating FrameworkThe ability to map the location and magnitude of aberrations is important.

    Aberration regions can be small.

    We are interested in regions of copy number aberrations (CNA) that are recurrent across a class of samples.Myc-N Amplification in high risk Neuroblastoma.ErbB2 Amplification in higher risk Breast Cancer.Both of these are highly correlated with prognosis.

  • Single Slide MethodsThere are numerous single slide methods for determining aberration within an array.

    These methods use multiple elements in a region as replicates for determining aberration in the region.

    With a single slide this is the best one can do.The resolution of detection is lower than the resolution of the array.

    With multiple slides we can take a different strategy. Be more liberal on the single slide calls.Only believe the calls when we see them replicated across samples significantly often.

    Finally, while there may be aberration present within a single array that is not present across samples, this aberration is unlikely to be due to a population effect.

  • Multiple Sample Analysis (MSA)The ability to use multiple samples as replication we are able to characterize the genomic aberrations at a higher resolution (at the resolution of the array). This also allows us to identify regions of importance to the population.

    Use Information from multiple samples to find aberrations characteristic to the class of samples.

    Rather than looking across the genome, we look across experiments at each location.

    This allows us to pickup small regions of tight concordance regardless of their small size within a single experiment.

  • STAC Statistical AlgorithmGiven a set of calls STAC finds aberrations which are significantly concordant across samples.

    STAC provides two statistical tests of significance, the footprint and frequency.

    Frequency measures the number of samples that overlap a particular clone.

    Footprint measures how tight the overlap is.

    Frequency = 5 in both cases.Footprint 7Footprint 4

    http://www.cbil.upenn.edu/STAC/

  • Motivating Dataset (Mies Lab)Fixed Paraffin Embedded (FFPE) Sample DNA.

    Challenging case

    Laser Captured Micro-dissected samples from FFPE, archived (10+ years), degraded tissue, with no exact normal analog.

    Indirectly labeled samples due to small quantity of DNA. Due to a need for sufficient amplification

    Amplification based on human specific degenerate oligo primers.

    2-Channel BAC Arrays made by the Penn Microarray Core based on the Weber library.

  • Making Calls and Processing DataRatios are formed for each clone with the reference (normal) intensity in the denominator and the experimental sample in the numerator.If a segment of DNA containing a clone is not altered, then ideally the ratio for that clone should be 1.If (in one chromosome) a segment of DNA containing a clone is missing, then ideally the ratio should be 1/2.If (in one chromosome) a segment of DNA containing a clone is duplicated ideally the ratio should be 3/2.If the segment is tripled then ideally the ratio should be 2.Of course data are noisy and subject to bias and artifacts.

  • Processing IssuesClone/Array quality issues

    Clone mapping issuesOverlaps and inconsistenciesUnequally spaced clonesHow to infer behavior at locations between clonesTiling Paths

    Clone-to-clone variation Differing clone hybridization affinities and clone/dye interaction effects, etc

    NormalizationRemoving dye-bias, etcWithin array normalizationBetween array normalizationNature of clone coverage. Inconsistent spacing due to both technical considerations as well as biological reality.

  • First Step: Develop a parameterized protocol for single slide calls.Make calls per cloneUse normal/normal distributionMake calls for each nucleotide covered by at least 1 cloneHow to deal with overlapping clones.How to deal with replicate (and potentially inconsistent) clones.Extend the calls to regions with no coverage.Develop method for extension from neighboring clones.Determine how to divide regions flanked by inconsistent clones.Standardize genome spacing for analysis.Merging continuous genome into discrete regions.How to deal with overlapping regions

  • Making clone-wise calls from raw dataAbsolute threshold cutoffs.

  • Using Normal ControlsUsing normal samples as controls.A distribution of sample normals analogous to the test channel of interest hybridized to an identical reference channel as used for the experimental hybridizations Possible cutoff parameters using normal samplesPercentilesStandard deviationsZ-scoresUser specified

    Given a fixed scheme (above), how can we find an optimal parameter setting?

  • Extending calls to regions with no coverageNote: We dont extend over all length only small spans. We cutout regions longer than a specified length.

  • Standardizing Genome Spacing

  • Analysis In an ideal situation we would believe every aberration call.

    We would then ask the question: which aberrations occur concordantly across samples?

    This is where the STAC statistic helps us out.

  • Finding a reasonable cutoffFor cutoff SD=1, we are definitely picking up false signal.

    For cutoff SD=6 we are likely missing true signal.

    Looking one slide at a time it is hard to tell what is a reasonable cutoff.A single array with calls made at 11 different cutoff values.

  • 6 normals, 15 tumor samples, in parallelfor 11 values of the SD cutoff1.0

    1.5

    2.0

    2.53.0

    3.5

    4.0

    4.55.0

    5.5

    6.0

  • High CutoffMiddle CutoffLow Cutoff

  • MethodologyAvoid making decision on cutoffs.

    Calculate significance, at a range of cutoff values, using STAC at each cutoff.

    Combine results using multiple testing correction.

    End PointStart PointPercent AberrationSD Cutoff ValuesLess ConservativeMore Conservative

  • ResultsChromosome 8 important in breast cancer.

    Provides fine resolution of aberration.Rather than simply providing gross changes.Able to characterize aberration at the resolution of the array.

    Able to characterize important regions.Myc, FGFR, etc.Other regions previously uncharacterized.

  • ChARM: Chromosome 8

  • CBS: Chromosome 8

  • MSA: Chromosome 8Able to characterize a 1Mb amplification of the FGFR oncogene All single slide methods missed this.

    Able to picks up the Myc oncogene amplification Single-slide methods missed despite its presence in every sample.

    Also characterizes other regions.Some of these regions the single slide methods were able to detect Detected other smaller regions of aberration

    Allows finer resolution mappingSmaller regions are either missed or clumped together or into larger regions of aberration.MYCNote: We are working on adding the CBS algorithm implementation to MSA to allow the use of its single slide approach to our Multiple Sample ApproachFGFR

  • DiscussionTo our knowledge, there are no methods that combine preprocessing and analysis harnessing the power of multiple samples.Because most methods are single array methods, integration between experiments is difficult to define.MSA provides statistical analysis at higher resolution.MSA works with difficult data: Based on Pinkel and Albertson scale of difficulty, our method has been tested, and works well, with 5/6 criteria.

  • Future PlansHandle Affymetrix SNP Chip data.Many of the ideas for leveraging multiple samples should also apply to the anaylsis of Affy SNP data. We are currently working on this extension.

    Release stand-alone GUI software package (CGH-MSA).To be released this month.www.cbil.upenn.edu/MSA

    Incorporate Single slide methods.

    Extend the STAC algorithm beyond binary data to account for levels of change.

    Estimate bias in non-Controlled experiments.

    Our method works really well on other cases as well.Downloadable app