Genomics Technologies

Sean Davis, M.D., Ph.D.Genetics Branch, Center for Cancer ResearchNational Cancer InstituteNational Institutes of Health

High-Resolution ViewsOfThe Cancer Genome

I am going to spend a few minutes illustrating how existing and emerging high-throughput genomic technologies are being used to understand cancer, a mindnumbingly complex and disregulated biologic process.

The Human Genome Project

The Central Dogma

phenotypeGene Copy NumberSequence VariationChromatin Structure and FunctionGene ExpressionTranscriptional RegulationDNA Methylation

Patient and Population Characteristics

Since Knudsons famous hypothesis proposing the two-hit model, our understanding of cancer as a genetic disease has progressed to the realization that cancer is not often a function of a single gene gone awry, but probably represents a complex interaction of multiple processes in the genome including altered copy number, gene expression, transcriptional regulation, chromatin modification, sequence variation, and DNA methylation. It is vital to the goal of producing better patient outcomes to understand not only what genes are involved in a certain type of cancer, but also how these other processes affect gene regulation. In short, an integrated view of the cancer genome is necessary and is now becoming possible.

Overview

MicroarraysGene expression

Comparative genomic hybridization

Tiling arrays and data integration

Next-generation sequencingDNAse-Seq application

Normal KaryotypeTumor Karyotype

The first karyotypes were produced in 1956. Shown here is a comparison of a normal karyotype of a normal female and one from a tumor. By 1960, a karyotype of a cancer genome revealed the presence of the Philadelphia chromosome. Now known to represent the BCR-ABL fusion protein, it was not until 33 years later in 1993 that a drug, gleevec, become available that targeted the fusion product. By applying high-throughput microarray technologies, the Cancer Genetics Branch is striving to make observations of the cancer genome that will provide deeper understandings of the biology of cancer, to develop prognostic and diagnostic markers to improve patient-specific treatments, and to find promising targets for directed drug therapy.

Hybridization

Highly robust propensity of nucleic acid polymers to form dimers with known base pairings

Integral to life as we know it

Can be leveraged to build systems for interrogating biologic processes

1

2

3

4

5~30,000 genes

DNA Microarrays

Gene ExpressionRNA hybridized to DNA, one spot per array

Gene Expression Microarrays

Golub et al., Science 286:531-537. (1999).

Spellman et al., Molecular Biology of the Cell 9, 3273-3297. (1999).

Ductal Carcinoma In Situ

Breast cancer precursor or benign lesion?

Histologic grading used to help define treatment

Low grade and high grade clear-cut, but intermediate grade problematic

DNA Microarrays

Gene ExpressionRNA hybridized to DNA, one probe per gene

RNA hybridized to DNA, multiple probes per gene

RNA hybridized to DNA, one or more probes per exon

DNA Microarrays

Gene Expression

Comparative Genomic Hybridization (CGH)Genomic DNA hybridized to DNA

Useful for determining relative copy number of DNA across the entire genome

Not useful (directly) for determining genome structure

Array Comparative Genomic Hybridization (aCGH)

Tumor DNANormal DNAHybridize

Normal Copy NumberDNA LossAmplification(Tumor Suppressors)(Oncogenes)

The technology for looking at genomic copy number is quite simple. DNA extracted from tumors is labeled with a red fluorochrome and normal DNA is labeled in green. The two extracts are allowed to hybridize to a microarray slide that contains probes that will each bind the DNA from a specific region of the genome. A scanner extracts the intensities in the red and green channels, providing a precise measurement of the amount of tumor and normal DNA present at each spot. Spots that show more red represent a relative abundance of tumor DNA and are amplified while regions that show green represent a relative loss of tumor DNA. By lining the probes up along the chromosomes, we can begin to define regions of DNA copy number that could contain oncogenes or tumor suppressor genes.

Tumor ChromosomeNormal Chromosome

Doing so results in a very high-resolution map of, in this case, chromosome 8. On the left is what is typically seen in normal germline DNA. On the right is a view of a cancer chromosome. In red are regions of copy number increase, some with as many as 20-30 copies. In green are regions that show relative loss of DNA in the tumor and may represent LOH or total deletion. On average in this view, there is a probe every 75kb spaced throughout the genome or between 1.5 and 2 probes per gene. With this resolution, we can quickly determine the breakpoints associated with these regions of copy number and determine what genes are involved in a given ampification or deletion. Of course, the figures here represent only one chromosome.

A Genome View of Copy Number

Zooming out to look at the whole genome at once, the normal genome with normal female DNA in red and normal male DNA in green shows the expected abnormalities on the X and Y chromosomes. Comparing that to a single breast cancer genome reveals the richness of the data that we are producing. Nearly every chromosome shows some copy number alteration that can be mapped to the genome to produce lists of candidate genes. But with so many alterations, it is helpful to consider multiple genomes at once, as copy number changes that occur in multiple samples are more likely to be of biological importance and not simply a product of an unstable cancer genome.

Frequency of Copy Number Changes

Summary of copy number changes from 46 breast cancer cell lines

This figure shows the frequency of copy number changes along chromosome 17 and is a summary of the results of measuring copy number in 46 breast cancer cell lines. On the right-hand y-axis is noted the percentage of samples showing copy number gain (in red) and copy number loss (in green. I have marked the location of a gene known to be important in breast cancer, ERBB2 (also knows as Her-2). This gene is known to be amplified in 10%-40% of breast cancer, agreeing with our own estimate from the breast cancer cell lines, and is associated with poor prognosis. It is now the target of an directed monoclonal antibody therapy. It is enticing to think that hidden in the other peaks of copy number change are multiple other potential drug targets. With such a high-resolution overview of the breast cancer genome, we can begin to dissect each region to determine what those genes might be.

GLI3 mutations and deletions can lead to Greig cephalopolysyndactyly syndrome (GCPS)

Typically requires mutation screening, FISH, and some small deletions may be missed

Use array comparative genomic hybridization to get very high-resolution view of the region

DNA Microarrays

Gene Expression

Comparative Genomic Hybridization (CGH)

Single Nucleotide PolymorphismsArrays can be used as a genotyping platform

Again, DNA hybridized to DNA, but designed to detect the differences in hybridization due to a SNP

Can be used for measuring copy number, finding stretches of uniparental disomy, risk alleles for certain conditions, and in linkage and association studies

DNA Microarrays

Gene Expression

Comparative Genomic Hybridization (CGH)

Single Nucleotide Polymorphisms

DNA methylation

MicroRNA expression

Tiling array applications

Others....

Tiling Array Technology

To dissect some of these regions in more detail, we can employ a new, extremely flexible and powerful technology now referred to as Tiling Array Technology. It works in principle the same as the microarrays that I have already described except that we can design the arrays to cover portions of the genome with extraordinary resolution. Continuing with the example of ERBB2 that we saw in the last slide, we can take a zoomed-in look at the ERBB2 gene. Zooming in again, we are now looking at exons in blue connected by introns. We choose probes spaced throughout the region, covering both exons and introns. The technology has progressed so that we can measure the copy number of 400,000 probes on a single array at any resolution we desire.

Annotated GenesExpressionCopy Number, Sample 1Copy Number, Sample 2Simultaneous measurement of copy number in two samples and gene expression in one sample overlayed on map of genes in the region

Simultaneous Gene Expression
and Copy Number on Tiling Arrays

Annotated GenesExpressionCopy Number, Sample 1Copy Number, Sample 2Increased expression in the small amplicon does not include all genes, giving clues as to the biologically important genes in the region



Annotated GenesExpressionCopy Number, Sample 1Copy Number, Sample 2Spikes of expression at exons of ERBB2

Copy Number

ExpressionCopy Number

Evolutionary ConservationExpressionCopy Number

Opposite Strand ExpressionEvolutionary ConservationExpressionCopy NumberSimultaneous Expression
and Copy Number

ZNF217 is a candidate oncogene located at chromosome 20q13. When overexpressed in human mammary epithelial cells, it is sufficient to immortalize them. Here, I show how integrating gene copy number, gene expression, and other genomic information, namely evolutionary conservation helps to frame the observation that the gene is amplified, shows expression at the exons as expected, but also shows expression outside the exons that roughly correlates with areas of evolutionarily conserved sequence. Interestingly, there is expression on the opposite strand that is not accounted for by any known transcribed element (gene or otherwise). Observations of abberant transcription outside of exons and transcription not associated with any known genes reveal the NEED for more observations at this level of detail. However, they also raise questions that demand further experimentation at this incredible resolution to help understand the processes that lead to these phenomenon and the biological importance of them.

2,000 spots, 1997

8,000 spots, 2000

36,000 spots, 2003

85,000 to 390,000 spots, 2004

10,000,000 beads, 2005Growth in Density Over Time

And these numbers are from only a single array!And these numbers are from only a single array!

Excel doesn't work!

Why did the chicken cross the road?

Darwin1: It was the logical next step after coming down from the trees.

Darwin2: The fittest chickens cross the road.

Sequencing

Why use hybridization, which is just a measure of sequences, correct?Sequencing is costly, time- and labor-intensive, and inefficient

Next-generation sequencing technology changes the equation such that sequencing can be more efficient, cheaper, and less time- and labor-intensive than hybridization-based methods like microarrays

Next-generation Sequencing

Chromatin

Chromatin is the complex of protein and DNA that make up the chromosomes. It is not a static structure.

The nucleosomes are the basic building blocks of chromatin structure. Their positioning on the genome and the regulation of their placement is not well described.

DNAse is an enzyme that cuts DNA at locations where DNA is accessible

These accessible regions have been associated with open chromatin

Regions of open chromatin are necessary for transcriptional and regulatory machinery to have access to gene neighborhoods and facilitate transcription

DNAse Hypersensitivity

Method for finding regions of open chromatin

In data published with the ENCODE consortium, DNAse hypersensitive (HS) were shown to be correlated with:Histone modification

Transcription start sites

Early replicating regions

Transcription factor binding sites (experimentally determined by ChIP/chip, etc.)

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. The ENCODE Consortium. Nature, 2007.

DNAse-Seq Method

Crawford, G.E., Davis, S., Scacheri, P.C., Renaud, G., Halawi, M.J., Erdos, M.R., Green, R., Meltzer, P.S., Wolfsberg, T.G., and Collins, F.S. Nat Methods, 2006

Distances between sequences in non-DNAse HS regions have an oscillating pattern with frequency that corresponds to a single turn of the double-helix

DNAse is known to cut preferentially in the minor groove, which is exposed every 10.4 bases when wrapped around a nucleosome

A nucleosome is wrapped by 147 base pairs when complexed with DNA

Implication: Nucleosomes are positioned in a highly organized, precise manner

Nucleosome Positioning

PhenotypeGene Copy NumberSequence VariationChromatin ModificationGene ExpressionTranscriptional RegulationDNA Methylation

To this end, the Cancer genetics branch is actively developing and using these technologies to look at these other aspects of the cancer genome. Again the goal is to produce an integrated view of the cancer genome in unprecedented detail and to distill from that view genes of therapeutic import and convenience, observations of prognostic or diagnostic importance, and, in the process to answer questions of intrinsic biologic interest.

Public Data

NCBI Gene Expression Omnibus (GEO)250,000 microarray experiments already done!

NCBI Short Read Archive (SRA)Compendium of sequencing experiments utilizing next-generation sequencing technologies

GWAS databases

Databases of gene and protein function and interactions

Challenges

Most of these technologies are still quite expensive and do not adapt well to clinical laboratory settings

Designing studies that evaluate the operating characteristics of new testing methods is costly and requires the appropriate patient populations

There are many ethical concerns associated with the enormous amounts of personal information that might be gleaned from genomic technologies applied in the clinical setting

The Biggest Challenge?

How do we integrate all the disparate pieces of information, collected longitudinally and by many sources, to improve the health of the individual?

One day the zoo-keeper noticed that the orangutan was reading two books - the Bible and Darwin's The Origin of Species. In surprise he asked the ape, "Why are you reading both those books"?

"Well," said the orangutan, "I just wanted to know if I was my brother's keeper or my keeper's brother."

[email protected]

National Human Genome Research Institute

CancerGeneticsBranch

Health & Medicine

Genomics Technologies