Upload
cole
View
29
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Chapter 6: Structural Variation and Medical Genomics. CS-6293 Bioinformatics Instructor: Dr. Jianhua Ruan. Presented by: Nesthor Perez. Outline. Outline. 1. Introduction. Based on the genetic every single human has different genomes. - PowerPoint PPT Presentation
Citation preview
Chapter 6: Structural Variation and
Medical GenomicsCS-6293 Bioinformatics
Instructor: Dr. Jianhua Ruan
Presented by: Nesthor Perez
Outline
Nesthor Perez
1. Introduction
2. Germline and Somatic SVs
3. Technologies for Measurement of SVs
4. Resequencing Strategies for SVs
5. Representation of SVs
6. Challenges for Cancer Genomics
7. Future Prospects
Outline
Nesthor Perez
1. Introduction
2. Germline and Somatic SVs
3. Technologies for Measurement of SVs
4. Resequencing Strategies for SVs
5. Representation of SVs
6. Challenges for Cancer Genomics
7. Future Prospects
Nesthor Perez
1. Introduction• Based on the genetic every single human has
different genomes.• Based on each genome there’s special trait for
diseases.• GWAS identified common germline.• DNA variants are associated to: diabetes, heart
deseases, and other deseases.• GWAS only explained fraction of heritability of traits.
Nesthor Perez
1. IntroductionEvery single person:
Has a different genome sequence:
Based on each person genetic and genomes, special trait are applied for each disease.
Nesthor Perez
1. Introduction• Cancer Genome Sequencing Studies identified
Somatic Mutations associated with cancer progression.
• This mutations are very heterogeneous.• Few mutations are common between patients.• Hard to associate mutations to cancer causes.• Comprehensive studies involve “all variants”.
Individual genomes are req for each case.
Nesthor Perez
1. Introduction• GWAS focus on Single Nucleotide Polymorphism:
every single human genome is unique.• Previously Germline Variants identified SCALES
ranging of DNA sequences:SNP’s Structural Variants
• Examples:– Duplications.– Deletions.– Inversions.– Translocations.
Nesthor Perez
1. Introduction• Then, GWAS identified common Single Nucleotide
Polymorphism SNP’s:Common SNP’s for common diseases (similarities).Common Variants between diseases (differences).
• Main purpose: Disease Association and Cancer Genetics Studies.
• In the last 5 years, DNA sequence next-generation technology become commercially available to companies: Illumina Life TechnologyComplete Genomics
Nesthor Perez
1. IntroductionChromosome components:
Nesthor Perez
1. IntroductionA reference genome range from SNPs to Stuctural Variants:
Nesthor Perez
1. IntroductionIn the last 5 years, these companies develop sequencing technology:
Consequently DNA cost decreased
Nesthor Perez
1. Introduction• Consequently the cost of DNA practice has
decreased.• DNA at low cost, the study of all variables is possible.• All variables:
Germlines. Somatics. SNP’s (Single Nucleotide Polymorphism). SV’s (Structural Variants).
• This paper talks about these sequence technologies, especially on Structural Variables: SV’s.
Outline
Nesthor Perez
1. Introduction
2. Germline and Somatic SVs
3. Technologies for Measurement of SVs
4. Resequencing Strategies for SVs
5. Representation of SVs
6. Challenges for Cancer Genomics
7. Future Prospects
Outline
Nesthor Perez
1. Introduction
2. Germline and Somatic SVs
3. Technologies for Measurement of SVs
4. Resequencing Strategies for SVs
5. Representation of SVs
6. Challenges for Cancer Genomics
7. Future Prospects
Nesthor Perez
2.1 Germline Structural Variation• Human Genetic Study has a big purpose:
Identify a unique DNA sequence• Attempts:
Identify common SNP’s (HapMap project).Whole-Genome Seq & Micro-Array measurement found
similar SV’s for:DuplicationsDeletionsInversions
Then, common SV’s are now linked to:AutismSchizophrenia
Nesthor Perez
Human Genetics Study purpose:Identify a unique DNA
sequencing.
2.1 Germline Structural Variation
Steps:
Identify common SNPs
Whole-Genome Seq and Micro-Array measurement found similar SVs through:
- Duplications- Deletions- Inversions
Large DNA seq
Nesthor Perez
2.2 Somatic Structural Variation• Cancer: driven by somatic mutations accumulated in
life: “Micro Evolutionary Process”.• Early studies in Leukemia and Lymphoma.• Identified as “Recurrent Chromosomal
Rearrangements”.• Present in many patients with the same cancer.• DNA sequence Next-Generation reconstruct how
cancer genomes are organized at single nucleotide resolution.
Nesthor Perez
2.3 Mechanisms of Structural Variation
• Base on the amount of sequence similarity
(homology) at the breakpoint of SV’s, there are two
mechanism:
NHEJ: Non-Homologus End Joining:Little or no sequence similarity.
NAHR: Non-Allelic Homologous Recombination:High sequence similarity.
Nesthor Perez
Cytogenetic Techniques:
Chromosome Painting:
2.3 Mechanisms of Structural Variation
Nesthor Perez
Cytogenetic Techniques:2.3 Mechanisms of Structural Variation
Nesthor Perez
Cytogenetic Techniques:
Fluorescent in Situ Hybridization (FISH):
2.3 Mechanisms of Structural Variation
Nesthor Perez
(FISH)
Outline
Nesthor Perez
1. Introduction
2. Germline and Somatic SVs
3. Technologies for Measurement of SVs
4. Resequencing Strategies for SVs
5. Representation of SVs
6. Challenges for Cancer Genomics
7. Future Prospects
Outline
Nesthor Perez
1. Introduction
2. Germline and Somatic SVs
3. Technologies for Measurement of SVs
4. Resequencing Strategies for SVs
5. Representation of SVs
6. Challenges for Cancer Genomics
7. Future Prospects
Nesthor Perez
• SV’s features are based on: Size.Complexity.Ranging: from hundred of nucleotides to large scale of
chromosome rearrangements.Cytogenetic Techniques:
Chromosome Painting.Spectral Karyotyping (SKY).Fluorescent in Situ Hybridization. (FISH)
3. Technologies for Measurement of Structural Variation
Nesthor Perez
3. Technologies for Measurement of Structural Variation
• Large SV’s can be observed on CHROMOSOMES:
Nesthor Perez
3.1 Microarrays• This technology was used for the first genome-wide
survey in 2004.• This technique apply the concept of “array
Comparative Genomic Hybridization: aCGH.• Reference genome are identified by a fluorescent
color.• By now, there are hundreds of thousands of probes
avaiables.• Since individual copy number ratios are subject to
experimental errors, computational techniques are required to analyze aCGH.
Nesthor Perez
3.1 Microarrays
Nesthor Perez
3.1 Microarrays• aCGH can be used to measure both: germline SV’s in
normal genomes and somatic SV’s in cancer genomes.
• aCGH initially was developed for cancer genomics applications.
• aCGH now is also used to detect copy number variants in large number of genomes at low cost.
• aCHG limitations:Detects only copy number variants.Requires that genomic probes from the reference genome
lie in non-repetitive regions.
Nesthor Perez
3.2 Next-generation DNA Sequencing Technologies
• Since DNA sequencing technology has demonstrated substantial sophistication, the DNA analysis cost has decreased a lot, too.
• A limitation can be the length of a DNA that can be sequenced.
• DNA short sequences range from 30 to 1000 nucleotides, or base pairs (bp).
Nesthor Perez
3.2 Next-generation DNA Sequencing Technologies
• Some DNA sequence technologies use a paired-end sequencing protocol to increase read length.
• At earlier Sanger sequencing protocols the DNA fragments size depended on the cloning vector.
• At next-generation technologies, several techniques have been used to generate paired reads.
• Today, latest techniques produce paired reads from fragments of only a few hundred bp to fragments of 2-3 kb.
Nesthor Perez
3.2 Next-generation DNA Sequencing Technologies
• Next-generation sequencing technologies have limited read lengths and limited insert sizes in comparison to Sanger sequencing.
• Two approaches to detect SV’s using DNA next-generation technology:Novo Assembly:
Sophisticated algorithms are used to reconstruct genome sequences from overlaps between reads.
Human genome assemblies are highly fragmented.
Nesthor Perez
3.2 Next-generation DNA Sequencing Technologies
• Two approaches to detect SV’s using DNA next-
generation technology:
Resequencing:Differences are found between an individual genome and a related
reference genome.
These differences are the same differences between the aligned
reads and the reference sequence.
Nesthor Perez
From earlier DNA Generation to new sequencing technology:
3.2 Next-generation DNA Sequencing Technologies
Advantages:
Disadvantages:Limitation in the length of a DNA molecule to be sequenced:
Today’s technologies produce “SHORT SEQUENCES” of DNA.Range:
30 1000 nucleotides
In order to increase read length, these DNA sequencing technologies use:Paired End or Mate Pair
Nesthor Perez
3.2 Next-generation DNA Sequencing Technologies
There’re two approaches to detect SVs:
Nesthor Perez
3.3 New DNA Sequencing Technologies• Previous DNA technologies challenges have been
several limitations.• For example:
SV’s breakpoints in high-repetitive sequences.• Third-generation and single molecule technologies
offer additional advantages for SV’s:– Longer reads lengths.– Easier sample preparation.– Lower input DNA requirements.– Higher throughput.
Nesthor Perez
3.3 New DNA Sequencing Technologies• Third-generation technologies expected
improvements:– Paired reads:
Include more than two reads from a single DNA fragment.– Long-range sequence information with low input DNA
requirements.• Sequencing technologies keep a fast development
thanks to the improvements of:– Chemistry.– Imaging.– Technology manufacture.
Nesthor Perez
3.3 New DNA Sequencing Technologies• New improvements are expected about:
– Increasing read lengths.– Inserting lengths.– Enhancing throughput.
• A new sequencing technology is the “Nanopore”, which directly read the nucleotides of long molecules of DNA, giving a dramatic advance.
• Using Nanopore, extremely long reads (tens of kb) are generated.
Nesthor Perez
Longer read lenghts:New features:
3.3 New DNA Sequencing Technologies
Higher throughput:
Nesthor Perez
New features:
3.3 New DNA Sequencing Technologies
Easier sample preparation
Nesthor Perez
New features:
3.3 New DNA Sequencing Technologies
Lower input DNA requirements:
Nesthor Perez
Keep active development thanks new improvements around:
3.3 New DNA Sequencing Technologies
Chemistry: Imaging Processing:
Data Processing:
Outline
Nesthor Perez
1. Introduction
2. Germline and Somatic SVs
3. Technologies for Measurement of SVs
4. Resequencing Strategies for SVs
5. Representation of SVs
6. Challenges for Cancer Genomics
7. Future Prospects
Outline
Nesthor Perez
1. Introduction
2. Germline and Somatic SVs
3. Technologies for Measurement of SVs
4. Resequencing Strategies for SVs
5. Representation of SVs
6. Challenges for Cancer Genomics
7. Future Prospects
Nesthor Perez
4. Resequencing Strategies for Structural Variation
• Purpose:Predict SV’s by alignments of sequence reads to the reference genome.
• Steps:Alignments of readsPrediction of SV’s from alignments.
• Resequencing is straightforward in principle but detection of SV’s in human genomes is really hard.
• Some types of SV’s are easy to detect, other are really difficult.
Step 1: Alignments of reads:
4. Resequencing Strategies for Structural Variation
Reads
Step 2: Predictions of SVs from alignments:
4. Resequencing Strategies for Structural Variation
“Disease”
Nesthor Perez
4. Resequencing Strategies for Structural Variation
• Some SV’s are hard to detect due technological limitations and biological features.
• Technological limitations: Sequencing errors. Limited read lengths. Insert sizes.
• SV’s biological features :Enriched for repetitive sequences near their breakpoints.Overlap: multiple states or complex architectures.Recurrent variants at the same locus.
Nesthor Perez
4. Resequencing Strategies for Structural Variation
• Therefore, alignments and predictions of SV’s are not easy tasks.
• Effective algorithms are required for highly sensitive and specific predictions of SV’s.
• Three approaches to identify SV’s from aligned reads: Split reads. Depth of coverage analysis. Paired-end mapping.
Nesthor Perez
4.1 Read Alignment• This is one of the most researched problem in
Bioinformatics.• Specialized task of aligning millions to billions of
individual short reads is done by software like:Maq.BWA.Bowtie/Bowtie2.BFAST.mrsFAST.
Nesthor Perez
4.1 Read Alignment• Reading alignment can be done getting a single
alignment for each read, or reads with multiple high-quality alignments.
• Choosing an alignment randomly with multiple alignments of equal score, is another option.
• In case of unique alignment, there’s a limitation to detect SV’s with breakpoints in repetitive regions.
• In case of ambiguous alignment, SV’s prediction requires an algorithm to distinguish between multiple possible alignments for each read.
Nesthor Perez
4.2 Split Reads• This is a direct approach to detect SV’s where
alignments are in two parts.
• To reduce false positive predictions, multiple split
reads are required.
• Split reads is only feasible when reads are sufficient
long.
Nesthor Perez
4.3 Depth of Coverage• Depth of coverage detects differences in the number
of reads that align to intervals in the reference genome.
• The number of reads in a nucleotide is:c = NL , where N is the number of reads G L is the length of each read
G is the length of the genome c is the coverage
• An example is “30X coverage”, which means a number of reads of c = 30.
Nesthor Perez
4.3 Depth of Coverage• In case an individual genome got a deletion of a
segment, the coverage of this segment is reduced to the half.
• In case an interval of the reference genome was duplicated or amplified, the coverage increases in the same number of copies.
• The coverage depth indicates the number of copies of this interval in the genome.
• Coverage calculation is affected by repetitive sequences.
Nesthor Perez
4.4 Paired-end Sequencing and Mapping• This is the most common resequencing approach.• This is used to identify somatic SV’s in cancer
genomes and germline SV’s.• This is using several next-generation sequencing
technologies.• This is used to obtain paired reads from opposite
ends of a larger DNA.• The length of particular sequenced fragment is
unknown.
Nesthor Perez
Outline
1. Introduction
2. Germline and Somatic SVs
3. Technologies for
Measurement of SVs
4. Resequencing Strategies for
SVs
5. Representation
of SVs
6. Challenges for Cancer
Genomics
7. Future Prospects
Nesthor Perez
Outline
1. Introduction
2. Germline and Somatic SVs
3. Technologies for
Measurement of SVs
4. Resequencing Strategies for
SVs
5. Representation
of SVs
6. Challenges for Cancer
Genomics
7. Future Prospects
Nesthor Perez
5. Representation of Structural Variants• Earlier DNA technologies have reduced the survey
cost of SV’s.• The Cancer Genome Atlas (TCGA) are performing
paired-end sequencing and aCGH of several human genomes.
• On the other hand, Microarray-based techniques are being used for small or single investigator projects.
• Therefore, in the future there’s an expectation of enormus number of measurement of SV’s.
Nesthor Perez
Outline
1. Introduction
2. Germline and Somatic SVs
3. Technologies for
Measurement of SVs
4. Resequencing Strategies for
SVs
5. Representation
of SVs
6. Challenges for Cancer
Genomics
7. Future Prospects
Nesthor Perez
Outline
1. Introduction
2. Germline and Somatic SVs
3. Technologies for
Measurement of SVs
4. Resequencing Strategies for
SVs
5. Representation
of SVs
6. Challenges for Cancer
Genomics
7. Future Prospects
Nesthor Perez
6. Challenges for Cancer Genomics Studies• Most cancer genomes are aneuploid, so the number
of copies of regions are variables.• High-resolution reconstruction of cancer genomes
are too small to be detected by cytogenetics.• Cancer is a heterogeneous mixture of cells with
possibly several number of mutations.• Heterogeneity means admixture and subpopulation
of tumor cells.• Some subpopulations contain mutations.• Most cancer genomes do not sequence single tumor
cells. They sequence mixture of cells.
Nesthor Perez
Outline
1. Introduction
2. Germline and Somatic SVs
3. Technologies for
Measurement of SVs
4. Resequencing Strategies for
SVs
5. Representation
of SVs
6. Challenges for Cancer
Genomics
7. Future Prospects
Nesthor Perez
Outline
1. Introduction
2. Germline and Somatic SVs
3. Technologies for
Measurement of SVs
4. Resequencing Strategies for
SVs
5. Representation
of SVs
6. Challenges for Cancer
Genomics
7. Future Prospects
Nesthor Perez
7. Future Prospects• It will be possible to systematically measure nearly all
but most complex variants in an individual genome.• SV’s between nearly identical sequences might
remain inaccesible until significally different types of DNA sequencing technologies become available.
• Having a complete list of germline SV’s, unsolved heritability for a trait cannot readily be the cause of lack of measurement of genetic information.
• The efficacy of particular treatments will require additional and hard working for future successfull results.
Nesthor Perez
Thanks