Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
PR9_Genomske tehnologije –metode za analizo genomov
Genome sequencing was driven by
technological progress// Tehnološki
napredek je poganjal sekveniranje
genomov
Genomski projekti/Genome projects
Overview of a genome project.
First, the genome must be selected,
which involves several factors including
cost and relevance. Second, the
sequence is generated and assembled
at a given sequencing center (such as
BGI or DOE JGI). Third, the genome
sequence is annotated at several levels:
DNA, protein, gene pathways, or
comparatively.
Increase in the number of genomes completed per year separated by
bacterial phylum. Data source: NCBI, complete genomes
Lagesen K, Ussery DW, Wassenaar TM. 2010. Genome update: the 1000th
genome - a cautionary tale. Microbiology 156:603-608.
Genome sequencing/Sekveniranje genomov
Processing and analysis of genome data is a "big data problem"
Bioinformacijska orodja za analizo celotnih genomov
Whole-genome alignments
available online
Selected web-based bioinformatic tools and web
services for integrative cancer genome analysis
Genome Browsers
AGBL5 (ATP/GTP binding protein-like 5) is a protein-coding
gene. Diseases associated with AGBL5 include intrahepatic
cholangiocarcinoma, and cholangiocarcinoma. GO annotations
related to this gene include metallocarboxypeptidase activity and
tubulin binding. An important paralog of this gene is AGTPBP1.
HAVANA - human and vertebrate analysis
and Annotation (at the Sanger Institute)
The HAVANA group provides the manual annotation of
human, mouse, zebrafish and other vertebrate
genomes that appears in the Vega browser. The value
of a genome is only as good as its annotation. To
create a gold standard reference annotation the
Human and Vertebrate Analysis and Annotation
(HAVANA) team uses tools developed in-house to
manually annotate human, mouse and zebrafish
genomes. The team aims to develop accurate and
comprehensive annotation representing the full
complexity of gene loci and their features. Manual
annotation is especially important in areas that are not
well catered for by automated annotation systems, such
as splice variation, pseudogenes, conserved gene
families, duplications and non-coding genes. The
HAVANA team constantly updates its methods by
incorporating new data sources that are created as
new technologies are developed. HAVANA annotation
is freely available through genome browsers, including
VEGA, Ensembl and UCSC.
mouse GENCODE freeze (version M2, 10/12/2013)
Total No of Genes 38924
Protein-coding genes 22572
Long non-coding RNA genes 4074
Small non coding RNA genes 5853
Pseudogenes 5948
- processed pseudogenes: 4556
- unprocessed pseudogenes: 1157
- unitary pseudogenes: 14
- polymorphic pseudogenes: 15
- pseudogenes: 204
Immunoglobulin/T-cell receptor gene segments
- protein coding segments: 477
- pseudogenes: 2
Total No of Transcripts 94545
Protein-coding transcripts 47394
- full length protein-coding: 38260
- partial length protein-coding: 9134
Nonsense mediated decay transcripts 4134
Long non-coding RNA loci transcripts 6053
Total No of distinct translations 38862
Genes that have more than one distinct translations
7946
Human GENCODE freeze (version 19, 10/12/2013)
Total No of Genes 57820
Protein-coding genes 20345
Long non-coding RNA genes 13870
Small non-coding RNA genes 9013
Pseudogenes 14206
- processed pseudogenes: 10532
- unprocessed pseudogenes: 2942
- unitary pseudogenes: 161
- polymorphic pseudogenes: 45
- pseudogenes: 296
Immunoglobulin/T-cell receptor gene segments
- protein coding segments: 386
- pseudogenes: 230
Total No of Transcripts 196520
Protein-coding transcripts 81814
- full length protein-coding: 57005
- partial length protein-coding: 24809
Nonsense mediated decay transcripts 13052
Long non-coding RNA loci transcripts 23898
Total No of distinct translations 61559
Genes that have more than one distinct translations
13600
The GENCODE Project: Encyclopaedia of genes and gene variants (http://www.gencodegenes.org/)
‘Omics’ data are providing comprehensive
descriptions of nearly all components and interactions
within the cell.Omics data sets that describe virtually all biomolecules in the cell
are starting to become available. These data can be generally
classified into three categories: components, interactions and
functional-states data. Components data detail the molecular
content of the cell or system, interactions data specify links
between molecular components, and functional-states data provide
an integrated readout of all omics data types by revealing the
overall cellular phenotype. The central pathway traces the
biological information flow from the genome to the ultimate
cellular phenotype, and the available omics data types that are
used to describe these processes are indicated in the adjacent
boxes. From the top, DNA (genomics) is first transcribed to
mRNA (transcriptomics) and translated into protein (proteomics),
which can catalyse reactions that act on and give rise to
metabolites (metabolomics), glycoproteins and oligosaccharides
(glycomics), and various lipids (lipidomics). Many of these
components can be tagged and localized within the cell
(localizomics). The processes that are responsible for generating
and modifying these cellular components are generally dictated
by molecular interactions, for example by protein–DNA
interactions in the case of transcription, and protein–protein
interactions in translational processes as well as enzymatic
reactions. Ultimately, the metabolic pathways comprise
integrated networks, or flux maps (fluxomics), which dictate
the cellular behaviour, or phenotype (phenomics).
'Omics' data repositories
Data visualization. The University of California-Santa Cruz (UCSC) Genome Browser is a tool for viewing genomic data sets. A vast amount
of data is available for viewing through this browser. This example from the browser shows numerous data types in K562 cells from the ENCODE
Consortium. A random gene was selected — katanin p60 subunit A-like 1 (KATNAL1) — that shows several points that can be identified by using
this tool. The promoter has a typical chromatin structure (a peak of histone 3 lysine 4 trimethylation (H3K4me3) between the bimodal peaks of
H3K4me1), is bound by RNA polymerase II (RNAPII) and is DNase hypersensitive. The gene is transcribed, as indicated by RNA sequencing
(RNA–seq) data, as well as H3K36me3 localization. The gene lies between two CCCTC-binding factor (CTCF)-bound sites that could be tested
for insulator activity. An intronic H3K4me1 peak (highlighted) predicts an enhancer element, corroborated by the DNase I hypersensitivity site
peak. There is a broad repressive domain of H3K27me3 downstream, which could have an open chromatin structure in another cell type.
The interplay and codependence of
experimental and computational
approaches.
The centrally located yellow box labeled
“Validation of Biological Mechanisms”
depicts the ultimate goal for researchers
studying a biological pathway. Approaches
illustrated from the top of the image
downward depict high-throughput
analyses used to predict transcription
factor binding sites and to determine
functional activity of those elements.
Approaches illustrated from the bottom of
the image upward signify more
conventional “locus-specific” analyses
that start from a narrowly defined
hypothesis of biological function and can
include the use of animal models.
Evolucijske analize genomov
The phylogenetic
inference process
Methods of phylogenomic inference
The flowchart shows steps in the inference
of evolutionary trees from genomic data.
Genomic information is obtained by large-
scale DNA sequencing. In general, sets of
orthologous genes are then assembled from
specific sets of species for phylogenetic
analysis. This homology or orthology
assessment is a crucial step that is almost
always based on simple similarity
comparisons (for example, BLAST
searches). Most methods used for the
subsequent reconstruction of phylogenetic
trees are either sequence-based or are based
on whole-genome features.
Strukturna genomika – analiza SD in CNV
(variabilni genom)
Array-based, genome-wide methods for the
identification of copy-number variants
Funkcionalna analiza genoma – ENCODE (pri človeku),
modENCODE (pri vinski mušici in glisti (C. elegans))
The Organization of the ENCODE Consortium.
(A) Schematic representation of the major methods
that are being used to detect functional elements
(gray boxes), represented on an idealized model of
mammalian chromatin and a mammalian gene.
(B) The overall data flow from the production
groups after reproducibility assessment to the Data
Coordinating Center (UCSC) for public access and
to other public databases. Data analysis is
performed by production groups for quality control
and research, as well as at a cross-Consortium level
for data integration.
modENCODE
(www.modencode.org/)
Finding function in novel targets: C.
elegans as a model organism
Overview of the RNA interference
supported target identification process.
Študij interakcij na genomskem nivoju
•Y1H (protein (TF) - DNA interakcija)
•Y2H (protein-protein interakcije)
•Y3H (RNA-protein interakcija), tudi za detekcijo majhnih ligandov
•membraneY2H (interakcije med membranskimi proteini in naravnim okoljem)
Comparison of experimental protocols. Experiments to
detect different aspects of DNA-binding proteins share
many of the same steps; simplified schematics of the
main steps are shown.
a| Chromatin immunoprecipitation followed by
sequencing (ChIP–seq) for DNA-binding proteins such
as transcription factors. Recent variations on the
standard protocol include using endonuclease digestion
instead of sonication (ChIP–exo) to increase the
resolution of binding-site detection and to eliminate
contaminating DNA, and DNA amplification after ChIP
for samples with limited cells.
b| ChIP–seq for histone modifications uses micrococcal
nuclease (MNase) digestion to fragment DNA and can
also now be run on low-quantity samples when combined
with the additional post-ChIP amplification.
c| DNase–seq relies on digestion by the DNaseI nuclease
to identify regions of nucleosome-depleted open
chromatin where there are binding sites for all types of
factors, but it cannot identify what specific factors are
bound.
d| Formaldehyde-assisted identification of regulatory
elements (FAIRE–seq) similarly identifies nucleosome-
depleted regions by extracting fragmented DNA that is
not crosslinked to nucleosomes.
LinDA, single-tube linear DNA amplification; T7, T7
phage RNA polymerase.
DNaseI footprints correspond to bound
proteins.The distribution of DNaseI digestion sites with
DNaseI hypersensitive regions is not uniform; peaks
and troughs occur in the signal, where troughs are
due to the protection of DNA sequences by bound
proteins. Transcription factor binding motif databases
such as JASPAR can be searched using the sequence
from each footprint to predict what factor is bound.
Shown here are data from the proximal promoter
region of the human fragile-X mental retardation 1
(FMR1) gene, with motif-matching results for one
footprint indicating that potentially bound factors are
interferon regulatory factor 1 (IRF1) or IRF2. DNaseI
footprints had been identified previously at this locus
in lymphoblastoid cells. More recent data from
DNase–seq was used to recapitulate these results in a
single experiment.
Detecting chromatin interactions.In three-dimensional space, distal genomic regions
on the same or different chromosomes interact, and
this can be mediated by one or more DNA-binding
proteins.
a| Chromatin conformation capture experiments use
a ligation step to join distant fragments that are
interacting in three-dimensional chromatin space,
thus providing information on possible targets for
DNA-bound proteins.
b| Chromatin interaction analysis with paired-end
tag sequencing (ChIA-PET) similarly detects
chromatin interactions using a ligation step to pair
non-adjacent interacting regions. However, ChIA-
PET uses a chromatin immunoprecipitation (ChIP)
step to more specifically identify interactions with a
particular bound protein, such as RNA polymerase
II. It should be noted that the DNA that is actually
sequenced as part of the paired-end sequencing does
not necessarily correspond to the precise region of
interaction but is dictated by the presence of restriction
enzyme targets.
Identification of protein–DNA
interactions using the ChIP-chip
approach.
Transcription factors are cross-linked
in vivo to their binding sites, sonicated
and DNA fragments that are covalently
bound to a transcription factor are
enriched and purified by
immunoprecipitation. The isolated DNA
is subsequently tagged by fluorescence
labels and hybridized on a DNA
microarray, thereby allowing the
identification of genomic regulatory
DNA elements. Control experiments are
carried out to detect non-specific
background.
Methods to analyze transcription factor function.
(A) The modular structure of TFs. Domains which function in
activation, repression, dimerization or protein–protein
interaction, and DNA-binding are color coded in red, green,
yellow, or blue, respectively.
(B) Overview of methods which can be used to elucidate TF
functions.
For details see text. Y2H, yeast two-hybrid; Y1H, yeast one-
hybrid; B1H, bacterial one-hybrid; P2H, protoplast two-hybrid;
BiFC, bimolecular fluorescence complementation; FRET,
fluorescence resonance energy transfer; Co-IP, co-immuno-
precipitation; PTA, protoplast transactivation; EMSA,
electrophoretic mobility shift assay; SELEX, systematic
evolution of ligands by exponential enrichment; DamID, DNA
adenine methylation identification; ChIP, chromatin immuno-
precipitation; ChIP-chip, ChIP combined with tilling array
technology; ChIP-seq, ChIP combined with sequencing of
immunoprecipitated DNA fragments; CRES-T, chimeric
repressor gene silencing technology; RNAi, RNA interference;
TSS, Transcriptional start site.
Naslednja generacija sekveniranja in njena uporaba
High-throughput sequencing technologies.The read length, the number of reads and the total amount of sequence generated in a
typical run are indicated for each of the new-generation high-throughput sequencing
technologies. a| 454 GS FLX pyrosequencing. Oligonucleotide adaptors are ligated
to fragmented DNA and immobilized to the surface of microscopic beads before
PCR amplification in an oil–droplet emulsion. Beads are isolated in picolitre wells
and incubated with dNTPs, DNA polymerase and beads bearing enzymes for the
chemiluminescent reaction. Incorporation of a nucleotide into the complementary
strand releases pyrophosphate, which is used to produce ATP. This, in turn, provides
the energy for the generation of light. The light emitted is recorded as an image for
analysis. b| SOLiD sample preparation is similar to that of 454 pyrosequencing.
After amplification, the beads are immobilized onto a custom substrate. A primer
that is complementary to the adaptor sequence (green), random oligonucleotides with
known 3' dinucleotides and a corresponding fluorophore are hybridized sequentially
along the sequence and image data collected. After five repeats, the complementary
strand is melted away and a new primer is added to the adaptor sequence, ending at a
position one nucleotide upstream of the previous primer. Second-strand synthesis is
repeated, allowing two-colour encoding and double reading of each of the target
nucleotides. Repeats of these cycles ensure that nucleotides in the gap between
known dinucleotides are read. Knowledge of the first base in the adaptor reveals the
dinucleotide using the colour-space scheme. In other words, knowing that the last
adaptor nucleotide is T and the colour is red means that the first base to be sequenced
must be A. Knowing that the first base is A and the colour is green means that the
next base must be C and so on. c| For Solexa GA sequencing, adaptors are ligated
onto DNA and used to anchor the fragments to a prepared substrate. Fold-back PCR
results in isolated spots of DNA of a large enough quantity that the amassed
fluorophore can be detected. Terminator nucleotides and DNA polymerase are then
used to create complementary-strand DNA. Images are collected at the end of each
cycle before the terminator is removed. d| Heliscope sequencing immobilizes
unamplified DNA with ligated adaptors to a substrate. Each species of dNTP with a
bright fluorophore attached is used sequentially to create second-strand DNA; a
'virtual terminator' prevents the inclusion of more than one nucleotide per strand and
cycle, and background signal is reduced by removal of 'used' fluorophore at the start
of each cycle. e| The new sequencing method developed by Pacific Biosciences
occurs in zeptolitre wells that contain an immobilized DNA polymerase. DNA and
dNTPs are added for synthesis. Fluorophores are cleaved from the complementary
strand as it grows and diffuse away, allowing single nucleotides to be read.
Continuous detection of fluorescence in the detection volume and high dNTP
concentration allow extremely fast and long reading.
Sequencing protocols are the molecular steps by which specific information in these cells is captured and
transformed into a population of adaptor-flanked DNA fragments for sequencing. Future applications of
sequencing may arise through new combinations of steps, the introduction of new steps or entirely new
approaches to sequencing
High-throughput sequencing platforms.
The schematic shows the main high-
throughput sequencing platforms available
to microbiologists today, and the
associated sample preparation and
template amplification procedures.
Comparison of next-generation sequencing platforms
Sequencing technologies and their uses.Various NGS methods can precisely map and quantify
chromatin features, DNA modifications and several
specific steps in the cascade of information from
transcription to translation. These technologies can be
applied in a variety of medically relevant settings,
including uncovering regulatory mechanisms and
expression profiles that distinguish normal and cancer
cells, and identifying disease biomarkers, particularly
regulatory variants that fall outside of protein coding
regions. Together, these methods can be used for
integrated personal omics profiling to map all
regulatory and functional elements in an individual.
Using this basal profile, dynamics of the various
components can be studied in the context of disease,
infection, treatment options, and so on. Such studies will
be the cornerstone of personalized and predictive
medicine.
A High Diversity of Next-Generation or Deep Sequencing Approaches Is Currently Available for Profiling Genomes,
Epigenomes, Methylomes, and Transcriptomes. A plethora of deep sequencing approaches are now available, ranging from
approaches to map the primary sequence of DNA (whole-genome-seq and exome-seq), map DNA methylation marks (meDIP-
seq, 5-hmC-seq, and many others), profile chromatin structure (MNase-seq, DNase-seq, and FAIRE-seq), profile all the
different stages of the transcriptome (GRO-seq, RNA-seq, and ribo-seq), profile transcription factors, cofactors, and histone
marks (ChIP-seq), profile RNA interactions to the genome or the transcriptome (ChIRP-seq and CLIP-seq and variants), and
finally profile the structure of the genome in the tridimensional space (ChIA-PET, HiC, and several others). All these
approaches are now available for the neurobiology community and are primed to revolutionize the field.
Next-generation DNA sequencing of paired-end tags
(PET) for transcriptome and genome analyses. PET
applications to address genome biology questions.
Cells have many different mechanisms for processing,
modifying, controlling, and transducing information
encoded in the genome. The PET technology can be
applied to investigate many questions regarding
nuclear processes, such as transcriptomes by RNA-
PET, transcription and epigenetic regulation by ChIP-
PET and ChIA-PET, as well as genome structural
variation by DNA-PET. Examples of PET data from
GIS-PET (an early version of RNA-PET), ChIP-PET, and
ChIA-PET experiments of human breast cancer MCF-7
cells with estrogen induction treatment at the TFF1 locus
(chr21:42,653,000-42,673,000) are shown: the high level
of TFF1 gene expression and the low level of TMPRSS3
gene expression; the ERα binding at the TFF1 promoter
and enhancer sites; and the long-range chromatin
interactions between the two ERα binding sites. An
example of DNA-PET data at the TNFRSF14 locus in the
genome of MCF-7 cells shows an inversion event
detected by two clusters of discordant DNA-PET cluster
mapping.
High-throughput sequencing: HTS or HT-Seq
Overview of NGS-based analysis strategies.
Primary analysis: This part describes analyses steps that are based directly on
the reads and are physically derived from sequence comparisons: CNV,
chromosomal InDels are Insertions or Deletions (including translocations), SNP
annotation regards SNPs already known and described (e.g. in dbSNP) while de
novo SNP detection results from alleles detected via multiple sequencing
coverage of the SNP position. Alternative splice sites can be detected via
mapping to a splice junction library or by direct genomic alignment of exon
spanning reads (when reads or clusters are longer than 50 nucleotides), new
transcripts/loci are derived by direct mapping of novel exons and splice-
overlaps. Downstream analysis: This part differs for the three major application
areas: genomic DNA-seq is genomic resequencing employed in genome-wide-
association-studies (GWAS), the definition of Haplotypes and tumor typing
usually via tumor-specific chromosomal InDels. ChIP-Seq determines genome-
wide patterns of modified chromatin such as histone methylation or acetylation
status as well as binding regions for DNA-binding proteins, usually DNA-
dependent RNA polymerases or TFs leading to the definition of patterns such as
TFBSs. RNA-Seq determines the the genome-wide expression of known as well
as unknown transcripts, which can be identified by mapping of the RNA-Seq
sequence tags to the genome and the transcriptome. This will also identify most
splice variants. All three strategies converge into the biology-oriented
downstream analysis involving identification of pathways, cis-regulatory
modules and regulatory networks, which also involves the integration of prior
knowledge as depicted in the flanking database areas in addition to the
experimental data. Finally, meta-analysis allows merging of several lines of
evidence [NGS results and other results (e.g. proteomics, metabolomics, etc)]
into a more complete description of the underlying biology, via network
reconstruction, multiple correlations of various lines of evidence (such as histone
modifications, pol II binding and transcription rates) and the cross examination
of multiple experiments such as transcriptional profiles from several patients.
How we are getting there: a subway map
of sequencing technology.Despite the disparate goals of different sequencing
experiments, the great variety of sequencing
experiments is a result of distinct combinations
of a relatively small set of core techniques,
which are represented as open circles or
‘stations’. Like subway lines, individual
sequencing experiments move from station to
station, until they ultimately arrive at a common
terminal: DNA sequencing. For example, the
initial demonstration of Hi-C was a comparative
experiment that progressed through cell culture,
cross-linking, proximity ligation, mechanical
shearing, affinity purification, adaptor ligation and
PCR amplification, before finally arriving at
sequencing. Other examples shown correspond to
sequencing applications in Table 1. For visual
clarity, not all stations and routes are shown. New
routes are being added regularly. TrAP, translating
ribosome affinity purification.
Strategies for next-generation sequencing in cancer. This schematic demonstrates the potential strategies for the
application of next-generation sequencing (NGS) in clinical oncology testing and research. Whole-genome sequencing
evaluates the entire genome and includes both gene-coding and non-coding regions. Exome sequencing uses baits to
hybridize and capture corresponding regions of the genome, focusing on the coding regions of the genome. Exome
sequencing can include the whole exome (about 20,000 genes), comprising just over 1% of the genome; alternatively, it can
focus on a panel of genes (hundreds of genes or more). Amplicon-based sequencing utilizes PCR amplification to isolate a
smaller region for sequencing. Transcriptome sequencing or RNAseq evaluates the expressed RNA and can be used to
measure gene expression, splice variants and nominate candidate gene fusions. Similar to exome sequencing, complementary
baits can be used to hybridize and capture portions of the transcriptome to focus on selected genes of interest.
NGS transforms today’s biology
● Genome sequencing
● Comparative genomics
● Mutation discovery
● Gene expression
● DNA barcoding
● Metagenomics
● Epigenomics
• 2nd and 3rd generation sequencing instruments are
revolutionizing biological research.
• Earliest impacts have been on cancer genomics and
metagenomics.
• The extreme need for bioinformatics-based analytical
approaches to interpret these large data sets has
revitalized the field and introduced statistical and
mathematical rigor.
• Integration across data sets from DNA, RNA,
methylation, proteomics, etc. presents the next challenge
but provides comprehensive analytical power to inform
biology.
• With newer instruments, clinical applications have
potential for implementation, with appropriate interpretive
algorithms.
1D in 3D analiza genomov
Dimensionality of the genome.The understanding of the human
genome has expanded with
advances of sequencing
technologies, from (A) 1D
sequencing of the human genome to
(B) 2D mapping of structural
variants (SVs) using methods such as
paired-end sequencing, (C) 3D
genome wide chromosomal
conformation capture using ChIA-
PET and Hi-C, and (D) four
dimensions across time.
1D genome analysis. Mapping various genomic and
chromatin features along chromosomes yields 1D data. The
top part of the figure shows a screenshot of the Genome
browser (http://genome.ucsc.edu/), presenting genome
tracks for the ENCODE region Enm005. All data were
generated by the ENCODE pilot project and are publicly
available through the UCSC genome browser. Methods
used to collect data are indicated on the left.
Tracks indicate the presence of genes (GENCODE genes);
the presence of DNase-I-hypersensitive sites; the level of
RNA (expression profiling data from HeLa cells); the level
of acetylated H3 (H3Ac; generated by ChIP-chip); and
levels of RNA polymerase II (Pol2) binding, and levels of
acetylated H4 (H4Ac) and methylated H3 (H3K27me3) in
HeLa cells as determined by ChIP-chip.
The bottom part of the figure shows an analysis of
chromosomal domains, by using a combination of wavelet
analysis and Hidden-Markov-Model segmentation of 1D
datasets, including replication timing, DNase I
hypersensitivity and histone modifications.
3D genome analysis. The spatial organization of genomes can be studied
using single-cell methods or using population-based methods, and at different
resolutions or length scales (A-D).
(A) A hypothetical pair of metaphase chromosomes. 1D compartmentalization is
indicated: constitutive heterochromatin domains include the centromere (cen),
pericentromeric heterochromatin (subcen het) and telomeres (tel). Chromosome arms
further consist of alternating active and repressed domains (indicated by different
colors). Numbers indicate (chromosomal) regions to be analysed by 3D methods in
panels B, C and D.
(B) Spatial organization of chromosomes shown in A in the interphase nucleus.
Chromosomal regions that are located far apart on the same chromosome (2 and 3) or
located on different chromosomes (1 and 2, 1 and 3) can colocalize in 3D to form
spatial compartments.
(C) A higher-resolution (Mb scale) analysis of cis and trans associations of
chromosomal regions 1 to 3 shown in B. At this resolution, associations of groups of
genes can be detected surrounding subnuclear structures such as transcription
factories (green circles) and splicing bodies [characterized by the presence of the
splicing factor SC35 (red)]. For example, a trans interaction between regions 1 and 2
can occur through colocalization to the same transcription factory or to two different
transcription factories that are both associated with one SC35 granule.
(D) High-resolution (Kb scale) analysis of 3D folding and long-range associations
that can be studied using 3C-based methods. At this scale, specific looping
interactions can be detected between genes and regulatory elements. This scheme
provides an example of a 3C analysis in which the interaction probability of a single
defined genomic element is mapped throughout the larger region 3 (right). Peaks in this
3C map indicate long-range interactions that suggest a looped conformation
(indicated on the left).
Schematic representation of 3C-based
methods.There are many methods derived from the original
3C design. Here, we present a few popular
methods. In brief, cells are cross-linked, and
chromatin is digested by restriction enzymes or
sonicated. The structures of protein complexes
containing DNA are preserved. These complexes
are then diluted to a very low concentration, and
ligation reactions are performed. Different
amplification strategies are used to measure the
relative cross-linking efficiency between loci.
3C is used to detect one specific interaction.
4C detects all possible interacting regions of one
given locus.
5C and HiC provide “many-to-many” interacting
efficiencies in a large genomic region or the whole
genome.
ChIA-PET includes immunoprecipitation to
specifically examine the long-range interactions
associated with a specific protein.3C = Chromosome conformation capture is a high-throughput molecular
biology technique used to analyze the organization of chromosomes in a cell's
natural state. Studying the structural properties and spatial organization of
chromosomes is important for the understanding and evaluation of the
regulation of gene expression, DNA replication and repair, and recombination.
Three-dimensional interpretation (left) of
regulatory and transcriptional complexity in one-
dimensional genome representation (right).
(A) The genome forms large complex clusters and
introspective folded clusters with specialized
transcription compartments. Each of these clusters
correlates to a collection of transcripts and
“background” ChIP-seq enrichment.
(B) Within each cluster the genome is folded to
associate with subnuclear structures containing
transcription factors and machinery, splicing, and
other accessory proteins. These associations
coregulate genes to generate interleaved complex
transcriptional networks of coding (blue) and
noncoding transcripts (green). Proximal cross-linking
with ChIP-seq results in a complex landscape of
enrichment across loci that reflect the folded genome
structure.
(C) Within each gene, local dynamic chromatin
folding determines the association of alternative
promoters and local noncoding RNAs with a shared
regulatory architecture, thereby mediating
coregulated gene expression.
Hype cycles in genome analysis (Biosciences)
Hype cycles
The Hype Cycle is a branded graphical tool developed and used by IT research and advisory firm Gartner for representing the maturity, adoption
and social application of specific technologies. Hype in new media (in the more general media sense of the term "hype") plays a large part in the
adoption of new media forms by society. Terry Flew states that hype (generally the enthusiastic and strong feeling around new forms of media
and technology in which people expect everything will be modified for the better surrounding new media technologies and their popularization,
along with the development of the Internet, is a common characteristic. But following shortly after the period of 'inflated expectations', as per the
diagram above, the new media technologies quickly fall into a period of disenchantment, which is the end of the primary, and strongest, phase of
hype.