Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
Ruslan Kalendar
http://www.helsinki.fi/people/Ruslan.Kalendar/
DNA, PCR primer design, FastPCR and WEB tools
e-mail: [email protected]
PCR primer design is a critical step in all types of PCR methods
Design of sub-optimal PCR primers can lead to non-specific amplification or no amplification at all.
The adaptation of PCR for different applications has made it necessary to develop new criteria for PCR primer and probe design to cover uses such as RT-PCR, qPCR, group-specific PCR, unique PCR, multiplex PCR, polymerase extension PCR, multi-fragments assembly cloning (OE-PCR) and others such as PCR with TaqManTM or molecular beacon.
PCR primers design factors
1. High priming efficiencyto achieve this, a good primer should be- of a reasonably high Tm,- without dimers, especially on their 3’-ends (to prevent self-extension),- without hairpin stems, especially on their 3’-ends (to prevent self-priming),- lacking repetitive sequences to ensure quick and correct annealing.- all primers/probes in one incubation mixture should not form significant 3’-
dimers between each other.
2. High specificityto achieve this, a good primer should be- long enough to increase specificity,- unique, especially at its 3’-end, to avoid false priming,- moderately stable at its 3’-end (as opposed to highly GC-rich) to ensure that a very short fragment won’t initialize the extension (too low 3’-end stability hurts the priming efficiency).
1. The primary nucleotide sequence of the primer determines linguistic complexity (nucleotide arrangement and composition) and specificity by the uniqueness of the primer 3’ end;
2. Primer melting temperature (dH and dS) and the melting temperature of the primer 3’ end (stability at the 3′ end in primer template complexes will improve the polymerization efficiency);
3. Minimise intra- and inter-primer interactions to avoid primer-dimer formation, including the alternative hydrogen binding to Watson-Crick base pairing.
PCR primers design factors
Sequence linguistic complexity (LC) is a measure of the richness of the nucleotide sequence vocabulary.
The complexity values were converted to a percentage value, in which 100% means maximal ‘vocabulary richness’ of a sequence.
Primer sequence LC, %
5’-AAAAAAAAAAAAAAAAAAAAA 8
5’-ACACACACACACACACACACA 15
5’-TTTTTTTTTTGGGGGGGGGAG 36
5’-GCTACCAATGAGAAGGTCACGT 98
5’-TGTTCTCCCATAGCACAAGAGGA 98
5’-TGGCTATTCTGAACCAGCGTTGC 100
Sequence linguistic complexity, specificity, the uniqueness of the primer
Sequence linguistic complexity
Kalendar R, Lee D, Schulman AH 2011. Java web tools for PCR, in silico PCR, and oligonucleotide assembly and analysis. Genomics, 98(2):
137-144.
E
x
LC
L
i
i
1
100
(%)
L
iii
i
is
isisE
1 14,4
14,1
1)
3(log4
sL
Linguistic complexity (LC) values for sequence length (s) are converted to percentages, 100% being the highest level:
, where
.
Melting temperature (Tm) calculation
Generally, sequences with higher fraction of G-C base pairs, have a
higher Tm than do AT-rich sequences.
However, the Tm of an oligo is not simply the sum of AT and GC
base content.
Over the past 50 years, there have been a large number of
alternative methods for predicting DNA duplex Tm that have been
published.
The simplest equation based on base content is the “Wallace rule”:
)(2)( CGLCTm
Wallace RB, Shaffer J, Murphy RF, Bonner J, Hirose T, Itakura K (1979) Hybridization of synthetic oligodeoxyribonucleotides to ΦX 174 DNA:
the effect of single base pair mismatch. Nucleic Acids Research 6 (11):3543-3558.
Melting temperature (Tm) calculation
A somewhat more advanced base content model, salt-adjusted Tm
calculation:
L
CGKCTm
528)(41log7.111.77)( 10
where L is the length of the hybrid duplex in base pairs.
Nonetheless, this formula do not include bimolecular initiation that is
present in oligonucleotides, does not account for sequence dependent
effects, and does not account for terminal end effects that are present
in oligonucleotide duplexes.
Thus, this equation works well for DNA polymers, where sequence-
dependent effects are averaged out, and long duplexes (greater than
40 base pairs) but breaks down for short oligonucleotide duplexes that
are typically used for PCR.
von Ahsen N, Wittwer CT, Schutz E (2001) Oligonucleotide melting temperatures under PCR conditions: Nearest-neighbor corrections for Mg2+ ,
deoxynucleotide triphosphate, and dimethyl sulfoxide concentrations with comparison to alternative empirical formulas. Clin Chem 47 (11):1956-
1961
Melting temperature (Tm) calculation
Base stacking interactions must also be taken into account such that
the actual specific sequence must be known to accurately predict Tm.
The effects of neighbouring bases as contributed through base
stacking are called “nearest neighbour effects” which are
mathematically accounted for by calculations made using
experimentally determined nearest neighbour (NN) thermodynamic
parameters.
where dH is enthalpy for helix formation; dS is entropy for helix
formation;
R is molar gas constant (1.987cal/K mol); c is the nucleic acid molar
concentration.
Melting temperature (Tm) calculation
To understand complexity of duplex stability calculations it is easier to explain it by
showing how free energy (DG, another measure of DNA helix stability), as the
formula is simpler:
DG = DH – TDS
(T is the temperature in °K).
Here’s the example of DG calculation of a 4-mer TCGA, hybridizing to a longer strand
like this:
Now, the DG can be calculated like this surprisingly long formula:
DG (TC) + DG (CG) + DG (GA) + initiation DG for T + initiation DG for A + DG of 3’-
dangling end AC + DG of 5’-dangling end GT
sequence CG% Empirical Tm Thermodynamic Tm
(dS, dH)
CACACACACACACACACACA 50 54.0 56.1
AAAAAAAAAACCCCCCCCCC 50 54.0 57.5
25
30
35
40
45
50
0 50 100 150 200 250 300 350
TmComparison of different Tm calculations (18 nt)
Nearest Neighbour Thermodynamic Parameters (J.J.SantaLucia,1998)
0.41CG% + 75.1 + 11.7lg[K+] - 528/L
A-form and the B-A transition
The B-form is the most frequently observed conformation of DNA.
The base pairs in the B-form are perpendicular to the double-helix
axis.
As (A + T) content increases, the negative band becomes deeper
and conformational variability increases.
Poly[d(A)]·poly[d(T)] and poly[d(G)]·poly[d(C)] adopt unusual B-
forms.
http://fbio.uh.cu/sites/genmol/adic/na_arch.htm
A-form and the B–A transition
A-form is a constitutive conformation of RNA or RNA/DNA hybrid.
DNA adopts the A-form in aqueous ethanol and other solutions.
Some molecules of DNA (e.g. poly[d(A)]·poly[d(T)]) do not adopt the
A-form conformation at all.
Others [e.g. (G + C) rich DNA fragments] exhibit A-form features
even in aqueous solution.
The Z-form and the B–Z transition
The base pairs in the Z-DNA (left-handed) double helix have an
opposite orientation with respect to the backbone than the B- and the
A-forms.
As with the B-form, there are several variants of the Z-form.
In contrast to the B–A transition, the B–Z transition is slow.
This is connected with the base pair flip that is required during the
transition, which is a kinetically difficult process.
Hydrogen bonding in Watson–Crick and Hoogsteen base pairs
(A) Watson–Crick A·T and G·C base
pairs.
(B) Hoogsteen base pair formation
between adenine and thymine,
guanine and cytosine.
Hydrogen bonding in Watson–Crick and Hoogsteen base pairs
Some structures showing possible reasons why large DNA (L-DNA) constructs do not self-assemble from more
than approximately a dozen synthetic single-stranded DNA oligonucleotides.
Top from left to right: The presence of strong (C:G) and weak (T:A) nucleobase pairs complicates the design of
self-assembling fragments. G-quartets can arise from G-rich sequences, with major groove interactions involving
hydrogen bonding to the “Hoogsteen edge” of purines. Wobble pairing can compete with Watson–Crickery.
Bottom. Even if Watson–Crickery were the only way for single stranded DNA sequences to interact, the low
information density of four-nucleotide DNA allows easy off-target hybridization and unimolecular hairpin formation.
Unimolecular processes (such as hairpin formation) compete with the desired intermolecular hybridization,
especially at low concentrations of oligonucleotide.
Watson–Crick pairing rules follow two rules of complementarity
(a) size complementarity (large purines pair with small pyrimidines) and
(b) hydrogen bonding complementarity (hydrogen bond acceptors, A, pair with hydrogen bond donors D).
Rearranging donor and acceptor groups on the nucleobases creates an artificially expanded genetic
information system (AEGIS), whose components can independently pair. AEGIS adds information
density to the DNA oligonucleotides, thereby diminishing off-target hybridization and other undesired
aggregation/folding motifs. With strength comparable to the G:C pair, AEGIS components
form S:B, Z:P, V:J, andK:X pairs.
Hydrogen bonding in Watson–Crick and Hoogsteen base pairs
Quadruplex structures
Four guanines can hydrogen bond in a square arrangement to form a G-quartet, with
a Hoogsteen G–G pairing pattern.
It also allows formation of secondary structures of G-rich single stranded DNA and
RNA called G-quadruplexes (G4-DNA and G4-RNA) at least in vitro.
It needs four triplets of G, separated by short spacers. This permits assembly of
planar quartets which are composed of stacked associations of Hoogsteen bonded
guanines.
Hoogsteen base-pairing
Circular dichroic (CD) spectroscopy spectra of quadruplexes
Kypr J, Kejnovska I, Renciuk D, Vorlickova M (2009) Circular dichroism and conformational polymorphism of DNA. Nucleic Acids Research 37
(6):1713-1725.
CD spectra of guanine quadruplexes
(A) Time-dependent formation of a parallel-stranded
quadruplex of d(G4) stabilized by 16 mM K+.
(B) Na+-induced formation of an anti-parallel
bimolecular quadruplex of d(G4T4G4). The
triangles in the sketches indicate guanines and
point in the 5′–3′ direction. The G-tetrad is
shown in the middle.
(C) CD spectra reflecting the acid-induced transition
of a DNA fragment
d(TCCCCACCTTCCCCACCCTCCCCACCCTC
CCCA) of a c-myc human oncogene, into an
intercalated cytosine quadruplex.
The triangles in the sketch indicate cytosines and
point in the 5′–3′ direction. The C+·C pair is shown
in the insert.
Cytosine quadruplexes
DNA strands rich in cytosine also generate quadruplexes.
These consist of two parallel homoduplexes connected through hemi-protonated
C·C+ pairs.
The triangles in the sketch indicate cytosines and point in the
5′–3′ direction. The C+·C pair is shown in the insert.
The two duplexes are mutually intercalated in an anti-parallel
orientation. Therefore, these structures are called intercalated
or i-tetraplexes.
Formation of cytosine quadruplexes is promoted by slightly
acid pH, which is needed for C·C+ pair hemiprotonation.
Similar to guanine quadruplexes, intermolecular cytosine
quadruplexes are formed with slow kinetics.
(C) CD spectra reflecting the acid-induced transition of a DNA
fragment
d(TCCCCACCTTCCCCACCCTCCCCACCCTCCCCA).
DNA fragments rich in guanine and adenine
DNA fragments rich in guanine and adenine exhibit cooperatively
melting conformers that differ from classical structures.
The first conformer is an anti-parallel homoduplex containing G·A
pairs.
The second conformer of the alternating GA sequence is a parallel
homoduplex with G·G and A·A pairs, similar to that of parallel
guanine quadruplexes but with smaller positive amplitudes.
This indicates that the guanine–guanine stacking does not change
significantly and that the duplex formation is mediated by inter-
strand adenine–adenine interactions.
Ethanol and even dimethylsulphoxide (DMSO), both DNA
denaturing agents, stabilize the single-stranded ordered
d(GA)n structure as does acid pH.
Primer-dimer detection criteria
• 3’-end and internal primer-dimers detection;
• Tm prediction for prime- dimers with mismatches for standard and
degenerate oligonucleotide bases (B, D, H, K, M, N, R, S, V, W, Y) and
for modifications (inosine, uridine or LNA) using nearest neighbour
thermodynamic parameters;
• For non Watson-Crick base pairs, like Hoogsteen base-pairing: base
triads in a DNA triple helix structure and G-quadruplexes.
Primer dimer detection criteria
Internal single mismatches
Base pairing stability in order of decreasing stability:
G-C > A-T > G·G > G·T ≥ G·A > T·T ≥ A·A > T·C ≥ A·C ≥ C·C
Guanine is the most universal base, since it forms the strongest base pair and the strongest mismatches.
On the other hand, cytosine is the most discriminating base, since it forms the strongest pair and the three weakest mismatches.
Designing Forward and Reverse primers to have matching Tm’s is the best strategy to optimize PCR
The Tm is the temperature at which half the primer strands are bound to target.
The PCR annealing temperature is typically chosen to be 10°C below the Tm.
However, different primers have different dH of binding, which results in different slopes at the Tm of the melting transition.
Thus, the hybridization behaviour at the Tm is not the same as the behaviour at the annealing temperature.
The quantity that is important for PCR design is the amount of primer bound to target at the annealing temperature.
If the primers have an equal concentration of binding, then they will be equally extended by DNA polymerase, resulting in efficient amplification.
Differences in primer binding are amplified with each cycle of PCR, thereby reducing the amplification efficiency and providing opportunity for artifacts to develop.
SantaLucia J, Jr. (2007) Physical principles and visual-OMP software for optimal PCR design. Methods Mol Biol 402:3-34.
Illustration of hybridization profiles of primers with two different design strategies.
In the left panel, the Tm ’s are matched at 68.6°C, but at the annealing temperature of 58°C, primer B(squares) binds 87% and primer A (diamonds) binds 97%.
This would lead to unequal hybridization and polymerase extension, thus reducing the efficiency of PCR. In the right panel, the dG at 58°C of the two primers is matched by redesigning primer B.
The result is that both primers are now 97% bound, and thus optimal PCR efficiency would be observed. Notice that the Tm ’s of the two primers are not equal in the right panel.
Designing Forward and Reverse primers to have matching Tm’s is the best strategy to optimize PCR
Primer quality (virtual PCR efficiency) determination
An abstract parameter called Primer Quality (PQ) that can help to estimate the efficiency of primers for PCR.
PQ is calculated by the consecutive summation of the points according to the following parameters:
total sequence and purine–pyrimidine sequence complexity, the melting temperatures of the whole primer and of the terminal 3′ and 5′;
self-complementarity, which gives rise to possible dimer and hairpin structures, reduces the final value.
Almost all quality “excellent” primers can be use at the annealing temperature from 68°C to 72°C without losing PCR efficiency and show stable efficient amplification in a range of PCR annealing temperature.
Higher quality primers are not only better for PCR efficiency but are also more immune to changing PCR conditions.
Primer quality (virtual PCR efficiency) determination
Primer design selection criteria
Criteria Range Ideal
Length (nt) >11 >20
Tm range (C)a 45 – 75 60 – 68
Tma 12 bases at 3’-end 34 – 48 41 – 47
GC (%) 30 – 70 50
3’-end composition (5’-nnn-3’) nnn ssa, sws, wss
Sequence linguistic complexity (LC, %) b 50 – 100 >95
Sequence Quality (PQ, %) 50 – 100 >95
a Nearest neighbour thermodynamic parameters. b Sequence linguistic complexity measurement was performed using the alphabet-capacity l-gram method.
The secondary (non-specific) binding test
The specificity of the oligonucleotides is one of the most important factors for good PCR;
optimal primers should hybridize only to the target sequence, particularly when complex genomic DNA is used as the template.
Amplification problems can arise due to primers annealing to repetitious sequences (retrotransposons, DNA transposons, or tandem repeats).
Comparison of primer design and oligonucleotide analysis tools
+ Feature supported, and
- Feature not supported
Features Primer-BLAST (Primer3) IDT SciTools: PrimerQuest,
OligoAnalyzer 3.1
PerlPrimer BiSearch Web server PrimerDigital Web Tools
Primer or probe design, length (nt) 15-30 16-35 12-30 10-35 12-500
Limit for sequence length (nt) 50,000 no limit no limit 5,000 no limit
Relative calculation speed quick slow slow very slow very quick
Multiple templates (sequences or primers) and multiple targets inside each sequence
- - - - +
Individual options for each sequence + + + + + Degenerate nucleotides in all operations (Tm calculation, searches and probe, primer design, etc.)
- + - + +
LNA and other nucleotide modifications - + - - + High-throughput runs enabled - - - - + Calculation of optimal annealing temperature
- - - - +
Primer's 3'-end cross and self-dimers - + + + + G-quadruplex detection - - - - + BLAST search + - + + -internal sequence test - - - - + external (specific library) test + - + + + Multiplex with pair primers and/or single primers
- - - - +
in silico for multiple sequences and primers - - - + +
Universal and unique - - - - + Inverted and circular sequences - - - - + Bisulphite modification assays and in silico - - + + +
Polymerase Extension multi-fragment assembly cloning
- - - - +
Oligonucleotide assembly for LCR - - - - +
Comparison Primer Quality with some on-line software
PCR primers design software, in silico PCR, and oligonucleotide assembly and analysis WEB tools
http://primerdigital.com/tools/
Analyze Features:
- general information
-oligonucleotide (Tm, dG, dS and dH), melting temperature calculation for standard and degenerate oligonucleotides including LNA and other modifications;
- evaluation of PCR efficiency;
- linguistic complexity;
- dimer and G/C-quadruplex detection; - dilution and resuspension calculator.
FastPCR is an integrated tool for PCR primers or probe design, in silico PCR, oligonucleotide assembly and analyses, alignment and repeat searching
http://primerdigital.com/fastpcr.html
The FastPCR software is an integrated tools environment that provides comprehensive and professional facilities for designing any kind of PCR primers for standard, long distance, inverse, real-time PCR (LUX and self-reporting), multiplex PCR, group-specific (universal primers for phylogenetically related DNA sequences) and unique (specific primers for each from phylogenetically related DNA sequences), overlap extension PCR (OE-PCR) multi-fragments assembling cloning; single primer PCR (design of PCR primers from close located inverted repeat), automatically detecting SSR loci and direct PCR primer design, amino acid sequence degenerate PCR, Polymerase Chain Assembly (PCA) and much more.
The software utilizes combinations of normal and degenerated primers for all tools and for the melting temperature calculation are based on the nearest neighbourthermodynamic parameters.
FastPCR is an integrated tool for PCR primers or probe design, in silico PCR, oligonucleotide assembly and analyses, alignment and repeat searching
The “in silico” (virtual) PCR primers or probe searching or in silico PCR against whole genome(s) or a list of chromosome - prediction of probable PCR products and search of potential mismatching location of the specified primers or probes. The “in silico” oligonucleotide search is helpful for discovering target binding sites with the temperature melting and PCR annealing temperature calculation.
A long oligonucleotide can be designed for microarray analyses and dual-labeled oligonucleotides for probes such as molecular beacons.
Comprehensive primer test, the melting temperature calculation for standard and degenerate oligonucleotides, primer's PCR efficiency and linguistic complexity, dilution and resuspension calculator.
Primers (probes) are analyzed for all primer secondary structures including G-quadruplexes detection (Hoogsteen base pairs), hairpins, self-dimers and cross-dimers in primer pairs.
FastPCR is an integrated tool for PCR primers or probe design, in silico PCR, oligonucleotide assembly and analyses, alignment and repeat searching
http://primerdigital.com/fastpcr.html
FastPCR has the capacity to handle long sequences and sets of nucleic acid or protein sequences and it allowed the individual task and parameters for each given sequences and joining several different tasks for single run. It also allows sequence editing and databases analysis.
Efficient and complete detection of various types of repeats developed and applied to the program with a visualisation.
The program includes various bioinformatics tools for analysis of sequences with GC or AT skew, CG content and purine-pyrimidine skew, the linguistic sequence complexity; generation random DNA sequence, restriction I-II-III types enzymes and homing endonucleases analysis, find or create restriction enzyme recognition sites for coding sequences and supports the clustering of sequences and consensus sequence generation and sequences similarity and conservancy analysis.
FastPCR is an integrated tool for PCR primers or probe design, in silico PCR, oligonucleotide assembly and analyses, alignment and repeat searching
http://primerdigital.com/fastpcr.html
Alignment algorithm and repeat search. En efficient tool for discovering LTR retrotransposons
Show that 26.5% from 30.4M bases sequence of Arabidopsis thaliana chromosome 1 covering by repeats sequences. The centromere of chromosome 1 is easy to detect in the middle of picture by special structure of the clusters of centromeric repeats, also shown the chromosome duplications.
FastPCR is an integrated tool for PCR primers or probe design, in silico PCR, oligonucleotide assembly and analyses, alignment and repeat searching
FastPCR software
http://primerdigital.com/fastpcr.html
online Java Tools
http://primerdigital.com/tools/
Kalendar R, Lee D, Schulman AH 2011. Java web tools for PCR, in silico PCR, and oligonucleotide assembly and analysis. Genomics, 98(2): 137-144.
Kalendar R, Lee D, Schulman AH 2009. FastPCR Software for PCR Primer and Probe Design and Repeat Search. Genes, Genomes and Genomics, 3 (1): 1-14.