DNA, PCR primer design, FastPCR and WEB toolsThe Z-form and the B–Z transition The base pairs in the Z-DNA (left-handed) double helix have an opposite orientation with respect to

Ruslan Kalendar

http://www.helsinki.fi/people/Ruslan.Kalendar/

DNA, PCR primer design, FastPCR and WEB tools

e-mail: [email protected]

mailto:[email protected]

PCR primer design is a critical step in all types of PCR methods

Design of sub-optimal PCR primers can lead to non-specific amplification or no amplification at all.

The adaptation of PCR for different applications has made it necessary to develop new criteria for PCR primer and probe design to cover uses such as RT-PCR, qPCR, group-specific PCR, unique PCR, multiplex PCR, polymerase extension PCR, multi-fragments assembly cloning (OE-PCR) and others such as PCR with TaqManTM or molecular beacon.

PCR primers design factors

1. High priming efficiencyto achieve this, a good primer should be- of a reasonably high Tm,- without dimers, especially on their 3’-ends (to prevent self-extension),- without hairpin stems, especially on their 3’-ends (to prevent self-priming),- lacking repetitive sequences to ensure quick and correct annealing.- all primers/probes in one incubation mixture should not form significant 3’-

dimers between each other.

2. High specificityto achieve this, a good primer should be- long enough to increase specificity,- unique, especially at its 3’-end, to avoid false priming,- moderately stable at its 3’-end (as opposed to highly GC-rich) to ensure that a very short fragment won’t initialize the extension (too low 3’-end stability hurts the priming efficiency).

1. The primary nucleotide sequence of the primer determines linguistic complexity (nucleotide arrangement and composition) and specificity by the uniqueness of the primer 3’ end;

2. Primer melting temperature (dH and dS) and the melting temperature of the primer 3’ end (stability at the 3′ end in primer template complexes will improve the polymerization efficiency);

3. Minimise intra- and inter-primer interactions to avoid primer-dimer formation, including the alternative hydrogen binding to Watson-Crick base pairing.

PCR primers design factors

Sequence linguistic complexity (LC) is a measure of the richness of the nucleotide sequence vocabulary.

The complexity values were converted to a percentage value, in which 100% means maximal ‘vocabulary richness’ of a sequence.

Primer sequence LC, %

5’-AAAAAAAAAAAAAAAAAAAAA 8

5’-ACACACACACACACACACACA 15

5’-TTTTTTTTTTGGGGGGGGGAG 36

5’-GCTACCAATGAGAAGGTCACGT 98

5’-TGTTCTCCCATAGCACAAGAGGA 98

5’-TGGCTATTCTGAACCAGCGTTGC 100

Sequence linguistic complexity, specificity, the uniqueness of the primer

Sequence linguistic complexity

Kalendar R, Lee D, Schulman AH 2011. Java web tools for PCR, in silico PCR, and oligonucleotide assembly and analysis. Genomics, 98(2):

137-144.

E

x

LC

L

i

i

1

100

(%)

L

iii

i

is

isisE

1 14,4

14,1

1)

3(log4

sL

Linguistic complexity (LC) values for sequence length (s) are converted to percentages, 100% being the highest level:

, where

.

Melting temperature (Tm) calculation

Generally, sequences with higher fraction of G-C base pairs, have a

higher Tm than do AT-rich sequences.

However, the Tm of an oligo is not simply the sum of AT and GC

base content.

Over the past 50 years, there have been a large number of

alternative methods for predicting DNA duplex Tm that have been

published.

The simplest equation based on base content is the “Wallace rule”:

)(2)( CGLCTm

Wallace RB, Shaffer J, Murphy RF, Bonner J, Hirose T, Itakura K (1979) Hybridization of synthetic oligodeoxyribonucleotides to ΦX 174 DNA:

the effect of single base pair mismatch. Nucleic Acids Research 6 (11):3543-3558.


A somewhat more advanced base content model, salt-adjusted Tm

calculation:

L

CGKCTm

528)(41log7.111.77)( 10

where L is the length of the hybrid duplex in base pairs.

Nonetheless, this formula do not include bimolecular initiation that is

present in oligonucleotides, does not account for sequence dependent

effects, and does not account for terminal end effects that are present

in oligonucleotide duplexes.

Thus, this equation works well for DNA polymers, where sequence-

dependent effects are averaged out, and long duplexes (greater than

40 base pairs) but breaks down for short oligonucleotide duplexes that

are typically used for PCR.

von Ahsen N, Wittwer CT, Schutz E (2001) Oligonucleotide melting temperatures under PCR conditions: Nearest-neighbor corrections for Mg2+ ,

deoxynucleotide triphosphate, and dimethyl sulfoxide concentrations with comparison to alternative empirical formulas. Clin Chem 47 (11):1956-

1961


Base stacking interactions must also be taken into account such that

the actual specific sequence must be known to accurately predict Tm.

The effects of neighbouring bases as contributed through base

stacking are called “nearest neighbour effects” which are

mathematically accounted for by calculations made using

experimentally determined nearest neighbour (NN) thermodynamic

parameters.

where dH is enthalpy for helix formation; dS is entropy for helix

formation;

R is molar gas constant (1.987cal/K mol); c is the nucleic acid molar

concentration.


To understand complexity of duplex stability calculations it is easier to explain it by

showing how free energy (DG, another measure of DNA helix stability), as the

formula is simpler:

DG = DH – TDS

(T is the temperature in °K).

Here’s the example of DG calculation of a 4-mer TCGA, hybridizing to a longer strand

like this:

Now, the DG can be calculated like this surprisingly long formula:

DG (TC) + DG (CG) + DG (GA) + initiation DG for T + initiation DG for A + DG of 3’-

dangling end AC + DG of 5’-dangling end GT

sequence CG% Empirical Tm Thermodynamic Tm

(dS, dH)

CACACACACACACACACACA 50 54.0 56.1

AAAAAAAAAACCCCCCCCCC 50 54.0 57.5

25

30

35

40

45

50

0 50 100 150 200 250 300 350

TmComparison of different Tm calculations (18 nt)

Nearest Neighbour Thermodynamic Parameters (J.J.SantaLucia,1998)

0.41CG% + 75.1 + 11.7lg[K+] - 528/L

A-form and the B-A transition

The B-form is the most frequently observed conformation of DNA.

The base pairs in the B-form are perpendicular to the double-helix

axis.

As (A + T) content increases, the negative band becomes deeper

and conformational variability increases.

Poly[d(A)]·poly[d(T)] and poly[d(G)]·poly[d(C)] adopt unusual B-

forms.

http://fbio.uh.cu/sites/genmol/adic/na_arch.htm

A-form and the B–A transition

A-form is a constitutive conformation of RNA or RNA/DNA hybrid.

DNA adopts the A-form in aqueous ethanol and other solutions.

Some molecules of DNA (e.g. poly[d(A)]·poly[d(T)]) do not adopt the

A-form conformation at all.

Others [e.g. (G + C) rich DNA fragments] exhibit A-form features

even in aqueous solution.

The Z-form and the B–Z transition

The base pairs in the Z-DNA (left-handed) double helix have an

opposite orientation with respect to the backbone than the B- and the

A-forms.

As with the B-form, there are several variants of the Z-form.

In contrast to the B–A transition, the B–Z transition is slow.

This is connected with the base pair flip that is required during the

transition, which is a kinetically difficult process.

Hydrogen bonding in Watson–Crick and Hoogsteen base pairs

(A) Watson–Crick A·T and G·C base

pairs.

(B) Hoogsteen base pair formation

between adenine and thymine,

guanine and cytosine.


Some structures showing possible reasons why large DNA (L-DNA) constructs do not self-assemble from more

than approximately a dozen synthetic single-stranded DNA oligonucleotides.

Top from left to right: The presence of strong (C:G) and weak (T:A) nucleobase pairs complicates the design of

self-assembling fragments. G-quartets can arise from G-rich sequences, with major groove interactions involving

hydrogen bonding to the “Hoogsteen edge” of purines. Wobble pairing can compete with Watson–Crickery.

Bottom. Even if Watson–Crickery were the only way for single stranded DNA sequences to interact, the low

information density of four-nucleotide DNA allows easy off-target hybridization and unimolecular hairpin formation.

Unimolecular processes (such as hairpin formation) compete with the desired intermolecular hybridization,

especially at low concentrations of oligonucleotide.

Watson–Crick pairing rules follow two rules of complementarity

(a) size complementarity (large purines pair with small pyrimidines) and

(b) hydrogen bonding complementarity (hydrogen bond acceptors, A, pair with hydrogen bond donors D).

Rearranging donor and acceptor groups on the nucleobases creates an artificially expanded genetic

information system (AEGIS), whose components can independently pair. AEGIS adds information

density to the DNA oligonucleotides, thereby diminishing off-target hybridization and other undesired

aggregation/folding motifs. With strength comparable to the G:C pair, AEGIS components

form S:B, Z:P, V:J, andK:X pairs.


Quadruplex structures

Four guanines can hydrogen bond in a square arrangement to form a G-quartet, with

a Hoogsteen G–G pairing pattern.

It also allows formation of secondary structures of G-rich single stranded DNA and

RNA called G-quadruplexes (G4-DNA and G4-RNA) at least in vitro.

It needs four triplets of G, separated by short spacers. This permits assembly of

planar quartets which are composed of stacked associations of Hoogsteen bonded

guanines.

Hoogsteen base-pairing

Circular dichroic (CD) spectroscopy spectra of quadruplexes

Kypr J, Kejnovska I, Renciuk D, Vorlickova M (2009) Circular dichroism and conformational polymorphism of DNA. Nucleic Acids Research 37

(6):1713-1725.

CD spectra of guanine quadruplexes

(A) Time-dependent formation of a parallel-stranded

quadruplex of d(G4) stabilized by 16 mM K+.

(B) Na+-induced formation of an anti-parallel

bimolecular quadruplex of d(G4T4G4). The

triangles in the sketches indicate guanines and

point in the 5′–3′ direction. The G-tetrad is

shown in the middle.

(C) CD spectra reflecting the acid-induced transition

of a DNA fragment

d(TCCCCACCTTCCCCACCCTCCCCACCCTC

CCCA) of a c-myc human oncogene, into an

intercalated cytosine quadruplex.

The triangles in the sketch indicate cytosines and

point in the 5′–3′ direction. The C+·C pair is shown

in the insert.

Cytosine quadruplexes

DNA strands rich in cytosine also generate quadruplexes.

These consist of two parallel homoduplexes connected through hemi-protonated

C·C+ pairs.

The triangles in the sketch indicate cytosines and point in the

5′–3′ direction. The C+·C pair is shown in the insert.

The two duplexes are mutually intercalated in an anti-parallel

orientation. Therefore, these structures are called intercalated

or i-tetraplexes.

Formation of cytosine quadruplexes is promoted by slightly

acid pH, which is needed for C·C+ pair hemiprotonation.

Similar to guanine quadruplexes, intermolecular cytosine

quadruplexes are formed with slow kinetics.

(C) CD spectra reflecting the acid-induced transition of a DNA

fragment

d(TCCCCACCTTCCCCACCCTCCCCACCCTCCCCA).

DNA fragments rich in guanine and adenine

DNA fragments rich in guanine and adenine exhibit cooperatively

melting conformers that differ from classical structures.

The first conformer is an anti-parallel homoduplex containing G·A

pairs.

The second conformer of the alternating GA sequence is a parallel

homoduplex with G·G and A·A pairs, similar to that of parallel

guanine quadruplexes but with smaller positive amplitudes.

This indicates that the guanine–guanine stacking does not change

significantly and that the duplex formation is mediated by inter-

strand adenine–adenine interactions.

Ethanol and even dimethylsulphoxide (DMSO), both DNA

denaturing agents, stabilize the single-stranded ordered

d(GA)n structure as does acid pH.

Primer-dimer detection criteria

• 3’-end and internal primer-dimers detection;

• Tm prediction for prime- dimers with mismatches for standard and

degenerate oligonucleotide bases (B, D, H, K, M, N, R, S, V, W, Y) and

for modifications (inosine, uridine or LNA) using nearest neighbour

thermodynamic parameters;

• For non Watson-Crick base pairs, like Hoogsteen base-pairing: base

triads in a DNA triple helix structure and G-quadruplexes.

Primer dimer detection criteria

Internal single mismatches

Base pairing stability in order of decreasing stability:

G-C > A-T > G·G > G·T ≥ G·A > T·T ≥ A·A > T·C ≥ A·C ≥ C·C

Guanine is the most universal base, since it forms the strongest base pair and the strongest mismatches.

On the other hand, cytosine is the most discriminating base, since it forms the strongest pair and the three weakest mismatches.

Designing Forward and Reverse primers to have matching Tm’s is the best strategy to optimize PCR

The Tm is the temperature at which half the primer strands are bound to target.

The PCR annealing temperature is typically chosen to be 10°C below the Tm.

However, different primers have different dH of binding, which results in different slopes at the Tm of the melting transition.

Thus, the hybridization behaviour at the Tm is not the same as the behaviour at the annealing temperature.

The quantity that is important for PCR design is the amount of primer bound to target at the annealing temperature.

If the primers have an equal concentration of binding, then they will be equally extended by DNA polymerase, resulting in efficient amplification.

Differences in primer binding are amplified with each cycle of PCR, thereby reducing the amplification efficiency and providing opportunity for artifacts to develop.

SantaLucia J, Jr. (2007) Physical principles and visual-OMP software for optimal PCR design. Methods Mol Biol 402:3-34.

Illustration of hybridization profiles of primers with two different design strategies.

In the left panel, the Tm ’s are matched at 68.6°C, but at the annealing temperature of 58°C, primer B(squares) binds 87% and primer A (diamonds) binds 97%.

This would lead to unequal hybridization and polymerase extension, thus reducing the efficiency of PCR. In the right panel, the dG at 58°C of the two primers is matched by redesigning primer B.

The result is that both primers are now 97% bound, and thus optimal PCR efficiency would be observed. Notice that the Tm ’s of the two primers are not equal in the right panel.

Designing Forward and Reverse primers to have matching Tm’s is the best strategy to optimize PCR

Primer quality (virtual PCR efficiency) determination

An abstract parameter called Primer Quality (PQ) that can help to estimate the efficiency of primers for PCR.

PQ is calculated by the consecutive summation of the points according to the following parameters:

total sequence and purine–pyrimidine sequence complexity, the melting temperatures of the whole primer and of the terminal 3′ and 5′;

self-complementarity, which gives rise to possible dimer and hairpin structures, reduces the final value.

Almost all quality “excellent” primers can be use at the annealing temperature from 68°C to 72°C without losing PCR efficiency and show stable efficient amplification in a range of PCR annealing temperature.

Higher quality primers are not only better for PCR efficiency but are also more immune to changing PCR conditions.

Primer quality (virtual PCR efficiency) determination

Primer design selection criteria

Criteria Range Ideal

Length (nt) >11 >20

Tm range (C)a 45 – 75 60 – 68

Tma 12 bases at 3’-end 34 – 48 41 – 47

GC (%) 30 – 70 50

3’-end composition (5’-nnn-3’) nnn ssa, sws, wss

Sequence linguistic complexity (LC, %) b 50 – 100 >95

Sequence Quality (PQ, %) 50 – 100 >95

a Nearest neighbour thermodynamic parameters. b Sequence linguistic complexity measurement was performed using the alphabet-capacity l-gram method.

The secondary (non-specific) binding test

The specificity of the oligonucleotides is one of the most important factors for good PCR;

optimal primers should hybridize only to the target sequence, particularly when complex genomic DNA is used as the template.

Amplification problems can arise due to primers annealing to repetitious sequences (retrotransposons, DNA transposons, or tandem repeats).

Comparison of primer design and oligonucleotide analysis tools

+ Feature supported, and

- Feature not supported

Features Primer-BLAST (Primer3) IDT SciTools: PrimerQuest,

OligoAnalyzer 3.1

PerlPrimer BiSearch Web server PrimerDigital Web Tools

Primer or probe design, length (nt) 15-30 16-35 12-30 10-35 12-500

Limit for sequence length (nt) 50,000 no limit no limit 5,000 no limit

Relative calculation speed quick slow slow very slow very quick

Multiple templates (sequences or primers) and multiple targets inside each sequence

- - - - +

Individual options for each sequence + + + + + Degenerate nucleotides in all operations (Tm calculation, searches and probe, primer design, etc.)

- + - + +

LNA and other nucleotide modifications - + - - + High-throughput runs enabled - - - - + Calculation of optimal annealing temperature

- - - - +

Primer's 3'-end cross and self-dimers - + + + + G-quadruplex detection - - - - + BLAST search + - + + -internal sequence test - - - - + external (specific library) test + - + + + Multiplex with pair primers and/or single primers

- - - - +

in silico for multiple sequences and primers - - - + +

Universal and unique - - - - + Inverted and circular sequences - - - - + Bisulphite modification assays and in silico - - + + +

Polymerase Extension multi-fragment assembly cloning

- - - - +

Oligonucleotide assembly for LCR - - - - +

Comparison Primer Quality with some on-line software

PCR primers design software, in silico PCR, and oligonucleotide assembly and analysis WEB tools

http://primerdigital.com/tools/

Analyze Features:

- general information

-oligonucleotide (Tm, dG, dS and dH), melting temperature calculation for standard and degenerate oligonucleotides including LNA and other modifications;

- evaluation of PCR efficiency;

- linguistic complexity;

- dimer and G/C-quadruplex detection; - dilution and resuspension calculator.

FastPCR is an integrated tool for PCR primers or probe design, in silico PCR, oligonucleotide assembly and analyses, alignment and repeat searching

http://primerdigital.com/fastpcr.html

The FastPCR software is an integrated tools environment that provides comprehensive and professional facilities for designing any kind of PCR primers for standard, long distance, inverse, real-time PCR (LUX and self-reporting), multiplex PCR, group-specific (universal primers for phylogenetically related DNA sequences) and unique (specific primers for each from phylogenetically related DNA sequences), overlap extension PCR (OE-PCR) multi-fragments assembling cloning; single primer PCR (design of PCR primers from close located inverted repeat), automatically detecting SSR loci and direct PCR primer design, amino acid sequence degenerate PCR, Polymerase Chain Assembly (PCA) and much more.

The software utilizes combinations of normal and degenerated primers for all tools and for the melting temperature calculation are based on the nearest neighbourthermodynamic parameters.


The “in silico” (virtual) PCR primers or probe searching or in silico PCR against whole genome(s) or a list of chromosome - prediction of probable PCR products and search of potential mismatching location of the specified primers or probes. The “in silico” oligonucleotide search is helpful for discovering target binding sites with the temperature melting and PCR annealing temperature calculation.

A long oligonucleotide can be designed for microarray analyses and dual-labeled oligonucleotides for probes such as molecular beacons.

Comprehensive primer test, the melting temperature calculation for standard and degenerate oligonucleotides, primer's PCR efficiency and linguistic complexity, dilution and resuspension calculator.

Primers (probes) are analyzed for all primer secondary structures including G-quadruplexes detection (Hoogsteen base pairs), hairpins, self-dimers and cross-dimers in primer pairs.



FastPCR has the capacity to handle long sequences and sets of nucleic acid or protein sequences and it allowed the individual task and parameters for each given sequences and joining several different tasks for single run. It also allows sequence editing and databases analysis.

Efficient and complete detection of various types of repeats developed and applied to the program with a visualisation.

The program includes various bioinformatics tools for analysis of sequences with GC or AT skew, CG content and purine-pyrimidine skew, the linguistic sequence complexity; generation random DNA sequence, restriction I-II-III types enzymes and homing endonucleases analysis, find or create restriction enzyme recognition sites for coding sequences and supports the clustering of sequences and consensus sequence generation and sequences similarity and conservancy analysis.



Alignment algorithm and repeat search. En efficient tool for discovering LTR retrotransposons

Show that 26.5% from 30.4M bases sequence of Arabidopsis thaliana chromosome 1 covering by repeats sequences. The centromere of chromosome 1 is easy to detect in the middle of picture by special structure of the clusters of centromeric repeats, also shown the chromosome duplications.


FastPCR software


online Java Tools


Kalendar R, Lee D, Schulman AH 2011. Java web tools for PCR, in silico PCR, and oligonucleotide assembly and analysis. Genomics, 98(2): 137-144.

Kalendar R, Lee D, Schulman AH 2009. FastPCR Software for PCR Primer and Probe Design and Repeat Search. Genes, Genomes and Genomics, 3 (1): 1-14.



Documents

DNA, PCR primer design, FastPCR and WEB toolsThe Z-form and the B–Z transition The base pairs in the Z-DNA (left-handed) double helix have an opposite orientation with respect to