149
MOLECULAR DYNAMICS SIMULATION OF DNA LESIONS By MATTHEW BRIAN ERNST A thesis submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE IN COMPUTER SCIENCE WASHINGTON STATE UNIVERSITY School of Electrical Engineering and Computer Science December 2005

MOLECULAR DYNAMICS SIMULATION OF DNA LESIONS

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

MOLECULAR DYNAMICS SIMULATION OF DNA LESIONS

By

MATTHEW BRIAN ERNST

A thesis submitted in partial fulfillment of the requirements for the degree of

MASTER OF SCIENCE IN COMPUTER SCIENCE

WASHINGTON STATE UNIVERSITYSchool of Electrical Engineering and Computer Science

December 2005

To the faculty of Washington State University:

The members of the Committe appointed to examine the thesis of MATTHEW BRIAN ERNST find it

satisfactory and recommend that it be accepted.

__________________________________Chair

__________________________________

__________________________________

ii

MOLECULAR DYNAMICS SIMULATION OF DNA LESIONS

Abstract

by Matthew Brian Ernst, M.S.Washington State University

December 2005

Chair: John H. Miller

Damage to DNA by physiological process byproducts or by introduced factors such as ionizing radiation

leads to aging and carcinogenesis in multicellular organisms. Such damage manifests itself most fundamen-

tally at the molecular level as a chemical alteration orlesion in a portion of the DNA. Although damage to

isolated molecules is difficult to examine experimentally due to the very fine time and length scales involved,

computer simulations can bridge the gap between theory and simulation to examine single DNA oligonu-

cleotides under approximately physiological conditions. Molecular dynamics simulations, which approxi-

mate changes in molecular conformation over time using classical mechanics, have been used to examine a

variety of lesions with particular attention to changes in hydration and free energy brought about by lesion

introduction. Changes in hydration are often dramatic with the lesion’s introduction, as are changes in free

energy. Simulations in the context of thermodynamic cycles indicates reasonable preliminary agreement be-

tween simulated and experimental lesion duplex destabilization. It appears, however, that multiple lesions in

close proximity are not more energetically significant than the sum of their parts.

iii

Contents

Abstract iii

1 Introduction 1

1.1 The genetic code and genetic damage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Studying molecular genetic damage through modeling and simulation . . . . . . . . . . . . 2

2 Methods overview 3

2.1 Molecular dynamics simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.1 The AMBER force field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.2 Fragment file development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.3 AMBER parameter development . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.4 Standard simulation method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2.1 Root mean square deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2.2 Curves analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Investigations 16

3.1 8-oxoguanine hydration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2 1A9G hydration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3 Duplex DNA destabilization in presence of 8-oxoguanine . . . . . . . . . . . . . . . . . . . 29

3.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.3.2 Thermodynamic integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.3.3 Thermodynamic integration with 8oG . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.4 Energetic additivity of two common DNA lesions, 8oG and thymine-glycol . . . . . . . . . 42

4 Conclusions and future work 51

A Code appendix 56

iv

A.1 Python scripts for NWChem input generation . . . . . . . . . . . . . . . . . . . . . . . . . 56

A.1.1 fragment-downgrade.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

A.1.2 make_nw_inputs.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

A.1.3 make_scripts.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

A.2 NWChem inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

A.2.1 equilibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

A.2.2 production . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

A.2.3 initial thermodynamic integration . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

A.2.4 extended thermodynamic integration . . . . . . . . . . . . . . . . . . . . . . . . . . 84

A.2.5 thymine-glycol task prepare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

A.3 Analysis tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

A.3.1 pdb_cleanup.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

A.3.2 extract_lis.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

A.3.3 iplot.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

A.3.4 hydrogen bond detection in 8oG . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

A.3.5 hydration_stats.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

List of Figures

1 CURVES parameters illustrated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2 CURVES parameters illustrated (continued) . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3 8-oxoguanine and its parent guanine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4 DNA sequences, native and lesioned with 8oG . . . . . . . . . . . . . . . . . . . . . . . . . 20

5 water bridging between O5* and O8 in 8oG . . . . . . . . . . . . . . . . . . . . . . . . . . 22

6 opportunities for hydrogen bonding in 8-oxoguanine . . . . . . . . . . . . . . . . . . . . . 24

7 1A9G sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

8 water in apurinic gap in 1A9G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

v

9 RMSD for 1A9G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

10 RMSD for 1A9G with initial water molecule removed from apurinic gap . . . . . . . . . . . 27

11 differing Curves Y-displacement in 1A9G (top) and 1A9G without initial water in apurinic

gap (bottom) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

12 apurinic gap reference point from DNA atoms, with reference atoms highlighted . . . . . . . 29

13 sequence used for 8oG melting temperature determination . . . . . . . . . . . . . . . . . . 30

14 thermodynamic cycle for lesion-induced duplex stability alteration . . . . . . . . . . . . . . 36

15 trimer systems used in duplex stability calculations . . . . . . . . . . . . . . . . . . . . . . 37

16 The transformation of G to 8oG is slightly more energetically favorable in a single strand.

The implication is that energy A is greater than energy B; it has to go “further uphill” ther-

modynamically to separate the strands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

17 thymine-glycol and its parent thymine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

18 placement of the thymine-glycol lesion in 12mer . . . . . . . . . . . . . . . . . . . . . . . 46

19 RMSD for thymine-glycol in 12mer: RMSD is perhaps still growing at 2000 ps, but not rapidly 46

20 RMSD for 8oG 12mer extended to 2000 ps . . . . . . . . . . . . . . . . . . . . . . . . . . 47

21 placement of adjacent thymine-glycol and 8oG lesions in 12mer . . . . . . . . . . . . . . . 48

22 RMSD for 12mer with adjacent thymine-glycol and 8oG . . . . . . . . . . . . . . . . . . . 48

23 separate vs. combined 8oG and thymine glycol lesions: free energy change of lesion intro-

duction appears to be additive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

24 placement of dual 8oG lesions in 12mer sequence . . . . . . . . . . . . . . . . . . . . . . . 50

25 RMSD for 12mer with dual 8oG lesions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

vi

1 Introduction

1.1 The genetic code and genetic damage

Living things are dependent upon DNA to store and pass on the information needed to produce the RNAs

and proteins that cellular machinery is made of. DNA – long polymeric molecular strands composed of

nucleotides containing four major bases (guanine, cytosine, adenine, and thymine) – encodes the complete

genetic makeup of an organism. When the encoded information is altered by the accidental breaking and

forming of covalent bonds within DNA, the cell may suffer too. Most of the time, mutations that significantly

alter gene function confer moderate to severe disadvantages to offspring. Damage to DNA in somatic cells

promotes individual cell deaths and inhibits replication. In multicellular organisms, carcinogenesis by the

alteration of proto-oncogenes and tumor-suppressor genes is another undesirable outcome of DNA damage.

When permanent alterations to nucleotide sequence appear in germ lines, they occasionally confer an advan-

tage to offspring in the local environment by altering gene function; these rare beneficial mutations are how

evolution progresses.

Most damage to DNA is endogenous, caused by spontaneous deamination of nucleotide bases (most

prominently in cytosine), spontaneous hydrolysis of the bond between base and pentose in purines, and

from reactive species generated within the cell during aerobic metabolism. DNA may also be damaged by

externally introduced chemicals such as arsenic, benzene, aflatoxins, and nickel. Finally, ionizing radiation

(fast-moving particles or high-frequency electromagnetic emissions, both capable of stripping electrons and

breaking covalent chemical bonds) may damage DNA. Ionizing radiation can produce qualitatively different

effects from endogenous damage at the molecular level. Particularly in the case of high linear energy transfer

radiation, damage may beclustered, with multiple lesions or defects introduced in close proximity to one

another. These clusters arise when radiation deposits energy over a short range in the cellular medium (as

when an alpha particle is emitted within the cell), both directly damaging DNA and producing reactive species

from the surrounding medium which then interact with DNA. They may present a greater challenge to repair

systems evolved by organisms to deal with endogenous damage.

Improved determination of the effects of isolated and clustered damage on the conformation and energet-

1

ics of DNA may enable improved understanding of the biochemistry associated with DNA damage, in turn

better quantifying the risks associated with low-dose radiation exposure.

1.2 Studying molecular genetic damage through modeling and simulation

Molecular-level study of DNA damage by experiment is difficult due to the small spatial and time scales

involved, and the difficulty of introducing only the desired alterations in sufficient DNA for laboratory study.

Computer models and simulations can bridge the gap between experiment and theory by providing more de-

tailed information than may be readily available from instruments or from simplified analytical treatments

of problems, though they are sometimes difficult to keep consistent with known facts from experiment.

Chemistry on computers (“computational chemistry”), both quantum and classical mechanics approaches

to molecular structures and energies, provides the fundamental tools for simulation studies of DNA.

Electronic structuremethods in computational chemistry attempt an approximate numerical solution

to the Schrödinger equation for a molecular system defined by a fixed arrangement of nuclei (the Born-

Oppenheimer approximation), with more accurate approximations rapidly becoming more expensive. Even

the commoner methods are computationally expensive enough that they are rarely used with systems of more

than a few dozen atoms, though improved codes and computer hardware continue to expand their reach. Al-

though computational expense precludes them from routine use in geometry optimization or simulation of

DNA-sized systems, they are crucial in assigning correct partial charges in paramaterized models.

Molecular mechanicsis a far more computationally tractable approach to larger systems, albeit one that

is more difficult to set up and less flexible. In these so-called force field methods, instead of considering elec-

tronic behavior as a function of nuclear geometries, bonds between different atoms are parameterized using

just a few values to describe the bonding, and a few more parameters to describe nonbonded interactions,

equations being used with parameters tuned to different atom and bond types to give a useful approximation

of the forces at work in a particular chemical system. Determining correct parameters can be challenging,

and molecular mechanics models are incapable of describing electronic effects (e.g. excited states and the

formation and breaking of covalent bonds). Still, such models can offer considerable insight.

2

Molecular dynamicsextends the use of computational chemistry to describe behaviors in a chemical

system over time, rather than at a single instant. In reality nuclei are never perfectly stationary in space,

and cannot be fully described by a fixed geometry. Over a large number of discrete time steps, each very

small, the forces and momenta found in the system by a particular method (usually a force field due to the

computational expense of electronic structure methods) are used to find the forces and momenta for the next

time step. With sufficiently small step sizes, a good approximation to continuous change over time in the

given force field can be computed.

Electronic structure methods and molecular dynamics have been used in a complementary fashion to

probe the behavior of nucleotide sequences.Ab initio electronic structure methods can optimize geometry

for small systems for comparison to molecular dynamics results, and they can be used to develop parameters

such as equilibrium bond lengths/angles and partial charges for use in force fields. Molecular dynamics can

be used to study time-dependent evolution of systems, rapidly sample structures near but not on the minima of

potential energy surfaces, explicitly model solvents, and treat systems too large to investigate with electronic

structure methods.

2 Methods overview

2.1 Molecular dynamics simulation

2.1.1 The AMBER force field

DNA simulation via molecular dynamics with the AMBER force field was the primary computational method

used to make the studies detailed under Section 3. AMBER, which stands for Assisted Model Building with

Energy Refinement, is used to refer both to a set of force fields and to a software package[23] where the

force fields were first developed. AMBER was developed with a focus on biopolymers, e.g. proteins and

nucleic acids, but is also capable of treating smaller organic molecules. All molecular dynamics simulations

were performed using the NWChem[14] software package, which contains a high-performance, parallelized

3

implementation of molecular mechanics/dynamics based on the AMBER force field. NWChem uses an up-to-

date revision of AMBER parameters, Parm99[29] which is itself based on the older Parm98[9] modification

to the second-generation AMBER force field[10].

The basic AMBER model is given in the following equation:

Etotal =∑

bonds

Kr(r−req)2+∑

angles

Kθ(θ−θeq)2+∑

dihedrals

Vn

2[1+cos(nφ−γ)]+

∑i<j

[Aij

R12ij

− Bij

R6ij

+qiqj

εRij

](1)

The total energyEtotal is the sum of several components related to bond lengths, bond angles, dihedral

angles, and nonbonded interactions.Bond lengths: Kr is a force constant altering the stiffness of bond

stretching/contraction in a particular bond type;r − reqrepresents the deviation of actual bond lengthr from

equilibrium bond lengthreq. Bond angles:Similarly, Kθ is a constant altering the energy cost for an angle

θ in a certain bonding arrangement to deviate from the equilibrium bond angleθeq. Dihedrals: TheVn are

constants giving the energy barrier to rotation for each of then terms,γ is the phase angle, andφ the dihedral

angle under evaluation.

Nonbonded interactions:The final sum gives nonbonded (van der Waals and electrostatic) interactions;

these interactions contribute to the total energy for pairs of atoms that are not in the same molecule or that

are separated by at least three bonds in the same molecule. Interactions separated by exactly three bonds

(“1-4 interactions”) are diminished by a scaling factor.Aij

R12ij

− Bij

R6ij

is a Lennard-Jones potential representing

the van der Waals interaction, whereA andB are constants determined by the atom types of the atomsi, j

involved, andRij is the distance between the atoms.Bij

R6ij

represents the atractive dispersion force between

atoms, which is dependent to the inverse sixth power on distance.Aij

R12ij

represents the repulsive force between

atoms based on their interacting electron clouds; its inverse 12th power dependence on distance is justified on

grounds of computational efficiency rather than theoretical purity. 1-4 van der Waals interactions are scaled

by 0.5. Finally, electrostatic interactions are represented byqiqj

εRijwhereqi andqj are the charges on atomsi

andj, Rij is the distance between the atoms, andε is the effective dielectric function for the medium. 1-4

electrostatic interactions are scaled by11.2 .

4

Limiting the pairwise atom calculations toi < j optimizes the calculation of nonbonded interactions. By

Newton’s third law, the force on an atomi applied by another atomj, Fij , is opposite but equal in magnitude

to the force applied toj by i; Fij = −Fji. Half of the pairwise potential calculations can be replaced by a

simple negation of already-calculated forces.

Nonbonded interactions rapidly come to account for the vast bulk of computation time as systems treated

by molecular mechanics increase in size; a straightforward implementation of nonbonded interactions scales

asO(n2) wheren is the number of atoms in the system. In order to improve the speed and scalability of sim-

ulations, molecular dynamics simulations generally use a cutoff radius beyond which nonbonded interactions

are ignored. This approximation taken alone works poorly for large molecules such as proteins and DNAs, of-

ten leading to clearly unphysical loss of structural definition after several hundred picoseconds of simulation.

In order to reduce the time spent on calculating nonbonded interactions while avoiding undesirable artifacts,

the smooth particle mesh Ewald method is implemented in NWChem. In this method, long-range electro-

static effects beyond the cutoff radius are found by convolution on an interpolation grid[13]. The method

exhibitsn log(n) scaling and permits extended simulations of biopolymers that remain structurally correct. It

was always used in the DNA simulations described under 3, except for the thermodynamic integration phase

of free energy simulations, where it was not available.

2.1.2 Fragment file development

NWChem comes with a substantial database of fragment files for nucleic acids, amino acids, and small

molecules. These fragment files are used in molecular mechanics computations to supply (most importantly)

connectivity, atom types, atom names, and partial charges. Connectivity simply describes which atoms are

connected to other atoms by explicit bonds. Atom types are important because the parameterized force fields

of molecular mechanics differentiate between atoms based on the chemical context those atoms appear in.

To a chemist, a carbon atom in benzene and a carbon atom in methane are both carbon, C, but to AMBER

they are CA and CT, needing different atom types to describe “aromatic carbon” and “tetrahedral carbon”

behavior. Partial charges affect electrostatic interactions and are especially important to hydrogen bonding,

which plays a large role in biopolymer structure and solvent interaction; they are usually determined by ab

5

initio electronic structure calculations. Atom names are simply convenient identifiers for use in software

commands and output. Every fragment file corresponds to a “residue,” which is a description of a molecule

or part of a molecule. Residues are named with codes of up to three letters in PDB1 files, and are given

fragment files of the same name.

If a molecule contains an unknown component (the molecular topology is inconsistent with the corre-

sponding fragment file, or no corresponding fragment file is found), the user must give the nonstandard

residue a unique name and create a new fragment file for it. ECCE, the Extensible Computational Chemistry

Environment[8], is the most convenient tool for this purpose. Typically, one would begin by loading the PDB

file containing the unknown structure into ECCE, isolating the unknown portion if it is part of a larger struc-

ture, and performing ab initio partial charge calculations. Since this step varies considerably from system to

system, it will not be further elaborated here.

The first step in building a fragment file for a residue with known partial charges is to assign atom types.

In the case of a modified nucleic acid, many of the atom types can be copied unchanged from the native

nucleic acid, whose atom types can be viewed in the ECCE interface or found by browsing the appropriate

file in NWChem’s parameter directories. New atoms, modified atoms, or atoms bonded to new or modified

atoms are assigned atom types based on the chemical context they appear in. There is a table of atom types in

[10] that describes the most appropriate contexts for the common atom types; this table plus some knowledge

of chemistry enables the choice of appropriate atom types. A fragment file cannot be exported from ECCE

until all atoms have been assigned a type.

Once all atom types have been assigned, partial charges (previously calculated) are assigned as well.

Interior DNA residues’ partial charges should sum to -1.0; testing that this condition holds true is an easy way

to double-check the copying of partial charges into the new residue. Unique atom names are also assigned

to any unnamed atoms. Connectivity is important, but is automatically assigned by ECCE based on atom

distances when a structure is imported. No manual intervention is required to control connectivity in most

cases. The finished fragment file, containing atom types, partial charges, atom names, and connectivity, may

be exported from ECCE and used in NWChem.

1Protein Data Bank files contain the atoms, geometric coordinates, and residue names that describe a stationary molecule or collectionof molecules.

6

There is one final difficulty in using ECCE in conjunction with NWChem: newer ECCE releases, from

3.2.2 on, export a fragment file that is incompatible with NWChem 4.5. NWChem 4.5 was needed for

multiconfiguration thermodynamic integration calculations, and modern versions of ECCE needed to be used

to obtain new features and bugfixes. A small tool written in the Python programming language[25], fragment-

downgrade [A.1.1], was the solution, enabling new ECCE to produce fragments for older NWChem.

2.1.3 AMBER parameter development

Methods such as AMBER potentially require a very large number of parameters to describe all possible

chemical systems, due to the many combinations possible for bond lengths, angles, and (especially) dihe-

drals. While NWChem comes with AMBER parameters sufficient to work with many small molecules and

biopolymers, modifications to these molecules will often introduce parameters whose values are not found

in the standard database. For example, the addition of hydrogen to N7 in 8-oxoguanine introduced the un-

known bond angle CB-NA-H into the system. Each missing parameter must be defined before calculations

are performed on the system.

The easiest approach to supplying missing parameters is to search the AMBER database for parameters

involving atom types similar to the types in the missing parameter and duplicate them under new names. A

missing parameter involving H1 might be substituted with a similar parameter involving HC, for example;

the relevant entry would be copied to a new text file named amber.par, and the atom types would be changed

to suit the current system. Unfortunately, this method is inexactly defined and there are not always obvious

candidates for missing parameter substitution.

Parameters chosen by substitution are double-checked by certain tests: if there is a quantum geometry

optimized structure available, the candidate parameters may be compared with measured parameters (lengths

and angles) in the structure, and the closest candidate selected. Obviously this method only helps in selecting

equilibrium values, not force constants. A related approach is to perform a molecular dynamics simulation

using the chosen parameters, then extract metrics like average bond lengths and angles and evaluate them in

light of chemical intuition and experimental results. If the differences between expected and actual metrics

are too large, the parameters may be adjusted and the system simulated again. Detailed protocols for AM-

7

BER parameter development from ab initio calculations are given in [10] and could in principle be applied

to develop missing parameters, but in practice this is a time-consuming task and simple substitution with

occasional alterations appears to give chemically reasonable results for the systems examined herein.

2.1.4 Standard simulation method

All of the simulations detailed herein follow a basic template: the system undergoes preparation, equili-

bration, and production. In order to ensure that the template is followed and to ease the management of

large numbers of simulations, the basic directory structure and NWChem command scripts common to all

simulations are generated with Python scripts originally written by Alejandro Aceves-Gaona [A.1.2, A.1.3].

NWChem prepare

1 p e r m a n e n t _ d i r / s c r a t c h / mat t / sim / c u r r e n t / 8 oGrerun / s t e p 123 T i t l e "8oG 12mer r e r u n "45 p r i n t h igh67 s t a r t 8 o G r e r u n I n i t89 p r e p a r e

10 sys tem 8 oGrerun_ tp11 c h a i n *12 f r a c t i o n 1 2 313 new_top new_seq14 g r i d 24 0 .815 c o u n t e r 22 Na16 touch 0 .317 expand 0 .218 c e n t e r ; o r i e n t19 s o l v a t e box 6 .8 6 .8 9 .820 w r i t e pdb 8 oGrerun_ in i tH2O . pdb21 w r i t e r s t 8 o G r e r u n _ i n i t . r s t22 end2324 t a s k p r e p a r e

The prepare stage starts with a molecular structure in PDB format, adds water and counterions (“sol-

8

vates”), and ultimately produces topology (.top) and restart (.rst) files that can be used directly as inputs to

NWChem simulations.

Starting structure PDB files are typically canonical B-DNA2, generated using ECCE’s DNA Builder and

modified as necessary to introduce lesions. Other sources of DNA structures can be used so long as the

residues and atoms follow NWChem’s naming conventions.

Because the box of water molecules and counterions around the solute is so small, it would rapidly boil

away if it were simply treated as a tiny solvent cube in empty space. Instead, periodic boundary condi-

tions are applied to maintain reasonably constant volume/pressure and reduce boundary effects. Molecules

or interactions that act outside the boundaries of the box are wrapped around to the other side. Although

this convention can introduce some artifacts of its own, particularly if the box is too small, it is a common

approach to realistic explicit solvation and is always used in the MD simulations presented here.

Because solvent molecules are typically so numerous relative to solute, they are frequently represented

computationally using simple models that afford inexpensive calculations. Explicit solvent molecules intro-

duced by NWChem’s solvate directive are, by default, water molecules represented using the SPC/E model.

Thisextended simple point chargemodel represents water with HOH angle equal to the tetrahedral angle, 0.1

nm O-H distance, O and H charges of -0.8476 and 0.4238 e respectively, and a Lennard-Jones potential on

the oxygen positions, given by

VLJ = −(A/r)6 + (B/r)12

with A=0.37122(kJ/mol)1/6 ·nm, B=0.3428(kJ/mol)1/12 ·nm. These parameters came from the best of

four attempts to better reproduce the potential energy and density of liquid water at 300 K via modification of

the older simple point charge (SPC) model[6]. For computational reasons, NWChem treats solvent molecules

separately from any solute molecules, so that (for example) a water molecule explicitly included in a PDB

structure will be treated as solute during simulation, even if it has diffused into the bulk solvent and is from a

chemical standpoint indistinguishable from other water molecules.

2Canonical B-DNA has the perfect double helix structure usually shown in textbooks. It is an idealization of how DNA conformswhen hydrated to approximately physiological conditions.

9

In line 10SYSTEMNAME_TP designates the name of the system and the fact that this group of commands

is for task prepare (TP). NWChem will look for a PDB file namedSYSTEMNAME.pdb in the current directory,

and the files it produces will share the common prefixSYSTEMNAME in their titles. Line 13 specifies the

generation of new topology and sequence files, 14-15 specify the addition of 22 sodium counterions3 on a

trial grid of size 24, with a minimum 0.8 nm separation between each counterion and atoms in the system.

Since DNA strands carry a net negative charge, counterions are needed to stabilize the system. Line 16

forces a minimum 0.3 nm separation between solute and solvent atoms, 17 expands the default box size, 18

places the imported solute’s center of geometry at the origin. 19 adds water to the system in a periodic box

with the given dimensions, 20 creates a PDB file containing the solute and added water/counterions, while

21 generates the restart file needed by later stages. Line 24 actually commands NWChem to carry out the

instructions provided in the task; without it, none of the instructions in the preceding block will execute.

If the PDB file contains residues that are not found in NWChem’s standard fragment directories, the

user will have to provide in the current directory a fragment file named after the missing fragment (see

section 2.1.2). Likewise, if the nonstandard residue references AMBER parameters not present in the standard

parameter files, the user will need to provide an amber.par file containing said parameters (see section 2.1.3).

After task prepare has been successfully completed, all the necessary information is incorporated in the restart

and topology files, and the parameter and fragment files are not needed for later stages.

NWChem equilibration

9 md10 system 8 oGrerun_rx11 noshake s o l u t e12 f i x s o l u t e 1 2413 sd 200014 end1516 t a s k md o p t i m i z e1718 # R e l a x a t i o n s t e p a t 50 .15 d e g r e e s K wi th s o l u t e f i x e d19 t a s k s h e l l " cp 8 oGrerun_rx . q r s 8 oGrerun_rx001 . r s t "20 md21 sys tem 8 oGrerun_rx001

3Each strand of DNA carries an overall charge of−(n− 1) wheren is the number of nucleotides in the strand. Therefore, a duplexDNA with 12 nucleotides in each strand needs(12− 1) ∗ 2 counterions to produce a neutral system.

10

22 v r e a s s 100 50 .1523 f i x s o l u t e 1 2424 e q u i l 0 d a t a 10000 s t e p 0 .00125 i s o t h e r m 50.15 t r e l a x 0 .1 0 .126 i s o b a r27 p r i n t s t e p 100 s t a t 100028 end2930 t a s k md dynamics3132 # R e l a x a t i o n s t e p a t 298.15 d e g r e e s K wi th s o l u t e f i x e d33 t a s k s h e l l " cp 8 oGrerun_rx001 . r s t 8 oGrerun_rx002 . r s t "34 md35 sys tem 8 oGrerun_rx00236 v r e a s s 100 298.1537 f i x s o l u t e 1 2438 e q u i l 0 d a t a 10000 s t e p 0 .00139 i s o t h e r m 298.15 t r e l a x 0 .1 0 .140 i s o b a r41 p r i n t s t e p 100 s t a t 100042 end4344 t a s k md dynamics

It is all but inevitable that the initial arrangement of starting structure, water molecules, and counterions

produced in the prepare stage represent a fairly high-energy (low-probability) configuration. In order to

develop a more typical configuration before the gathering of data for analysis begins, the temperature of the

system is raised in a series of steps and the solute and solvent are permitted to move separately before they

are both freed. In this way the system can gradually release stresses without the introduction of artifacts.

Equilibration begins by performing a steepest-descent optimization on the water and counterions gener-

ated in the prepare module. Line 12 turns off the default SHAKE[26] constraints for the solute and 13 fixes

the DNA residues in space, so that only water and counterions have their positions altered. The next two tasks

gradually raise the temperature of the solvent, first to 50.15 K and then to 298.15 K. Temperature increases

on this microscopic scale correspond to pseudorandom modification of particles’ velocities to achieve the

desired average kinetic energy within a given collection of particles. The vreass command in line 22 moves

the particles in the system toward their 50.15 K target temperature by reassigning velocities every 100 time

steps. The isotherm and isobar commands use Berendsen coupling[7] to approach and maintain a constant

11

temperature of 50.15 K and a constant pressure of1.025 ∗ 105 Pa (default). The imposed NPT conditions are

intended at the termination of equilibration to mimic laboratory conditions. Similar steps are taken to relax

and warm the solute and, ultimately, the solute and solvent together; the full series of commands can be seen

in A.2.1.

NWChem production

9 md10 system 8oGrerun_md00111 e q u i l 0 d a t a 10000 s t e p 0 .00212 c u t o f f 1 .013 upda te c e n t e r 1 f r a c t i o n 114 pme g r i d 64 o r d e r 415 i s o t h e r m 298.15 t r e l a x 0 .1 0 .116 i s o b a r17 mwm 650018 p r i n t s t e p 100 s t a t 100019 r e c o r d r e s t 1000 prop 100 coord 1000 s c o o r 10020 end2122 t a s k md dynamics2324 t a s k s h e l l " cp 8oGrerun_md001 . r s t 8oGrerun_md002 . r s t "25 md26 sys tem 8oGrerun_md00227 e q u i l 0 d a t a 10000 s t e p 0 .00228 c u t o f f 1 .029 upda te c e n t e r 1 f r a c t i o n 130 pme g r i d 64 o r d e r 431 i s o t h e r m 298.15 t r e l a x 0 .1 0 .132 i s o b a r33 mwm 650034 p r i n t s t e p 100 s t a t 100035 r e c o r d r e s t 1000 prop 100 coord 1000 s c o o r 10036 end3738 t a s k md dynamics

Production’s core operation is the generation of 20 picoseconds of molecular dynamics simulation fol-

lowed by recording to a new restart file, copying the old output for use as the new input, and repeating

simulation for another 20 ps. Line 11 specifies the production period: no equilibration, 10000 data-gathering

steps, and a step size of 2 femtoseconds (0.002 ps). Use of “cutoff 1.0” indicates that electrostatic and van

12

der Waals interactions will not be calculated normally (i.e. according to equation 1) beyond one nanometer.

Molecule one (DNA) is recentered on the origin at every time step in line 13. Line 14 directs the use of the

particle mesh Ewald technique for calculation of long-range electrostatic interactions, using 64 grid points

per dimension and the default fourth order cardinal B-spline interpolation. The next two lines impose NPT

conditions like those at the end of the equilibration stage, while the command “mwm 6500” specifies a maxi-

mum of 6500 solvent molecules per node and is in place to increase the default maximum, so that larger jobs

may run on a limited number of nodes in parallel execution. Finally, the print and record directives force

NWChem to store output in .out files, trajectories, and restart files everyk steps, withk either 1000 or 100

depending on the property. After this first production task has finished, line 24 copies the old restart file for

use as the starting point in the next production task, and the process is repeated until the input file is exhausted

(see A.2.2 for full production listing). 20 steps of production, each for 20 ps, appear in each input file. The

full simulation is thus produced 400 ps at a time, up to a total of at least 2000 ps.

2.2 Analysis

Different simulations require different approaches to analysis depending on the features of interest. Nev-

ertheless, certain analysis methods are consistently useful for summarizing the structural behavior of DNA

and for estimating the trustworthiness of the simulation itself. Common analysis tasks have been automated

with an intricate and frequently-altered system of makefiles, shell scripts, and Python programs developed

by Alejandro Aceves-Gaona. Due to the complexity of this system, only isolated analysis tasks are discussed

here.

2.2.1 Root mean square deviation

NWChem can calculate the root mean square deviation (RMSD) of an atom selection’s Cartesian coordinates

from specified reference Cartesian coordinates for any saved coordinate snapshot. Plotting the RMSD values

over time gives a rough measure of the simulation’s stability. When the overall RMSD trend has stopped

changing appreciably (this may be difficult to judge), the simulation is considered converged to a state that is

13

at least locally stable. Such simulations are rarely extended further, and the stable range may be preferred for

data-gathering in more detailed analysis.

RMSD =

√√√√√√N∑

i=1

(ri,t − ri,tref )2

N(2)

The RMSD value for a single snapshot corresponding to timet is found by taking the mean square root

of the sum of the squared distances between current coordinates and reference coordinates (corresponding to

a snapshot at timetref ) over allN atoms. Each snapshot is rotated and translated to superpose itself on the

reference coordinates before the RMSD result is reported, in order to filter out large-scale motions that are

not indicative of conformational changes.

RMSD analysis

1 a n a l y z e2 sys tem thymine−g l y c o l _ a n a3 r e f e r e n c e thymine−glycol_md001 . r s t f i l e thymine−glyco l_md ? . t r j 001 1004 s e l e c t s u p e r 1−105 s e l e c t 1−106 rmsd7 scan s u p e r thymine−g l yco l_ rmsd8 end9 t a s k a n a l y s i s

Any RMSD analysis needs a system topology file, a restart file containing reference coordinates, and

one or more trajectory files containing simulation snapshots. In this sample analysis file, line 2 givesSYS-

TEMNAME for the analysis task. Line 3 designates that reference coordinates will come fromTHYMINE -

GLYCOL_MD001.RSTand that the trajectory filesTHYMINE -GLYCOL_MD001.TRJthroughTHYMINE -GLYCOL_MD100.TRJ

will be processed for snapshots. Next the first 10 residues (corresponding to the 10 DNA residues of this sim-

ulation) are selected for superposition, and the same 10 residues are selected for inclusion in the final reported

RMSD value. The actual analysis with superposition is triggered by “scan super” followed by the name of

the output file,THYMINE -GLYCOL_RMSD, and takes place once NWChem processes the “task analysis” di-

rective. RMSD analysis produces an .rms file which contains (most importantly) columns of number pairs,

corresponding to simulation time in picoseconds and RMSD in nanometers. Plots of these pairs indicate

14

trends and possible problems in the simulation.

2.2.2 Curves analysis

The Curves algorithm, which was first described in [18] and modified and extended to handle local parameters

in [19], is a method for rigorously describing the conformation of an irregular nucleic acid segment based on

a global description of axis curvature and per-nucleotide helicoidal parameters. Nucleic acid segments with

a straight axis are easily described in terms of local parameters, but this approach is ill-suited to compare

different irregular segments, as local parameters alone cannot reveal the contributions of segment curvature

vs. local distortions. Alternatively, nucleotides may each be given their own local axis segments (with

each nucleotide positioned identically) and the overall conformation described by the kinks and dislocations

between local segments. Curves uses both modes of description, distributing segment irregularities over

helicoidal parameters and axis curvature in the “smoothest” way by minimizing a function that simultaneously

expresses irregularity in terms of sums of helicoidal parameter variation between successive bases and kinks

between local helical axis segments. The per-nucleotide parameters (Figures 1,2) and curvature can be used

to compare different systems on a common basis.

Two different Curves implementations have been used in analysis: CUR5_S 5.3, which is an implemen-

tation of Curves created and maintained by the original Curves authors, andPYCURVES, which is a reim-

plementation of the Curves algorithm by David West written in the Python programming language[30]. The

original FORTRAN implementation is faster, but West’s pycurves is more flexible in its sequence handling.

Curves implementations cannot directly read NWChem’s trajectories, so the first step is to extract snap-

shots from trajectories as PDB files. As with RMSD analysis, a restart file, topology file, and one or more

trajectory files are needed. NWChem does not provide constructs to convert a trajectory file to a series of

PDB files; input files must explicitly list every snapshot to be extracted. Generation of these large input files

is automated with some simple programming.

PDB extraction

1 a n a l y z e2 sys tem thymine−g l y c o l _ e x t r a c t

15

3 r e f e r e n c e thymine−glycol_md001 . r s t4 f i l e thymine−glycol_md001 . t r j5 w r i t e 1 s o l u t e thymine−glycol_md00001 . pdb6 end78 t a s k a n a l y z e

This series of commands is similar to that for RMSD analysis, but each trajectory file must be explicitly

indicated (line 4) and a different name given to each PDB file written (line 5). The snapshot number is also

specified in line 5. There are typically 100 snapshots per trajectory to be extracted. A complete extraction

input file simply repeats this series of commands to write all snapshots from all trajectory files. Due to

extreme length and repetition, it is not included in full here or in the code appendix.

Curves implementations are sensitive to naming conventions used in PDB files, so the files written by

NWChem are processed byPDB_CLEANUP.PY (Appendix A.3.1) before they are given over to Curves. After

Curves has finished with the processed PBD files, there is an .lis file for each PDB file, each .lis file containing

Curves parameters.EXTRACT_LIS.PY (Appendix A.3.2) turns these many files with one parameter sample

each into a few files with many parameter samples each, so that for example all pucker values go into a single

file. IPLOT.PY (Appendix A.3.3) offers a variety of interactive and script-driven methods for generating plots

of these parameters once they have been separated.

3 Investigations

3.1 8-oxoguanine hydration

The oxidation of guanine to 8-oxo-7,8-dihydroguanine (8oG) (Figure 3) is one of the most common

manifestations of oxidative DNA damage. It is a dangerous lesion as it permits the incorporation of adenine

during DNA replication, leading to G:C→ A:T transversion mutations. Aerobic organisms have evolved

multiple enzymatic mechanisms to detect and repair such lesions. Enzymes may use conformational cues in

DNA structure induced by 8oG, but the experimental study of structure and dynamics on such a fine scale

16

Figure 1: CURVES parameters illustrated

17

Figure 2: CURVES parameters illustrated (continued)

18

Figure 3: 8-oxoguanine and its parent guanine

is difficult. Molecular dynamics simulation has therefore been used to study the effect of 8oG on DNA

bending[21] and found to induce changes favorable to glycosylase binding.

Solute/water interaction is known to powerfully influence DNA conformation and therefore interesting

as part of lesion-induced conformational alteration. Ishida[15] observed water molecules bridging between

atoms O8 and O5’ in a molecular dynamics simulation using the PARM94 version of the AMBER force

field. Water bridging between these positions may be a distinctive feature of 8oG, as in DNA, particularly

solvated B-DNA, water bridging between 5’ phosphate and base atoms is known to be relatively infrequent

([31], 231). Simulations were performed in NWChem to to examine the hydration of 8oG compared with

native guanine and to make comparisons with Ishida’s earlier work.

Two molecular dynamics simulations of the dodecamers shown in Figure 4 were performed in the NWChem

computational chemistry package, using the PARM99 version of the AMBER force field. The nonstan-

dard 8oG residue had partial charges calculated for its base atoms by the restrained electrostatic potential

method[2] using the 6-31G* basis set. The sugar portion of the nucleotide was replaced with a methyl group

whose charge was constrained at 0.0888 eu in order to make the net charge on 8oG equal to that on guanine

in the AMBER force field[21].

In the first pair of simulations, which was the basis for an earlier publication[21], the DNA was sol-

vated in a periodic box containing 22 Na+ counterions and about 7000 water molecules. The second pair of

simulations was very similar, but the periodic box was larger and contained nearly 15000 water molecules.

Simulations ran for 2 ns of simulated time, with snapshots of all atom coordinates recorded every 2 ps to

19

Figure 4: DNA sequences, native and lesioned with 8oG

Table 1: 8-oxoguanine modified partial charges and AMBER parameters

atom partial chargeN1 -0.4025H1 0.3266C2 0.7208N2 -0.96251H2 0.43712H2 0.4371N3 -0.6118C4 0.2108C5 -0.0211C6 0.4299O6 -0.5500N7 -0.5129H7 0.4077C8 0.4468O8 -0.5558N9 -0.111

missing parameter existing AMBER parameterCK-O C-O

CB-NA CB-NBCK-NA CK-NB

CB-CB-NA CB-C-NAC-CB-NA C-CB-NBCB-NA-H CC/CR/CW-NA-H

CB-NA-CK CB-NB-CKNA-CK-O NA-C-ON*-CK-O N*-C-O

N*-CK-NA N*-C-NACK-NA-H CC/CR/CW-NA-H

X-CB-NA-X X-CB-N*-XX-CK-NA-X X-CC-NA-XX-X-CK-O X-X-C-O

CB-N*-CB-NC CB-NC-CA-N2C-NA-CB-CB CB-NC-CA-N2

20

yield 1000 snapshots over the course of each simulation.

The 1000 snapshot files from each of the simulations were processed to quantify interactions between

water and DNA at the site of the introduced lesion.

bridging water detection

1 # ! / us r / b in / env py thon2 . 22 import math , g lob34 def d i s t (A, B) :5 x , y , z = A[ 0 ] − B[ 0 ] , A[ 1 ] − B[ 1 ] , A[ 2 ] − B[ 2 ]6 re turn math . s q r t ( x* x + y* y + z* z )78 def g e t c o o r d s ( l i n e ) :9 re turn [ f l o a t ( l i n e [ 5 ] ) , f l o a t ( l i n e [ 6 ] ) , f l o a t ( l i n e [ 7 ] ) ]

1011 f i l e L i s t = g lob . g lob ( ’ / va r / tmp2 / 8 oGrerun2 / pdbs /* . pdb ’ )12 f i l e L i s t . s o r t ( )1314 c o u n t e r = 01516 f o r e n t r y i n f i l e L i s t :17 p r i n t e n t r y18 atoms = open ( e n t r y ) . r e a d l i n e s ( )19 r e f 1 = g e t c o o r d s ( atoms [ 5 7 5 ] . s p l i t ( ) )#O5*20 r e f 2 = g e t c o o r d s ( atoms [ 5 8 6 ] . s p l i t ( ) )#O8 / H821 f o r w i n range (783 , 21678 , 3 ) :22 w a t e r l o c = g e t c o o r d s ( atoms [w ] . s p l i t ( ) )23 d i s t a n c e = d i s t ( re f1 , w a t e r l o c ) + d i s t ( re f2 , w a t e r l o c )24 i f d i s t a n c e < 6 . 0 :25 c o u n t e r += 12627 p r i n t c o u n t e r

“Bridging” configurations near residue 19 were quantified by counting water molecules that closely ap-

proached pairs of electronegative DNA atoms. If a water molecule’s oxygen had less than 6 angstroms

combined distance between itself and the pair of DNA atoms, it was counted as a bridging configuration.

6 angstroms was the cutoff point because hydrogen bonds are typically 2.6-3.05 angstroms in length[17];

the sum of two such lengths should be ~6 angstroms or slightly less. Waters in a bridging configuration are

sharing hydrogen bonds with both DNA atoms, as the orientation of the water molecule in Figure 5 shows.

21

Figure 5: water bridging between O5* and O8 in 8oG

Strictly speaking, true bridging hydrogen bonds are a subset of the bridging configurations found here, since

they take orientation as well as distance into account.

Bridging results differed significantly between guanine and 8oG: the first simulation with fewer water

molecules showed the bridging pattern 158 times between H8 and O5* in G19, and 348 times between O8

and O5* in 8oG19. The second simulation showed the pattern 188 times between H8 and O5* in G19, and

531 times between O8 and O5* in 8oG19. The simulations with 8oG thus show a 2.2-fold and 2.8-fold

increase in close solvent-DNA interactions in this region compared to simulations without the lesion.

Ishida’s approach to examining hydration was somewhat different: interactions of O8 and H7 with water

in 8oG were counted as hydrogen bonds whenever the hydrogen acceptor-proton distance was less than 2.5

angstroms and the hydrogen acceptor-proton-hydrogen donor angle was greater than 120 degrees (Figure 6).

O8 can function as a hydrogen acceptor, with a water molecule’s oxygen playing the role of donor of one

of its hydrogens. N7 can function as a hydrogen donor, donating H7 to a water molecule’s oxygen. By

examining these possible combinations for each water molecule in each trajectory snapshot, the total number

of hydrogen bonds over the course of the simulation can be found. Such examination was perfomed with a

small program [A.3.4] to count the number of hydrogen bonds involving O8 and H7 in 8oG19, and then to

perform a similar analysis on G19 in the native simulation.

22

Table 2: hydrogen bond occupancy as determined by number of bonds formed vs. total number of framesexamined

native H8 8oG H7 8oG O8simulation 1 12.3% 86.7% 93.5%simulation 2 11.9% 93.2% 99.9%

Ishida’s simulation 15.5% “close to 100%” “close to 100%”

Both simulations showed high hydrogen bond occupancy in the 8oG case and low hydrogen bond occu-

pancy for H8 in the native case, though the occupancy was slightly lower than that found by Ishida (Table

2). It is possible that differences in solvent models account for the differences: although Ishida’s work also

used the AMBER force field, it used the TIP3P water model as opposed to the SPC/E model used in the

current work. Occupancy was determined by dividing the total number of frames with hydrogen bonds of the

specified type by the number of frames in the simulation. O8 had a tendency to form two hydrogen bonds

simultaneously but this behavior is not reflected in the table, which indicates only how often there was at least

one hydrogen bond present. The results show strong similarity to those of Ishida, and it is noteworthy that this

stricter definition of hydrogen bonding illuminates a sharper difference in hydration between lesion-bearing

and native structures around O8/H8. 8oG’s enhanced hydrogen bonding may contribute toward its relatively

minor destabilization of DNA.

3.2 1A9G hydration

Beger and Bolton determined the structures of some apurinic and apyrimidinic (“abasic”) sites in duplex

DNA, using a combination of NMR instrumental data and restrained molecular dynamics[4]. Lesions of this

type are common in DNA as an intermediate stage in DNA repair, when damaged bases are cleaved from

the sugar as a prelude to restoration, and as such their structural effects upon DNA are also of interest. One

structurally interesting finding was a persistently hydrogen-bonded water molecule lingering between the

apurinic site and the base opposite it, as shown in Figure 8. Simulations were undertaken starting with their

experimentally determined structure to see if unrestrained molecular dynamics showed the same persistence

of water in the gap left by base removal.

The structure of 11 nucleotides[3] (Figure 7) was downloaded from the Protein Data Bank[5] and modi-

23

Figure 6: opportunities for hydrogen bonding in 8-oxoguanine

Figure 7: 1A9G sequence

24

Figure 8: water in apurinic gap in 1A9G

25

Table 3: apurinic site modified partial charges and AMBER parameters

atom partial chargeO1* -0.511920

1HO1 0.397749H1* 0.082669

missing parameter existing AMBER parameterOS-CT-OH O/O2-C-O/O2OH-CT-H2 OS-CT-O2

Figure 9: RMSD for 1A9G

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0 200 400 600 800 1000 1200 1400 1600 1800

rmsd

(nm

)

time (ps)

1A9G (1A9G) rmsd

fied slightly for use in ECCE/NWChem by a change of atom naming conventions. In reality, the form of the

apurinic site interconverts betweenα-hemiacetal,β-hemiacetal, aldehyde, and hydrated aldehyde. However,

molecular dynamics does not incorporate changes in bonding, so the startingα-hemiacetal form was retained

for the entire simulation time. Two simulations of 2000 ps each were performed. The first used the starting

structure as described, along with sufficient sodium counterions to make the system neutral and added waters

in a solvation box with periodic boundary conditions. The second was identical with the exception that the

water molecule in the apurinic gap that was originally part of the experimental structure was removed.

The plotted RMSD values for the two simulations show a fair amount of stability in the 400-1600 ps time

range in both cases. This time range was therefore selected for further analysis of hydration and conformation.

Despite roughly comparable RMSD values and near-identical starting structures, Curves analysis showed

substantial conformational differences between the two simulations, as in the Y displacement values shown

26

Figure 10: RMSD for 1A9G with initial water molecule removed from apurinic gap

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0 200 400 600 800 1000 1200 1400 1600 1800

rmsd

(nm

)

time (ps)

1A9G-nowater (1A9G-nowater) rmsd

in Figure 11 (see Figures 1 and 2 for visual explanation of Curves parameters).

The differences seen between the two simulations carry over into hydration, as shown in Table 4. In order

to observe water occupancy in the gap between the abasic site and its nominal partner, a reference point was

defined by averaging the coordinates of N3, N4, and O2 on the cytosine opposite the abasic site with those

of O1P, O2P, and O5* from the abasic site. Any water molecule whose oxygen came within five angstroms

was counted as being “in” the gap, and the distance of the closest approach between the reference point

and water was recorded for each frame. As with Curves analysis, only frames from the 400-1600 ps range

were examined, and sampling was somewhat reduced because solvent coordinates were recorded only every

tenth frame. The analysis shows that 1A9G has modestly more “in-gap” solvent molecules than its modified

counterpart, and a considerably shorter average minimum distance between the reference point and solvent.

It defies expectations that the removal of a single water molecule (and one that is easily replaced from

the solvent, at that) would profoundly affect an oligonucleotide. Why, then, are such differences seen? The

answer may lie in the nature of molecular dynamics simulations. Molecular dynamics samples a very high-

dimensional potential energy surface. A small coordinate change on the surface can have large effects on

the simulation, because there are so many favorable local minima to visit. The simulation issensitive to

initial conditions. 1A9G is particularly vulnerable because its loss of hydrogen bonds affords much more

conformational freedom to the cytosine opposite the abasic site. The two systems diverge within a few

27

Figure 11: differing Curves Y-displacement in 1A9G (top) and 1A9G without initial water in apurinic gap(bottom)

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

3

C11-G12C-GG-CC-GA7-T16A5-T18G-CC-GG-CC1-G22

Ang

stro

ms

base pair

Global Base pair-Axis Parameters for 1A9G C Ydisp

-1

-0.5

0

0.5

1

1.5

C11-G12C-GG-CC-GA7-T16A5-T18G-CC-GG-CC1-G22

Ang

stro

ms

base pair

Global Base pair-Axis Parameters for 1A9G-nowater C Ydisp

picoseconds of the beginning of simulation and do not find similar paths within the time permitted.

Although real molecules experimentally exhibit statistically indistinguishable properties under controlled

conditions, experimental methods typically examine vast numbers of molecules over time scales far longer

than what can be achieved with molecular dynamics at present, reporting macroscopic values averaged over

so many molecules as to make individual variations insignificant. If molecular dynamics could examine on

the order of1015slightly perturbed copies of a system, generating a one-second trajectory for each, oligonu-

cleotide simulations would be expected to give results as reliable and repeatable as any laboratory exercise.

Until then, dynamic properties derived from simulation must be taken with a grain of salt.

28

Figure 12: apurinic gap reference point from DNA atoms, with reference atoms highlighted

Table 4: hydration statistics

1A9G 1A9G, without NMR wateravg. number of water molecules 8.8 7.3

avg. minimum reference-water distance1.2 angstroms 2.7 angstroms

3.3 Duplex DNA destabilization in presence of 8-oxoguanine

3.3.1 Introduction

Under standard physiological conditions, DNA occurs as a duplex of two polymeric molecules, held together

by base-stacking and hydrogen bonding interactions. At elevated temperatures, the duplex will separate into

single strands. Since stacked DNA bases exhibit reduced ultraviolet absorption, double helix separation can

be monitored by ultraviolet absorption and plotted against temperature. Midway along the transition from

double helix to separated strands as determined by UV absorption, the temperature is designatedTm, the

melting temperature.Tm is characteristic of DNA sequences; it increases with increasing guanine/cytosine

content and with strand length. It is also affected by the introduction of lesions.

Multiple-lesion energetic effects are of interest in DNA because they may reveal clues about the biochem-

ical nature of clustered damage. Single lesions, starting with 8oG, were investigated first because they provide

a necessary basis for comparison. The single-lesion 8oG results were in turn compared with experimental

29

Figure 13: sequence used for 8oG melting temperature determination

results in order to evaluate the validity of the computational approach.

Experimental results fromTm determinations indicate that the introduction of 8oG in the central position

of a 13 nucleotide sequence opposite cytosine (Figure 13) destabilizes the duplex by−2.0 ± 0.7 kcal/mol

at 25 °C[24]. In order to make computational comparisons with experiment, multiple simulations using

thermodynamic integration were placed in the context of a thermodynamic cycle.

3.3.2 Thermodynamic integration4

Macroscopic quantities of interest in thermodynamics, such as entropy, enthalpy, internal energy, and free en-

ergy, can be calculated from the canonical partition functionQ which describes a collection ofN interacting

particles:

Q =all states∑

i

e−Ei/kBT (3)

HereEi is the energy of statei, kB is Boltzmann’s constant, andT is the temperature. In typical systems,

energy levels are so closely spaced that their distribution can be thought of as continuous, and the sum can be

replaced by an integral over all coordinates (r ) and momenta (p) (thephase space):

Q =∫

e−E(r,p)/kBT drdp (4)

4A longer version of the following explanation of thermodynamic integration can be found in [16] under “Simulations, Time-Dependent Methods, and Solvation Models”.

30

More properly, when working in phase space,E should be replaced by the HamiltonianH for the system.

The Hamiltonian itself can be separated into kinetic and potential energy components,H = T + V. The

interesting component is the potential energyV, since in its absence the system is an ideal gas. For the

calculations under discussion,V is given by the AMBER force field (Equation 1).

Other thermodynamic properties, such as the Helmholtz free energyA, are defined in terms of the partition

function:

A = −kBT lnQ (5)

With Q understood in terms of a force field, the Helmholtz free energyA can be expressed as

A = kBT ln

(all states∑

i

eEi/kBT P (Ei)

)(6)

or

A = kBT ln(∫

eE(r,p)/kBT P (r,p)drdp)

(7)

whereP (Ei) is the probability of the system being in energy stateEi or (alternatively) whereP (r,p) is

the probability of the system being at the phase space location(r,p):

P (Ei) = Q−1e−Ei/kBT (8)

P (r,p) = Q−1e−E(r,p)/kBT (9)

Due to the vast number of states that systems can occupy, it is not practical to sum over all states or inte-

grate over all phase space. The best that can be hoped for is to use the average value of a finite, manageable,

but representative collection of states to substitute for the full sum. A representative collection of states must

sample all “important” parts of phase space and contain configurations of a given energy in such a way that

their representation is proportional to the Boltzmann distribution:

Ni

N=

gie−Ei/kBT

Q(10)

31

with gi being the degeneracy of statei andNi being the number of particles out ofN total occupying

statei and having energyEi. This representative collection of states is called anensemble, and the average

of some propertyX that follows from theM states in the ensemble is called theensemble average〈X〉:

〈X〉M =1M

M∑i

X (εi) =1M

M∑i

X (ri,pi) (11)

When the ensemble is generated by molecular dynamics, the average is a time average, but by the ergodic

hypothesis (which states that all points in phase space are accessible within a system regardless of the starting

position) it is assumed to be equivalent to an ensemble average:

〈X〉 = limτ→∞

∫ τ

0

X(t)dt = limM→∞

1M

M∑i

Xi

The ergodic hypothesis can only be proven for a hard sphere gas, but it is assumed to be true when

molecular dynamics trajectories are used to construct ensembles for thermodynamic measurements. For

example, the ensemble average Helmholtz free energy as given by an ensemble generated from molecular

dynamics:

〈A〉M = kBT ln

(1M

M∑i

eEi/kBT

)= kBT ln

⟨eE/kBT

⟩M

(12)

The value of〈X〉 is determined with a statistical uncertaintyσ (X) which is inversely proportional to the

square root of the number of samples in the ensemble:

σ (X) ∝ 1√M

(13)

The statistical uncertainty can therefore be decreased by increased sampling, but only so long as further

samples remain representative. If the system finds a local minimum and spends a disproportionate amount

of time in that region of phase space, further samples may actually increase the error in asystematicfashion,

yielding properties that are precisely determined yet non-representative of the system. In practice, a molecular

dynamics trajectory over the limited timespan typically available with current software and hardware tends to

32

sample configurations near the starting point in phase space. Unfortunately for free energy determination, the

average free energy is powerfully influenced by rare high-energy configurations. Even if a simulation avoids

systematic errors, it takes a vast number of samples to describe the average free energy with low statistical

error.

Fortunately, thedifferencebetween free energy for two different systemsA andB is chemically useful

and easier to find with reasonable statistical uncertainty. Consider these two systems and their two associated

energy functionsEA andEB . From Equations 5, 6, and 7, the difference in Helmholtz free energy between

systems can be expressed as:

AA −AB = −kBT lnQA + kBT lnQB = −kBT lnQA

QB(14)

AA −AB = −kBT ln(∑

e−(EA−EB)/kBT)

(15)

and this property, like other other thermodynamic properties, can also be evaluated as an ensemble aver-

age:

AA −AB = kBT ln⟨e(EA−EB)/kBT

⟩M

(16)

Now the exponential involves an energy difference, that betweenEA andEB. As long as this difference

is small compared withkBT , the ensemble average can yield a good estimate of the free energy difference

using a tractable number of samples. Suppose the difference between the two energy functions is not small, as

in the case of energy functions (Hamiltonians) describing guanine and 8-oxoguanine. The ensemble average

will no longer give reasonable results in a reasonable amount of time. In fact, the system will probably

fly apart, as vast change is introduced in a femtosecond, and the forces set up will spiral out of control

because standard molecular dynamics time steps are too long to accurately model the disrupted system.

However, if the transformation is controlled by a coupling parameterλ interpolating over an arbitrary number

of intermediate points (“windows”) between the endpoint energy functions, the difference between any pair

of energy functions can be kept reasonably small. For an arbitrary intermediate point:

33

Eλ = λEA + (1− λ) EB (17)

The total free energy change is found as a sum over free energy changes for all values ofλ:

AA −AB = kBT∑

λ

ln⟨e(∆Eλ)/kBT

⟩M

(18)

If the energy function is itself a function ofλ as in Equation 17, so too are the partition function and

therefore the Helmholtz free energy as well:

A (λ) = −kBT lnQ (λ)

Using the definition of the partition function and the Boltzmann probability distribution (Equations 4, 4,

8, and 9) and differentiating yields:

∂A (λ)∂λ

= −kBTQ

∂Q(λ)∂λ = −kBT

Q

∑i

(− 1

kBT

)e−Ei(λ)/kBT ∂Ei (λ)

∂λ(19)

∂A (λ)∂λ

=∑

i∂Ei(λ)

∂λ P (λ) (20)

The right hand side may be replaced by an ensemble average and integrated overλ to yield:

A (1)−A (0) =∫ 1

0

⟨∂E (λ)

∂λ

⟩M

dλ (21)

The left hand side is the expression for the free energy difference, and the right hand side may be approx-

imated by a sum over a finite number ofλ values:

AA −AB =∑

i

⟨∂E (λ)

∂λ

⟩M

∆λi (22)

At long last, Equation 22 gives the method of thermodynamic integration, by which a free energy dif-

34

ference is calculated between two systems via a specified number of samples taken over a specified number

of windowsi. This method was used to compute free energy differences for a variety of DNA lesions. The

energy function (Hamiltonian) is altered over time by interpolation, to change partial charges on atoms, atom

types, and equilibrium values and force constants for bond lengths, angles, and dihedrals. The intermedi-

ate states represent chimeric molecules that can have no physical existence, but since free energy is a state

function, the strange nonphysical path taken from the starting system to the end system does not invalidate

the final result. Gibbs free energy may be preferred to Helmholtz free energy as a measure of free energy

difference, but as both are defined in terms of the partition function, analagous calculations can be used to

find Gibbs free energy differences between two systems.

3.3.3 Thermodynamic integration with 8oG

In order to touch base with experimental results, an interesting value to calculate would be the free energy

required to separate duplex DNA in the presence and absence of 8oG lesions. One approach that immediately

suggests itself is to use thermodynamic integration to remove an entire strand: system A1 would contain two

native strands, one of which would be removed over time by transforming all masses, charges, and forces

associated with it toward zero, finally yielding system A2. System B1 would contain one native strand and

one strand containing the 8oG lesion, the native strand being removed over time as in system A to yield B2.

If the only difference between the two pairs of systems is the presence or absence of 8oG, comparing the free

energies associated with removing the second strand should reveal 8oG’s effect on duplex stability.

Unfortunately, the straightforward approach is intractable. For one thing, well-behaved solvated DNA

systems should have a net charge of zero, but removing a strand removes all the backbone units associated

with the strand and leaves behind an excess of sodium counterions. If counterions are simultaneously removed

to preserve neutrality, their removal may have considerably different energetic effects depending on their

location in the system at the commencement of thermodynamic integration, making it difficult to isolate

8oG’s effect on the energetics of separation. Even more worrisome, removing an entire strand represents

an enormous change in the Hamiltonian, while changes between successive windows must be small for

thermodynamic integration to yield useful results. A manageable number of windows may not parcel out the

35

Figure 14: thermodynamic cycle for lesion-induced duplex stability alteration

full transformation into sufficiently small increments for the procedure to work correctly. Indeed, attempts

to make this transformation in a small DNA trimer system yielded nothing but software crashes and garbage

output.

Using multiple simulations in the context of athermodynamic cyclesidesteps the difficulties of directly

applying thermodynamic integration with very large transformations. We now look at new collection of

systems. System A1 is a native duplex, and it is transformed via thermodynamic integration into system A2,

a duplex with a lesion on one strand. System B1 is an isolated strand with no lesions, and it is transformed

via thermodynamic integration into system B2, an isolated strand with one lesion (the other separated strand

needs no caclulations because it is not transformed: its contribution to free energy change must be zero).

As shown in Figure 14, these transformations directly yield two figures for free energy change,∆EA and

∆EB . It is also known that a full traversal of the thermodynamic cycle, starting from any of the four systems,

will yield a free energy change of∆E = 0, because there is no free energy difference between a system

and itself. Since free energy is a state function, two legs of the thermodynamic cycle can be calculated

by thermodynamic integration, and a full traversal of the cycle yields no free energy change, it is possible

to determine the relationship between the two legs representing strand separation energy, even though their

actual values remain unknown. If∆EA is less than∆EB , it must be the case that strand separation energy

36

Figure 15: trimer systems used in duplex stability calculations

A is less than strand separation energyB: the lesion has stabilized the duplex. If∆EA is greater than

∆EB , strand separation energyA is greater than strand separation energyB: the lesion has destabilized the

duplex. The difference between the energies is proportional to the stabilization/destabilization introduced by

the lesion. In this framework it is possible to relate tractable calculations to experimentally derived duplex

destabilization energies found from melting temperature measurements.

Two series of calculations were carried out in NWChem to implement the thermodynamic cycle for 8-

oxoguanine in the context of a trimer and in the context of a 12mer. The trimer sequences are shown in Figure

15 and the 12mer sequences were shown earlier in Figure 4. Both were created as standard B-DNA in ECCE

and modified to include 8oG. The calculations actually began with the 8oG lesion-bearing systems and trans-

formed them to the native guanine systems. NWChem requires that the ending system in a mutation operation

have no more atoms than the starting system, so 8oG with its extra hydrogen atom on N7 had to go first. After

8oG is transformed to G in the forward phase, a reverse phase calculation is performed to transform G back

to 8oG, so the expected transformation does take place, but not before its “backwards” counterpart. The two

calculations should be equal in magnitude and opposite in sign. Differences of magnitude between the two

are an indicator of statistical error (though they will not reveal systematic error). NWChem actually reports

37

two sets of values for thermodynamic integration: one “including mass contributions” and one “excluding

mass contributions.” Since there is one more atom in 8oG than in G, and since a hydrogen atom is trans-

formed to oxygen in 8oG, the two systems have different masses. Including the mass contributions takes

this change into account when reporting numbers for free energy, and it seems more physically correct to do

so. All numbers reported for this and following free-energy calculations include mass contributions when

possible.

NWChem prepare for 8oG thermodynamic integration

1 T i t l e "8oG t r i m e r energy c a l c u l a t i o n "23 p r i n t h igh4 s t a r t 8 o G e n e r g y t r i m e r I n i t56 p r e p a r e7 sys tem 8 o G e n e r g y t r i m e r _ t p8 amber9 modify atom 2 : _N2 f i n a l cha rge−0.923000

10 modify atom 2 : _O6 f i n a l cha rge−0.56990011 modify atom 2 : _C6 f i n a l cha rge 0.49180012 modify atom 2 : _C5 f i n a l cha rge 0.19910013 modify atom 2 : _N7 f i n a l cha rge−0.572500 type NB14 modify atom 2 : _H7 f i n a l cha rge 0 .0 dummy15 modify atom 2 : _C8 f i n a l cha rge 0.07360016 modify atom 2 : _O8 f i n a l cha rge 0.199700 type H517 modify atom 2 : _N9 f i n a l cha rge 0.05770018 modify atom 2 : _C4 f i n a l cha rge 0.18140019 modify atom 2 : _N3 f i n a l cha rge−0.66360020 modify atom 2 : _C2 f i n a l cha rge 0.74320021 modify atom 2 : _N1 f i n a l cha rge−0.50530022 modify atom 2 : _C8 f i n a l cha rge 0.07360023 modify atom 2:2H2 f i n a l cha rge 0.42350024 modify atom 2:3H2 f i n a l cha rge 0.42350025 modify atom 2 : _H1 f i n a l cha rge 0.35200026 modify atom 2 : _C1* f i n a l cha rge 0.03580027 modify atom 2 : _H1* f i n a l cha rge 0.17460028 c h a i n *29 new_top new_seq30 g r i d 24 0 .831 c o u n t e r 4 Na32 touch 0 .333 c e n t e r ; o r i e n t34 s o l v a t e box 2 .4 2 .4 3 .0

38

Table 5: additional parameter needed during 8oG thermodynamic integration

missing AMBER parameter existing AMBER parameterC-NB-CB-CB C-NA-CB-CB (previosuly defined in amber.par for 8oG)

35 w r i t e pdb 8 oGenergy t r imer_ in i tH2O . pdb36 w r i t e r s t 8 o G e n e r g y t r i m e r _ i n i t . r s t37 end3839 t a s k p r e p a r e

Thermodynamic integration calculations start from ordinary molecular dynamics trajectories, but the ad-

ditional information needed to use them in thermodynamic integration must be inserted in the task prepare

step, necessitating the completion of all the ordinary molecular dynamics tasks before integration can begin.

The listing shown above for the trimer differs from standard molecular dynamics preparation in its ’modify

atom’ commands. These commands tell NWChem what charge and type each specified atom should have at

the end of its transformation. Here, they are defined so as to make 8oG back into guanine. DUMMY atoms

are those that completely vanish whenλ = 1. AMBER parameters are read from parameter files as usual,

though more parameters may need to be defined for thermodynamic integration calculations if bond types that

would not ordinarily be used show up during the transformation. Both duplex and single stranded systems of

three and 12 nucleotides were prepared as in the above listing, though single stranded systems had only half

the number of counterions.

After the modified prepare step, all systems went through standard equilibration and 400 ps of production.

They were then subjected to the initial thermodynamic integration stage.

8oG starting thermodynamic integration

31 # M u l t i s t e p Thermodynamic I n t e g r a t i o n : FORWARD32 md33 system 8 o G e n e r g y t r i m e r _ t i34 c u t o f f 1 .035 noshake s o l u t e36 s s s d e l t a 0 .08537 l e a p f r o g38 new fo rward 21 of 21 e r r o r 5 .0 d r i f t 5 . 0 f a c t o r 0 .75

39

39 s t e p 0 .001 e q u i l 1000 d a t a 500000 over 500040 i s o t h e r m 298.15 t r e l a x 0 .141 i s o b a r42 p r i n t s t e p 500 s t a t 500043 upda te p a i r s 10 c e n t e r 1044 r e c o r d r e s t 100045 load p a i r s46 end ; t a s k md thermodynamics

The key lines in this excerpt of the full integration script (found in A.2.3) are 38, 39 and 46, which define

and execute the thermodynamic task. 38 indicates that the thermodynamic integration will run over 21 win-

dows, with a maximum statistical error of 5.0 kJ/mol in each ensemble, maximum drift of 5.0 kJmol−1ps−1

allowed in the ensemble average derivative of the Hamiltonian with respect toλ, and a minimum ensemble

size of 0.75 times the previous ensemble size for each successive ensemble. Line 39 specifies (for each en-

semble) a step size of 1 fs, 1000 steps of equilibration prior to data-gathering, a maximum of 500000 data

gathering steps, and a minimum of 5000 data gathering steps. Each ensemble will contain at least 5000 sam-

ples, at most 500000 samples, but will contain an intermediate number of samples if the statistical error drops

below the specified value before all 500000 samples have been collected. After the integration task com-

pletes, the system is equilibrated for a short time and then used in a very similar reverse phase calculation

transforming guanine back to 8-oxoguanine.

The differences in free energy change induced by the absence/presence of a second strand are small com-

pared compared to the magnitude of the calculated free energy change itself, so it is important to calculate the

free energy values with as much precision as possible. It is possible to use the output of one thermodynamic

integration task as the input for a second task that improves the statistics with tighter error tolerances and

additional samples in the ensembles.

8oG starting thermodynamic integration

4 # M u l t i s t e p Thermodynamic I n t e g r a t i o n : FORWARD5 md6 system 8 o G e n e r g y t r i m e r _ t i7 c u t o f f 1 .08 noshake s o l u t e9 s s s d e l t a 0 .085

10 l e a p f r o g

40

Figure 16: The transformation of G to 8oG is slightly more energetically favorable in a single strand. Theimplication is that energy A is greater than energy B; it has to go “further uphill” thermodynamically toseparate the strands.

11 ex tend fo rward 21 of 21 e r r o r 2 .5 d r i f t 5 . 0 f a c t o r 0 .7512 s t e p 0 .001 e q u i l 1000 d a t a 500000 over 500013 i s o t h e r m 298.15 t r e l a x 0 .114 i s o b a r15 p r i n t s t e p 500 s t a t 500016 upda te p a i r s 10 c e n t e r 1017 r e c o r d r e s t 100018 load p a i r s19 end ; t a s k md thermodynamics

This excerpt (full input listing A.2.4) from an extension of thermodynamic integration should look very

familiar. Only line 11 has changed: the error tolerance has been tightened. Because thermodynamic integra-

tion is so computationally expensive, it was necessary to improve the statistics over multiple runs so as not to

exceed the machine runtime limit for any one calculation. Both the trimer and 12mer were run three times,

starting with error at 5.0 and then declining to 2.5 and 1.5.

41

Table 6: thermodynamic integration results for 8oG in trimers and 12mers

12mer DS F DS R SS F SS R DS error SS error F difference R differenceinitial 191.86 -187.34 196.41 -187.41 4.52 9.00 -4.56 -0.08

extension 1 192.19 -187.01 196.05 -190.18 5.18 5.88 -3.86 -3.16extension 2 192.30 -188.74 194.40 -190.81 3.56 3.59 -2.10 -2.07

trimerinitial 185.85 -195.79 186.71 -195.33 -9.94 -8.62 -0.86 0.459

extension 1 184.32 -190.46 186.33 -195.72 -6.14 -9.39 -2.00 -5.254extension 2 184.52 -189.25 188.39 -194.41 -4.73 -6.02 -3.87 -5.157

Table 65 contains the results for 12mers and trimers as found by using the above prescription for calcu-

lating free energies. The differences between these energies in the cases of single and double strands indicate

the effects of 8oG on duplex stability. In all but one simulation, the free energy change from guanine to 8-

oxoguanine was negative and of slightly greater magnitude in the single-stranded case. This is in qualitative

agreement with melting temperature experiments, but the 12mer results of -0.08 to -4.56 kJ/mol duplex desta-

bilization fall short of the previosuly mentioned experimental results for a 13mer of−2.0 ± 0.7 kcal/mol at

25 °C. Further, the error (as defined by the difference in magnitude between forward and reverse results) is of

the same order as the calculated destabilizations themselves, and the trend upon extension was to increase the

calculated destabilization in the trimer whiledecreasingthe calculated destabilization in the 12mer. Though it

is doubtful that the indicated destabilization results are completely artifactual, given their similar quality over

multiple extensions of independent simulations, it may be very difficult and computationally expensive to

determine the destabilization with good precision and accuracy, given that the procedure will always involve

finding a small difference between two much larger numbers.

3.4 Energetic additivity of two common DNA lesions, 8oG and thymine-glycol

The encouraging but difficult initial simulations of duplex DNA destabilization by 8oG prompted the exami-

nation of a perhaps easier question: is there significant energetic interaction between nearby lesions in DNA?

5In table: DS=double stranded, SS=single stranded, F=forward, R=reverse, all values given in kJ/mol

42

Figure 17: thymine-glycol and its parent thymine

Given two lesionsA andB introduced into separate systems with the same starting sequence at nearby lo-

cations, and two identical lesions introduced simultaneously into the same locations in the same sequence, is

the sum of the free energy changes associated withA andB in separate systems approximately equal to the

free energy change associated with their simultaneous introduction in the same system? Clustered damage,

associated with high-LET radiation, exhibits multiple lesions in close proximity. Biological, chemical, and

physical hallmarks of clustered damage are all interesting. If lesion formation were thermodynamically fa-

vored or suppressed by the presence of other lesions, it would be a significant chemical revelation about an

involved biochemical/biophysical process.

In order to examine the free energy changes of simultaneous vs. isolated lesion introduction, two model

lesions were chosen. One was the standby 8oG, while the other was a somewhat more complicated lesion,

thymine-glycol. Thymine-glycol (Figure 17), as its name implies, is a thymine residue that has been mod-

ified by the oxidative introduction of two hydroxyl groups on the ring, leading to ring saturation, increased

hydrophilic character, and the distortion/displacement of thymine’s characteristic methyl group. Thymine-

glycol is known to be formed by the effects of ionizing radiation and chemical oxidants such as osmium

tetroxide on thymine and is a well-studied lesion.

In order to use thymine-glycol in any molecular dynamics simulation, appropriate parameters had to be

developed. CRESP charges were calculated using NWChem and ECCE operating on an isolated thymine-

glycol base, minus phosphate and terminated with hydrogens. The thymine-glycol base itself came from

a pentamer that had been geometry optimized by quantum mechanical methods in vacuo. 2H2* and 3H2*,

2H5* and 3H5*, and the three methyl hydrogens were constrained to have equal charges as standard AMBER

43

thymine has equal charges on those atoms as well. The partial charges on the terminating hydrogens were

forced to sum to 0.118300, so that the sugar and base together would have a net charge of -0.118300 when

the system was neutral. Combined with the phosphate, this would give the standard -1.0 net charge for the

nucleotide. The 6-31G* basis set was used for the calculation and all other parameters were left at their

defaults.

Missing AMBER parameters were chosen by hand-picking a small number of potential substitutes on

the basis of chemical similarity, and then using the parameters whose equilibrium values were closest to

the angles/dihedrals measured in the optimized pentamer. Some cases were degenerate, with all candidates

having the same equilibrium values and force constants, in which case measurements were not needed to make

the choice. The improper dihedral CT-CT-N*-C, whose parameters originally came from C-CT-N-H, had a

large difference in its measured dihedral angle in the optimized structure vs. the standard equilibrium dihedral

angle (2.71 vs. 3.14 radians). In this case the equilibrium dihedral angle was therefore modified to match the

measured value. Vibrational frequency calculations indicated that the 4 lowest vibrational modes for thymine-

glycol with the chosen parameters were 76.5, 113.7, 182.2, and 213.7cm−1, which were considered to be in

reasonable agreement with the previously found HF/6-31G vibrational modes of 94.5, 145.1, 191.3, and 230.9

cm−1 and AMBER vibrational modes of 74.1, 99.2, 124.3, and 180.4cm−1[20]. The vibrational frequency

calculation was however performed with an earlier iteration of the thymine-glycol fragment that did not use

charge constraints in the charge assignment stage, and is worth revisiting with the finalized parameters.

Once the parameters were chosen, the thymine-glycol lesion was copied from the pentamer of optimized

geometry and inserted in place of thymine in a standard B-DNA 12mer using the Insight II molecular mod-

eling software. The lesion’s geometry was retained from the optimized structure because it was believed

that its conformation plays a large role in its behavior and that the standard duration of molecular dynamics

simulation would not be sufficient to yield an equilibrium conformation due to the large rearrangements in-

volved. The structure was solvated, equilibrated, and subjected to 2000 ps of standard MD production after

having appropriate ’modify atom’ commands added in task prepare to enable its transformation back to gua-

nine (A.2.5). The greater length of production, compared to the initial energy work with 8oG, was chosen

in an effort to start with a better-equilibrated structure. To obtain useful results from the thermodynamic

44

Table 7: thymine-glycol modified partial charges and AMBER parameters

atom partial chargeC5* 0.668360C4* 0.040452C3* 0.417171C2* -0.105762C1* 0.547308O4* -0.499514O3* -0.637141N1 -0.669837C2 -0.801431N3 -0.618999C4 0.586340C5 0.375974C6 0.614136O2 -0.602388O4 -0.564701

C5M -0.298869H1* 0.041465

2H2*/3H2* 0.007524H3* -0.036346H4* 0.010585

2H5*/3H5* -0.150685H3 0.373939H6 0.061631

2H5M/3H5M/4H5M 0.075801O6 -0.756966

O5M -0.654961H6O 0.444641H5O 0.402670

missing parameter existing AMBER parameterCT-N*-CT CT-N-CTNA-C-CT CT-C-NN*-CT-HC CT-CT-N*N*-CT-OH N*-CT-OSHC-CT-OH H1-CT-OH

CT-CT-N*-C C-CT-N-H6

45

Figure 18: placement of the thymine-glycol lesion in 12mer

Figure 19: RMSD for thymine-glycol in 12mer: RMSD is perhaps still growing at 2000 ps, but not rapidly

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0 500 1000 1500 2000

rmsd

(nm

)

time (ps)

thgenergy (thgenergy) rmsd

integration method, the system about to undergo transformation must have achieved equilibrium. In reality,

it is impossible to determine if a system has truly achieved equilibrium, but longer periods of production and

examination of the system RMSD can give some confidence that the system is no longer rapidly changing in

a systematic way. The RMSD plot for thymine-glycol in a 12mer (Figure 19) suggests that the RMSD is still

slowly growing at 2000 ps, but is mostly fluctuating. Had the simulation been stopped at 400 ps, it would

have had a too-low RMSD plus some possibly chaotic changes to interfere with free energy determination

during thermodynamic integration.

46

Figure 20: RMSD for 8oG 12mer extended to 2000 ps

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

0.26

0.28

0.3

0 200 400 600 800 1000 1200 1400 1600 1800 2000

rmsd

(nm

)

time (ps)

8oG12 (8oG12) rmsd

The final restart file generated by the 2000 ps of MD production was used as input for thermodynamic

integration, which used the same parameters as discussed previously for 8oG. At the completion of the initial

stage of thermodynamic integration, the thymine to thymine-glycol transformation was found to be associated

with a free energy change of -1263 kJ/mol (according to the forward calculation) or -1266 kJ/mol (according

to the reverse calculation). These are large free energy changes. Intuitively, it makes sense that thymine-

glycol is substantially more disruptive to thymine than 8oG is to guanine, but it is still an unusually large

change in free energy. It is possible that this value is in systematic error and demands a greater number of

windows or longer per-window equilibration periods in order to avoid free energy change over-reporting.

Additionally, NWChem did not report separate energies including and excluding mass contributions, even

though thymine-glycol and thymine certainly have different masses.

In order to generate an 8oG free energy change used for comparison in a multiple-lesion case, the same

double stranded 12mer simulation that was previously used to calculate strand separation energies was ex-

tended up to 2000 ps to match the thymine-glycol simulation. The final output file was used as input to

thermodynamic integration, and it appears to come from a stable region of 1600-2000 ps similar to the region

between 200 and 800 ps. Upon standard first-stage thermodynamic integration, the free energy change from

G to 8oG was found to be -186.5 kJ/mol (according to forward calculation) or -185.5 kJ/mol (according to

47

Figure 21: placement of adjacent thymine-glycol and 8oG lesions in 12mer

Figure 22: RMSD for 12mer with adjacent thymine-glycol and 8oG

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0 200 400 600 800 1000 1200 1400 1600 1800 2000

rmsd

(nm

)

time (ps)

thg8oGenergy (thg8oGenergy) rmsd

reverse calculation).

The simulation containing adjacent thymine-glycol and 8oG was started using the DNA 12mer that had

already been joined with thymine-glycol and introducing 8oG next to it using ECCE. Modify atom commands

were used to prepare the system for transformation of both lesions back to native nucleotides. The simulation

was run out to 2000 ps as in the simulations of individual lesions and the end of the simulation was used as

input for thermodynamic integration. The RMSD seems to indicate a lack of systematic change in the last

700 ps or so of simulation, the tail end of which was ultimately used.

48

Figure 23: separate vs. combined 8oG and thymine glycol lesions: free energy change of lesion introductionappears to be additive

At the end of the first stage of thermodynamic integration, the free energy change associated with simulta-

neously transforming guanine to 8oG and thymine to thymine-glycol was found to be -1455 kJ/mol (forward

(reverse not yet available)). This is remarkably close to the sum of the separate free energy changes, -1450

kJ/mol. If there is any energetic interaction between the lesions here, it is a very subtle effect. The energies

look quite additive.

To further test the additivity of lesions, a simulation containing two 8oG lesions in the familiar 12mer

context was undertaken. The sequence did not permit them to be adjacent; they were separated by two base

pairs. 2000 ps of MD production yielded a trajectory whose RMSD appeared nicely stable in the latter

portion; the tail end supplied the starting point for thermodynamic integration. Thermodynamic integration

yielded a free energy change for two guanine bases becoming two 8oG lesions that is very nearly the same

as twice that of the single lesion: the forward phase of thermodynamic integration gave a free energy change

of -379.0 kJ/mol, while two times the single-lesion result gives -373.0 kJ/mol. Again, if there is interaction,

it is very subtle.

49

Figure 24: placement of dual 8oG lesions in 12mer sequence

Figure 25: RMSD for 12mer with dual 8oG lesions

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0 500 1000 1500 2000

rmsd

(nm

)

time (ps)

double8oG (double8oG) rmsd

50

4 Conclusions and future work

Molecular dynamics has proven itself a versatile tool for investigating normal and damaged DNA oligonu-

cleotides, capable of revealing information about conformation, hydration, and even thermodynamic quan-

tities. In particular, the thermodynamic results gathered thus far are encouraging but incomplete. Further

multiple-lesion simulations, and the incorporation of said simulations into thermodynamic cycles, are needed

to ensure the reproducibility of lesion free-energy evaluations and develop their relationship with experimen-

tally accessible quantities.

51

References

[1] Atkins, Peter. Physical Chemistry. 5th ed. New York: W. H. Freeman and Company. 1994.

[2] Bayly, Christopher I., Piotr Cieplak, Wendy D. Cornell, and Peter A. Kollman, A Well-Behaved Elec-

trostatic Potential Based Method Using Charge Restraints for Deriving Atomic Charges: The RESP

Model.Journal of Physical Chemistry, 97, 10269-10280 (1993).

[3] PDB ID: 1A9G, Apurinic DNA With Bound Water At The Damaged Site and N3 Of Cytosine,β Form,

NMR, 1 Structure. Beger, Richard D., and Philip H. Bolton. http://www.pdb.org/

[4] Beger, Richard D., and Philip H. Bolton, Structures of Apurinic and Apyrimidinic Sites in Duplex

DNAs. The Journal of Biological Chemistry, 273, 15565-15573 (1998).

[5] Berman, H.M., J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne,

The Protein Data Bank.Nucleic Acids Research, 28, 235-242 (2000).

[6] Berendsen, H.J.C., J.R. Grigera, and T.P. Straatsma, The Missing Term in Effective Pair Potentials.

Journal of Physical Chemistry, 91, 6269-6271 (1987).

[7] Berendsen, H.J.C., J.P.M. Postma, W.F. van Gunsteren, A. DiNola, and J.R. Haak, Molecular Dynamics

With Coupling to an External Bath.Journal of Chemical Physics, 81, 3369-3755 (1984).

[8] Black, G.; Didier, B.; Elsethagen, T.; Feller, D.; Gracio, D.; Hackler, M.; Havre, S.; Jones, D.; Jurrus,

E.; Keller, T.; Lansing, C.; Matsumoto, S.; Palmer, B.; Peterson, M.; Schuchardt, K.; Stephan, .E.;

Sun, L.; Taylor, H.; Thomas, G.; Vorpagel, E.; Windus, T.; Winters, C.; "Ecce, A Problem Solving

Environment for Computational Chemistry, Software Version 3.2.3" (2005), Pacific Northwest National

Laboratory, Richland, Washington 99352-0999, USA.

[9] Cheatham III, Thomas E., Piotr Cieplak, and Peter A. Kollman, A Modified Version of the Cornellet al.

Force Field with Improved Sugar Pucker Phases and Helical Repeat.Journal of Biomolecular Structure

and Dynamics, 16, 845-862 (1999).

52

[10] Cornell, Wendy D., Piotr Cieplak, Christopher I. Bayly, Ian R. Gould, Kenneth M. Merz, Jr., David M.

Ferguson, David C. Spellmeyer, Thomas Fox, James W. Caldwell, and Peter A. Kollman, A Second

Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules.Journal

of the American Chemical Society117, 5179-5197 (1995).

[11] Cramer, Christopher J. Essentials of Computational Chemistry: Theories and Models.2nd ed. West

Sussex, England: John Wiley & Sons Ltd, 2004.

[12] DeLano, W. L., The PyMOL Molecular Graphics System. San Carlos, CA: DeLano Scientific, 2002.

http://www.pymol.org/

[13] Essman, Ulrich, Lalith Perera, and Max L. Berkowitz, A Smooth Particle Mesh Ewald Method.The

Journal of Chemical Physics, 103, 8577-8593 (1995).

[14] Harrison, R.J.; Nichols, J.A.; Straatsma, T.P.; et al. NWChem, a computational chemistry package for

parallel computers, version 4.5. Pacific Northwest National Laboratory, 2003.

[15] Ishida, H., J. Biomol. Struct. Dyn. 19, 839-851 (2002).

[16] Jensen, Frank. Introduction to Computational Chemistry. New York: John Wiley and Sons Ltd, 1999.

[17] Kroon, J., J. A. Kanters, J. G. C. M. van Duijeneveldt-van de Rijdt, F. B. van Duijneveldt and J. A.

Vliegenthart, O-H··O Hydrogen Bonds in Molecular Crystals: A Statistical and Quantum-Mechanical

Analysis.Journal of Molecular Structure, 24, 109-129 (1975).

[18] Lavery, Richard, and Heinz Sklenar, The Definition of Generalized Helicoidal Parameters and of Axis

Curvature for Irregular Nucleic Acids.Journal of Biomolecular Structure and Dynamics, 6, 63-91

(1988).

[19] Lavery, Richard, and Heinz Sklenar, Defining the Structure of Irregular Nucleic Acids: Conventions

and Principles.Journal of Biomolecular Structure and Dynamics, 6, 655-657 (1989).

53

[20] Miaskiwewicz, Karol, John Miller, Rick Ornstein, and Roman Osman, Molecular Dynamics Simula-

tions of the Effects of Ring-Saturated Thymine Lesions on DNA Structure.Biopolymers35, 113-124

(1995).

[21] Miller, J.H., C. P. Fan-Chiang, T. P. Straatsma and M. A. Kennedy, 8-Oxoguanine Enhances Bending of

DNA that Favors Binding to Glycosylases.Journal of the American Chemical Society125, 6331-6336

(2003).

[22] Nelson, David L., and Michael M. Cox. Lehninger Principles of Biochemistry. W.H. Freeman, 2004.

[23] Pearlman, David A., David A. Case, James W. Caldwell, Wilson S. Ross, Thomas E. Cheatham III, Steve

DeBolt, David Ferguson, George Seibel, Peter Kollman, AMBER, a package of computer programs for

applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations

to simultate the structural and energetic properties of molecules.Computer Physics Communications,

91, 1-41 (1995).

[24] Plum, G.E., A. P. Grollman, F. Johnson and K. J. Breslauer, Influence of the Oxidatively Damaged

Adduct 8-oxodeoxyguanosine on the Conformation, Energetics, and Thermodynamic Stability of a

DNA Duplex.Biochemistry34, 16148-16160 (1995).

[25] van Rossum, Guido, and F. L. Drake (eds), Python Reference Manual Release 2.2.3. Python Software

Foundation, 2003. Available at http://www.python.org

[26] Ryckaert, Jean-Paul, Giovanni Ciccotti, and Herman J.C. Berendsen, Numerical Integration of of the

Cartesian Equations of Motion of a System with Constraints: Molecular Dynamics of n-Alkanes.Jour-

nal of Computational Physics, 23, 327-341 (1977).

[27] Saenger, Wolfram. Principles of Nucleic Acid Structure. New York: Springer-Verlag, 1984.

[28] von Sonntag, C. The Chemical Basis of Radiation Biology. New York: Taylor & Francis, 1987.

[29] Wang, Junmei, Piotr Cieplak, Peter A. Kollman, How Well Does a Restrained Electrostatic Potential

(RESP) Model Perform in Calculating Conformational Energies of Organic and Biological Molecules?

Journal of Computational Chemistry, 21, 1049-1074 (2000).

54

[30] West, David. 2005. Extraction of Helical Parameters from Molecular Dynamics Simulations of Duplex

DNA. Unpublished master’s thesis. Washington State University.

[31] Westhof, Eric (ed). Water and Biological Macromolecules. Boca Raton, FL: CRC Press, 1993.

55

A Code appendix

A.1 Python scripts for NWChem input generation

A.1.1 fragment-downgrade.py

# ! / us r / b in / env py thon2 . 2import sys

coun t = 0f o r l i n e i n open ( sys . a rgv [ 1 ] ) . r e a d l i n e s ( ) :

i f coun t == 1 :l i n e = l i n e [ : 5 ] + ’ \ n ’

e l i f coun t == 2 :l i n e = ’ ’

e l i f l e n ( l i n e . s p l i t ( ) ) > 2 :l i n e = l i n e [ : 3 0 ] + l i n e [ 3 5 : ]

sys . s t d o u t . w r i t e ( l i n e )coun t += 1

A.1.2 make_nw_inputs.py

# ! / us r / b in / env py thon2 . 2import sys , s t r i n g

c l a s s makeTaskPrepare :def _ _ i n i t _ _ ( s e l f , name ) :

s e l f . name = names e l f . f i l e n a m e = name + ’ _ tp . nw ’t r y :

# open t h e f i l e f o r w r i t i n gs e l f . o u t p u t = open ( s e l f . f i l ename , ’w ’ )

excep t:p r i n t ’ e r r o r open ing ’ + s e l f . f i l e n a m esys . e x i t ( 1 )

s e l f . t i t l e = ’ ’s e l f . p l a t f o r m = ’ none ’s e l f . t ype = ’ normal ’s e l f . s o l u t e _ r a n g e = ’ 1 24 ’

def w r i t e 2 f i l e ( s e l f ) :s e l f . o u t p u t . w r i t e ( ’ # nwvisus f i l e a u t o m a t i c a l l y g e n e r a t e d by

make_nw_inputs . py \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # g e n e r a t e d t o work w i th nwchem i n ’ + s e l f .

p l a t f o r m + ’ \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # made by A l e j a n d r o Aceves− PNNL − J u l y 2003 \ n

\ n ’ )

56

s e l f . o u t p u t . w r i t e ( ’ p e r m a n e n t _ d i r ’ )i f s e l f . p l a t f o r m == ’ nwv isus ’ :

s e l f . o u t p u t . w r i t e ( s e l f . pa th + ’ \ n \ n ’ )e l s e :

s e l f . o u t p u t . w r i t e ( ’ / home / ’ + s e l f . u s e r + ’ / ’ + s e l f . name + ’ /s t e p 1 \ n \ n ’ )

s e l f . o u t p u t . w r i t e ( ’ T i t l e " ’ + s e l f . t i t l e + ’ " \ n \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # Genera l v a r i a b l e s \ n ’ )s e l f . o u t p u t . w r i t e ( ’ p r i n t h igh \ n ’ )s e l f . o u t p u t . w r i t e ( ’ \ n s t a r t ’ + s e l f . name + ’ I n i t \ n \ n ’ )s e l f . o u t p u t . w r i t e ( ’ p r e p a r e \ n ’ )s e l f . o u t p u t . w r i t e ( ’ sys tem ’ + s e l f . name + ’ _ tp \ n ’ )s e l f . o u t p u t . w r i t e ( ’ c h a i n * \ n f r a c t i o n 1 2 3 \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # g e n e r a t e new t opo lo gy f i l e \ n new_top new_seq

\ n ’ )s e l f . o u t p u t . w r i t e ( ’ g r i d 24 0 . 8 \ n c o u n t e r 8 Na \ n touch 0 . 3 \ n ’ )s e l f . o u t p u t . w r i t e ( ’ expand 0 . 2 \ n c e n t e r ; o r i e n t \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # reduce t h e d e f a u l t box s i z e \ n s o l v a t e box

6 .8 6 .8 9 . 8 \ n ’ )i f s e l f . t ype == ’pmf ’ :

s e l f . o u t p u t . w r i t e ( ’ # pmf p a r a m e t e r s \ n ’ )s e l f . o u t p u t . w r i t e ( ’ s e l e c t 1 19 : _P 19 : _O1P 19 : _O2P 19 : _O5* 18 :

_O3* \ n ’ )s e l f . o u t p u t . w r i t e ( ’ s e l e c t 2 20 : _P 20 : _O1P 20 : _O2P 20 : _O5* 19 :

_O3* \ n ’ )s e l f . o u t p u t . w r i t e ( ’ pmf b i a s d i s t a n c e 1 2 0 .7 0 .50 12500 12500\ n

’ )s e l f . o u t p u t . w r i t e ( ’ pmf b a s e p a i r 1 24 \ n ’ )s e l f . o u t p u t . w r i t e ( ’ pmf b a s e p a i r 12 13 \ n ’ )

s e l f . o u t p u t . w r i t e ( ’ # w r i t e pdb and r e s t a r t f i l e f o r f u t u r e use \ n ’)

s e l f . o u t p u t . w r i t e ( ’ w r i t e pdb ’ + s e l f . name + ’ _ in i tH2O . pdb \ n ’ )s e l f . o u t p u t . w r i t e ( ’ w r i t e r s t ’ + s e l f . name + ’ _ i n i t . r s t \ n ’ )s e l f . o u t p u t . w r i t e ( ’ end \ n ’ )s e l f . o u t p u t . w r i t e ( ’ \ n t a s k p r e p a r e \ n ’ )s e l f . o u t p u t . w r i t e ( ’ \ n# end of t a s k p r e p a r e \ n \ n ’ )s e l f . o u t p u t . c l o s e ( )

c l a s s makeRe laxa t ion :def _ _ i n i t _ _ ( s e l f , name ) :

s e l f . name = names e l f . f i l e n a m e = name + ’ _rx . nw ’t r y :

# open t h e f i l e f o r w r i t i n gs e l f . o u t p u t = open ( s e l f . f i l ename , ’w ’ )

57

excep t:p r i n t ’ e r r o r open ing ’ + s e l f . f i l e n a m esys . e x i t ( 1 )

s e l f . t i t l e = ’ ’s e l f . p l a t f o r m = ’ none ’s e l f . t ype = ’ normal ’s e l f . u s e r = ’ merns t ’s e l f . e x t e n s i o n = [ ’ ’ ]f o r i i n range ( 1 , 2 1 ) :

i f i < 10 :s e l f . e x t e n s i o n . append ( ’ 00 ’ + ’%d ’ % i )

e l s e:s e l f . e x t e n s i o n . append ( ’ 0 ’ + ’%d ’ % i )

s e l f . t e m p e r a t u r e = [ ’ 50 .0 ’ , ’ 100 .0 ’ , ’ 150 .0 ’ , ’ 200 .0 ’ , ’ 250 .0 ’ , ’298 .15 ’ ]

def w r i t e 2 f i l e ( s e l f ) :s e l f . o u t p u t . w r i t e ( ’ # nwvisus f i l e a u t o m a t i c a l l y g e n e r a t e d by

make_nw_inputs . py \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # g e n e r a t e d t o work w i th nwchem i n ’ + s e l f .

p l a t f o r m + ’ \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # made by A l e j a n d r o Aceves− PNNL − J u l y 2003 \ n

\ n ’ )s e l f . o u t p u t . w r i t e ( ’ p e r m a n e n t _ d i r ’ )i f s e l f . p l a t f o r m == ’ nwvisus ’ :

s e l f . o u t p u t . w r i t e ( s e l f . pa th + ’ \ n \ n ’ )e l s e :

s e l f . o u t p u t . w r i t e ( ’ / home / ’ + s e l f . u s e r + ’ / ’ + s e l f . name + ’ /s t e p 2 \ n \ n ’ )

s e l f . o u t p u t . w r i t e ( ’ # In o r d e r t o run t h i s s c r i p t , 2 f i l e s needs t obe cop ied from s t e p 1 \ n ’ )

s e l f . o u t p u t . w r i t e ( ’ # i n t h i s case copy : \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # / home / ’ + s e l f . u s e r + ’ / ’ + s e l f . name + ’ / s t e p 1

/ ’ )s e l f . o u t p u t . w r i t e ( s e l f . name + ’ _ i n i t . r s t t o t h i s d i r e c t o r y \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # w i th t h e name ’ + s e l f . name + ’ _rx . r s t \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # and : \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # / home / ’ + s e l f . u s e r + ’ / ’ + s e l f . name + ’ / s t e p 1

/ ’ )s e l f . o u t p u t . w r i t e ( s e l f . name + ’ . top t o t h i s d i r e c t o r y w i th t h e same

name \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # The f o l l o w i n g i n s t r u c t i o n s shou ld work \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # ( comment them ou t i f i t i s done manua l ly ) : \ n ’ )s e l f . o u t p u t . w r i t e ( ’ t a s k s h e l l " cp ’ )s e l f . o u t p u t . w r i t e ( ’ / home / ’ + s e l f . u s e r + ’ / ’ + s e l f . name + ’ / s t e p 1 /

58

’ )s e l f . o u t p u t . w r i t e ( s e l f . name + ’ _ i n i t . r s t ’ )s e l f . o u t p u t . w r i t e ( s e l f . name + ’ _rx . r s t " \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # t a s k s h e l l " cp ’ )s e l f . o u t p u t . w r i t e ( ’ / home / ’ + s e l f . u s e r + ’ / ’ + s e l f . name + ’ / s t e p 1 /

’ )s e l f . o u t p u t . w r i t e ( s e l f . name + ’ . top . " \ n \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # Also , i f a s c r i p t has been ran , t h e f i l e may

be f i n d a t : \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # . . / s t e p 1 / r s t s \ n ’ )# S t a r t i n gs e l f . o u t p u t . w r i t e ( ’ T i t l e " ’ + s e l f . t i t l e + ’ " \ n \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # Genera l v a r i a b l e s \ n ’ )s e l f . o u t p u t . w r i t e ( ’ p r i n t h igh \ n ’ )s e l f . o u t p u t . w r i t e ( ’ \ n s t a r t ’ + s e l f . name + ’ _ R e l a x a t i o n \ n \ n ’ )# S t e e p e s t d e s c e n ts e l f . o u t p u t . w r i t e ( ’ # 2000 s t e e p e s t d e s c e n t w i th s o l u t e f i x e d \ n \ n ’ )s e l f . o u t p u t . w r i t e ( ’md \ n ’ )s e l f . o u t p u t . w r i t e ( ’ sys tem ’ + s e l f . name + ’ _rx \ n ’ )s e l f . o u t p u t . w r i t e ( ’ noshake s o l u t e \ n f i x s o l u t e %s \ n sd 2000\ n ’

% s e l f . s o l u t e _ r a n g e )s e l f . o u t p u t . w r i t e ( ’ end \ n \ n t a s k md o p t i m i z e \ n ’ )# Fix s o l u t et e m p e r a t u r e = [ ’ 50 .15 ’ , ’ 298 .15 ’ ]# f i l e E x t e n s i o n s i s a b i t o f a k ludge t h a t a l l o w s s t e e p e s t d e s c e n t

o u t p u t f i l e s# t o be cop ied a t a p p r o p r i a t e p o i n t s , w i t h o u t d i s r u p t i n g t h e f l ow o f

t h e e x i s t i n g# code ( o ld code c a l l e d a l l f i l e s . r s t , s t e e p e s t d e s c e n t makes . q rs

f i l e s )f i l e E x t e n s i o n s = [ ’ . r s t ’ ] * 10f i l e E x t e n s i o n s [ 0 ] = ’ . q r s ’f i l e E x t e n s i o n s [ 3 ] = ’ . q r s ’# u s e I s o b a r i s a s i m i l a r t r i c k ; i s o b a r no t needed a t 50 Ku s e I s o b a r = [ ’ i s o b a r ’ , ’ i s o b a r ’ ]f o r i i n range ( 2 ) :

s e l f . o u t p u t . w r i t e ( ’ \ n \ n# R e l a x a t i o n s t e p a t ’ + t e m p e r a t u r e [ i ] +’ d e g r e e s K ’ )

s e l f . o u t p u t . w r i t e ( ’ w i th s o l u t e f i x e d \ n ’ )s e l f . o u t p u t . w r i t e ( ’ \ n t a s k s h e l l " cp ’+ s e l f . name + ’ _rx ’ )s e l f . o u t p u t . w r i t e ( s e l f . e x t e n s i o n [ i ]+ ’%s ’ % f i l e E x t e n s i o n s [ i ] )s e l f . o u t p u t . w r i t e ( s e l f . name + ’ _rx ’+ s e l f . e x t e n s i o n [ i +1] + ’ . r s t

" \ n ’ )s e l f . o u t p u t . w r i t e ( ’md \ n ’ )s e l f . o u t p u t . w r i t e ( ’ sys tem ’ + s e l f . name + ’ _rx ’ + s e l f .

59

e x t e n s i o n [ i +1] + ’ \ n ’ )s e l f . o u t p u t . w r i t e ( ’ v r e a s s 100 ’+ t e m p e r a t u r e [ i ] + ’ \ n f i x

s o l u t e %s \ n ’ % s e l f . s o l u t e _ r a n g e )s e l f . o u t p u t . w r i t e ( ’ e q u i l 0 d a t a 10000 s t e p 0 . 0 0 1 \ n ’ )s e l f . o u t p u t . w r i t e ( ’ i s o t h e r m ’+ t e m p e r a t u r e [ i ] + ’ t r e l a x 0 .1

0 . 1 \ n %s \ n ’ % u s e I s o b a r [ i ] )s e l f . o u t p u t . w r i t e ( ’ p r i n t s t e p 100 s t a t 1000\ n ’ )s e l f . o u t p u t . w r i t e ( ’ end \ n \ n t a s k md dynamics \ n ’ )

# Fix s o l v e n ts e l f . o u t p u t . w r i t e ( ’ \ n \ n# 2000 s t e e p e s t d e s c e n t w i th s o l v e n t f i x e d \ n

\ n ’ )s e l f . o u t p u t . w r i t e ( ’ t a s k s h e l l " cp ’+ s e l f . name + ’ _rx ’ )s e l f . o u t p u t . w r i t e ( s e l f . e x t e n s i o n [2 ]+ ’ . r s t ’ )s e l f . o u t p u t . w r i t e ( s e l f . name + ’ _rx ’+ s e l f . e x t e n s i o n [ 3 ] + ’ . r s t " \ n ’ )# S t e e p e s t d e s c e n t w i t h f i x s o l v e n ts e l f . o u t p u t . w r i t e ( ’md \ n ’ )s e l f . o u t p u t . w r i t e ( ’ sys tem ’ + s e l f . name + ’ _rx ’ + s e l f . e x t e n s i o n

[ 3 ] + ’ \ n ’ )s e l f . o u t p u t . w r i t e ( ’ f i x s o l v e n t \ n sd 2000\ n ’ )s e l f . o u t p u t . w r i t e ( ’ end \ n \ n t a s k md o p t i m i z e \ n ’ )

f o r i i n range ( 3 , 9 ) :s e l f . o u t p u t . w r i t e ( ’ \ n \ n# R e l a x a t i o n s t e p a t ’ + s e l f . t e m p e r a t u r e [

i −3] + ’ d e g r e e s K ’ )s e l f . o u t p u t . w r i t e ( ’ w i th s o l v e n t f i x e d \ n ’ )s e l f . o u t p u t . w r i t e ( ’ \ n t a s k s h e l l " cp ’+ s e l f . name + ’ _rx ’ )s e l f . o u t p u t . w r i t e ( s e l f . e x t e n s i o n [ i ]+ ’%s ’ % f i l e E x t e n s i o n s [ i ] )s e l f . o u t p u t . w r i t e ( s e l f . name + ’ _rx ’+ s e l f . e x t e n s i o n [ i +1] + ’ . r s t

" \ n ’ )s e l f . o u t p u t . w r i t e ( ’md \ n ’ )s e l f . o u t p u t . w r i t e ( ’ sys tem ’ + s e l f . name + ’ _rx ’ + s e l f .

e x t e n s i o n [ i +1] + ’ \ n ’ )s e l f . o u t p u t . w r i t e ( ’ v r e a s s 100 ’+ s e l f . t e m p e r a t u r e [ i−3] + ’ \ n

f i x s o l v e n t \ n ’ )s e l f . o u t p u t . w r i t e ( ’ e q u i l 0 d a t a 10000 s t e p 0 . 0 0 1 \ n ’ )s e l f . o u t p u t . w r i t e ( ’ i s o t h e r m ’+ s e l f . t e m p e r a t u r e [ i−3] + ’ t r e l a x

0 .1 0 . 1 \ n i s o b a r \ n ’ )s e l f . o u t p u t . w r i t e ( ’ p r i n t s t e p 100 s t a t 1000\ n ’ )s e l f . o u t p u t . w r i t e ( ’ end \ n \ n t a s k md dynamics \ n ’ )

s e l f . o u t p u t . w r i t e ( ’ \ n \ n# R e l a x a t i o n s t e p a t ’ + t e m p e r a t u r e [ 1 ] + ’d e g r e e s K ’ )

s e l f . o u t p u t . w r i t e ( ’ w i th n o t h i n g f i x e d \ n ’ )s e l f . o u t p u t . w r i t e ( ’ \ n t a s k s h e l l " cp ’+ s e l f . name + ’ _rx ’ )s e l f . o u t p u t . w r i t e ( s e l f . e x t e n s i o n [9 ]+ ’ . r s t ’ )s e l f . o u t p u t . w r i t e ( s e l f . name + ’ _rx ’+ s e l f . e x t e n s i o n [ 1 0 ] + ’ . r s t " \ n ’

60

)s e l f . o u t p u t . w r i t e ( ’md \ n ’ )s e l f . o u t p u t . w r i t e ( ’ sys tem ’ + s e l f . name + ’ _rx ’ + s e l f . e x t e n s i o n

[ 1 0 ] + ’ \ n ’ )## s e l f . o u t p u t . w r i t e ( ’ v r e a s s 100 ’+ t e m p e r a t u r e [1 ] + ’ \ n f i x

s o l v e n t 1 24 \ n ’ )s e l f . o u t p u t . w r i t e ( ’ e q u i l 0 d a t a 10000 s t e p 0 . 0 0 1 \ n ’ )i f s e l f . t ype == ’pmf ’ :

s e l f . o u t p u t . w r i t e ( ’ c u t o f f 1 . 2 \ n pmf \ n p r o f i l e \ n ’ )e l s e :

s e l f . o u t p u t . w r i t e ( ’ c u t o f f 1 . 0 \ n ’ )s e l f . o u t p u t . w r i t e ( ’ upda te c e n t e r 1 f r a c t i o n 1 \ n ’ )s e l f . o u t p u t . w r i t e ( ’ pme g r i d 64 o r d e r 4 ’ )i f s e l f . t ype == ’pmf ’ :

s e l f . o u t p u t . w r i t e ( ’ p r o c s 4 \ n ’ )e l s e:

s e l f . o u t p u t . w r i t e ( ’ \ n ’ )s e l f . o u t p u t . w r i t e ( ’ i s o t h e r m ’+ t e m p e r a t u r e [ 1 ] + ’ t r e l a x 0 .1 0 . 1 \

n ’ )i f s e l f . t ype == ’pmf ’ :

s e l f . o u t p u t . w r i t e ( ’ i s o b a r 1 .025 E5 t r e l a x 0 . 1 \ n ’ )e l s e:

s e l f . o u t p u t . w r i t e ( ’ i s o b a r \ n ’ )s e l f . o u t p u t . w r i t e ( ’ p r i n t s t e p 100 s t a t 1000\ n ’ )s e l f . o u t p u t . w r i t e ( ’ r e c o r d r e s t 1000 prop 100 coord 1000 s c o o r

100 \ n ’ )s e l f . o u t p u t . w r i t e ( ’ end \ n \ n t a s k md dynamics \ n ’ )s e l f . o u t p u t . w r i t e ( ’ \ n# end of R e l a x a t i o n \ n \ n ’ )s e l f . o u t p u t . c l o s e ( )

c l a s s makeProdMD :def _ _ i n i t _ _ ( s e l f , name ) :

s e l f . name = name## f i l e e x t = [ ’ 0 ’ ]s e l f . t i t l e = ’ ’s e l f . p l a t f o r m = ’MPP2 ’s e l f . u s e r = ’ merns t ’s e l f . e x t e n s i o n = [ ’ ’ ]

def o p e n O u t p u t F i l e ( s e l f ) :t r y :

# open t h e f i l e f o r w r i t i n gs e l f . o u t p u t = open ( s e l f . f i l ename , ’w ’ )

excep t:

61

p r i n t ’ e r r o r open ing ’ + s e l f . f i l e n a m esys . e x i t ( 1 )

def w r i t e 2 f i l e ( s e l f ) :## p r i n t ( s e l f . p s e c s / s e l f . p s e c _ s t e p + 1)f o r i i n range ( 1 , s e l f . p s e c s / s e l f . p s e c _ s t e p + 1) :

i f i < 10 :s e l f . e x t e n s i o n . append ( ’ 00 ’ + ’%d ’ % i )

e l i f i < 100 :s e l f . e x t e n s i o n . append ( ’ 0 ’ + ’%d ’ % i )

e l s e :s e l f . e x t e n s i o n . append ( ’%d ’ % i )

## p r i n t s e l f . e x t e n s i o n## p r i n t s e l f . p s e c s / ( s e l f . i n t e r _ i n p u t* s e l f . p s e c _ s t e p )n u m _ f i l e s = s e l f . p s e c s / ( s e l f . i n t e r _ i n p u t* s e l f . p s e c _ s t e p )

number = [ ]f o r i i n range ( n u m _ f i l e s +1) :

number . append ( ’%d ’ % ( i* s e l f . i n t e r _ i n p u t * s e l f . p s e c _ s t e p ) )## p r i n t s e l f . p s e c s % ( s e l f . i n t e r _ i n p u t* s e l f . p s e c _ s t e p )## p r i n t ( s e l f . i n t e r _ i n p u t* s e l f . p s e c _ s t e p )i f s e l f . p s e c s % ( s e l f . i n t e r _ i n p u t* s e l f . p s e c _ s t e p ) <> 0 :

n u m _ f i l e s = n u m _ f i l e s + 1e x t r a = 1number . append ( ’%d ’ % s e l f . p s e c s )

e l s e :e x t r a = 0

# s y s . e x i t ( 1 )f o r i i n range ( n u m _ f i l e s ) :

s e l f . f i l e n a m e = s e l f . name + ’_md_ ’ + number [ i ] + ’ t o ’ + number [ i+1] + ’ ps . nw ’

s e l f . o p e n O u t p u t F i l e ( )s e l f . o u t p u t . w r i t e ( ’ # nwvisus f i l e a u t o m a t i c a l l y g e n e r a t e d by

make_nw_inputs . py \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # g e n e r a t e d t o work w i th nwchem i n ’ + s e l f .

p l a t f o r m + ’ \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # made by A l e j a n d r o Aceves− PNNL − J u l y 2003

\ n \ n ’ )s e l f . o u t p u t . w r i t e ( ’ p e r m a n e n t _ d i r ’ )i f s e l f . p l a t f o r m == ’ nwvisus ’ :

s e l f . o u t p u t . w r i t e ( s e l f . pa th + ’ \ n \ n ’ )e l s e :

s e l f . o u t p u t . w r i t e ( ’ / home / ’ + s e l f . u s e r + ’ / ’ + s e l f . name + ’ /s t e p 3 \ n \ n ’ )

62

s e l f . o u t p u t . w r i t e ( ’ # In o r d e r t o run t h i s s c r i p t , 2 f i l e s needst o be cop ied ’ )

s e l f . o u t p u t . w r i t e ( ’ i n t h i s case copy : \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # / home / ’ + s e l f . u s e r + ’ / ’ + s e l f . name + ’ /

s t e p 3 / ’ )i f i == 0 :

s e l f . o u t p u t . w r i t e ( s e l f . name + ’ _rx010 . r s t t o t h i s d i r e c t o r y \ n ’ )e l s e:

s e l f . o u t p u t . w r i t e ( s e l f . name + ’_md ’ + s e l f . e x t e n s i o n [ i* s e l f .i n t e r _ i n p u t ] + ’ . r s t \ n ’ )

s e l f . o u t p u t . w r i t e ( ’ # w i th t h e name ’ + s e l f . name + ’_md ’ )s e l f . o u t p u t . w r i t e ( s e l f . e x t e n s i o n [ i* s e l f . i n t e r _ i n p u t +1] + ’ , and

: \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # / home / ’ + s e l f . u s e r + ’ / ’ + s e l f . name + ’ /

s t e p 1 / ’ )s e l f . o u t p u t . w r i t e ( s e l f . name + ’ . top t o t h i s d i r e c t o r y w i th t h e

same name \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # The f o l l o w i n g i n s t r u c t i o n shou ld work \ n ’ )s e l f . o u t p u t . w r i t e ( ’ t a s k s h e l l " cp ’ )s e l f . o u t p u t . w r i t e ( ’ / home / ’ + s e l f . u s e r + ’ / ’ + s e l f . name + ’ /

s t e p 1 / ’ )s e l f . o u t p u t . w r i t e ( s e l f . name + ’ . top . " \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # And t h i s one ( remove t h e comment i f i t i s

done manua l ly ) : \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # t a s k s h e l l " cp ’ )i f i == 0 :

s e l f . o u t p u t . w r i t e ( ’ / home / ’ + s e l f . u s e r + ’ / ’ + s e l f . name + ’ /s t e p 2 / ’ )

s e l f . o u t p u t . w r i t e ( s e l f . name + ’ _rx010 . r s t ’ + s e l f . name + ’_md ’)

s e l f . o u t p u t . w r i t e ( s e l f . e x t e n s i o n [ i* 10+1]+ ’ . r s t " \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # I f a s c r i p t has been ran , t h e f i l e may be

f i n d a t : \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # / home / ’ + s e l f . u s e r + ’ / ’ + s e l f . name + ’ /

s t e p 2 / r s t s / ’ )s e l f . o u t p u t . w r i t e ( s e l f . name + ’ _rx010 . r s t \ n \ n \ n ’ )

e l s e:s e l f . o u t p u t . w r i t e ( ’ / home / ’ + s e l f . u s e r + ’ / ’ + s e l f . name + ’ /

s t e p 3 / prod ’ )s e l f . o u t p u t . w r i t e ( number [ i ] + ’ t o ’ + number [ i +1] + ’ / ’ )s e l f . o u t p u t . w r i t e ( s e l f . name + ’_md ’ + s e l f . e x t e n s i o n [ i* s e l f .

i n t e r _ i n p u t ] + ’ . r s t ’ )s e l f . o u t p u t . w r i t e ( s e l f . name + ’_md ’ + s e l f . e x t e n s i o n [ i* s e l f .

i n t e r _ i n p u t +1] + ’ . r s t " \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # Also , i f a s c r i p t has been ran , t h e f i l e

63

may be f i n d a t : \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # / home / ’ + s e l f . u s e r + ’ / ’ + s e l f . name + ’ /

s t e p 3 / r s t s / ’ )s e l f . o u t p u t . w r i t e ( s e l f . name + ’_md ’ + s e l f . e x t e n s i o n [ i* s e l f .

i n t e r _ i n p u t ] + ’ . r s t \ n \ n \ n ’ )

s e l f . o u t p u t . w r i t e ( ’ T i t l e " ’ + s e l f . t i t l e + ’ P r o d u c t i o n " \ n \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # Genera l v a r i a b l e s \ n ’ )s e l f . o u t p u t . w r i t e ( ’ p r i n t h igh \ n ’ )s e l f . o u t p u t . w r i t e ( ’ \ n s t a r t ’ + s e l f . name + ’ _ P r o d u c t i o n \ n \ n ’ )i f ( i +2) > n u m _ f i l e s and e x t r a :

range1 = ( i* s e l f . i n t e r _ i n p u t )range2 = s e l f . p s e c s / s e l f . p s e c _ s t e p

e l s e:range1 = i* s e l f . i n t e r _ i n p u trange2 = ( i +1) * s e l f . i n t e r _ i n p u t

f o r j i n range ( range1 , range2 ) :i f j > ( i * s e l f . i n t e r _ i n p u t ) :

s e l f . o u t p u t . w r i t e ( ’ \ n \ n# %d ’ % ( j +1) )i f ( ( j +1) % 10) == 1 :

s e l f . o u t p u t . w r i t e ( ’ s t ’ )e l i f ( ( j +1) % 10) == 2 :

s e l f . o u t p u t . w r i t e ( ’ nd ’ )e l i f ( ( j +1) % 10) == 2 :

s e l f . o u t p u t . w r i t e ( ’ rd ’ )e l s e :

s e l f . o u t p u t . w r i t e ( ’ t h ’ )s e l f . o u t p u t . w r i t e ( ’ p r o d u c t i o n s t e p ’ )s e l f . o u t p u t . w r i t e ( ’ \ n t a s k s h e l l " cp ’+ s e l f . name + ’_md ’ )s e l f . o u t p u t . w r i t e ( s e l f . e x t e n s i o n [ j ]+ ’ . r s t ’ )s e l f . o u t p u t . w r i t e ( s e l f . name + ’_md ’+ s e l f . e x t e n s i o n [ j +1] + ’ .

r s t " \ n ’ )e l s e:

s e l f . o u t p u t . w r i t e ( ’ # %d ’ % ( j +1) )i f ( j + 1) == 11 :

s e l f . o u t p u t . w r i t e ( ’ t h ’ )e l s e:

s e l f . o u t p u t . w r i t e ( ’ s t ’ )s e l f . o u t p u t . w r i t e ( ’ p r o d u c t i o n s t e p \ n ’ )

s e l f . o u t p u t . w r i t e ( ’md \ n ’ )s e l f . o u t p u t . w r i t e ( ’ sys tem ’ + s e l f . name + ’_md ’ + s e l f .

e x t e n s i o n [ j +1] + ’ \ n ’ )s e l f . o u t p u t . w r i t e ( ’ e q u i l 0 d a t a 10000 s t e p 0 . 0 0 2 \ n c u t o f f

1 . 0 \ n ’ )

64

s e l f . o u t p u t . w r i t e ( ’ upda te c e n t e r 1 f r a c t i o n 1 \ n ’ )s e l f . o u t p u t . w r i t e ( ’ pme g r i d 64 o r d e r 4 \ n ’ )s e l f . o u t p u t . w r i t e ( ’ i s o t h e r m 298.15 t r e l a x 0 .1 0 . 1 \ n i s o b a r \ n

mwm 6500\ n ’ )s e l f . o u t p u t . w r i t e ( ’ p r i n t s t e p 100 s t a t 1000\ n ’ )s e l f . o u t p u t . w r i t e ( ’ r e c o r d r e s t 1000 prop 100 coord 1000 s c o o r

100 \ n ’ )s e l f . o u t p u t . w r i t e ( ’ end \ n \ n t a s k md dynamics \ n ’ )

s e l f . o u t p u t . w r i t e ( ’ \ n# end of t h i s p r o d u c t i o n run \ n \ n ’ )# i f ( j + 1) == ( s e l f . p s e c s / ( s e l f . i n t e r _ i n p u t* 2) ) :i f ( i +2) > n u m _ f i l e s :

s e l f . o u t p u t . w r i t e ( ’ \ n# end of t h e s i m u l a t i o n \ n \ n ’ )s e l f . o u t p u t . c l o s e ( )

i f __name__ == ’ __main__ ’ :sim_name = ’ t h g e n e r g y ’s i m _ t i t l e = ’THG 12mer energy c a l c u l a t i o n ’t p = makeTaskPrepare ( sim_name )t p . pa th = ’ . / ’ + sim_name + ’ / s t e p 1 ’t p . t i t l e = s i m _ t i t l et p . p l a t f o r m = ’ nwvisus ’t p . t ype = ’ normal ’t p . w r i t e 2 f i l e ( )

t r = makeRe laxa t ion ( sim_name )t r . s o l u t e _ r a n g e = ’ 1 24 ’t r . pa th = ’ . / ’ + sim_name + ’ / s t e p 2 ’t r . t i t l e = s i m _ t i t l et r . p l a t f o r m = ’mpp2 ’t r . t ype = ’ normal ’t r . u s e r = ’ merns t ’t r . w r i t e 2 f i l e ( )

prod = makeProdMD ( sim_name )prod . pa th = ’ . / ’ + sim_name + ’ / s t e p 3 ’prod . t i t l e = s i m _ t i t l eprod . p l a t f o r m = ’MPP2 ’prod . u s e r = ’ merns t ’prod . p s ec s = 3200prod . i n t e r _ i n p u t = 20prod . p s e c _ s t e p = 20prod . w r i t e 2 f i l e ( )

A.1.3 make_scripts.py

65

# ! / us r / b in / env py thon2 . 2import sys , s t r i n g

c l a s s makeScr ip t :def _ _ i n i t _ _ ( s e l f , name , p la t f o rm1 , p l a t f o r m 2 ) :

s e l f . name = names e l f . t i t l e = ’ ’s e l f . u s e r = ’ ’s e l f . pa th = [ ’ ’ , ’ ’ , ’ ’ ]s e l f . t ype = ’ normal ’s e l f . p l a t f o r m = [ ’ ’ , ’ ’ , ’ ’ ]

def o p e n F i l e ( s e l f , i ndex ) :s e l f . f i l e n a m e = s e l f . name + ’ _makeDirTree_ ’ + s e l f . p l a t f o r m [ index ] +

’ . b a t c h ’t r y :

# open t h e f i l e 1 f o r w r i t i n gs e l f . o u t p u t = open ( s e l f . f i l ename , ’w ’ )

excep t:p r i n t ’ e r r o r open ing ’ + s e l f . f i l e n a m esys . e x i t ( 1 )

def w r i t e 2 f i l e ( s e l f , i ndex ) :s e l f . o p e n F i l e ( i ndex )## s e l f . o u t p u t . w r i t e ( ’ # ! / us r / b in / csh \ n ’ )s e l f . o u t p u t . w r i t e ( ’ \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # Automat ic g e n e r a t e d s c r i p t f i l e t o g e n e r a t e a

t r e e f o r \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # a l l f i l e s i n a nwchem s i m u l a t i o n . Usage : \ n#

python m a k e _ s c r i p t s . py . \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # There a r e 2 s p r i p t ( b a t c h ) f i l e s : one t o run

i n a p r o d u c t i o n \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # machine ( suchs as Opus or MPP2 and a n o t h e r t o

run t a s k p r e p a r e \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # which u s u a l l y i s run i n a " s m a l l e r " machine . \ n

’ )s e l f . o u t p u t . w r i t e ( ’ # Crea ted by A l e j a n d r o Aceves− PNNL − J u l y

2003\ n ’ )s e l f . o u t p u t . w r i t e ( ’ \ n ’ )n u m _ f i l e s = s e l f . p s e c s / ( s e l f . i n t e r _ i n p u t* s e l f . p s e c _ s t e p )

number = [ ]f o r i i n range ( n u m _ f i l e s +1) :

number . append ( ’%d ’ % ( i* s e l f . i n t e r _ i n p u t * s e l f . p s e c _ s t e p ) )

66

i f s e l f . p s e c s % ( s e l f . i n t e r _ i n p u t* s e l f . p s e c _ s t e p ) <> 0 :n u m _ f i l e s = n u m _ f i l e s + 1e x t r a = 1number . append ( ’%d ’ % s e l f . p s e c s )

e l s e :e x t r a = 0

## p r i n t number# move t o d e s i r e d d i r e c t o r ys e l f . o u t p u t . w r i t e ( ’ echo move t o d e s i r e d d i r e c t o r y \ n ’ )s e l f . o u t p u t . w r i t e ( ’ cd ’ + s e l f . pa th [ i ndex ] + ’ \ n ’ )# make b a s i c d i r e c t o r i e ss e l f . o u t p u t . w r i t e ( ’ echo make b a s i c d i r e c t o r i e s \ n ’ )s e l f . o u t p u t . w r i t e ( ’ mkdir . / ’ + s e l f . name + ’ \ n ’ )s e l f . o u t p u t . w r i t e ( ’ mkdir . / ’ + s e l f . name + ’ / a n a l y s i s \ n ’ )s e l f . o u t p u t . w r i t e ( ’ mkdir . / ’ + s e l f . name + ’ / s t e p 1 / \ n ’ )s e l f . o u t p u t . w r i t e ( ’ mkdir . / ’ + s e l f . name + ’ / s t e p 2 / \ n ’ )s e l f . o u t p u t . w r i t e ( ’ mkdir . / ’ + s e l f . name + ’ / s t e p 3 / \ n ’ )s e l f . o u t p u t . w r i t e ( ’ mkdir . / ’ + s e l f . name + ’ / s t e p 1 / i n i t \ n ’ )s e l f . o u t p u t . w r i t e ( ’ mkdir . / ’ + s e l f . name + ’ / s t e p 2 / i n i t \ n ’ )s e l f . o u t p u t . w r i t e ( ’ mkdir . / ’ + s e l f . name + ’ / s t e p 3 / i n i t \ n ’ )# make secondary d i r e c t o r i e ss e l f . o u t p u t . w r i t e ( ’ echo make seconda ry d i r e c t o r i e s \ n ’ )s e l f . o u t p u t . w r i t e ( ’ mkdir . / ’ + s e l f . name + ’ / s t e p 2 / t r j s \ n ’ )s e l f . o u t p u t . w r i t e ( ’ mkdir . / ’ + s e l f . name + ’ / s t e p 2 / o u t s \ n ’ )s e l f . o u t p u t . w r i t e ( ’ mkdir . / ’ + s e l f . name + ’ / s t e p 2 / p rps \ n ’ )s e l f . o u t p u t . w r i t e ( ’ mkdir . / ’ + s e l f . name + ’ / s t e p 2 / r s t s \ n ’ )s e l f . o u t p u t . w r i t e ( ’ mkdir . / ’ + s e l f . name + ’ / s t e p 2 / cmds \ n ’ )f o r i i n range ( n u m _ f i l e s ) :

s e l f . o u t p u t . w r i t e ( ’ mkdir . / ’ + s e l f . name + ’ / s t e p 3 / prod ’ + number[ i ] + ’ t o ’ + number [ i +1] + ’ \ n ’ )

s e l f . o u t p u t . w r i t e ( ’ mkdir . / ’ + s e l f . name + ’ / s t e p 3 / prod ’ + number[ i ]+ ’ t o ’ + number [ i +1] + ’ / i n i t \ n ’ )

s e l f . o u t p u t . w r i t e ( ’ mkdir . / ’ + s e l f . name + ’ / s t e p 3 / prod ’ + number[ i ]+ ’ t o ’ + number [ i +1] + ’ / o u t s \ n ’ )

s e l f . o u t p u t . w r i t e ( ’ mkdir . / ’ + s e l f . name + ’ / s t e p 3 / prod ’ + number[ i ]+ ’ t o ’ + number [ i +1] + ’ / p rps \ n ’ )

s e l f . o u t p u t . w r i t e ( ’ mkdir . / ’ + s e l f . name + ’ / s t e p 3 / prod ’ + number[ i ]+ ’ t o ’ + number [ i +1] + ’ / r s t s \ n ’ )

s e l f . o u t p u t . w r i t e ( ’ mkdir . / ’ + s e l f . name + ’ / s t e p 3 / prod ’ + number[ i ]+ ’ t o ’ + number [ i +1] + ’ / t r j s \ n ’ )

s e l f . o u t p u t . w r i t e ( ’ mkdir . / ’ + s e l f . name + ’ / s t e p 3 / prod ’ + number[ i ]+ ’ t o ’ + number [ i +1] + ’ / cmds \ n ’ )

# move f i l e s t o t h e i r p l a c e ss e l f . o u t p u t . w r i t e ( ’ echo move f i l e s t o t h e i r p l a c e s \ n ’ )s e l f . o u t p u t . w r i t e ( ’mv . / ’ + s e l f . name + ’ _ tp . nw ’ + s e l f . name + ’ /

67

s t e p 1 / i n i t / \ n ’ )s e l f . o u t p u t . w r i t e ( ’mv . / ’ + s e l f . name + ’ _rx . nw ’ + s e l f . name + ’ /

s t e p 2 / i n i t / \ n ’ )s e l f . o u t p u t . w r i t e ( ’ chmod u+x . / a f t e r _ r u n _ s t e p 2 . b a t c h \ n ’ )s e l f . o u t p u t . w r i t e ( ’mv . / a f t e r _ r u n _ s t e p 2 . b a t c h . / ’ + s e l f . name + ’ /

s t e p 2 / i n i t / \ n ’ )

f o r i i n range ( n u m _ f i l e s ) :d i r_name = ’ prod ’ + number [ i ]+ ’ t o ’ + number [ i +1]s e l f . o u t p u t . w r i t e ( ’mv . / ’ + s e l f . name + ’_md_ ’ + number [ i ]+ ’ t o ’

+ number [ i +1] + ’ ps . nw ’ )s e l f . o u t p u t . w r i t e ( ’ . / ’ + s e l f . name + ’ / s t e p 3 / prod ’ + number [ i ]+ ’

t o ’ + number [ i +1] + ’ / i n i t / \ n ’ )a f t e r _ r u n _ f n = ’ a f t e r _ r u n ’ + number [ i ]+ ’ t o ’ + number [ i +1] + ’ .

b a t c h ’

s e l f . o u t p u t . w r i t e ( ’ chmod +x . / ’ + a f t e r _ r u n _ f n + ’ \ n ’ )s e l f . o u t p u t . w r i t e ( ’mv . / ’ + a f t e r _ r u n _ f n )s e l f . o u t p u t . w r i t e ( ’ . / ’ + s e l f . name + ’ / s t e p 3 / ’+ d i r_name + ’ /

i n i t / \ n ’ )t r y :# open t h e r u n _ a f t e r f i l e f o r w r i t i n g

o u t p u t 2 = open ( a f t e r _ r u n _ f n , ’w ’ )excep t:

p r i n t ’ e r r o r open ing ’ + a f t e r _ r u n _ f nsys . e x i t ( 1 )

o u t p u t 2 . w r i t e ( ’ # S c r i p t f i l e t h a t he lp t o c l e a n up and p r e p a r et h e nex t s i m u l a t i o n \ n ’ )

o u t p u t 2 . w r i t e ( ’ # Crea ted by A l e j a n d r o Aceves− PNNL − J u l y 2003\ n’ )

i f i +1 < n u m _ f i l e s :n e x t _ d i r = ’ . / p rod ’ + number [ i +1]+ ’ t o ’ + number [ i +2] + ’ / i n i t / ’i f ( s e l f . i n t e r _ i n p u t * ( i +1) ) < 10 :

o u t p u t 2 . w r i t e ( ’ cp . / ’ + s e l f . name + ’_md00 ’ + ’%d ’ % ( s e l f .i n t e r _ i n p u t * ( i +1) ) + ’ . r s t ’ )

o u t p u t 2 . w r i t e ( n e x t _ d i r )o u t p u t 2 . w r i t e ( s e l f . name + ’_md00 ’ + ’%d ’ % ( s e l f . i n t e r _ i n p u t

* ( i +1) + 1) + ’ . r s t \ n ’ )e l i f s e l f . i n t e r _ i n p u t * ( i +1) < 100 :

o u t p u t 2 . w r i t e ( ’ cp . / ’ + s e l f . name + ’_md0 ’ + ’%d ’ % ( s e l f .i n t e r _ i n p u t * ( i +1) ) + ’ . r s t ’ )

o u t p u t 2 . w r i t e ( n e x t _ d i r )o u t p u t 2 . w r i t e ( s e l f . name + ’_md0 ’ + ’%d ’ % ( s e l f . i n t e r _ i n p u t*

( i +1) + 1) + ’ . r s t \ n ’ )e l s e :

68

o u t p u t 2 . w r i t e ( ’ cp . / ’ + s e l f . name + ’_md ’ + ’%d ’ % ( s e l f .i n t e r _ i n p u t * ( i +1) ) + ’ . r s t ’ )

o u t p u t 2 . w r i t e ( n e x t _ d i r )o u t p u t 2 . w r i t e ( s e l f . name + ’_md ’ + ’%d ’ % ( s e l f . i n t e r _ i n p u t*

( i +1) + 1) + ’ . r s t \ n ’ )o u t p u t 2 . w r i t e ( ’mv . /* . r s t . / ’ + d i r_name + ’ / r s t s / \ n ’ )o u t p u t 2 . w r i t e ( ’mv . /* . ou t . / ’ + d i r_name + ’ / o u t s / \ n ’ )o u t p u t 2 . w r i t e ( ’mv . /* . p rp . / ’ + d i r_name + ’ / p rps / \ n ’ )o u t p u t 2 . w r i t e ( ’mv . /* . t r j . / ’ + d i r_name + ’ / t r j s / \ n ’ )o u t p u t 2 . w r i t e ( ’mv . /* . cmd . / ’ + d i r_name + ’ / cmds / \ n ’ )o u t p u t 2 . w r i t e ( ’mv . /* . bsub . / ’ + d i r_name + ’ \ n ’ )o u t p u t 2 . w r i t e ( ’mv . /* . e r r . / ’ + d i r_name + ’ \ n ’ )o u t p u t 2 . w r i t e ( ’mv . /* . db . / ’ + d i r_name + ’ \ n ’ )o u t p u t 2 . w r i t e ( ’mv . /* ’ + number [ i ]+ ’ t o ’ + number [ i +1] + ’ * . nw . / ’

+ d i r_name + ’ \ n ’ )o u t p u t 2 . w r i t e ( ’mv . /* ’ + number [ i ]+ ’ t o ’ + number [ i +1] + ’ * .

o u t p u t . / ’ + d i r_name + ’ \ n ’ )o u t p u t 2 . w r i t e ( ’ rm . / i n i t /* . b a t c h \ nrm . / i n i t /* . r s t \ nrm . / i n i t /* . nw

\ n ’ )o u t p u t 2 . w r i t e ( ’ rm a f t e r* \ n ’ )i f i +1 < n u m _ f i l e s :

o u t p u t 2 . w r i t e ( ’ cp ’ + n e x t _ d i r + ’* . / i n i t \ n ’ )o u t p u t 2 . w r i t e ( ’ cp . / i n i t /* . \ n ’ )o u t p u t 2 . w r i t e ( ’ echo ready t o p roceed wi th s i m u l a t i o n . \ n ’ )o u t p u t 2 . w r i t e ( ’ echo t o c o n t i n u e wi th t h e nex t s i m u l a t i o n t ype

: . \ n ’ )o u t p u t 2 . w r i t e ( ’ echo . . / l l nw . phase2 \ n \ n ’ )

e l s e:o u t p u t 2 . w r i t e ( ’ echo s i m u l a t i o n has f i n i s h e d . You can proceed

wi th a n a l y s i s \ n \ n ’ )o u t p u t 2 . w r i t e ( ’ # End of s c r i p t \ n \ n ’ )o u t p u t 2 . c l o s e ( )

s e l f . o u t p u t . w r i t e ( ’ echo Done ! ! ! \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # End of t h e s c r i p t \ n \ n ’ )s e l f . o u t p u t . c l o s e ( )# w r i t e t h e a f t e r _ r u n f i l e st r y :

# open t h e f i l e f o r w r i t i n gs e l f . o u t p u t = open ( ’ a f t e r _ r u n _ s t e p 2 . b a t c h ’ , ’w ’ )

excep t:p r i n t ’ e r r o r open ing a f t e r _ r u n _ s t e p 2 . b a t c h \ n ’sys . e x i t ( 1 )

s e l f . o u t p u t . w r i t e ( ’ # S c r i p t f i l e t h a t he lp t o c l e a n up and p r e p a r et h e nex t s i m u l a t i o n \ n ’ )

s e l f . o u t p u t . w r i t e ( ’ # Crea ted by A l e j a n d r o Aceves− PNNL − J u l y

69

2003\ n ’ )## s e l f . o u t p u t . w r i t e ( ’ mv* . r s t r s t s / \ nmv * . cmd cmds / \ nmv* . ou t o u t s

/ \ n ’ )## s e l f . o u t p u t . w r i t e ( ’ mv* . prp prps / \ nmv * . t r j r s t s / \ n ’ )

n e x t _ d i r = ’ . / . . / s t e p 3 / prod ’ + number [0 ]+ ’ t o ’ + number [ 1 ] + ’ / i n i t /’

s e l f . o u t p u t . w r i t e ( ’ cp . / ’ + s e l f . name + ’ _rx011 . r s t ’ )s e l f . o u t p u t . w r i t e ( n e x t _ d i r )s e l f . o u t p u t . w r i t e ( s e l f . name + ’_md001 . r s t \ n ’ )s e l f . o u t p u t . w r i t e ( ’ cp . / ’ + s e l f . name + ’ _rx011 . r s t ’ )s e l f . o u t p u t . w r i t e ( n e x t _ d i r )s e l f . o u t p u t . w r i t e ( s e l f . name + ’_md000 . r s t \ n ’ )

s e l f . o u t p u t . w r i t e ( ’mv . /* . r s t . / r s t s / \ n ’ )s e l f . o u t p u t . w r i t e ( ’mv . /* . ou t . / o u t s / \ n ’ )s e l f . o u t p u t . w r i t e ( ’mv . /* . p rp . / p rps / \ n ’ )s e l f . o u t p u t . w r i t e ( ’mv . /* . t r j . / t r j s / \ n ’ )s e l f . o u t p u t . w r i t e ( ’mv . /* . cmd . / cmds / \ n ’ )s e l f . o u t p u t . w r i t e ( ’ cd ’ + n e x t _ d i r + ’ \ n ’ )s e l f . o u t p u t . w r i t e ( ’ echo ready t o p roceed wi th s i m u l a t i o n . \ n \ n ’ )s e l f . o u t p u t . w r i t e ( ’ # End of s c r i p t \ n \ n ’ )s e l f . o u t p u t . c l o s e ( )

def openOutput ( s e l f , f i l e n a m e ) :t r y :

# open t h e f i l e 1 f o r w r i t i n gs e l f . o u t p u t = open ( f i l ename , ’w ’ )

excep t:p r i n t ’ e r r o r open ing ’ + f i l e n a m esys . e x i t ( 1 )

def makeLinks ( s e l f , f i l e n a m e ) :s e l f . openOutput ( f i l e n a m e )n u m _ f i l e s = s e l f . p s e c s / ( s e l f . i n t e r _ i n p u t* s e l f . p s e c _ s t e p )

number = [ ]f o r i i n range ( n u m _ f i l e s +1) :

number . append ( ’%d ’ % ( i* s e l f . i n t e r _ i n p u t * s e l f . p s e c _ s t e p ) )i f s e l f . p s e c s % ( s e l f . i n t e r _ i n p u t* s e l f . p s e c _ s t e p ) <> 0 :

n u m _ f i l e s = n u m _ f i l e s + 1e x t r a = 1number . append ( ’%d ’ % s e l f . p s e c s )

e l s e :e x t r a = 0

a l l w h a t = [ ’ t r j ’ , ’ r s t ’ ]

70

f o r j i n a l l w h a t :s e l f . o u t p u t . w r i t e ( ’ pushd . \ ncd . / ’ + s e l f . name + ’ / a n a l y s i s / a l l ’

+ j + ’ s \ n ’ )f o r i i n range ( n u m _ f i l e s ) :

s e l f . o u t p u t . w r i t e ( ’ l n −s . . / . . / s t e p 3 / prod ’ + number [ i ] + ’ t o ’ +number [ i +1] + ’ / ’ + j + ’ s / * . ’ + j + ’ . \ n ’ )

s e l f . o u t p u t . w r i t e ( ’ popd \ n ’ )s e l f . o u t p u t . c l o s e ( )

i f __name__ == ’ __main__ ’ :sim_name = ’ t h g e n e r g y ’mkt = makeScr ip t ( sim_name , ’ nwv isus ’ , ’MPP2 ’ )mkt . t i t l e = "THG 12mer energy c a l c u l a t i o n "mkt . u s e r = ’ merns t ’mkt . pa th [ 1 ] = ’ / s c r a t c h / mat t / sim / c u r r e n t / ’mkt . pa th [ 2 ] = ’ / home / ’ + mkt . u s e r + ’ / ’mkt . p l a t f o r m [ 1 ] = ’ nwv isus ’mkt . p l a t f o r m [ 2 ] = ’MPP2 ’mkt . p s e c s = 3200mkt . i n t e r _ i n p u t = 20mkt . p s e c _ s t e p = 20mkt . w r i t e 2 f i l e ( 1 )mkt . makeLinks ( ’ l i n k s _ t r j . b a t c h ’ )

A.2 NWChem inputs

A.2.1 equilibration

p e r m a n e n t _ d i r / home / merns t / 8 oGrerun / s t e p 2

T i t l e "8oG 12mer r e r u n "

p r i n t h igh

s t a r t 8 oG r e r u n_ R e l a xa t i o n

mdsystem 8 oGrerun_rxnoshake s o l u t ef i x s o l u t e 1 24sd 2000

end

t a s k md o p t i m i z e

71

# R e l a x a t i o n s t e p a t 50 .15 d e g r e e s K wi th s o l u t e f i x e dt a s k s h e l l " cp 8 oGrerun_rx . q r s 8 oGrerun_rx001 . r s t "md

system 8 oGrerun_rx001v r e a s s 100 50 .15f i x s o l u t e 1 24e q u i l 0 d a t a 10000 s t e p 0 .001i s o t h e r m 50.15 t r e l a x 0 .1 0 .1i s o b a rp r i n t s t e p 100 s t a t 1000

end

t a s k md dynamics

# R e l a x a t i o n s t e p a t 298.15 d e g r e e s K wi th s o l u t e f i x e dt a s k s h e l l " cp 8 oGrerun_rx001 . r s t 8 oGrerun_rx002 . r s t "md

system 8 oGrerun_rx002v r e a s s 100 298.15f i x s o l u t e 1 24e q u i l 0 d a t a 10000 s t e p 0 .001i s o t h e r m 298.15 t r e l a x 0 .1 0 .1i s o b a rp r i n t s t e p 100 s t a t 1000

end

t a s k md dynamics

# 2000 s t e e p e s t d e s c e n t w i th s o l v e n t f i x e dt a s k s h e l l " cp 8 oGrerun_rx002 . r s t 8 oGrerun_rx003 . r s t "md

system 8 oGrerun_rx003f i x s o l v e n tsd 2000

end

t a s k md o p t i m i z e

# R e l a x a t i o n s t e p a t 50 .0 d e g r e e s K wi th s o l v e n t f i x e dt a s k s h e l l " cp 8 oGrerun_rx003 . q r s 8 oGrerun_rx004 . r s t "md

system 8 oGrerun_rx004v r e a s s 100 50 .0f i x s o l v e n te q u i l 0 d a t a 10000 s t e p 0 .001

72

i s o t h e r m 50 .0 t r e l a x 0 .1 0 .1i s o b a rp r i n t s t e p 100 s t a t 1000

end

t a s k md dynamics

# R e l a x a t i o n s t e p a t 100 .0 d e g r e e s K wi th s o l v e n t f i x e dt a s k s h e l l " cp 8 oGrerun_rx004 . r s t 8 oGrerun_rx005 . r s t "md

system 8 oGrerun_rx005v r e a s s 100 100 .0f i x s o l v e n te q u i l 0 d a t a 10000 s t e p 0 .001i s o t h e r m 100.0 t r e l a x 0 .1 0 .1i s o b a rp r i n t s t e p 100 s t a t 1000

end

t a s k md dynamics

# R e l a x a t i o n s t e p a t 150 .0 d e g r e e s K wi th s o l v e n t f i x e dt a s k s h e l l " cp 8 oGrerun_rx005 . r s t 8 oGrerun_rx006 . r s t "md

system 8 oGrerun_rx006v r e a s s 100 150 .0f i x s o l v e n te q u i l 0 d a t a 10000 s t e p 0 .001i s o t h e r m 150.0 t r e l a x 0 .1 0 .1i s o b a rp r i n t s t e p 100 s t a t 1000

end

t a s k md dynamics

# R e l a x a t i o n s t e p a t 200 .0 d e g r e e s K wi th s o l v e n t f i x e dt a s k s h e l l " cp 8 oGrerun_rx006 . r s t 8 oGrerun_rx007 . r s t "md

system 8 oGrerun_rx007v r e a s s 100 200 .0f i x s o l v e n te q u i l 0 d a t a 10000 s t e p 0 .001i s o t h e r m 200.0 t r e l a x 0 .1 0 .1i s o b a rp r i n t s t e p 100 s t a t 1000

73

end

t a s k md dynamics

# R e l a x a t i o n s t e p a t 250 .0 d e g r e e s K wi th s o l v e n t f i x e dt a s k s h e l l " cp 8 oGrerun_rx007 . r s t 8 oGrerun_rx008 . r s t "md

system 8 oGrerun_rx008v r e a s s 100 250 .0f i x s o l v e n te q u i l 0 d a t a 10000 s t e p 0 .001i s o t h e r m 250.0 t r e l a x 0 .1 0 .1i s o b a rp r i n t s t e p 100 s t a t 1000

end

t a s k md dynamics

# R e l a x a t i o n s t e p a t 298.15 d e g r e e s K wi th s o l v e n t f i x e dt a s k s h e l l " cp 8 oGrerun_rx008 . r s t 8 oGrerun_rx009 . r s t "md

system 8 oGrerun_rx009v r e a s s 100 298.15f i x s o l v e n te q u i l 0 d a t a 10000 s t e p 0 .001i s o t h e r m 298.15 t r e l a x 0 .1 0 .1i s o b a rp r i n t s t e p 100 s t a t 1000

end

t a s k md dynamics

# R e l a x a t i o n s t e p a t 298.15 d e g r e e s K wi th n o t h i n g f i x e dt a s k s h e l l " cp 8 oGrerun_rx009 . r s t 8 oGrerun_rx010 . r s t "md

system 8 oGrerun_rx010e q u i l 0 d a t a 10000 s t e p 0 .001c u t o f f 1 .0upda te c e n t e r 1 f r a c t i o n 1pme g r i d 64 o r d e r 4i s o t h e r m 298.15 t r e l a x 0 .1 0 .1i s o b a rp r i n t s t e p 100 s t a t 1000r e c o r d r e s t 1000 prop 100 coord 1000 s c o o r 100

end

74

t a s k md dynamics

A.2.2 production

p e r m a n e n t _ d i r / home / merns t / 8 oGrerun / s t e p 3

T i t l e "8 oGrerun c o r r e c t e q u i l i b r a t i o n P r o d u c t i o n "

p r i n t h igh

s t a r t 8 oGre run_Produc t i on

mdsystem 8oGrerun_md001e q u i l 0 d a t a 10000 s t e p 0 .002c u t o f f 1 .0upda te c e n t e r 1 f r a c t i o n 1pme g r i d 64 o r d e r 4i s o t h e r m 298.15 t r e l a x 0 .1 0 .1i s o b a r

mwm 6500p r i n t s t e p 100 s t a t 1000r e c o r d r e s t 1000 prop 100 coord 1000 s c o o r 100

end

t a s k md dynamics

t a s k s h e l l " cp 8oGrerun_md001 . r s t 8oGrerun_md002 . r s t "md

system 8oGrerun_md002e q u i l 0 d a t a 10000 s t e p 0 .002c u t o f f 1 .0upda te c e n t e r 1 f r a c t i o n 1pme g r i d 64 o r d e r 4i s o t h e r m 298.15 t r e l a x 0 .1 0 .1i s o b a r

mwm 6500p r i n t s t e p 100 s t a t 1000r e c o r d r e s t 1000 prop 100 coord 1000 s c o o r 100

end

t a s k md dynamics

t a s k s h e l l " cp 8oGrerun_md002 . r s t 8oGrerun_md003 . r s t "md

75

sys tem 8oGrerun_md003e q u i l 0 d a t a 10000 s t e p 0 .002c u t o f f 1 .0upda te c e n t e r 1 f r a c t i o n 1pme g r i d 64 o r d e r 4i s o t h e r m 298.15 t r e l a x 0 .1 0 .1i s o b a r

mwm 6500p r i n t s t e p 100 s t a t 1000r e c o r d r e s t 1000 prop 100 coord 1000 s c o o r 100

end

t a s k md dynamics

t a s k s h e l l " cp 8oGrerun_md003 . r s t 8oGrerun_md004 . r s t "md

system 8oGrerun_md004e q u i l 0 d a t a 10000 s t e p 0 .002c u t o f f 1 .0upda te c e n t e r 1 f r a c t i o n 1pme g r i d 64 o r d e r 4i s o t h e r m 298.15 t r e l a x 0 .1 0 .1i s o b a r

mwm 6500p r i n t s t e p 100 s t a t 1000r e c o r d r e s t 1000 prop 100 coord 1000 s c o o r 100

end

t a s k md dynamics

t a s k s h e l l " cp 8oGrerun_md004 . r s t 8oGrerun_md005 . r s t "md

system 8oGrerun_md005e q u i l 0 d a t a 10000 s t e p 0 .002c u t o f f 1 .0upda te c e n t e r 1 f r a c t i o n 1pme g r i d 64 o r d e r 4i s o t h e r m 298.15 t r e l a x 0 .1 0 .1i s o b a r

mwm 6500p r i n t s t e p 100 s t a t 1000r e c o r d r e s t 1000 prop 100 coord 1000 s c o o r 100

end

t a s k md dynamics

76

t a s k s h e l l " cp 8oGrerun_md005 . r s t 8oGrerun_md006 . r s t "md

system 8oGrerun_md006e q u i l 0 d a t a 10000 s t e p 0 .002c u t o f f 1 .0upda te c e n t e r 1 f r a c t i o n 1pme g r i d 64 o r d e r 4i s o t h e r m 298.15 t r e l a x 0 .1 0 .1i s o b a r

mwm 6500p r i n t s t e p 100 s t a t 1000r e c o r d r e s t 1000 prop 100 coord 1000 s c o o r 100

end

t a s k md dynamics

t a s k s h e l l " cp 8oGrerun_md006 . r s t 8oGrerun_md007 . r s t "md

system 8oGrerun_md007e q u i l 0 d a t a 10000 s t e p 0 .002c u t o f f 1 .0upda te c e n t e r 1 f r a c t i o n 1pme g r i d 64 o r d e r 4i s o t h e r m 298.15 t r e l a x 0 .1 0 .1i s o b a r

mwm 6500p r i n t s t e p 100 s t a t 1000r e c o r d r e s t 1000 prop 100 coord 1000 s c o o r 100

end

t a s k md dynamics

t a s k s h e l l " cp 8oGrerun_md007 . r s t 8oGrerun_md008 . r s t "md

system 8oGrerun_md008e q u i l 0 d a t a 10000 s t e p 0 .002c u t o f f 1 .0upda te c e n t e r 1 f r a c t i o n 1pme g r i d 64 o r d e r 4i s o t h e r m 298.15 t r e l a x 0 .1 0 .1i s o b a r

mwm 6500p r i n t s t e p 100 s t a t 1000r e c o r d r e s t 1000 prop 100 coord 1000 s c o o r 100

77

end

t a s k md dynamics

t a s k s h e l l " cp 8oGrerun_md008 . r s t 8oGrerun_md009 . r s t "md

system 8oGrerun_md009e q u i l 0 d a t a 10000 s t e p 0 .002c u t o f f 1 .0upda te c e n t e r 1 f r a c t i o n 1pme g r i d 64 o r d e r 4i s o t h e r m 298.15 t r e l a x 0 .1 0 .1i s o b a r

mwm 6500p r i n t s t e p 100 s t a t 1000r e c o r d r e s t 1000 prop 100 coord 1000 s c o o r 100

end

t a s k md dynamics

t a s k s h e l l " cp 8oGrerun_md009 . r s t 8oGrerun_md010 . r s t "md

system 8oGrerun_md010e q u i l 0 d a t a 10000 s t e p 0 .002c u t o f f 1 .0upda te c e n t e r 1 f r a c t i o n 1pme g r i d 64 o r d e r 4i s o t h e r m 298.15 t r e l a x 0 .1 0 .1i s o b a r

mwm 6500p r i n t s t e p 100 s t a t 1000r e c o r d r e s t 1000 prop 100 coord 1000 s c o o r 100

end

t a s k md dynamics

t a s k s h e l l " cp 8oGrerun_md010 . r s t 8oGrerun_md011 . r s t "md

system 8oGrerun_md011e q u i l 0 d a t a 10000 s t e p 0 .002c u t o f f 1 .0upda te c e n t e r 1 f r a c t i o n 1pme g r i d 64 o r d e r 4i s o t h e r m 298.15 t r e l a x 0 .1 0 .1i s o b a r

78

mwm 6500p r i n t s t e p 100 s t a t 1000r e c o r d r e s t 1000 prop 100 coord 1000 s c o o r 100

end

t a s k md dynamics

t a s k s h e l l " cp 8oGrerun_md011 . r s t 8oGrerun_md012 . r s t "md

system 8oGrerun_md012e q u i l 0 d a t a 10000 s t e p 0 .002c u t o f f 1 .0upda te c e n t e r 1 f r a c t i o n 1pme g r i d 64 o r d e r 4i s o t h e r m 298.15 t r e l a x 0 .1 0 .1i s o b a r

mwm 6500p r i n t s t e p 100 s t a t 1000r e c o r d r e s t 1000 prop 100 coord 1000 s c o o r 100

end

t a s k md dynamics

t a s k s h e l l " cp 8oGrerun_md012 . r s t 8oGrerun_md013 . r s t "md

system 8oGrerun_md013e q u i l 0 d a t a 10000 s t e p 0 .002c u t o f f 1 .0upda te c e n t e r 1 f r a c t i o n 1pme g r i d 64 o r d e r 4i s o t h e r m 298.15 t r e l a x 0 .1 0 .1i s o b a r

mwm 6500p r i n t s t e p 100 s t a t 1000r e c o r d r e s t 1000 prop 100 coord 1000 s c o o r 100

end

t a s k md dynamics

t a s k s h e l l " cp 8oGrerun_md013 . r s t 8oGrerun_md014 . r s t "md

system 8oGrerun_md014e q u i l 0 d a t a 10000 s t e p 0 .002c u t o f f 1 .0upda te c e n t e r 1 f r a c t i o n 1

79

pme g r i d 64 o r d e r 4i s o t h e r m 298.15 t r e l a x 0 .1 0 .1i s o b a r

mwm 6500p r i n t s t e p 100 s t a t 1000r e c o r d r e s t 1000 prop 100 coord 1000 s c o o r 100

end

t a s k md dynamics

t a s k s h e l l " cp 8oGrerun_md014 . r s t 8oGrerun_md015 . r s t "md

system 8oGrerun_md015e q u i l 0 d a t a 10000 s t e p 0 .002c u t o f f 1 .0upda te c e n t e r 1 f r a c t i o n 1pme g r i d 64 o r d e r 4i s o t h e r m 298.15 t r e l a x 0 .1 0 .1i s o b a r

mwm 6500p r i n t s t e p 100 s t a t 1000r e c o r d r e s t 1000 prop 100 coord 1000 s c o o r 100

end

t a s k md dynamics

t a s k s h e l l " cp 8oGrerun_md015 . r s t 8oGrerun_md016 . r s t "md

system 8oGrerun_md016e q u i l 0 d a t a 10000 s t e p 0 .002c u t o f f 1 .0upda te c e n t e r 1 f r a c t i o n 1pme g r i d 64 o r d e r 4i s o t h e r m 298.15 t r e l a x 0 .1 0 .1i s o b a r

mwm 6500p r i n t s t e p 100 s t a t 1000r e c o r d r e s t 1000 prop 100 coord 1000 s c o o r 100

end

t a s k md dynamics

t a s k s h e l l " cp 8oGrerun_md016 . r s t 8oGrerun_md017 . r s t "md

system 8oGrerun_md017

80

e q u i l 0 d a t a 10000 s t e p 0 .002c u t o f f 1 .0upda te c e n t e r 1 f r a c t i o n 1pme g r i d 64 o r d e r 4i s o t h e r m 298.15 t r e l a x 0 .1 0 .1i s o b a r

mwm 6500p r i n t s t e p 100 s t a t 1000r e c o r d r e s t 1000 prop 100 coord 1000 s c o o r 100

end

t a s k md dynamics

t a s k s h e l l " cp 8oGrerun_md017 . r s t 8oGrerun_md018 . r s t "md

system 8oGrerun_md018e q u i l 0 d a t a 10000 s t e p 0 .002c u t o f f 1 .0upda te c e n t e r 1 f r a c t i o n 1pme g r i d 64 o r d e r 4i s o t h e r m 298.15 t r e l a x 0 .1 0 .1i s o b a r

mwm 6500p r i n t s t e p 100 s t a t 1000r e c o r d r e s t 1000 prop 100 coord 1000 s c o o r 100

end

t a s k md dynamics

t a s k s h e l l " cp 8oGrerun_md018 . r s t 8oGrerun_md019 . r s t "md

system 8oGrerun_md019e q u i l 0 d a t a 10000 s t e p 0 .002c u t o f f 1 .0upda te c e n t e r 1 f r a c t i o n 1pme g r i d 64 o r d e r 4i s o t h e r m 298.15 t r e l a x 0 .1 0 .1i s o b a r

mwm 6500p r i n t s t e p 100 s t a t 1000r e c o r d r e s t 1000 prop 100 coord 1000 s c o o r 100

end

t a s k md dynamics

81

t a s k s h e l l " cp 8oGrerun_md019 . r s t 8oGrerun_md020 . r s t "md

system 8oGrerun_md020e q u i l 0 d a t a 10000 s t e p 0 .002c u t o f f 1 .0upda te c e n t e r 1 f r a c t i o n 1pme g r i d 64 o r d e r 4i s o t h e r m 298.15 t r e l a x 0 .1 0 .1i s o b a r

mwm 6500p r i n t s t e p 100 s t a t 1000r e c o r d r e s t 1000 prop 100 coord 1000 s c o o r 100

end

t a s k md dynamics

A.2.3 initial thermodynamic integration

8oG starting thermodynamic integration

s t a r t e n e r g y 8 0 0 e q u i lT i t l e "8oG t o G 12mer aq . f r e e−energy C a l c u l a t i o n : E q u i l i b r a t i o n "

mdsystem 8 oGenergyt r imer_md020noshake s o l u t esd 50 i n i t 0 .01 min 1 .0E−4 max 0 .1cg 200 i n i t 0 .01 min 1 .0E−8 cy 10

end ; t a s k md o p t i m i z e

t a s k s h e l l " cp 8 oGenergyt r imer_md020 . q r s 8 oGenergyt r imer_md . r s t "

mdsystem 8 oGenergyt r imer_mds t e p 0 .001 e q u i l 1000 d a t a 5000c u t o f f 1 .0noshake s o l u t el e a p f r o gi s o t h e r m 298.16 t r e l a x 0 .1 0 .1i s o b a r 1 .025 e5 t r e l a x 0 .4p r i n t s t e p 200 s t a t 1000r e c o r d r e s t 1000upda te p a i r s 10 c e n t e r 10load p a i r s

end ; t a s k md dynamics

t a s k s h e l l " cp 8 oGenergyt r imer_md . r s t 8 o G e n e r g y t r i m e r _ t i . r s t "

82

T i t l e "8oG t o guan ine s o l v a t i o n energy : Free−energy C a l c u l a t i o n "

# M u l t i s t e p Thermodynamic I n t e g r a t i o n : FORWARDmd

system 8 o G e n e r g y t r i m e r _ t ic u t o f f 1 .0noshake s o l u t es s s d e l t a 0 .085l e a p f r o gnew fo rward 21 of 21 e r r o r 5 .0 d r i f t 5 . 0 f a c t o r 0 .75s t e p 0 .001 e q u i l 1000 d a t a 500000 over 5000i s o t h e r m 298.15 t r e l a x 0 .1i s o b a rp r i n t s t e p 500 s t a t 5000upda te p a i r s 10 c e n t e r 10r e c o r d r e s t 1000load p a i r s

end ; t a s k md thermodynamics

# Copy r e s t a r t f i l e f o r e q u i l i b r a t i o n o f r e v e r s e p r o c e s st a s k s h e l l " cp 8 o G e n e r g y t r i m e r _ t i . r s t 8 oGenergytr imer_mdR . r s t "

mdsystem 8 oGenergytr imer_mdRs t e p 0 .001 e q u i l 1000 d a t a 5000c u t o f f 1 .0noshake s o l u t el e a p f r o gi s o t h e r m 298.16 t r e l a x 0 .1 0 .1i s o b a r 1 .025 e5 t r e l a x 0 .4p r i n t s t e p 200 s t a t 5000r e c o r d r e s t 1000 coord 50upda te p a i r s 10 c e n t e r 10load p a i r s

end ; t a s k md dynamics

t a s k s h e l l " cp 8 oGenergytr imer_mdR . r s t 8 o G e n e r g y t r i m e r _ t i R . r s t "

# M u l t i s t e p Thermodynamic I n t e g r a t i o n : REVERSEmd

system 8 o G e n e r g y t r i m e r _ t i Rc u t o f f 1 .0noshake s o l u t e

83

s s s d e l t a 0 .085l e a p f r o gnew r e v e r s e 21 of 21 e r r o r 5 .0 d r i f t 5 . 0 f a c t o r 0 .75s t e p 0 .001 e q u i l 1000 d a t a 500000 over 5000i s o t h e r m 298.15 t r e l a x 0 .1i s o b a rp r i n t s t e p 500 s t a t 5000upda te p a i r s 10 c e n t e r 10r e c o r d r e s t 1000load p a i r s

end ; t a s k md thermodynamics

A.2.4 extended thermodynamic integration

8oG starting thermodynamic integration

s t a r t e n e r g y 4 0 0 e q u i lT i t l e "8oG t r i m e r s o l v a t i o n energy : MCTI e x t e n s i o n "

# M u l t i s t e p Thermodynamic I n t e g r a t i o n : FORWARDmd

system 8 o G e n e r g y t r i m e r _ t ic u t o f f 1 .0noshake s o l u t es s s d e l t a 0 .085l e a p f r o gex tend fo rward 21 of 21 e r r o r 2 .5 d r i f t 5 . 0 f a c t o r 0 .75s t e p 0 .001 e q u i l 1000 d a t a 500000 over 5000i s o t h e r m 298.15 t r e l a x 0 .1i s o b a rp r i n t s t e p 500 s t a t 5000upda te p a i r s 10 c e n t e r 10r e c o r d r e s t 1000load p a i r s

end ; t a s k md thermodynamics

# Copy r e s t a r t f i l e f o r r e v e r s e p r o c e s st a s k s h e l l " cp 8 o G e n e r g y t r i m e r _ t i . r s t 8 o G e n e r g y t r i m e r _ t i R . r s t "

# M u l t i s t e p Thermodynamic I n t e g r a t i o n : REVERSEmd

system 8 o G e n e r g y t r i m e r _ t i Rc u t o f f 1 .0noshake s o l u t es s s d e l t a 0 .085

84

l e a p f r o gex tend r e v e r s e 21 of 21 e r r o r 5 .0 d r i f t 5 . 0 f a c t o r 0 .75s t e p 0 .001 e q u i l 1000 d a t a 500000 over 5000i s o t h e r m 298.15 t r e l a x 0 .1i s o b a rp r i n t s t e p 500 s t a t 5000upda te p a i r s 10 c e n t e r 10r e c o r d r e s t 1000load p a i r s

end ; t a s k md thermodynamics

# Free−energy d i f f e r e n c i e s shou ld be abou t t h e same f o r t h e# fo rward and r e v e r s e d i r e c t i o n s .

A.2.5 thymine-glycol task prepare

thymine glycol prepare with modify atom commands

T i t l e "THG 12mer energy c a l c u l a t i o n "s t a r t t h g e n e r g y I n i t

p r e p a r esys tem t h g e n e r g y _ t pambermodify atom 18 : _P f i n a l cha rge 1.165900modify atom 18:4H5M f i n a l cha rge 0.077000modify atom 18:3H5M f i n a l cha rge 0.077000modify atom 18:2H5M f i n a l cha rge 0.077000modify atom 18 : _H6 f i n a l cha rge 0.260700 type H4modify atom 18 : _H3 f i n a l cha rge 0.342000modify atom 18:3H5* f i n a l cha rge 0.075400modify atom 18:2H5* f i n a l cha rge 0.075400modify atom 18 : _H4* f i n a l cha rge 0.117600modify atom 18 : _H3* f i n a l cha rge 0.098500modify atom 18:3H2* f i n a l cha rge 0.071800modify atom 18:2H2* f i n a l cha rge 0.071800modify atom 18 : _H1* f i n a l cha rge 0.180400modify atom 18 :_C5M f i n a l cha rge−0.226900modify atom 18 : _C5 f i n a l cha rge 0.002500 type CMmodify atom 18 : _C6 f i n a l cha rge−0.220900 type CMmodify atom 18 :_O5M f i n a l cha rge 0 .0 dummymodify atom 18 :_H5O f i n a l cha rge 0 .0 dummymodify atom 18 : _O6 f i n a l cha rge 0 .0 dummymodify atom 18 :_H6O f i n a l cha rge 0 .0 dummymodify atom 18 : _N1 f i n a l cha rge−0.023900

85

modify atom 18 : _O2 f i n a l cha rge−0.588100modify atom 18 : _C2 f i n a l cha rge 0.567700modify atom 18 : _N3 f i n a l cha rge−0.434000modify atom 18 : _C4 f i n a l cha rge 0.519400modify atom 18 : _O4 f i n a l cha rge−0.556300modify atom 18 : _O4* f i n a l cha rge −0.369100modify atom 18 : _C5* f i n a l cha rge −0.006900modify atom 18 : _C4* f i n a l cha rge 0.162900modify atom 18 : _C3* f i n a l cha rge 0.071300modify atom 18 : _C2* f i n a l cha rge −0.085400modify atom 18 : _C1* f i n a l cha rge 0.068000modify atom 18 : _O3* f i n a l cha rge −0.523200modify atom 18 : _O5* f i n a l cha rge −0.495400modify atom 18 : _O2P f i n a l cha rge−0.776100modify atom 18 : _O1P f i n a l cha rge−0.776100c h a i n *f r a c t i o n 1 2 3new_top new_seqg r i d 24 0 .8c o u n t e r 22 Natouch 0 .3expand 0 .2c e n t e r ; o r i e n ts o l v a t e box 6 .8 6 .8 9 .8w r i t e pdb thgene rgy_ in i tH2O . pdbw r i t e r s t t h g e n e r g y _ i n i t . r s t

end

t a s k p r e p a r e

A.3 Analysis tools

A.3.1 pdb_cleanup.py

import sys , s t r i n g , commands , g lob

c l a s s convertPDB :def _ _ i n i t _ _ ( s e l f ) :

a rgv = sys . a rgva rgc = l e n ( a rgv )i f a rgc > 1 :

p r i n t ’ usage : [ py thon ] conver t_and_renumber_PDBs . py ’p r i n t ’ t h i s program would read a l l t h e pdbs w i t h i n t h e c u r r e n t

d i r e c t o r y and ’p r i n t ’ would c o n v e r t them i n t o daw pdb f i l e c o m p a t i b l e ’

86

p r i n t ’ I n a d d i t i o n , i t would d e l e t e a l l t h e NA atoms , as we l l asrenumber ing them ’

sys . e x i t ( 1 )s e l f . bases1 = [ ’ A ’ , ’ C ’ , ’ G ’ , ’ T ’ ]s e l f . bases2 = [ ’ADE’ , ’CYT ’ , ’GUA’ , ’THY’ ]s e l f . bases3 = [ ’ DA’ , ’ DC’ , ’ DG’ , ’ DT ’ ]s e l f . bases4 = [ ’DA_ ’ , ’DC_ ’ , ’DG_ ’ , ’DT_ ’ ]

def o p e n I n p u t F i l e ( s e l f ) :t r y :

# open t h e pdb f i l e f o r read ings e l f . i n p u t = open ( s e l f . i n _ f i l e n a m e , ’ r ’ )

excep t:p r i n t ’ e r r o r open ing ’ + s e l f . i n _ f i l e n a m esys . e x i t ( 1 )

def o p e n O u t p u t F i l e ( s e l f ) :t r y :

# open t h e pdb f i l e f o r read ings e l f . o u t p u t = open ( s e l f . ou t_ f i l ename , ’w ’ )

excep t:p r i n t ’ e r r o r open ing ’ + s e l f . o u t _ f i l e n a m esys . e x i t ( 1 )

def d o i t ( s e l f ) :l i s t _ p d b s = g lob . g lob ( ’* . pdb ’ )i f not l e n ( l i s t _ p d b s ) :

p r i n t ’ I cou ld no t run conver t_and_renumber_PDBs program . ’p r i n t ’ Check t h a t pdbs a r e i n t h e d i r e c t o r y and t r y a g a i n ’sys . e x i t ( 1 )

f o r s e l f . i n _ f i l e n a m e i n l i s t _ p d b s :new_atom_number = 0new_number = 0currnum = ’ 0 ’oldnum = ’ 0 ’s e l f . o p e n I n p u t F i l e ( )s e l f . o u t _ f i l e n a m e = s e l f . i n _ f i l e n a m e + ’ . tmp ’s e l f . o p e n O u t p u t F i l e ( )f o r l i n e i n s e l f . i n p u t . r e a d l i n e s ( ) :

l i n e = l i n e . r e p l a c e ( "* " , " ’ " )i f ( l i n e [ : 4 ] . f i n d ( ’ATOM’ ) > −1 ) or ( l i n e [ : 6 ] . f i n d ( ’HETATM’ ) >

−1) :resName = l i n e [ 1 7 : 2 0 ]i f resName . f i n d ( ’Na ’ ) > −1:

con t inue

87

i f resName . f i n d ( ’WAT’ ) > −1:con t inue

currnum = l i n e [ 2 3 : 2 7 ]i f currnum <> oldnum :

new_number = new_number + 1oldnum = currnum [ : ]

newn = ’%4s ’ % new_numbernewanum = ’%5s ’ % new_atom_numbernew_ l ine = l i n e [ : 6 ] + newanum + l i n e [ 1 1 : 2 2 ] + newn + l i n e [ 2 6 : ]l i n e = new_ l ine# D e l e t e a l l t h e Sodium atoms ( i o n s )# Rep lace Atom namesl i n e = l i n e . r e p l a c e ( ’HETATM’ , ’ATOM ’ )l i n e = l i n e . r e p l a c e ( ’ 2H2 ’ , ’H21 ’ )l i n e = l i n e . r e p l a c e ( ’ 3H2 ’ , ’H22 ’ )l i n e = l i n e . r e p l a c e ( ’ 2H4 ’ , ’H41 ’ )l i n e = l i n e . r e p l a c e ( ’ 3H4 ’ , ’H42 ’ )l i n e = l i n e . r e p l a c e ( ’ 2H6 ’ , ’H61 ’ )l i n e = l i n e . r e p l a c e ( ’ 3H6 ’ , ’H62 ’ )l i n e = l i n e . r e p l a c e ( ’ 2H5 ’ , ’H51 ’ )l i n e = l i n e . r e p l a c e ( ’ 3H5 ’ , ’H52 ’ )l i n e = l i n e . r e p l a c e ( ’ 4H5 ’ , ’H53 ’ )# Rep lace r e s i d u e namesl i n e = l i n e . r e p l a c e ( ’DG ’ , ’ G ’ )l i n e = l i n e . r e p l a c e ( ’DC ’ , ’ C ’ )l i n e = l i n e . r e p l a c e ( ’DA ’ , ’ A ’ )l i n e = l i n e . r e p l a c e ( ’DT ’ , ’ T ’ )

l i n e = l i n e . r e p l a c e ( ’ DTP ’ , ’ T ’ )l i n e = l i n e . r e p l a c e ( ’ DPO’ , ’ T ’ )l i n e = l i n e . r e p l a c e ( ’ DPH’ , ’ T ’ )l i n e = l i n e . r e p l a c e ( ’ 8oG ’ , ’ G ’ )l i n e = l i n e . r e p l a c e ( ’ ABA’ , ’ A ’ )l i n e = l i n e . r e p l a c e ( ’ ABC’ , ’ C ’ )l i n e = l i n e . r e p l a c e ( ’ AAB’ , ’ A ’ )l i n e = l i n e . r e p l a c e ( ’ DT_ ’ , ’ T ’ )l i n e = l i n e . r e p l a c e ( ’ DA_ ’ , ’ A ’ )l i n e = l i n e . r e p l a c e ( ’ DC_ ’ , ’ C ’ )

l i n e = l i n e . r e p l a c e ( ’ DG_ ’ , ’ G ’ )s e l f . o u t p u t . w r i t e ( l i n e )

e l i f l i n e [ : 3 ] i n [ ’END’ , ’TER ’ ] :s e l f . o u t p u t . w r i t e ( l i n e )

new_atom_number = new_atom_number + 1s e l f . i n p u t . c l o s e ( )

88

s e l f . o u t p u t . c l o s e ( )# Renaming t h e f i l ecmd_s ta tus = commands . g e t s t a t u s o u t p u t ( ’mv %s %s ’ % ( s e l f .

ou t_ f i l ename , s e l f . i n _ f i l e n a m e ) )i f cmd_s ta tus [ 0 ] <> 0 :

p r i n t ’ I cou ld no t run t h i s program s u c c e s f u l l y . ’p r i n t s t a t u s [ 1 ]sys . e x i t ( cmd_s ta tus [ 0 ] )

t r y :o u t p u t = open ( ’ conve r ted_ readme . t x t ’ , ’w ’ )

excep t:p r i n t ’ cou ld no t c r e a t e conver ted_ readme . t x t ’sys . e x i t ( 1 )

o u t p u t . w r i t e ( ’ l i s t o f c o n v e r t e d pdbs \ n ’ )f o r name i n l i s t _ p d b s :

o u t p u t . w r i t e ( ’%s \ n ’ % name )

i f __name__ == ’ __main__ ’ :cPDB = convertPDB ( )cPDB . d o i t ( )

A.3.2 extract_lis.py

import s t r i n g , g lob , sys , osfrom math import s q r t

l i s _ f i l e s = ’ * . l i s ’

# A _ s t r i n g = ’ | A | Globa l a x i s pa ramete rs ’n e w _ s e c t i o n _ s t r = ’−−−−−−−−−−−−−−−−−−−− ’I _ s p e c i a l = ’ Duplex O f f s e t ’J _ t o r s i o n s = ’ T o r s i o n s ’J _ n o r m a l _ s t r = ’ 2nd s t r a n d C1 ’

c l a s s e x t r a c t _ p a r a m s :def _ _ i n i t _ _ ( s e l f ) :

# F i r s t , t h e s e are t h e v a l u e s f o r new f i l e s . The new f i l e nameswould be t h e

# sim_name + s e c t i o n + c a t e g o r y f i r s t t o k e n + . t x t# i . e . G5T6SSB_md_B_Bc . t x t ( G5T6SSB_md i s sim_name , B i s t h e s e c i t o n

and Bc i s t h e c a t e g o r y )# Va lues f o r t h e new f i l e ss e l f . n e w _ f i l e s = [ \# S e c t i o n l e t t e r , D e s c r i p t i o n

# A i s a s p e c i a l case B−H are t h e same and I , and J are s p e c i a lt oo

89

( ’A ’ , ’ G loba l a x i s p a r a m e t e r s ’ , [ ’U ’ , ’P ’ , ’D ’ ] , ( 2 , 6 , 10) ) , \( ’B ’ , ’ G loba l Base−Axis P a r a m e t e r s ’ , \

# C a t e g o r i e s[ ’ Xdisp ( dx ) ’ , ’ Ydisp ( dy ) ’ , ’ I n c l i n ( e t a ) ’ , ’ T ip ( t h e t a ) ’ , ’

Bc ’ , ’ Tc ’ ] , \# When i n f o r m a t i o n s t a r t s ( e . g . f i r s t column w i th numbers )

3) , \( ’C ’ , ’ G loba l Base p a i r−Axis P a r a m e t e r s ’ , \

[ ’ Xdisp ( dx ) ’ , ’ Ydisp ( dy ) ’ , ’ I n c l i n ( e t a ) ’ , ’ T ip ( t h e t a ) ’ , ’Bc ’ , ’ Tc ’ ] , \

5 ) , \( ’D ’ , ’ G loba l Base−Base P a r a m e t e r s ’ , \

[ ’ Shear ( Sx ) ’ , ’ S t r e t c h ( Sy ) ’ , ’ S t a g g e r ( Sz ) ’ , ’ Buck le ( kappa )’ , \

’ P r o p e l ( omega ) ’ , ’ Opening ( sigma ) ’ , ’Bc ’ , ’ Tc ’ ] ,5 ) , \( ’E ’ , ’ G loba l I n t e r−Base P a r a m e t e r s ’ , \

[ ’ S h i f t (Dx) ’ , ’ S l i d e (Dy) ’ , ’ R ise ( Dz ) ’ , ’ T i l t ( t a u ) ’ , ’ Ro l l( rho ) ’ , ’ Twis t ( Omega ) ’ , ’Dc ’ ] , \

5 ) , \( ’F ’ , ’ G loba l I n t e r−Base p a i r P a r a m e t e r s ’ , \

[ ’ S h i f t (Dx) ’ , ’ S l i d e (Dy) ’ , ’ R ise ( Dz ) ’ , ’ T i l t ( t a u ) ’ , ’ Ro l l( rho ) ’ , ’ Twis t ( Omega ) ’ , ’Dc ’ ] , \

5 ) , \( ’G ’ , ’ Loca l I n t e r −Base P a r a m e t e r s ’ , \

[ ’ S h i f t (Dx) ’ , ’ S l i d e (Dy) ’ , ’ R ise ( Dz ) ’ , ’ T i l t ( t a u ) ’ , ’ Ro l l( rho ) ’ , ’ Twis t ( Omega ) ’ , ’Dc ’ ] , \

5 ) , \( ’H ’ , ’ Loca l I n t e r −Base p a i r P a r a m e t e r s ’ , \

[ ’ S h i f t (Dx) ’ , ’ S l i d e (Dy) ’ , ’ R ise ( Dz ) ’ , ’ T i l t ( t a u ) ’ , ’ Ro l l( rho ) ’ , ’ Twis t ( Omega ) ’ , ’Dc ’ ] , \

5 ) , \( ’ I ’ , ’ G loba l Axis C u r v a t u r e ’ , \

[ ’Ax ’ , ’Ay ’ , ’ Ainc ’ , ’ A t ip ’ , ’ Adis ’ , ’ Angle ’ , ’ Pa th ’ , ’Dc ’ ] , \5 ) , \

( ’ I s ’ , ’ G loba l Axis C u r v a t u r e ’ , \[ ’ O f f s e t ’ , ’L . D i r . . . wr t end−to−end v e c t o r ’ ] , \

3 ) , \( ’ J ’ , ’ Backbone P a r a m e t e r s ’ , \

[ ’C1−C2 ’ , ’C2−C3 ’ , ’ Phase ’ , ’ Ampli ’ , ’ Pucker ’ , ’C1 ’ , ’C2 ’ , ’C3’ ] , \

3 ) , \( ’ J t ’ , ’ Backbone P a r a m e t e r s T o r s i o n s ’ , \

[ ’ Chi ( C1\ ’ −N) ’ , ’Gamma ( C5\ ’−C4 \ ’ ) ’ , ’ D e l t a ( C4\ ’−C3 \ ’ ) ’ , \’ E p s i l ( C3\ ’−O3 \ ’ ) ’ , ’ Ze ta (O3\ ’−P ) ’ , ’ Alpha ( P−O5 \ ’ ) ’ , ’

90

Beta (O5\ ’−C5 \ ’ ) ’ ] , \3 ) \

]

# d e f a u l t v a l u e s f o r a l l command o p t i o n ss e l f . command_params = [ ( ’−pa th ’ , ’ . / ’ ) , ( ’−f i l e _ n a m e ’ , ’ ’ ) , ( ’−range ’

, ’ ’ ) , \( ’−s t a r t s i m ’ , ’ 0 . 2 ’ ) , ( ’− i n t e r v a l ’ , ’ 0 . 2 ’ ) ]

# I n i t l i s t s t h a t would c o n t a i n t h e da tas e l f . i n i t _ p a r a m s = {}# V a l i d a t e t h e pa ramete rs . Th i s program works even w i t h o u ts e l f . v a l i d a t e _ p a r a m s ( )# i n i t l i s t s ( d i c t i o n a r i e s , wha tever )s e l f . s e c t i o n s = {}s e l f . e x t r a _ i n f o = [ ]s e l f . c a t e g o r i e s = [ ]# s e l f e x p l a n a t o r y , i s n ’ t i ts e l f . g e t _ i n f o _ f r o m _ l i s e s ( )s e l f . w r i t e 2 f i l e ( )# e x t r a s i s a f i l e t h a t c o n t a i n s min , max , avg , s t d dev . e t c .s e l f . c a l c u l a t e _ e x t r a s ( )

def v a l i d a t e _ p a r a m s ( s e l f ) :# Check t h e command o p t i o n s t h e use r i n p u tf o r cp i n s e l f . command_params :

i f cp [ 0 ] i n sys . a rgv :s e l f . i n i t _ p a r a m s [ cp [ 0 ] ] = s e l f . ge t_param ( cp [ 0 ] )

e l s e:s e l f . i n i t _ p a r a m s [ cp [ 0 ] ] = cp [ 1 ]

# v a l i d a t e t h a t pa th e x i s t si f not os . pa th . e x i s t s ( s e l f . i n i t _ p a r a m s [ ’−pa th ’ ] ) :

s e l f . e r r o r ( ’ d i r e c t o r y "%s " does no t e x i s t ’ % s e l f . i n i t _ p a r a m s [ ’−pa th ’ ] , 1 )

# v a l i d a t e a t l e a s t one f i l e e x i s ti f s e l f . i n i t _ p a r a m s [ ’−pa th ’ ] [−1] <> ’ / ’ :

s e l f . i n i t _ p a r a m s [ ’−pa th ’ ] = s e l f . i n i t _ p a r a m s [ ’−pa th ’ ] + ’ / ’l i s n a m e s _ l i s t = s e l f . i n i t _ p a r a m s [ ’−pa th ’ ] + s e l f . i n i t _ p a r a m s [ ’−

f i l e _ n a m e ’ ] + ’ * . l i s ’# Crea te t h e l i s t o f a l l . l i s f i l e ss e l f . l i s n a m e s = g lob . g lob ( l i s n a m e s _ l i s t )i f not l e n ( s e l f . l i s n a m e s ) :# no f i l e s w i t h t h a t name

s e l f . e r r o r ( ’No l i s f i l e s a v a i l a b l e i n "%s " ’ % s e l f . i n i t _ p a r a m s [ ’−pa th ’ ] , 1 )

# Change t h e fo rma t t o UNIX ( e . g " / " i n s t e a d o f " \ " )f o r l n i n s e l f . l i s n a m e s :

91

s e l f . l i s n a m e s [ s e l f . l i s n a m e s . index ( l n ) ] = l n . r e p l a c e ( ’ \ \ ’ , ’ / ’ )# v a l i d a t e rangei f s e l f . i n i t _ p a r a m s [ ’−range ’ ] == ’ ’ : # e . g use r d id no t g i v e range

s e l f . i n i t _ p a r a m s [ ’−range ’ ] = s e l f . g e t _ r a n g e ( 1 )# then g e t t h erange

e l s e:s e l f . i n i t _ p a r a m s [ ’−range ’ ] = s e l f . g e t _ r a n g e ( 0 )# then g e t t h e

range# Check i f a l l f i l e n a m e s can be g e n e r e t e d u s i ng t h e c u r r e n t i n f os e l f . c h e c k _ f i l e n a m e s ( )

def get_param ( s e l f , cp ) :# j u s t check i f t h e r e are enough paramete rsa rgc = l e n ( sys . a rgv )i f sys . a rgv . i ndex ( cp ) + 1 >= a rgc :

s e l f . e r r o r ( ’ need a t l e a s t one p a r a m e t e r a f t e r "%s " ’ % cp , 2 )re turn sys . a rgv [ sys . a rgv . i ndex ( cp ) + 1 ]

def c h e c k _ f i l e n a m e s ( s e l f ) :# c r e a t e a copy o f a l l f i l e n a m e stemp_names = s e l f . l i s n a m e s [ : ]# Using t h e range , t r y t o genea te a l l t h e namesf o r i i n range ( s e l f . i n i t _ p a r a m s [ ’−range ’ ] [ 0 ] , s e l f . i n i t _ p a r a m s [ ’−

range ’ ] [ 1 ] + 1) :i f s e l f . i n i t _ p a r a m s [ ’−range ’ ] [ 2 ] : # i f same s i z e d i g i t s

lname = s e l f . bu i ld_name ( s e l f . i n i t _ p a r a m s [ ’−range ’ ] [ 3 ] , i )e l s e: # number i s c o n s e c u t i v e w i th no z e r o s

lname = ’%s%s%d . l i s ’ % ( s e l f . i n i t _ p a r a m s [ ’−pa th ’ ] , s e l f .i n i t _ p a r a m s [ ’−f i l e _ n a m e ’ ] , i )

t r y : # t r y removing t h e name from t h e . l i s l i s t copytemp_names . remove ( lname )

excep t:’ someht ing \ ’ s wrong ’

# I f a t t h e end t h e r e are none l e f t , we ’ re ok .i f l e n ( temp_names ) <> 0 :# However , i f t h e r e i s a t l e a s t one , t h e

method f a i l e dp r i n t ’ f a i l e d t o g e n e r a t e : ’f o r t n i n temp_names :

p r i n t t ns e l f . e r r o r ( ’ cou ld no t b u i l t f i l e n a m e s wi th t h e g iven i n f o ’ , 2 )

def bui ld_name ( s e l f , d i g i t _ l e n g t h , num ) :# I n i t v a r i a b l e sst r_num = ’ ’new_num = 10

92

size_num = 1# d e t e r m i n e t h e s i z e ( e . g . < 10 or < 100 , or < 1000 , e t c .whi le ( new_num <= num ) :

new_num = new_num* 10size_num = size_num + 1

# add z e r o s b e f o r e t h e numberf o r i i n range ( d i g i t _ l e n g t h− size_num ) :

s t r_num = st r_num + ’ 0 ’# then add t h e number a f t e r t h a t . For example , i f num = 1 and

d i g i g _ l e n g h t = 4# st r_num = 0001st r_num = ’%s%d ’ % ( str_num , num )# then , add a l l t h e o t h e r p a r t s o f t h e f i l ename , such as t h e path ,

morpheme , e t c .# same example as above , add " . / " + "NoSSB_md" + st r_num + " . l i s "f u l l n ame = ’%s%s%s . l i s ’ % ( s e l f . i n i t _ p a r a m s [ ’−pa th ’ ] , s e l f .

i n i t _ p a r a m s [ ’−f i l e _ n a m e ’ ] , s t r_num )# i n t h e end , i t shou ld r e t u r n " . / NoSSB_md0001 . l i s "re turn f u l l n ame

def g e t _ r a n g e ( s e l f , t ype ) :i f l e n ( s e l f . l i s n a m e s ) < 1 :# i f t h e l i s t i s empty . A c t u a l l y , i t

shou ldn ’ t a t t h i s p o i n ts e l f . e r r o r ( ’ cou ld no t f i n d any . l i s f i l e s ’ )

# g e t t h e i n d e x f o r t h e s t a r t o f t h e pa th ( t o t a k e i t ou t o f t h ename )

p a t h i = s t r i n g . r f i n d ( s e l f . l i s n a m e s [ 0 ] , ’ / ’ ) + 1jus t_names = [ ]# g e t j u s t t h e morphemesf o r l n i n s e l f . l i s n a m e s :

e x t i = s t r i n g . f i n d ( ln , ’ . l i s ’ )j u s t_names . append ( l n [ p a t h i : e x t i ] )

# check where t h e d i g i t s s t a r t . S t a r t l o o k i n g backwards ( e . g . f romt h e end o f t h e name )

# Th i s i s t o ensu re t h a t t h e number ing does no t i n t e r f i e r e w i t h t h ename l i k e i n T6T7SSB_md0001 . l i s

f o r j i n range ( l e n ( j us t_names [ 0 ] )−1, 0 , −1) :i f not j u s t_names [ 0 ] [ j ] . i s d i g i t ( ) : # i f i t i s no t a d i g i t

break # g e t ou tnumber i = j +1# g e t t h e morpheme ( i n t h i s example : T6T7SSB_md )morpheme = jus t_names [ 0 ] [ : number i ]i f ( t ype ) : # i f use r d id no t g i v e a range

# i n i t t o coun t t h e f i r s t t ime g e t s i tj = 0

93

# suppose t h a t equa l l e n g t h i s t r u ee q u a l _ l e n g t h _ d i g i t s = 1# g e t t h e number o f d i g i t s and numbers . Check i n a l l o f them t o

a s s u r e c o n s i s t e n c yf o r j n i n j u s t_names :

num_st r = j n [ number i : ]i f not j :

m a x i _ d i g i t s = l e n ( j n )− number ii f m a x i _ d i g i t s <> l e n ( j n )− number i : # i f t h e max number o f

d i g i t s changee q u a l _ l e n g t h _ d i g i t s = 0# then t u r n f l a g o f f

t r y :num = s t r i n g . a t o i ( num_st r )# check t h a t t h e r e i s number ing

excep t:s e l f . e r r o r ( ’ cou ld no t e x t r a c t range ’ , 2 )

i f j == 0 : # on l y f i r s t t imemaxi = min i = num

e l s e: # s u b s e q u e n tmaxi = max ( maxi , num ) # g e t maximummini = min ( mini , num ) # g e t minimum

j = j + 1e l s e: # use r gave a range

mini = s e l f . i n i t _ p a r a m s [ ’−range ’ ] [ 0 ]maxi = s e l f . i n i t _ p a r a m s [ ’−range ’ ] [ 1 ]e q u a l _ l e n g t h _ d i g i t s = l e n ( s e l f . i n i t _ p a r a m s [ ’−range ’ ] [ 0 ] ) == l e n (

s e l f . i n i t _ p a r a m s [ ’−range ’ ] [ 1 ] )i f e q u a l _ l e n g t h _ d i g i t s :

m a x i _ d i g i t s = l e n ( s e l f . i n i t _ p a r a m s [ ’−range ’ ] [ 0 ] )e l s e:

m a x i _ d i g i t s = 0# add t h e morpheme i f t h e r e i s nonei f s e l f . i n i t _ p a r a m s [ ’−f i l e _ n a m e ’ ] == ’ ’ :

s e l f . i n i t _ p a r a m s [ ’−f i l e _ n a m e ’ ] = morpheme# pu t l i s n a m e s i n a l i s t t h a t w i l l be used l a t e rs e l f . l i s _ l i s t = s e l f . l i s n a m e s# r e t u r n s t a r t , end , equa l l e n g t h f l a g and maximum number o f d i g i t sre turn ( mini , maxi , e q u a l _ l e n g t h _ d i g i t s , m a x i _ d i g i t s )

def g e t _ i n f o _ f r o m _ l i s e s ( s e l f ) :# i n i t t h e l i s t t h a t w i l l c o n t a i n a l l t h e new f i l e ss e l f . l i s t _ b y _ f i l e = {}# f o r a l l t h e . l i s f i l e s ( i . e . NoSSB_md0001 . l i s , NoSSB_md0002 . l i s ,

e t c . )f o r l l i n s e l f . l i s _ l i s t :

t r y : # open t h e f i l e

94

s e l f . i n p u t = open ( l l , ’ r ’ )excep t:

p r i n t ( ’ cou ld no t open %s ’ % l l )con t inue # j u s t need t o con t i nue , no t t o break e v e r y t h i n g . We

might g e t away w i th i t# i n i t t h e d i c t i o n a r y c o n t a i n i n g a l l t h e da tas e l f . l i s t _ b y _ f i l e [ l l ] = {}# A coup le o f f l a g snew_sec t i on = 1g e t _ s e c t i o n = 0s e l f . I _ s e c o n d _ p a r t = 0# v a r i a b l e t h a t would d e t e r m i n e where t o pu t t h e da ta ’A ’ , ’B ’ , ’C

’ , e t c .s e c t i o n _ l e t t e r = ’ ’# t h i s l i s t would c o n t a i n a l l t h e da ta per segment ( c a r e f u l J i s a

s p e c i a l case )s e l f . s e c t i o n = [ ]# t o r s i o n _ c = 0J_normal = [ ]J _ t o r s i o n = [ ]# f o r a l l t h e l i n e s i n t h e . l i s f i l ef o r l i n e i n s e l f . i n p u t . r e a d l i n e s ( ) :

i f l i n e [ 0 ] == ’ \ n ’ :con t inue

# i f l i n e c o n t a i n s "−−−−−−−−−−−−−−−−−−−−−−", means a new s e c t i o ni f s t r i n g . f i n d ( l i n e , n e w _ s e c t i o n _ s t r ) >−1:

i f new_sec t i on :g e t _ s e c t i o n = 1new_sec t i on = 0

e l s e:new_sec t i on = 1

con t inue # don ’ t need t o go on , go back and s t a r t p u t t i n g da tai n t h e s e c t i o n

i f s t r i n g . f i n d ( l i n e , I _ s p e c i a l ) >−1: # I has a s p e c i a l case .Fo rma t t i ng changes .

s e l f . l i s t _ b y _ f i l e [ l l ] [ ’ I ’ ] = s e l f . s e c t i o ns e l f . s e c t i o n = [ ]s e c t i o n _ l e t t e r = ’ I s ’ # The new s e c t i o n i s c a l l e d I s as i n I

s p e c i a lg e t _ s e c t i o n = 0con t inue

i f s t r i n g . f i n d ( l i n e , J _ t o r s i o n s ) >−1: # J has a s p e c i a lt o r s i o n s s e c t i o n

J_normal = J_normal + s e l f . s e c t i o n# v a r i a b l e would ho ld t h e "normal " da ta

95

s e c t i o n _ l e t t e r = ’ J t ’s e l f . s e c t i o n = [ ]# t o r s i o n _ c = t o r s i o n _ c + 1con t inue

i f s t r i n g . f i n d ( l i n e , J _ n o r m a l _ s t r ) >−1:J _ t o r s i o n = J _ t o r s i o n + s e l f . s e c t i o n# v a r i a b l e would ho ld t h e

To rs i on da tas e c t i o n _ l e t t e r = ’ J ’s e l f . s e c t i o n = [ ]con t inue

i f g e t _ s e c t i o n : # i f t h i s i s a new s e c t i o ni f s e c t i o n _ l e t t e r <> ’ ’ : # i f t h e s e c t i o n l e t t e r i s no t empty

i f s e c t i o n _ l e t t e r <> ’ J t ’ : # check f o r s p e c i a l case (T o r s i o n s ) i n J

s e l f . l i s t _ b y _ f i l e [ l l ] [ s e c t i o n _ l e t t e r ] = s e l f . s e c t i o n#then g e t i t i n t h e d i c t .

e l s e: # huh , s e c t i o n J s p e c i a l t hen . . . save da ta i n twod i f f e r e n t s e c t i o n s

J _ t o r s i o n = J _ t o r s i o n + s e l f . s e c t i o ns e l f . l i s t _ b y _ f i l e [ l l ] [ ’ J ’ ] = J_normals e l f . l i s t _ b y _ f i l e [ l l ] [ ’ J t ’ ] = J _ t o r s i o n

# r e i n i t s e c t i o n ( t h e da ta has been saved i n t h e d i c t i o n a r y )s e l f . s e c t i o n = [ ]s t r t o k = l i n e . s p l i t ( )s e c t i o n _ l e t t e r = s t r t o k [ 0 ] . s p l i t ( ’ | ’ ) [ 1 ] # g e t t h e new s e c t i o n

l e t t e rg e t _ s e c t i o n = 0# t u r n f l a g o f fcon t inue # g e t t o t h e new data ( do no t add any th ing , y e t )

s e l f . a d d 2 s e c t i o n ( s e c t i o n _ l e t t e r , l i n e , l l )# i f i t g e t s here ,new data i s coming f o r c u r r e n t s e c t i o n

s e l f . i n p u t . c l o s e ( ) # c l o s e t h i s f i l e ( NoSSB_md0001 . l i s , e t c . )

def a d d 2 s e c t i o n ( s e l f , l e t t e r , l i n e , l l ) :t r i v i a l _ t o k e n s = [ ’ 1 s t ’ , ’ 2nd ’ , ’ ( dx ) ’ , ’ S t r a n d ’ , ’ Duplex ’ , ’ Average

: ’ , \’ ( Sx ) ’ , ’ (Dx ) ’ , ’ O v e r a l l ’ , ’ Pa th ’ , ’ T o r s i o n s ’ , ’C1\ ’ −

N’ ]s t r t o k = l i n e [ :−1 ] . s p l i t ( )t r y :

s t r t o k [ 0 ] [ 0 ]excep t:

re turnt r y :

s t r t o k [ 2 ] [ 0 ]excep t:

96

re turni f s t r t o k [ 0 ] i n t r i v i a l _ t o k e n s :

i f s t r i n g . f i n d ( s t r t o k [ 2 ] , ’ has ’ ) > −1:i f l e n ( s e l f . e x t r a _ i n f o ) < 2 :

s e l f . e x t r a _ i n f o . append ( l i n e [ :−1 ] )re turn

re turni f l e t t e r == ’ ’ :

re turne l i f l e t t e r == ’A ’ :

f o r i i n range ( l e n ( s t r t o k ) ) :i f i == 0 : # t r y i n g t o g e t j u s t t h e number f ro , 1 ) , 13) , e t c .

s u b _ s e c t i o n = [ s t r t o k [ i ] . s p l i t ( ’ ) ’ ) [ 0 ] ]e l i f i not in [ 1 , 5 , 9 ] :

s u b _ s e c t i o n . append ( s t r t o k [ i ] )s e l f . s e c t i o n . append ( s u b _ s e c t i o n )

e l i f l e t t e r == ’B ’ :s e l f . s e c t i o n . append ( s t r t o k [ : ] )

e l i f l e t t e r i n [ ’C ’ , ’D ’ ] :t r y :

t o k s p l = s t r t o k [ 2 ] . s p l i t ( ’− ’ )excep t:

re turn# p r i n t ’ s e c t i o n ’ , l e t t e r , ’ ’ ,# p r i n t t o k s p ls e l f . s e c t i o n . append ( s t r t o k [ 0 : 2 ] + t o k s p l [ : ] + s t r t o k [ 3 : ] )

e l i f l e t t e r i n [ ’E ’ , ’F ’ , ’G ’ , ’H ’ , ’ I ’ ] :t r y :

t o k s p l = s t r t o k [ 2 ] . s p l i t ( ’ / ’ )excep t:

re turns e l f . s e c t i o n . append ( s t r t o k [ 0 : 2 ] + t o k s p l [ : ] + s t r t o k [ 3 : ] )

# s p e c i a l case Ie l i f l e t t e r == ’ I s ’ :

t r y :s e l f . s e c t i o n . append ( s t r t o k [ : ] )

excep t:re turn

e l i f l e t t e r i n [ ’ J ’ , ’ J t ’ ] :t r y :

t o k s p l = s t r t o k [ 0 ] . s p l i t ( ’ ) ’ )excep t:

re turns e l f . s e c t i o n . append ( t o k s p l + s t r t o k [ 1 : ] )

e l i f l e t t e r == ’K ’ :

97

# don ’ t know what t o do w i th t h i s i n f opass

def w r i t e 2 f i l e ( s e l f ) :# f o r each l e t t e r c a t e g o r y ( e . g . A , B , C , D, e t c . )A_count = 0b o r r a r = 0f o r nf i n s e l f . n e w _ f i l e s :

# f o r each s u b c a t e g o r y ( Xdisp , Ydisp , e t c . )i f nf [ 0 ] == ’A ’ :

# s e l f . w r i t e _ s e c t i o n _ A ( n f )s u b _ c a t = n f [ 3 ] [ A_count ]− 1

e l s e:s u b _ c a t = n f [ 3 ]− 1

# p r i n t s e l f . n e w _ f i l e sf o r fn i n nf [ 2 ] :

s u b _ c a t = s u b _ c a t + 1# Crea te a new f i l esn = fn . s p l i t ( ) [ 0 ]f i l e n a m e = s e l f . i n i t _ p a r a m s [ ’−f i l e _ n a m e ’ ] + ’ _ ’ + n f [ 0 ] + ’ _ ’ +

sn + ’ . t x t ’t r y :

i n p u t = open ( f i l ename , ’w ’ )excep t:

s e l f . e r r o r ( ’ cou ld no t c r e a t e %s ’ % f i l ename , 1)# f i l e opened . Then , w r i t e t h e headers# F i r s t , g e n e r a t e f i r s t . l i s f i l e namei f s e l f . i n i t _ p a r a m s [ ’−range ’ ] [ 2 ] : # i f same s i z e d i g i t s

key_name = s e l f . bu i ld_name ( s e l f . i n i t _ p a r a m s [ ’−range ’ ] [ 3 ] , 1 )e l s e: # number i s c o n s e c u t i v e w i th no z e r o s

key_name = ’%s%s%d . l i s ’ % ( s e l f . i n i t _ p a r a m s [ ’−pa th ’ ] , s e l f .i n i t _ p a r a m s [ ’−f i l e _ n a m e ’ ] , 1 )

# second , w r i t e some b a s i c i n f oi n p u t . w r i t e ( ’ # A u t o m a t i c a l l y g e n e r e d a t e d f i l e from e x t r a c t _ l i s .

py \ n ’ )i n p u t . w r i t e ( ’ # These p a r t i c u l a r mo lecu les c o n t a i n : \ n ’ )f o r e i i n s e l f . e x t r a _ i n f o :

i n p u t . w r i t e ( ’ # %s \ n ’ % e i )i n p u t . w r i t e ( ’ # No t i ce t h a t some number ing w i l l be by s t r a n d

i n s t e a d of s o r t e d by number \ n ’ )i n p u t . w r i t e ( ’ # ( i . e . i n s t e a d of 1−32 they would be 1−16, 32−17)

\ n ’ )i n p u t . w r i t e ( ’ # \ n# Th is f i l e c o n t a i n s . l i s i n f o from a f i l e

s e r i e s %s : \ n ’ % \

98

s e l f . i n i t _ p a r a m s [ ’−f i l e _ n a m e ’ ] )i n p u t . w r i t e ( ’ # w i t h i n t h e range %d− %d \ n ’ % \

( s e l f . i n i t _ p a r a m s [ ’−range ’ ] [ 0 ] , s e l f .i n i t _ p a r a m s [ ’−range ’ ] [ 1 ] ) )

i n p u t . w r i t e ( ’ # from . l i s p a r a m e t e r s "|% s | %s "− "%s " \ n # \ n ’ \% ( n f [ 0 ] , n f [ 1 ] , fn ) )

i f nf [ 0 ] == ’A ’ :i f A_count < 2 :

i n p u t . w r i t e ( ’ # \ t F i l e name \ t t i m e \ t b a s e p a i r \ t x \ t y \ t z \ n ’ )e l s e:

i n p u t . w r i t e ( ’ # \ t F i l e name \ t p a r a m e t e r \ t \ n ’ )A_count = A_count + 1

e l s e:# then , g e t i n f o us i n g t h a t name ( key_name ) as a key w r i t e

headers i n t o f i l ei n p u t . w r i t e ( ’ # \ t F i l e name \ t t i m e \ t ’ )# p r i n t key_name# p r i n t n f [0 ]i f s e l f . l i s t _ b y _ f i l e . has_key ( key_name ) :

i f s e l f . l i s t _ b y _ f i l e [ key_name ] . has_key ( n f [ 0 ] ) :f o r c o l i n s e l f . l i s t _ b y _ f i l e [ key_name ] [ n f [ 0 ] ] :

t r y :c o l [ 2 ]

excep t:i n p u t . w r i t e ( ’−−\ t ’ )con t inue

i f nf [ 3 ] > 3 :i n p u t . w r i t e ( ’%s%s−%s%s \ t ’ % ( c o l [ 1 ] , c o l [ 2 ] , c o l [ 3 ] ,

c o l [ 4 ] ) )e l s e:

i n p u t . w r i t e ( ’%s%s \ t ’ % ( c o l [ 1 ] , c o l [ 2 ] ) )i n p u t . w r i t e ( ’ \ n ’ )# Las t −bu t no t l e a s t− w r i t e t h e i n f o from each columnk e y _ n a m e _ l i s t = [ ]f o r i i n range ( s e l f . i n i t _ p a r a m s [ ’−range ’ ] [ 0 ] , s e l f . i n i t _ p a r a m s [

’−range ’ ] [ 1 ] + 1) :i f s e l f . i n i t _ p a r a m s [ ’−range ’ ] [ 2 ] : # i f same s i z e d i g i t s

k e y _ n a m e _ l i s t . append ( s e l f . bu i ld_name ( s e l f . i n i t _ p a r a m s [ ’−range ’ ] [ 3 ] , i ) )

e l s e: # number i s c o n s e c u t i v e w i th no z e r o sk e y _ n a m e _ l i s t . append ( ’%s%s%d . l i s ’ % ( s e l f . i n i t _ p a r a m s [ ’−

pa th ’ ] , s e l f . i n i t _ p a r a m s [ ’−f i l e _ n a m e ’ ] , i ) )# go t t h e keyst i m e _ t = s t r i n g . a t o f ( s e l f . i n i t _ p a r a m s [ ’−s t a r t s i m ’ ] )i n t e r v a l = s t r i n g . a t o f ( s e l f . i n i t _ p a r a m s [ ’− i n t e r v a l ’ ] )

99

f o r kn l i n k e y _ n a m e _ l i s t :s h o r t _ f i l e _ n a m e = kn l [ kn l . r f i n d ( ’ / ’ ) +1: kn l . f i n d ( ’ . l i s ’ ) ]i f nf [ 0 ] == ’A ’ :

i n p u t . w r i t e ( ’ # \ t%s \ t %12.3 f \ n ’ % ( s h o r t _ f i l e _ n a m e , t i m e _ t ) )e l s e:

i n p u t . w r i t e ( ’ \ t%s \ t %12.3 f \ t ’ % ( s h o r t _ f i l e _ n a m e , t i m e _ t ) )i f nf [ 0 ] == ’A ’ :

f o r c o l i n s e l f . l i s t _ b y _ f i l e [ kn l ] [ ’A ’ ] :i f A_count i n [ 1 , 2 ] :

n i = A_count − 1i n p u t . w r i t e ( ’ \ t \ t \ t%s \ t%s \ t%s \ t%s ’ % \

( c o l [ 0 ] , c o l [3* n i +1 ] , c o l [3* n i +2 ] , c o l [3* n i +3 ] ) )e l s e:

t r y :c o l [ 7 ]

excep t:i n p u t . w r i t e ( ’ \ t ’ )con t inue

i n p u t . w r i t e ( ’ \ t \ t \ t%s \ t%s ’ % ( c o l [ 0 ] , c o l [ 7 ] ) )i n p u t . w r i t e ( ’ \ n ’ )

e l s e:i f s e l f . l i s t _ b y _ f i l e . has_key ( kn l ) :

i f s e l f . l i s t _ b y _ f i l e [ kn l ] . has_key ( n f [ 0 ] ) :f o r l b f i n s e l f . l i s t _ b y _ f i l e [ kn l ] [ n f [ 0 ] ] :

i f nf [ 0 ] == ’ I ’ :t r y :

l b f [ s u b _ c a t ]excep t:

s u b _ c a t = 3t r y :

i n p u t . w r i t e ( ’%s \ t ’ % l b f [ s u b _ c a t ] )excep t: # j u s t w r i t e a b lank

i n p u t . w r i t e ( ’ \ t ’ )i n p u t . w r i t e ( ’ \ n ’ )

t i m e _ t = t i m e _ t + i n t e r v a li n p u t . c l o s e ( )

def w r i t e _ s e c t i o n _ A ( s e l f , n f ) : # A needed s p e c i a l a t e n t i o ni = 0f o r fn i n nf [ 2 ] :

s u b _ c a t = n f [ 3 ] [ i ]# Crea te a new f i l esn = fn . s p l i t ( ) [ 0 ]f i l e n a m e = s e l f . i n i t _ p a r a m s [ ’−f i l e _ n a m e ’ ] + ’ _ ’ + n f [ 0 ] + ’ _ ’ + sn

+ ’ . t x t ’

100

t r y :i n p u t = open ( f i l ename , ’w ’ )

excep t:s e l f . e r r o r ( ’ cou ld no t c r e a t e %s ’ % f i l ename , 1)

# f i l e opened . Then , w r i t e t h e headers# second , w r i t e some b a s i c i n f oi n p u t . w r i t e ( ’ # A u t o m a t i c a l l y g e n e r e d a t e d f i l e from e x t r a c t _ l i s . py

\ n ’ )i n p u t . w r i t e ( ’ # These p a r t i c u l a r mo lecu les c o n t a i n : \ n ’ )f o r e i i n s e l f . e x t r a _ i n f o :

i n p u t . w r i t e ( ’ # %s \ n ’ % e i )i n p u t . w r i t e ( ’ # No t i ce t h a t some number ing w i l l be by s t r a n d

i n s t e a d of s o r t e d by number \ n ’ )i n p u t . w r i t e ( ’ # ( i . e . i n s t e a d of 1−32 they would be 1−16, 32−17) \ n

’ )i n p u t . w r i t e ( ’ # \ n# Th is f i l e c o n t a i n s . l i s i n f o from a f i l e s e r i e s

%s : \ n ’ % \s e l f . i n i t _ p a r a m s [ ’−f i l e _ n a m e ’ ] )

i n p u t . w r i t e ( ’ # w i t h i n t h e range %d− %d \ n ’ % \( s e l f . i n i t _ p a r a m s [ ’−range ’ ] [ 0 ] , s e l f .

i n i t _ p a r a m s [ ’−range ’ ] [ 1 ] ) )i n p u t . w r i t e ( ’ # from . l i s p a r a m e t e r s "|% s | %s "− "%s " \ n # \ n ’ \

% ( n f [ 0 ] , n f [ 1 ] , fn ) )# then , g e t i n f o us i n g t h a t name ( key_name ) as a key w r i t e headers

i n t o f i l ei f i < 2 :

i n p u t . w r i t e ( ’ # \ t F i l e name \ t t i m e \ t b a s e p a i r \ t x \ t y \ t z \ n ’ )e l s e:

i n p u t . w r i t e ( ’ # \ t F i l e name \ t p a r a m e t e r \ t \ n ’ )t i m e _ t = s t r i n g . a t o f ( s e l f . i n i t _ p a r a m s [ ’−s t a r t s i m ’ ] )i n t e r v a l = s t r i n g . a t o f ( s e l f . i n i t _ p a r a m s [ ’− i n t e r v a l ’ ] )f o r f i l e _ n a m e i n s e l f . l i s t _ b y _ f i l e . keys ( ) :

s h o r t _ f i l e _ n a m e = f i l e _ n a m e [ f i l e _ n a m e . r f i n d ( ’ / ’ ) +1: f i l e _ n a m e .f i n d ( ’ . l i s ’ ) ]

i n p u t . w r i t e ( ’ # \ t%s \ t %12.3 f \ n ’ % ( s h o r t _ f i l e _ n a m e , t i m e _ t ) )t i m e _ t = t i m e _ t + i n t e r v a lf o r c o l i n s e l f . l i s t _ b y _ f i l e [ f i l e _ n a m e ] [ ’A ’ ] :

i f i i n [ 0 , 1 ] :i n p u t . w r i t e ( ’ \ t \ t \ t%s \ t%s \ t%s \ t%s ’ % ( c o l [ 0 ] , c o l [3* i +1 ] ,

c o l [3* i +2 ] , c o l [3* i +3 ] ) )e l s e:

t r y :c o l [ 7 ]

excep t:i n p u t . w r i t e ( ’ \ t ’ )

101

con t inuei n p u t . w r i t e ( ’ \ t \ t \ t%s \ t%s ’ % ( c o l [ 0 ] , c o l [ 7 ] ) )

i n p u t . w r i t e ( ’ \ n ’ )i = i + 1

def c a l c u l a t e _ e x t r a s ( s e l f ) :# f o r each l e t t e r c a t e g o r y ( e . g . B , C , D, e t c . )o u t f i l e n a m e = s e l f . i n i t _ p a r a m s [ ’−f i l e _ n a m e ’ ] + ’ _ e x t r a s . t x t ’t r y :

o u t p u t = open ( o u t f i l e n a m e , ’w ’ )excep t:

s e l f . e r r o r ( ’ cou ld no t open e x t r a s f i l e ’ , 1 )# p r i n t headerso u t p u t . w r i t e ( ’ # A u t o m a t i c a l l y g e n e r e d a t e d f i l e from e x t r a c t _ l i s . py \

n ’ )o u t p u t . w r i t e ( ’ # These p a r t i c u l a r mo lecu les c o n t a i n : \ n ’ )f o r e i i n s e l f . e x t r a _ i n f o :

o u t p u t . w r i t e ( ’ # %s \ n ’ % e i )o u t p u t . w r i t e ( ’ # Th is f i l e s c o n t a i n s minimum , maximum , ( l o c a t a t i o n

o f t h o s e ) , \ n ’ )o u t p u t . w r i t e ( ’ # average , and s t a n d a r d d e v i a t i o n f o r each : \ n ’ )o u t p u t . w r i t e ( ’ # F i l e name \ t S e c t i o n name \ t P a r a m e t e r name \ tBase ( s ) o r

Base p a i r ( s ) \ n ’ )# f o r each f i l e ( i . e . noediav_B_Bc . t x t , noediav_B_Tc . t x t , e t c )f o r nf i n s e l f . n e w _ f i l e s :

i f nf [ 0 ] == ’A ’ : # A does no t be long here ( l i k e " Sesame S t r e e e t " )con t inue

# f o r each s u b c a t e g o r y ( Xdisp , Ydisp , e t c . )s u b _ c a t = n f [ 3 ]− 1c o u n t e r =−1# c r e a t e t h e namef o r fn i n nf [ 2 ] :

c o u n t e r = c o u n t e r + 1s u b _ c a t = s u b _ c a t + 1# Crea te a new f i l esn = fn . s p l i t ( ) [ 0 ]f i l e n a m e = s e l f . i n i t _ p a r a m s [ ’−f i l e _ n a m e ’ ] + ’ _ ’ + n f [ 0 ] + ’ _ ’ +

sn + ’ . t x t ’# I go t t h e name , then open i t ( or t r y t o )t r y :

i n p u t = open ( f i l ename , ’ r ’ )excep t:

s e l f . e r r o r ( ’ cou ld no t open %s f o r r e a d i n g ’ % f i l ename , 1)v a l u e s = {}# now , read each l i n e

102

f o r l i n e i n i n p u t . r e a d l i n e s ( ) :s t r t o k = l i n e . s p l i t ( )t r y : # s k i p b lank l i n e ( i f any )

s t r t o k [ 0 ] [ 0 ]excep t:

con t inuei f s t r t o k [ 0 ] [ 0 ] == ’ # ’ : # s k i p comments

i f s t r i n g . f i n d ( l i n e , ’ F i l e name \ t t i m e ’ ) >−1: # e x c e p t i st h e c a t e g o r y ’ s t i t l e

# g e t t h e column ’ s namescolumns = s t r t o k [ 3 : ]

e l s e:con t inue # no t t i t l e , t hen go t o n e x t l i n e

o u t p u t . w r i t e ( ’ \ n ’ )con t inue

# c o n v e r t l i n e i n t o t o k e n ss t r t o k = l i n e . s p l i t ( ’ \ t ’ )# i n i t d i c t i o n a r yv a l u e s [ s t r t o k [ 1 ] ] = s t r t o k [2:−1]

i n p u t . c l o s e ( )# go t a l l t h e i n f o . Now , r e c o n f i g u r e i t and c a l c u l a t e average ,

min , max , e t c .e x t r a s _ l i s t = {}f o r row i n range ( l e n ( columns ) ) :

t o t a l = 0 .0i t em s = 0f o r va i n v a l u e s . keys ( ) :

t r y :num = s t r i n g . a t o f ( v a l u e s [ va ] [ row ] )

excep t:con t inue

t o t a l = t o t a l + numi f i t em s == 0 :

min i = maxi = numm i n i _ f i l e = m a x i _ f i l e = vamin i_ t ime = maxi_t ime = s t r i n g . a t o f ( v a l u e s [ va ] [ 0 ] )

i f mini > num :min i = numm i n i _ f i l e = vamin i_ t ime = s t r i n g . a t o f ( v a l u e s [ va ] [ 0 ] )

i f maxi < num :maxi = numm a x i _ f i l e = vamaxi_t ime = s t r i n g . a t o f ( v a l u e s [ va ] [ 0 ] )

i t em s = i t em s + 1

103

i f i t em s > 0 :i t e m s _ f l o a t = s t r i n g . a t o f ( ’%d ’ % i t e m s )ave rage = t o t a l / i t e m s# now c a l c u l a t e s ta nda rd d e v i a t i o nt o t a l = 0 .0f o r va i n v a l u e s . keys ( ) :

t r y :num = s t r i n g . a t o f ( v a l u e s [ va ] [ row ] )

excep t:con t inue

t o t a l = t o t a l + ( ave rage− num )** 2sd = s q r t ( f l o a t ( t o t a l / i t e m s ) )e x t r a s _ l i s t [ columns [ row ] ] = \

( mini , m i n i _ f i l e , min i_ t ime , maxi , m a x i _ f i l e , maxi_t ime ,average , sd , i t e m s )

e l s e:e x t r a s _ l i s t [ columns [ row ] ] = ( ’NA’ )

# now , w r i t e t o e x t r a s f i l eo u t p u t . w r i t e ( ’#%s \ t%s \ t%s \ t%s \ t%s ’ % ( f i l ename , n f [ 1 ] , n f [ 0 ] ,

fn , ’ p a r a m e t e r ’ ) )f o r c i n columns :

o u t p u t . w r i t e ( ’ \ t%s ’ % c )o u t p u t . w r i t e ( ’ \ n \ t \ t \ t \ tminimum ’ )f o r c i n columns :

i f e x t r a s _ l i s t [ c ] <> ’NA’ :o u t p u t . w r i t e ( ’ \ t %.4 f ’ % e x t r a s _ l i s t [ c ] [ 0 ] )

e l s e:o u t p u t . w r i t e ( ’ \ tNA ’ )

o u t p u t . w r i t e ( ’ \ n \ t \ t \ t \ tmininum f i l e ’ )f o r c i n columns :

i f e x t r a s _ l i s t [ c ] <> ’NA’ :o u t p u t . w r i t e ( ’ \ t%s ’ % e x t r a s _ l i s t [ c ] [ 1 ] )

e l s e:o u t p u t . w r i t e ( ’ \ tNA ’ )

o u t p u t . w r i t e ( ’ \ n \ t \ t \ t \ tminimum t ime ’ )f o r c i n columns :

i f e x t r a s _ l i s t [ c ] <> ’NA’ :o u t p u t . w r i t e ( ’ \ t %.3 f ’ % e x t r a s _ l i s t [ c ] [ 2 ] )

e l s e:o u t p u t . w r i t e ( ’ \ tNA ’ )

o u t p u t . w r i t e ( ’ \ n \ t \ t \ t \ tmaximum ’ )f o r c i n columns :

i f e x t r a s _ l i s t [ c ] <> ’NA’ :o u t p u t . w r i t e ( ’ \ t %.4 f ’ % e x t r a s _ l i s t [ c ] [ 3 ] )

e l s e:

104

o u t p u t . w r i t e ( ’ \ tNA ’ )o u t p u t . w r i t e ( ’ \ n \ t \ t \ t \ tmaximum f i l e ’ )f o r c i n columns :

i f e x t r a s _ l i s t [ c ] <> ’NA’ :o u t p u t . w r i t e ( ’ \ t%s ’ % e x t r a s _ l i s t [ c ] [ 4 ] )

e l s e:o u t p u t . w r i t e ( ’ \ tNA ’ )

o u t p u t . w r i t e ( ’ \ n \ t \ t \ t \ tmaximum t ime ’ )f o r c i n columns :

i f e x t r a s _ l i s t [ c ] <> ’NA’ :o u t p u t . w r i t e ( ’ \ t %.3 f ’ % e x t r a s _ l i s t [ c ] [ 5 ] )

e l s e:o u t p u t . w r i t e ( ’ \ tNA ’ )

o u t p u t . w r i t e ( ’ \ n \ t \ t \ t \ t a v e r a g e ’ )f o r c i n columns :

i f e x t r a s _ l i s t [ c ] <> ’NA’ :o u t p u t . w r i t e ( ’ \ t %.5 f ’ % e x t r a s _ l i s t [ c ] [ 6 ] )

e l s e:o u t p u t . w r i t e ( ’ \ tNA ’ )

o u t p u t . w r i t e ( ’ \ n \ t \ t \ t \ t s t a n d a r d d e v i a t i o n ’ )f o r c i n columns :

i f e x t r a s _ l i s t [ c ] <> ’NA’ :o u t p u t . w r i t e ( ’ \ t %.5 f ’ % e x t r a s _ l i s t [ c ] [ 7 ] )

e l s e:o u t p u t . w r i t e ( ’ \ tNA ’ )

o u t p u t . w r i t e ( ’ \ n \ t \ t \ t \ t t o t a l # samples ’ )f o r c i n columns :

i f e x t r a s _ l i s t [ c ] <> ’NA’ :o u t p u t . w r i t e ( ’ \ t%d ’ % e x t r a s _ l i s t [ c ] [ 8 ] )

e l s e:o u t p u t . w r i t e ( ’ \ tNA ’ )

def e r r o r ( s e l f , t e x t , t ype ) :p r i n t t e x tsys . e x i t ( t ype )

i f __name__ == ’ __main__ ’ :e x t r a c t _ p a r a m s ( )

A.3.3 iplot.py

# i p l o t 2 . py Program t h a t p l o t s a n a l y s i s v a l u e s i n t e r a c t i v e l y

import sys , commands , s t r i n g , g lob , commands , math , sys

105

#some c o n s t a n t sPADDING = 0.025 # 2.5% paddingFILE_TYPE_RMSD = ’* . rms ’FILE_TYPE_POLAR = ’* bend ing . t x t ’TERMINAL_DEFAULT = ’ p o s t s c r i p t eps c o l o r ’OUTPUT_DEFAULT = ’ . eps ’

c l a s s i p l o t :def _ _ i n i t _ _ ( s e l f ) :

s e l f . v a l i d _ t o k e n s = [ ’ s e t ’ , ’ g e t ’ , ’ q u i t ’ , ’ e x i t ’ , ’ i n c l u d e ’ , ’ p l o t ’, ’ he lp ’ ]

s e l f . g e t _ s e t _ t o k e n s _ v a l i d = [ ’ p lo t_name ’ , \’ x l a b e l ’ , ’ y l a b e l ’ , ’ t i t l e ’ , \’ x range ’ , ’ y range ’ , ’ p range ’ , ’ t r a n g e ’ ,

\’ key ’ , ’ g r i d ’ , ’ f ( x ) ’ , \’ s u b t i t l e ’ , ’ f x _ t i t l e ’ , \’ padd ing ’ , ’ t i c k s ’ , \’ rmsd ’ , ’ p o l a r ’ , ’ b i n d s ’ , ’ l i s ’ , ’

s t a r w a r s ’ , ’ means ’ , \’ f o n t ’ , ’ l a t e x ’ , \’ t e r m i n a l ’ , ’ t ype ’ , ’ o u t p u t ’ ]

s e l f . p l o t _pa rams = { \’ t e r m i n a l ’ : ’ s c r e e n ’ , \’ f o n t ’ : ’ ArialMT ’ , \’ padd ing ’ : ’ on ’ \}

s e l f . t y p e s = [ ’ rmsd ’ , ’ p o l a r ’ , ’ b i n d s ’ , ’ s t a r w a r s ’ , ’ means ’ , ’ l i s ’ ]s e l f . t e r m i n a l _ v a l i d = [ ’ s c r e e n ’ , ’ p o s t s c r i p t ’ , ’ j pg ’ , ’ png ’ , ’

windows ’ , ’X11 ’ ]s e l f . needed_params2p lo t = [ ’ p lo t_name ’ , ’ t ype ’ , ’ t e r m i n a l ’ ]s e l f . a l l _ t o k e n s _ v a l i d = s e l f . v a l i d _ t o k e n s + s e l f .

g e t _ s e t _ t o k e n s _ v a l i ds e l f . i n t e r a c t i v e = 0s e l f . temp_f i l e_name = ’ ’s e l f . columns = [ ]s e l f . val id_command_params ( )s e l f . c h e c k _ r u n n i n g _ p l a t f o r m ( )s e l f . d e f _ t e r m i n a l ( )i f s e l f . i n t e r a c t i v e :

s e l f . i n t e r a c t i v e _ s h e l l ( )# read from i n t e r a c t i v e s h e l le l s e:

s e l f . r e a d _ f r o m _ f i l e ( ) # read from f i l e

106

def c h e c k _ r u n n i n g _ p l a t f o r m ( s e l f ) :s e l f . inwindows = sys . p l a t f o r m . f i n d ( ’ win ’ ) >−1

def i n t e r a c t i v e _ s h e l l ( s e l f ) :i c o u n t e r = 1 # i n t e r a c t i v e l i n e c o u n t e rwhi le True : # c y c l e u n t i l found q u i t

t r y : # i n e x c e p t i o n f o r ^ c , ^D, e t c .i l i n e = raw_ inpu t ( ’ i p l o t (%s ) >> ’% i c o u n t e r )

excep t: # ^C, ^D, e t c . ? then q u i tp r i n t ’ \ n t hanks f o r u s i n g i p l o t ’break

i f not u s e f u l _ l i n e ( i l i n e ) : # garbage ?con t inue # then go back t o t h e loop

# non−b lank l i n e , t hen v a l i d a t e i t ( minus CR c h a r a c t e r )i f s e l f . v a l i d a t e _ l i n e ( i l i n e ) :

s e l f . apply_command ( i l i n e )# good parameter , t hen run commandi c o u n t e r = i c o u n t e r + 1

def r e a d _ f r o m _ f i l e ( s e l f ) :t r y :

i n p u t _ s h e l l = open ( s e l f . i n p u t _ f n , ’ r ’ )excep t:

s e l f . e r r o r (101 , s e l f . i n p u t _ f n )f o r l i n e i n i n p u t _ s h e l l . r e a d l i n e s ( ) :# f o r each l i n e i n s c r i p t f i l e

i f not u s e f u l _ l i n e ( l i n e ) : # i f l i n e i s a comment or b lankcon t inue # s k i p i t

# non−b lank l i n e , t hen v a l i d a t e i t ( minus CR c h a r a c t e r )i f s e l f . v a l i d a t e _ l i n e ( l i n e [ :−1 ] ) :

s e l f . apply_command ( l i n e [ :−1 ] ) # v a l i d commmand , run command

def v a l i d a t e _ l i n e ( s e l f , l i n e ) :s t r t o k = l i n e . s p l i t ( )i f s t r t o k [ 0 ] not in s e l f . v a l i d _ t o k e n s :

s e l f . e r r o r (601 , s t r t o k [ 0 ] )re turn 0

i f s t r t o k [ 0 ] . f i n d ( ’ s e t ’ ) > −1:# check n e x t t o k e n ( s ) i s / are v a l i di f l e n ( s t r t o k ) < 2 :

s e l f . e r r o r (504 , ’ s e t ’ )re turn 0 # needs more paramete rs

# g e t r i d o f " s e t " wordi f s t r t o k [ 1 ] not in s e l f . g e t _ s e t _ t o k e n s _ v a l i d :

s e l f . e r r o r (691 , s t r t o k [ 1 ] )re turn 0

s e l f . command2apply = 10

107

e l i f s t r t o k [ 0 ] . f i n d ( ’ p l o t ’ ) > −1:# check a l l pa ramete rs needed are i n a l r e d ys e l f . command2apply = 20

e l i f s t r t o k [ 0 ] . f i n d ( ’ g e t ’ ) > −1:# check g e n e r a l or s p e c i f i c g e t r e q u i e r e di f s t r t o k [ 1 ] . f i n d ( ’ a l l ’ ) > −1:

s e l f . command2apply = 31re turn 1 # e v e r y t h i n g i s dandy

i f l e n ( s t r t o k ) > 1 : # g e t v a l u e from a s p e c i f i c command?f o r s t i n s t r t o k [ 1 : ] : # check a l l t h e commands are v a l i d

i f s t not in s e l f . g e t _ s e t _ t o k e n s _ v a l i d :s e l f . e r r o r (603 , s t ) # no t v a l i d , w r i t e messagere turn 0 # send e r r o r back

# s p e c i f i c g e t has been r e q u e s t e ds e l f . command2apply = 30

e l s e:s e l f . e r r o r (504 , ’ g e t ’ )re turn 0 # e r r o r

e l i f s t r t o k [ 0 ] . f i n d ( ’ q u i t ’ ) > −1:# j u s t q u i ts e l f . command2apply = 40

e l i f s t r t o k [ 0 ] . f i n d ( ’ e x i t ’ ) > −1:# j u s t q u i ts e l f . command2apply = 40

e l i f s t r t o k [ 0 ] . f i n d ( ’ i n c l u d e ’ ) > −1:# check parameter i s comings e l f . command2apply = 50

e l i f s t r t o k [ 0 ] . f i n d ( ’ he lp ’ ) > −1:# check g e n e r a l or s p e c i f i c he lp r e q u i e r e di f l e n ( s t r t o k ) > 1 : # he lp t o a s p e c i f i c command?

f o r s t i n s t r t o k [ 1 : ] : # check a l l t h e commands are v a l i di f s t not in s e l f . a l l _ t o k e n s _ v a l i d :

s e l f . e r r o r (603 , s t ) # no t v a l i d , w r i t e messagere turn 0 # send e r r o r back

s e l f . command2apply = 61e l s e: # j u s t g e n e r a l he lp

s e l f . command2apply = 60e l s e:

s e l f . e r r o r ( 2 1 ) # Unknown e r r o r . Th i s shou ld no t be happen ingre turn 1 # no e r r o r : command i s v a l i d

def apply_command ( s e l f , l i n e ) :s t r t o k = l i n e . s p l i t ( )i f 10 <= s e l f . command2apply < 20 :# s e t

# i s command i s j u s t t o change a parameter , t hen change i t r i g h t

108

awayi f not s e l f . com2apply ( l i n e ) :

s e l f . a p p l y _ t y p e ( l i n e ) # e l s e do a l o t o f t h i n g se l i f 20 <= s e l f . command2apply < 30 :# p l o t

i f s e l f . a l l _ n e e d e d _ p a r a m s 2 p l o t ( ) :# check i f t h e b a s i c t h i n g s arei n

s e l f . c r e a t e _ p l o t _ f i l e ( )s e l f . c r e a t e _ p l o t ( ) # then j u s t run g n u p l o t ( or any o t h e r p l o t t e r

)e l s e:

s e l f . e r r o r ( 2 4 )e l i f 30 <= s e l f . command2apply < 40 :# g e t

i f s e l f . command2apply == 31 :# g e t a l ls e l f . p r i n t _ a l l _ p a r a m s ( )re turn

s e l f . p r i n t _ g e t ( l i n e ) # p r i n t a l l t h e pa ramete rs g i v e n by use re l i f 40 <= s e l f . command2apply < 50 :# e x i t

i f s e l f . command2apply == 40 :s e l f . message ( 1 )# g i v e g r e e t i n g messagesys . e x i t ( 0 ) # e x i t w i t h

e l i f 60 <= s e l f . command2apply < 70 :s e l f . p r i n t _ h e l p ( l i n e ) # p r i n t he lp

e l s e:s e l f . e r r o r ( 4 1 )

re turn

def com2apply ( s e l f , l i n e ) :s t r t o k = l i n e . s p l i t ( )i f s t r t o k [ 1 ] i n s e l f . t y p e s :

re turn 0 # i t i s a t y p e d e f i n i t i o n ; t hen r e t u r ni f s t r t o k [ 1 ] . f i n d ( ’ l a b e l ’ ) > −1 or \

s t r t o k [ 1 ] . f i n d ( ’ t i t l e ’ ) > −1 or \s t r t o k [ 1 ] . f i n d ( ’ g r i d ’ ) > −1 or \s t r t o k [ 1 ] . f i n d ( ’ key ’ ) > −1 or \s t r t o k [ 1 ] . f i n d ( ’ t i c k s ’ ) > −1 or \s t r t o k [ 1 ] . f i n d ( ’ padd ing ’ ) > −1 or \s t r t o k [ 1 ] . f i n d ( ’ t e r m i n a l ’ ) > −1 or \s t r t o k [ 1 ] . f i n d ( ’ f o n t ’ ) > −1: # j u s t pass t h e v a l u e a long

## p r i n t l i n e [ l i n e . f i n d ( s t r t o k [ 1 ] ) : ]s e l f . p l o t _pa rams [ s t r t o k [ 1 ] ] = l i n e [ l i n e . f i n d ( s t r t o k [ 1 ] ) + l e n (

s t r t o k [ 1 ] ) + 1 : ]re turn 1

e l i f s t r t o k [ 1 ] . f i n d ( ’ range ’ ) > −1:re turn s e l f . check_range ( s t r t o k [ 1 : ] )

e l i f s t r t o k [ 1 ] . f i n d ( ’ p lo t_name ’ ) > −1:

109

re turn s e l f . check_p lo t_name ( s t r t o k [ 1 : ] )e l i f s t r t o k [ 1 ] . f i n d ( ’ f ( x ) ’ ) > −1:

re turn s e l f . check_ fx ( s t r t o k [ 1 : ] )e l i f s t r t o k [ 1 ] . f i n d ( ’ padd ing ’ ) > −1:

re turn s e l f . check_padd ing ( s t r t o k [ 1 : ] )e l i f s t r t o k [ 1 ] . f i n d ( ’ l a t e x ’ ) > −1:

re turn s e l f . c h e c k _ l a t e x ( s t r t o k [ 1 : ] )e l i f s t r t o k [ 1 ] . f i n d ( ’ t e r m i n a l ’ ) > −1:

re turn s e l f . c h e c k _ t e r m i n a l ( s t r t o k [ 1 : ] )e l i f s t r t o k [ 1 ] . f i n d ( ’ o u t p u t ’ ) > −1:

re turn s e l f . c h e c k _ o u t p u t ( s t r t o k [ 1 : ] )e l s e:

re turn 0 # Don ’ t know what i t i s

def check_range ( s e l f , t o k e n s ) :i f l e n ( t o k e n s ) <> 3 : # any range needs t o pa ramete rs a t l e a s t

s e l f . e r r o r (505 , t o k e n s [ 0 ] )re turn 0

t r y : # range must be a p a i r o f numbersnum1 = s t r i n g . a t o f ( t o k e n s [ 1 ] )

excep t:s e l f . e r r o r (505 , t o k e n s [ 0 ] )re turn 0

t r y :num2 = s t r i n g . a t o f ( t o k e n s [ 2 ] )

excep t:s e l f . e r r o r (505 , t o k e n s [ 0 ] )re turn 0

# e v e r y t h i n g f i n e , w r i t e i t t o v a r i a b l es e l f . p l o t _pa rams [ t o k e n s [ 0 ] ] = ( num1 , num2 )i f t o k e n s [ 0 ] . f i n d ( ’ t r a n g e ’ ) > −1: # i f chang ing t range , t hen change

temporary f i l ei f s e l f . t emp_f i l e_name <> ’ ’ : # no temp f i l e name , do n o t h i n g

i f not s e l f . c r e a t e _ t e m p _ f i l e ( ) :# r e t u r n e r r o r i f cou ld no tc r e a t e temp f i l e

s e l f . e r r o r (902 , s e l f . t emp_f i l e_name )re turn 0

re turn 1

def check_p lo t_name ( s e l f , t o k e n s ) :i f l e n ( t o k e n s ) <> 2 :

s e l f . e r r o r ( 2 5 )re turn 0

t r y :t e s t _ i n p = open ( t o k e n s [ 1 ] )

110

excep t:s e l f . e r r o r (105 , t o k e n s [ 1 ] )re turn 0

t e s t _ i n p . c l o s e ( )# send a warning t h a t chang ing p lo t_name i s dangerouss e l f . e r r o r ( 3 0 1 )re turn 1

def check_ fx ( s e l f , t o k e n s ) :i f l e n ( t o k e n s ) <> 2 :

s e l f . e r r o r ( 6 5 1 )re turn 0

t r y :num = s t r i n g . a t o f ( t o k e n s [ 1 ] )

excep t:s e l f . e r r o r ( 6 6 1 )re turn 0

s e l f . p l o t _pa rams [ ’ f ( x ) ’ ] = numre turn 1

def check_padd ing ( s e l f , t o k e n s ) :i f l e n ( t o k e n s ) == 1 : # no parameter

s e l f . c a l c _ p a d d i n g ( 1 )e l i f l e n ( t o k e n s ) == 2 : # t h e r e i s a parameter

t r y : # then needs t o be a numberpad = s t r i n g . a t o f ( t o k e n s [ 1 ] )

excep t:s e l f . e r r o r ( 6 7 6 ) # parameter i s no t a numberre turn 0

s e l f . p l o t _pa rams [ ’ padd ing ’ ] = pad# e v e r y t h i n g i s f i n ee l s e:

s e l f . e r r o r (662 , 1) # more than one or two paramete rs ( don ’ t knowwhat t o do ) .

re turn 0re turn 1 # e v e r y t h i n g i s coo l

def c a l c _ p a d d i n g ( s e l f , p a d _ f l a g ) :i f s e l f . p l o t _pa rams . has_key ( ’ x range ’ ) :# i f x range e x i s t

# c a l c u l a t e t h e padding p e r c e n t a g e ( g i v e n i n PADDING)pad = ( s e l f . p l o t _pa rams [ ’ x range ’ ] [ 1 ]− s e l f . p l o t _pa rams [ ’ x range ’

] [ 0 ] ) * PADDINGs e l f . p l o t _pa rams [ ’ padd ing ’ ] = pad

e l s e: # no parameter and no range , e r r o ri f p a d _ f l a g :

s e l f . e r r o r ( 2 6 )

111

re turn 26 # e r r o r

def c h e c k _ l a t e x ( s e l f , t o k e n s ) :# c r e a t e l a t e x i n p u t f i l es e l f . l a t e x _ f i l e = ’ r e s u l t s . t e x ’t r y :

l a t e x _ o u t p u t = open ( s e l f . l a t e x _ f i l e , ’w ’ )excep t:

p r i n t ( ’ cou ld no t open l a t e x f i l e "%s " . Check p e r m i s s i o n s ’ % s e l f .l a t e x _ f i l e )

re turnp s _ l i s t = g lob . g lob ( ’* . eps ’ )p s _ l i s t = p s _ l i s t + g lob . g lob ( ’* . ps ’ )# Alex : here I need t o add a l i s t f o r a l l t h e ps f i l e s i n t h e "

i n c l u d e " d i r e c t o r i e si f not l e n ( p s _ l i s t ) :

p r i n t ( ’ no t . ps nor . eps f i l e s i n t h i s d i r e c t o r y ’ )re turn

l a t e x _ o u t p u t . w r i t e ( ’% A u t o m a t i c a l l y g e n e r a t e d l a t e x f i l e by i p l o t .py \ n \ n ’ )

l a t e x _ o u t p u t . w r i t e ( ’ \ \ documen tc l ass { a r t i c l e } \ n ’ )l a t e x _ o u t p u t . w r i t e ( ’ \ \ usepackage { g r a p h i c x } \ n ’ )l a t e x _ o u t p u t . w r i t e ( ’ \ \ usepackage { nopageno } \ n ’ )l a t e x _ o u t p u t . w r i t e ( ’ \ \ s e t l e n g t h { \ \ t e x t h e i g h t } {9 .250 i n } \ n ’ )l a t e x _ o u t p u t . w r i t e ( ’ \ \ s e t l e n g t h { \ \ t opmarg in }{−0.250 i n } \ n ’ )l a t e x _ o u t p u t . w r i t e ( ’ \ \ beg in { document } \ n \ n ’ )l o c a l _ p s s = {}l i s t _ p a r a m s = [ ’ rms ’ , ’ r oc ’ , ’ means ’ ]f o r l p i n l i s t _ p a r a m s :

f o r p l i n p s _ l i s t :i f p l . f i n d ( l p ) > −1: # found a * rms* . eps or * roc * . ps , e t c .

i f l o c a l _ p s s . has_key ( l p ) :l o c a l _ p s s [ l p ] . append ( p l )

e l s e:l o c a l _ p s s [ l p ] = [ p l ]

p s _ l i s t . remove ( p l )i = 0f o r l p s s i n l o c a l _ p s s . keys ( ) :

f o r l i n l o c a l _ p s s [ l p s s ] :l a t e x _ o u t p u t . w r i t e ( ’ \ \ beg in { f i g u r e* } [ t ] \ n ’ )l a t e x _ o u t p u t . w r i t e ( ’ \ \ c e n t e r i n g \ n ’ )l a t e x _ o u t p u t . w r i t e ( ’ \ \ i n c l u d e g r a p h i c s [ h e i g h t =3.0 i n ]{% s } \ n ’

% l )l a t e x _ o u t p u t . w r i t e ( ’ \ \ end { f i g u r e* } \ n \ n ’ )i = i + 1

112

i f i%3 == 0 :l a t e x _ o u t p u t . w r i t e ( ’ \ \ c l e a r p a g e \ n \ n ’ )

l a t e x _ o u t p u t . w r i t e ( ’ \ n ’ )i f l e n ( p s _ l i s t ) > 0 :

l a t e x _ o u t p u t . w r i t e ( ’ % . l i s f i g u r e s \ n ’ )f o r p l i n p s _ l i s t :

l a t e x _ o u t p u t . w r i t e ( ’ \ \ beg in { f i g u r e* } [ t ] \ n ’ )l a t e x _ o u t p u t . w r i t e ( ’ \ \ c e n t e r i n g \ n ’ )l a t e x _ o u t p u t . w r i t e ( ’ \ \ i n c l u d e g r a p h i c s [ h e i g h t =3.0 i n ]{% s } \ n ’

% p l )l a t e x _ o u t p u t . w r i t e ( ’ \ \ end { f i g u r e* } \ n ’ )i = i + 1i f i %3==0:

l a t e x _ o u t p u t . w r i t e ( ’ \ \ c l e a r p a g e \ n \ n ’ )l a t e x _ o u t p u t . w r i t e ( ’ \ n ’ )l a t e x _ o u t p u t . w r i t e ( ’ \ \ end { document } \ n \ n ’ )l a t e x _ o u t p u t . c l o s e ( )p r i n t ’ r eady t o run l a t e x on " r e s u l t s . t e x " ’re turn 1

def c h e c k _ t e r m i n a l ( s e l f , t o k e n s ) :i f t o k e n s [ 1 ] not in s e l f . t e r m i n a l _ v a l i d :

s e l f . e r r o r (707 , t o k e n s [ 1 ] )re turn 0

term = ’ ’f o r t o i n t o k e n s [ 1 : ] :

te rm = term + ’%s ’ % t os e l f . p l o t _pa rams [ ’ t e r m i n a l ’ ] = termre turn 1

def c h e c k _ o u t p u t ( s e l f , t o k e n s ) :i f l e n ( t o k e n s ) <> 2 : # j u s t check r i g h t amount o f pa ramete rs

s e l f . e r r o r (653 , t o k e n s [ 0 ] , 1 )s e l f . p l o t _pa rams [ ’ o u t p u t ’ ] = t o k e n s [ 1 ] . r e p l a c e ( ’ " ’ , ’ ’ )re turn 1

def a p p l y _ t y p e ( s e l f , l i n e ) :# t y p e s : ’ rmsd ’ , ’ po l a r ’ , ’ b i n d s ’ , ’ s t a r w a r s ’ , ’ means ’ , ’ l i s ’s t r t o k = l i n e . s p l i t ( )b a s e _ p a i r = ’ 0 ’t y p e _ d i g = 0i f s t r t o k [ 1 ] . f i n d ( ’ rmsd ’ ) > −1:

i f l e n ( s t r t o k [ 1 ] ) <> 4 : # check i s on l y rmsds e l f . e r r o r (685 , s t r t o k [ 1 ] )re turn 0

113

g raph_ type = ’ rmsd ’f i l e _ l o o k _ u p = FILE_TYPE_RMSDt y p e _ d i g = 1

e l i f s t r t o k [ 1 ] . f i n d ( ’ p o l a r ’ ) > −1:i f l e n ( s t r t o k [ 1 ] ) <> 5 : # check i s on l y po l a r ( no t po lara ,

po la ra r , e t c . )s e l f . e r r o r (685 , s t r t o k [ 1 ] )re turn 0

g raph_ type = ’ p o l a r ’f i l e _ l o o k _ u p = FILE_TYPE_POLARt y p e _ d i g = 2

e l i f s t r t o k [ 1 ] . f i n d ( ’ l i s ’ ) > −1:i f l e n ( s t r t o k [ 1 ] ) <> 3 : # check i s on l y l i s ( no t l i s a , l i s a s , e t c

. )s e l f . e r r o r (685 , s t r t o k [ 1 ] )re turn 0

g raph_ type = ’ l i s ’t y p e _ d i g = 3

e l i f s t r t o k [ 1 ] . f i n d ( ’ b i n d s ’ ) > −1:i f l e n ( s t r t o k [ 1 ] ) <> 5 :

s e l f . e r r o r (685 , s t r t o k [ 1 ] )re turn 0

g raph_ type = ’ b i n ds ’t y p e _ d i g = 4

e l i f s t r t o k [ 1 ] . f i n d ( ’ means ’ ) >−1 or s t r t o k [ 1 ] . f i n d ( ’ s t a r w a r s ’ ) >−1:

i f l e n ( s t r t o k [ 1 ] ) not in [ 5 , 8 ] :s e l f . e r r o r (685 , s t r t o k [ 1 ] )re turn 0 # p o s s i b i l i t y o f a bug : meansare would passed as means

g raph_ type = ’ means ’t y p e _ d i g = 5

e l i f s t r t o k [ 1 ] . f i n d ( ’ l a t e x ’ ) > −1:i f l e n ( s t r t o k [ 1 ] ) <> 5 :

s e l f . e r r o r (685 , s t r t o k [ 1 ] )re turn 0

g raph_ type = ’ l a t e x ’t y p e _ d i g = 6re turn 0

e l s e:s e l f . e r r o r ( 4 2 )

i f g raph_ type i n [ ’ b i n d s ’ , ’ means ’ , ’ l i s ’ ] :r e s t _ p a r a m s = s e l f . g e t _ r e s t _ p a r a m s ( graph_ type , s t r t o k [ 1 : ] )i f not r e s t _ p a r a m s [ 0 ] : # i f pa ramete rs are no t good

re turn 0 # re tu rn , e r r o r has been p r i n t e d# go t p a r t o f f i l e name

114

f i l e _ l o o k _ u p = ’*%s . t x t ’ % r e s t _ p a r a m s [ 1 ]b a s e _ p a i r = r e s t _ p a r a m s [ 2 ]s e l f . p l o t _pa rams [ ’ l i s _ p a r a m ’ ] = r e s t _ p a r a m s [ 1 ]

f i l e _ c h e c k = s e l f . c h e c k _ o n l y _ o n e _ f i l e ( g raph_ type , f i l e _ l o o k _ u p ,b a s e _ p a i r )

i f not f i l e _ c h e c k [ 0 ] : # no f i l e s w i t h t h a t d e s c r i p t i o ns e l f . e r r o r ( 2 1 0 )re turn 0

i f f i l e _ c h e c k [ 0 ] > 1 :s e l f . e r r o r ( 3 3 1 ) # warning : more than one f i l e

i f l e n ( f i l e _ c h e c k [ 1 ] ) == 1 :s e l f . message (11 , f i l e _ c h e c k [ 1 ] [ 0 ] )

s e l f . c r e a t e _ t e m p _ f i l e _ n a m e ( f i l e _ c h e c k [ 1 ] [ 0 ] , g raph_ t ype )# g e t e x t r a pa ramete rse r r o r = s e l f . g e t _ e x t r a _ p a r a m s ( type_d ig , b a s e _ p a i r )i f e r r o r :

p r i n t e r r o rre turn 0

# c r e a t e temporary f i l ei f not s e l f . c r e a t e _ t e m p _ f i l e ( ) :# r e t u r n s 0 i f f u n c t i o n f a i l s

s e l f . e r r o r ( 9 0 1 ) # cou ld no t c r e a t e temporary f i l e# check yranges e l f . i n i t _ r e s t _ v a r s ( g raph_ type , t y p e _ d i g )

def i n i t _ r e s t _ v a r s ( s e l f , g raph_ type , d type ) :i f d type i n [ 1 , 3 , 4 ] : # rmsd , l i s , b inds , means

s e l f . p l o t _pa rams [ ’ x l a b e l ’ ] = ’ t ime ( ps ) ’s e l f . p l o t _pa rams [ ’ key ’ ] = ’ key on box ’s e l f . p l o t _pa rams [ ’ g r i d ’ ] = ’ g r i d ’

i f d type == 1 : # i n i t v a r i a b l e s f o r rmsds e l f . p l o t _pa rams [ ’ y l a b e l ’ ] = ’ d i s t a n c e (nm) ’s e l f . p l o t _pa rams [ ’ key ’ ] = ’ o f f ’

e l i f d type == 2 : # p o l a r# i n i t v a r i a b l e s f o r po l a r p l o t

s e l f . p l o t _pa rams [ ’ g r i d ’ ] = ’ g r i d p o l a r ’s e l f . p l o t _pa rams [ ’ key ’ ] = ’ o f f ’

e l i f d type == 3 : # l i spass # i n i t v a r i a b l e s f o r c u r v e s p l o t

e l i f d type == 4 : # b in d spass # i n i t v a r i a b l e s f o r b i n d s p l o t

e l i f d type == 5 : # means# s e l f . p lo t_params [ ’ x l a b e l ’ ] = ’ base p a i r ’s e l f . p l o t _pa rams [ ’ key ’ ] = ’ o f f ’s e l f . p l o t _pa rams [ ’ g r i d ’ ] = ’ g r i d ’

e l s e:

115

s e l f . e r r o r ( 5 0 1 )i f d type i n [ 3 , 4 , 5 ] : # l i s and c u r v e s

i f s e l f . p l o t _pa rams [ ’ l i s _ p a r a m ’ ] . f i n d ( ’ _ ’ ) >−1:s t r t o k = s e l f . p l o t _pa rams [ ’ l i s _ p a r a m ’ ] . s p l i t ( ’ _ ’ )c a t = s t r t o k [ 0 ]param = s t r t o k [ 1 ]

e l s e:c a t = ’ ’param = s e l f . p l o t _pa rams [ ’ l i s _ p a r a m ’ ]

keys = ge t _ke ys ( param , c a t )i f not keys or not l e n ( keys ) :

s e l f . e r r o r ( 3 3 2 )s e l f . p l o t _pa rams [ ’ keys ’ ] = [ ]

e l s e:s e l f . p l o t _pa rams [ ’ keys ’ ] = keys

def d e f _ t e r m i n a l ( s e l f ) :i f s e l f . i n t e r a c t i v e :

i f s e l f . inwindows :s e l f . p l o t _pa rams [ ’ t e r m i n a l ’ ] = ’ windows ’

e l s e:s e l f . p l o t _pa rams [ ’ t e r m i n a l ’ ] = TERMINAL_DEFAULT

e l s e:s e l f . p l o t _pa rams [ ’ t e r m i n a l ’ ] = TERMINAL_DEFAULT

def g e t _ e x t r a _ p a r a m s ( s e l f , d type , bp ) :i f not s e l f . p l o t _pa rams . has_key ( ’ p lo t_name ’ ) :

s e l f . e r r o r ( 2 8 )re turn 28

t r y :i n p u t = open ( s e l f . p l o t _pa rams [ ’ p lo t_name ’ ] , ’ r ’ )

excep t:s e l f . e r r o r (922 , s e l f . p l o t _pa rams [ ’ p lo t_name ’ ] )re turn 922 # cou ld no t open f i l e

i f d type == 5 :re turn 0 # does no t need columns f o r t h i s p l o t

i f d type == 1 :s e l f . columns = [ 0 , 1 ] # f i r s t two columns from . rms f i l ere turn 0 # know wich columns t o e x t r a c t

e l i f d type == 2 :s e l f . columns = [ 0 , 1 , 2 ] # a l l co lumns f o r po l a r (* bend ing . t x t ’ )

f i l ere turn 0 # know wich columns t o e x t r a c t

gco l = [ ]f o r l i n e i n i n p u t . r e a d l i n e s ( ) :

s t r t o k = l i n e . s p l i t ( ’ \ t ’ )

116

i f not l e n ( s t r t o k ) : # no leng th , t hen i s emptycon t inue # no t u s e f u l

t r y : # n o t h i n g i n t h e f i r s t t o k e n ?s t r t o k [ 0 ] [ 0 ]

excep t:con t inue # then u s e l e s s

i f s t r t o k [ 0 ] [ 0 ] == ’ # ’ : # comment?i f l e n ( s t r t o k ) > 1 :

i f s t r t o k [ 1 ] . f i n d ( ’ F i l e name ’ ) > −1: # look f o r " F i l e name"gco l = s t r t o k [3:−1] # copy a l l t h e rowbreak # g e t ou t o f t h e c y c l e

i = 3 # base p a i r s s t a r t i n t h e t h i r d columnbpn = s t r i n g . a t o i ( bp ) # c o n v e r t base p a i r numberf o r gc i n gco l : # f o r a l l base p a i r names

t ok = gc . s p l i t ( ’− ’ ) # s e p a r a t e themnum1 = s t r i n g . a t o i ( t ok [ 0 ] [ 1 : ] ) # c o n v e r t t h e f i r s t p a r t t o number

( t a k e r e s i d u e name ou t )num2 = s t r i n g . a t o i ( t ok [ 1 ] [ 1 : ] ) # c o n v e r t t h e second p a r t t o

number ( t a k e r e s i d u e name ou t )i f bpn == num1 or bpn == num2 : # i f l ooked base−p a i r i n f i l e ’ s

base p a i rbreak # g e t ou t o f loop , i t would c o n t a i n i n f o

i = i + 1e l s e: # f i n i s h e d f o r w i t h o u t s u c c e s s ( no base p a i r found )

s e l f . e r r o r ( 2 9 ) # cou ld no t f i n d columsre turn 29 # send e r r o r back

s e l f . columns = [ 1 , i ]re turn 0 # no e r r o r

def c r e a t e _ t e m p _ f i l e _ n a m e ( s e l f , or i_name , t ype ) :i f t ype . f i n d ( ’ means ’ ) > −1:

s e l f . t emp_f i l e_name = or i_name . r e p l a c e ( ’ . ’ , ’ _ ’ ) + ’ _means . tmp ’e l s e:

s e l f . t emp_f i l e_name = or i_name . r e p l a c e ( ’ . ’ , ’ _ ’ ) + ’ . tmp ’# p r i n t or i_names e l f . p l o t _pa rams [ ’ p lo t_name ’ ] = or i_names e l f . p l o t _pa rams [ ’ t ype ’ ] = t ype

def c r e a t e _ t e m p _ f i l e ( s e l f ) :i f s e l f . t emp_f i l e_name == ’ ’ :

s e l f . e r r o r (903 , ’ ’ )re turn 0 # e r r o r

i f not s e l f . p l o t _pa rams . has_key ( ’ p lo t_name ’ ) :s e l f . e r r o r ( 2 7 )re turn 0 # e r r o r

117

t r y :i n p u t = open ( s e l f . p l o t _pa rams [ ’ p lo t_name ’ ] , ’ r ’ )

excep t:s e l f . e r r o r (921 , s e l f . p l o t _pa rams [ ’ p lo t_name ’ ] )re turn 0 # cou ld no t open f i l e

i f not s e l f . p l o t _pa rams . has_key ( ’ t ype ’ ) :s e l f . e r r o r ( 2 8 )i n p u t . c l o s e ( )re turn 0

i f not s e l f . p l o t _pa rams [ ’ t ype ’ ] . f i n d ( ’ means ’ ) >−1:i f not l e n ( s e l f . columns ) :

s e l f . e r r o r ( 2 9 ) # no colum i n f o r m a t i o n : g e t ou t o f herei n p u t . c l o s e ( ) # bu t c l o s e opened f i l e f i r s tre turn 0

nomeans = 1e l s e:

##nomeans = 0# means f i l e r e q u i e r e s s p e c i a l ca rere turn s e l f . c r e a t e _ m e a n s _ t e m p _ f i l e ( )

t r y :ou tpu t_ temp = open ( s e l f . temp_f i le_name , ’w ’ )

excep t:s e l f . e r r o r (910 , s e l f . t emp_f i l e_name )re turn 0 # cou ld no t open f i l e

s e l f . w r i t e_heade r_ tmp ( ou tpu t_ temp )p a r t i a l = s e l f . p l o t _pa rams . has_key ( ’ t r a n g e ’ )# p a r t i a l rangel i n e n = 0coun t = 0min i = maxi = 0 .0param2 = 0 .0f o r l i n e i n i n p u t . r e a d l i n e s ( ) :

l i n e n = l i n e n + 1i f not u s e f u l _ l i n e ( l i n e ) :

con t inues t r t o k = l i n e . s p l i t ( )i f s t r t o k [ 0 ] . f i n d ( ’ a n a l y s i s ’ ) > −1:

breaki f nomeans : # r e g u l a r f i l e

i f p a r t i a l : # t h e r e i s a t r a n g et r y :

t ime = s t r i n g . a t o f ( s t r t o k [ s e l f . columns [ 0 ] ] )excep t:

s e l f . e r r o r (935 , s e l f . p l o t _pa rams [ ’ p lo t_name ’ ] , l i n e n )#e r r o r i n fo rma t

i f not ( s e l f . p l o t _pa rams [ ’ t r a n g e ’ ] [ 0 ] <= t ime <= s e l f .

118

p lo t_pa rams [ ’ t r a n g e ’ ] [ 1 ] ) :con t inue

t ry :param1 = s t r i n g . a t o f ( s t r t o k [ s e l f . columns [ 0 ] ] )

excep t:s e l f . e r r o r (935 , s e l f . p l o t _pa rams [ ’ p lo t_name ’ ] , l i n e n )# e r r o r

i n fo rma tbreak

i f l e n ( s e l f . columns ) > 2 :t r y :

param2 = s t r i n g . a t o f ( s t r t o k [ s e l f . columns [ 2 ] ] )excep t:

s e l f . e r r o r (935 , s e l f . p l o t _pa rams [ ’ p lo t_name ’ ] , l i n e n )#e r r o r i n fo rma t

breake l s e:

param2 = 0 .0i f not coun t :

min i=maxi=param1min i2=maxi2=param2

min i = min ( mini , param1 )maxi = max ( maxi , param1 )min i2 = min ( mini2 , param2 )maxi2 = max ( maxi2 , param2 )coun t = coun t + 1ou tpu t_ temp . w r i t e ( ’%10s ’ % s t r t o k [ s e l f . columns [ 0 ] ] )f o r c i n s e l f . columns [ 1 : ] :

ou tpu t_ temp . w r i t e ( ’ \ t %10s ’ % s t r t o k [ c ] )ou tpu t_ temp . w r i t e ( ’ \ n ’ )

e l s e:# p r i n t l i n e [:−1]# p r i n t ’ he re ’pass # c a l c u l a t e t o t a l

i n p u t . c l o s e ( )ou tpu t_ temp . c l o s e ( )i f s e l f . p l o t _pa rams [ ’ t ype ’ ] . f i n d ( ’ p o l a r ’ ) >−1:

( reg ion , t i c ) = s e l f . f i n d _ t i c s ( maxi2 )p r i n t t i cs e l f . p l o t _pa rams [ ’ t i c s ’ ] = t i cs e l f . p l o t _pa rams [ ’ x range ’ ] = (− reg ion , r e g i o n )

e l s e:padd ing = 1s e l f . p l o t _pa rams [ ’ x range ’ ] = ( mini , maxi )s e l f . c a l c _ p a d d i n g ( 0 )t r y :

119

s t r i n g . a t o f ( s e l f . p l o t _pa rams [ ’ padd ing ’ ] )excep t:

padd ing =0i f padd ing :

s e l f . p l o t _pa rams [ ’ x range ’ ] = \( mini−s t r i n g . a t o f ( s e l f . p l o t _pa rams [ ’ padd ing ’ ] ) , \maxi+ s t r i n g . a t o f ( s e l f . p l o t _pa rams [ ’ padd ing ’ ] ) )

re turn 1 # no e r r o r

def v a l i d ( s e l f , l i n e ) :i f not l e n ( l i n e ) :

re turn 0s t r t o k = l i n e . s p l i t ( )i f not l e n ( s t r t o k ) :

re turn 0i f s t r t o k [ 0 ] [ 0 ] == ’ # ’ :

re turn 0re turn 1

def h e a d l i n e s ( s e l f , l i n e ) :i f l i n e . f i n d ( ’ F i l e name ’ ) > −1 and \

l i n e . f i n d ( ’ t ime ’ ) > −1:re turn 1

e l s e:re turn 0

def g e t _ h e a d l i n e s ( s e l f , l i n e ) :s t r t o k = l i n e . s p l i t ( ’ \ t ’ )re turn s t r t o k

def c r e a t e _ m e a n s _ t e m p _ f i l e ( s e l f ) :fn = s e l f . p l o t _pa rams [ ’ p lo t_name ’ ]e x t r a s _ p a t t e r n = ’* ’ + fn [ : fn . f i n d ( ’ _ ’ ) ] + ’ * e x t r a* . t x t ’e x t r a s _ f i l e = g lob . g lob ( e x t r a s _ p a t t e r n )i f l e n ( e x t r a s _ f i l e ) <> 1 :

r e c a l c = 1e l i f s e l f . p l o t _pa rams . has_key ( ’ t r a n g e ’ ) :

r e c a l c = 1e l s e:

r e c a l c = 0h e a d e r s _ l i s t = [ ]c o n t e n t _ l i s t = [ ]i f not r e c a l c : # easy way out , j u s t f i n d average and s t d d e v i n f i l e

and w r i t e i t t o temp f i l e

120

t r y :i n p u t _ e x t r a s = open ( e x t r a s _ f i l e [ 0 ] , ’ r ’ )

excep t:s e l f . e r r o r ( ’ cou ld no t open e x t r a s f i l e "%s " . Check p e r m i s s i o n s ’

% e x t r a s _ f i l e [ 0 ] , 1 )re turn

# e x t r a s f i l e open# look f o r f i l e namef o u n d _ i t = 0f o r l i n e i n i n p u t _ e x t r a s . r e a d l i n e s ( ) :

s t r t o k = l i n e . s p l i t ( ’ \ t ’ )i f l e n ( s t r t o k ) < 1 :

con t inuei f f o u n d _ i t :

t r y :s t r t o k [ 6 ]

excep t:con t inue

i f not l e n ( s t r t o k [ 0 ] ) :t e m p _ l i s t = s t r t o k [6:−1]t e m p _ l i s t . append ( s t r t o k [−1 : ] [ 0 ] [ : −1 ] )c o n t e n t _ l i s t . append ( t e m p _ l i s t )

t r y :s t r t o k [ 0 ] [ 0 ]

excep t:con t inue

i f f o u n d _ i t and s t r t o k [ 0 ] [ 0 ] == ’ # ’ :f o u n d _ i t = 0break

i f s t r t o k [ 0 ] [ 0 ] <> ’ # ’ : # i f no t a comment , f i l e name i s no there

con t inuei f s t r t o k [ 0 ] . f i n d ( fn ) > −1:

f o u n d _ i t = 1h e a d e r s _ l i s t = s t r t o k [ 4 : ]# p r i n t s t r t o k

i n p u t _ e x t r a s . c l o s e ( )e l s e: # read t a t a from t h e o r i g i n a l f i l e

t r y :i n p u t = open ( s e l f . p l o t _pa rams [ ’ p lo t_name ’ ] , ’ r ’ )

excep t:s e l f . e r r o r ( ’ cou ld no t open "%s " f i l e . Check p e r m i s s i o n s ’ % s e l f

. p l o t _pa rams [ ’ p lo t_name ’ ] , 1 )re turn

i = 0

121

i n _ t r a n g e _ l i s t = [ ]t o t a l s = [ ]m in i s = [ ]maxis = [ ]m i n i s _ s t r = [ ]m a x i s _ s t r = [ ]m in i s_ t ime = [ ]max is_ t ime = [ ]l = 0r c = 0nrc = 0f o r l i n e i n i n p u t . r e a d l i n e s ( ) :

i = i + 1i f not s e l f . v a l i d ( l i n e ) :

i f not s e l f . h e a d l i n e s ( l i n e ) :con t inue

e l s e:h e a d e r s _ l i s t = s e l f . g e t _ h e a d l i n e s ( l i n e )

# p r i n t h e a d e r s _ l i s tcon t inue

s t r t o k = l i n e . s p l i t ( ’ \ t ’ )i f l e n ( s t r t o k ) < 2 :

con t inuet ry :

num = f l o a t ( s t r t o k [ 2 ] )excep t:

s e l f . e r r o r ( ’"%s " f i l e i s malformed i n l i n e %d ’ % ( s e l f .p l o t _pa rams [ ’ p lo t_name ’ ] , i ) , 1 )

re turnj = 0i f num >= s e l f . p l o t _pa rams [ ’ t r a n g e ’ ] [ 0 ]and \

num <= s e l f . p l o t _pa rams [ ’ t r a n g e ’ ] [ 1 ] :nums = [ ]j = j + 1# p r i n t s t r t o kf o r s t i n s t r t o k [ 2 : ] :

t s t r = s t r i n g . s t r i p ( s t )i f l e n ( t s t r ) < 1 :

con t inuet ry :

num = f l o a t ( s t )excep t:

p r i n t ’"%s " f i l e i s malformed i n l i n e %d i n %s param %d ’% ( s e l f . p l o t _pa rams [ ’ p lo t_name ’ ] , i , s t , j )

re turn

122

nums . append ( num )k = 0f o r nu i n nums :

i f l == 0 :t o t a l s . append ( nu )maxis . append ( nu )m in i s . append ( nu )m i n i s _ s t r . append ( s t r t o k [ 1 ] )m a x i s _ s t r . append ( s t r t o k [ 1 ] )m in i s_ t ime . append ( s t r t o k [ 2 ] )max is_ t ime . append ( s t r t o k [ 2 ] )

e l s e:t o t a l s [ k ] = t o t a l s [ k ] + nui f maxis [ k ] < nu :

maxis [ k ] = num a x i s _ s t r [ k ] = s t r t o k [ 1 ]max is_ t ime [ k ] = s t r t o k [ 2 ]

i f min is [ k ] > nu :m in i s [ k ] = num i n i s _ s t r [ k ] = s t r t o k [ 1 ]m in i s_ t ime [ k ] = s t r t o k [ 2 ]

k = k + 1i n _ t r a n g e _ l i s t . append ( nums )r c = r c + 1l = l + 1

e l s e:n rc = n rc + 1

a v e r a g e s = [ ]n = l e n ( i n _ t r a n g e _ l i s t )f o r t i n t o t a l s :

a v e r a g e s . append ( f l o a t ( t / n ) )t o t a l s = [ ]i = 0f o r i t l i n i n _ t r a n g e _ l i s t :

k = 0f o r i t i n i t l :

i f i == 0 :t o t a l s . append ( ( i t− a v e r a g e s [ k ] ) * ( i t − a v e r a g e s [ k ] ) )

e l s e:t o t a l s [ k ] = t o t a l s [ k ] + ( ( i t − a v e r a g e s [ k ] ) * ( i t − a v e r a g e s

[ k ] ) )k = k + 1

i = i + 1i f n < 2 :

i f n == 1 :

123

n = 1e l s e:

p r i n t ’ r ange c o n t a i n s ze ro samples ’re turn

e l s e:n = l e n ( i n _ t r a n g e _ l i s t )− 1 # i t ’ s n − 1 becuase i t ’ s a sample

sds = [ ]f o r t i n t o t a l s :

sds . append ( math . s q r t ( f l o a t ( t / n ) ) )c o n t e n t _ l i s t = [ ]c l = [ ]f o r m i n min is [ 1 : ] :

c l . append ( ’ %.4 f ’ % m)c o n t e n t _ l i s t . append ( c l )c l = [ ]f o r m i n m i n i s _ s t r [ 1 : ] :

c l . append (m)c o n t e n t _ l i s t . append ( c l )c l = [ ]f o r m i n min i s_ t ime [ 1 : ] :

c l . append ( s t r i n g . s t r i p (m) )c o n t e n t _ l i s t . append ( c l )c l = [ ]f o r m i n maxis [ 1 : ] :

c l . append ( ’ %.4 f ’ % m)c o n t e n t _ l i s t . append ( c l )c l = [ ]f o r m i n m a x i s _ s t r [ 1 : ] :

c l . append (m)c o n t e n t _ l i s t . append ( c l )c l = [ ]f o r m i n maxis_ t ime [ 1 : ] :

c l . append ( s t r i n g . s t r i p (m) )c o n t e n t _ l i s t . append ( c l )c l = [ ]f o r m i n a v e r a g e s [ 1 : ] :

c l . append ( ’ %.4 f ’ % m)c o n t e n t _ l i s t . append ( c l )c l = [ ]f o r m i n sds [ 1 : ] :

c l . append ( ’ %.4 f ’ % m)c o n t e n t _ l i s t . append ( c l )c l = [ ]f o r m i n sds [ 1 : ] :

c l . append ( ’%d ’ % r c )

124

c o n t e n t _ l i s t . append ( c l )s k i p _ i t = 0t r y :

o u t p u t = open ( ’ e x t r a s . t x t ’ , ’w ’ )excep t:

p r i n t ’ cou ld no t open " e x t r a s . t x t " f o r w r i t i n g . Check p e r m i s s i o n’

s k i p _ i t = 1i f not s k i p _ i t :

o u t p u t . w r i t e ( ’ # A u t o m a t i c a l l y g e n e r a t e d f i l e by i p l o t . py \ n ’ )i = 0f o r c l i n c o n t e n t _ l i s t :

i f i == 0 :o u t p u t . w r i t e ( ’ \ tminimum ’ )

i f i == 1 :o u t p u t . w r i t e ( ’ \ tminimum f i l e ’ )

i f i == 2 :o u t p u t . w r i t e ( ’ \ tminimum t ime ’ )

i f i == 3 :o u t p u t . w r i t e ( ’ \ tmaximum ’ )

i f i == 4 :o u t p u t . w r i t e ( ’ \ tmaximum f i l e ’ )

i f i == 5 :o u t p u t . w r i t e ( ’ \ tmaximum t ime ’ )

i f i == 6 :o u t p u t . w r i t e ( ’ \ t a v e r a g e ’ )

i f i == 7 :o u t p u t . w r i t e ( ’ \ t s t a n d a r d d e v i a t i o n ’ )

i f i == 8 :o u t p u t . w r i t e ( ’ \ t t o t a l # samples ’ )

f o r c i n c l :o u t p u t . w r i t e ( ’ \ t%s ’ % c )

o u t p u t . w r i t e ( ’ \ n ’ )i = i + 1

t i c k s _ l i s t = [ ]co = 1h a l f = ( l e n ( h e a d e r s _ l i s t )− 2) / 2l a s t = ( l e n ( h e a d e r s _ l i s t )− 3)# p r i n t h e a d e r s _ l i s ti f s e l f . p l o t _pa rams [ ’ p lo t_name ’ ] . f i n d ( ’ _E_ ’ ) >−1 or \

s e l f . p l o t _pa rams [ ’ p lo t_name ’ ] . f i n d ( ’_G_ ’ ) >−1 :s p e c _ c a s e = 1

e l s e:s p e c _ c a s e = 0

125

f o r vb i n h e a d e r s _ l i s t [ 2 :−1 ] :# i f long names ( i . e ’A1−T24 ’ ) t hen w r i t e as i t i s# A lex : need t o add v a r i a b l e c o n t a i n i n g t h i s o p t i o n# i f s e l f . p lo t_params [ ’ longn ’ ]i f ( not s p e c _ c a s e )and ( co == 1 or co == h a l f or co == h a l f + 1) :

tempv = vbe l s e:

tempv = ’ ’# p r i n t vbf o r v i n vb :

i f v i n s t r i n g . l e t t e r s or v == ’− ’ :i f s p e c _ c a s e :

i f v <> ’− ’ :tempv = tempv + v

e l s e:tempv = tempv + v

t i c k s _ l i s t . append ( [ tempv , co ] )co = co + 1

i f not s p e c _ c a s e :t i c k s _ l i s t . append ( [ h e a d e r s _ l i s t [−1 : ] [ 0 ] [ : −1 ] , co ] )

e l s e:tempv = ’ ’f o r v i n h e a d e r s _ l i s t [−1 : ] [ 0 ] [ : −1 ] :

i f v i n s t r i n g . l e t t e r s :tempv = tempv + v

t i c k s _ l i s t . append ( [ tempv , co ] )

s e l f . p l o t _pa rams [ ’ x t i c s ’ ] = t i c k s _ l i s t [ : ]# p r i n t t i c k s _ l i s ti f l e n ( h e a d e r s _ l i s t ) < 1or l e n ( c o n t e n t _ l i s t ) < 1 :

s e l f . e r r o r ( 1 , ’ cou ld no t f i n d d a t a a s s o c i a t e d wi th t h a t p a r a m e t e r’ )

re turn ( 1 , 0 , 0 , ’ ’ )o u t _ f n = fn . r e p l a c e ( ’ . ’ , ’ _ ’ ) + ’ _means . tmp ’t r y :

o u t p u t = open ( ou t_ fn , ’w ’ )excep t:

s e l f . e r r o r ( ’ cou ld no t open "%s " f o r w r i t t i n g . Check p e r m i s s i o n s . ’% fn , 1 )

o u t p u t . w r i t e ( ’ # A u t o m a t i c a l l y g e n e r a t e d f i l e by i p l o t . py \ n ’ )o u t p u t . w r i t e ( ’#%8s%15s%15s%15s%15s \ n ’ % ( ’ x ’ , ’ y ’ , ’ e r r o r

( s t d d e v ) ’ , ’minimum ’ , ’maximum ’ ) )# p r i n t c o n t e n t _ l i s tf o r i i n range ( l e n ( c o n t e n t _ l i s t [ 6 ] ) ) :

126

i f c o n t e n t _ l i s t [ 7 ] [ i ] <> ’ ’ :t r y :

e r r o r _ b a r = s t r i n g . a t o f ( c o n t e n t _ l i s t [ 7 ] [ i ] )excep t:

e r r o r _ b a r = 0 .0e l s e:

e r r o r _ b a r = 0 .0o u t p u t . w r i t e ( ’%8d%15s %15.4 f%15s%15s \ n ’ %\

( i +1 , c o n t e n t _ l i s t [ 6 ] [ i ] , e r r o r _ b a r , c o n t e n t _ l i s t[ 0 ] [ i ] , c o n t e n t _ l i s t [ 3 ] [ i ] ) )

o u t p u t . c l o s e ( )re turn 1 # no e r r o r

def f i n d _ t i c s ( s e l f , num ) :i f num > 1 . 0 :

r e g i o n = math . f l o o r ( num ) + 1 .0e l s e:

new_num = numi = 0whi le new_num < 1 .0 :

new_num = new_num* 10i = i + 1

r e g i o n = math . f l o o r ( new_num ) + 1 .0f o r newi i n range ( i ) :

r e g i o n = r e g i o n / 10t i c = r e g i o n / 4 .0re turn ( reg ion , t i c )

def c h e c k _ o n l y _ o n e _ f i l e ( s e l f , type , f l u , bp ) :f i l e _ l i s t = g lob . g lob ( f l u )re turn ( l e n ( f i l e _ l i s t ) , f i l e _ l i s t )

def g e t _ r e s t _ p a r a m s ( s e l f , type , t o k e n s ) :i f l e n ( t o k e n s ) < 2 : # check f o r a t l e a s t some paramete rs

s e l f . e r r o r (801 , type , 2 )# p r i n t e r r o rre turn ( 0 , 0 , 0 ) # r e t u r n e r r o r

b a s e _ p a i r = 0# i n i t base p a i r numberi f t ype . f i n d ( ’ means ’ ) > −1: # means i s a s p e c i a l case ’ cause can

on l y g e t 3 paramsi f l e n ( t o k e n s ) not in [ 2 , 3 ] : # more than 3 params , e r r o r

s e l f . e r r o r (653 , t o k e n s [ 0 ] , 2 )re turn ( 0 , 0 , 0 )

i f l e n ( t o k e n s ) <> 2 : # more than 2 params# then conca t them f o r example " e r o l l " i n t o E_Rol l

127

p a r t i a l _ n a m e = s t r i n g . upper ( t o k e n s [ 1 ] ) + ’ _ ’ + \s t r i n g . upper ( t o k e n s [ 2 ] [ 0 ] ) + t o k e n s [ 2 ] [ 1 : ]

e l s e: # no t ? then j u s t t h e r e g u l a r parameter name w i th t h e f i r s tl e t t e r i n c a p i t a l case

p a r t i a l _ n a m e = s t r i n g . upper ( t o k e n s [ 1 ] [ 0 ] ) + t o k e n s [ 1 ] [ 1 : ]e l s e: # b in d s or l i s

i f l e n ( t o k e n s ) not in [ 3 , 4 ] : # no more than 4 paramete rs areneeded

s e l f . e r r o r (656 , t o k e n s [ 0 ] , 3 )# p r i n t e r r o rre turn ( 0 , 0 , 0 ) # r e t u r n e r r o r

# i f l e n ( t o k e n s ) == 3; 2 good o p t i o n s :# l i s b u c k l e 10 or b i n d s b u c k l e 3# l i s e r o l l 10 or b i n d s g r o l l 4l a s t = l e n ( t o k e n s )− 1 # i n d e x t o t h e l a s t t o k e nt r y :

num = s t r i n g . a t o i ( t o k e n s [ l a s t ] )# t r y t o c o n v e r t i t t o i n t e g e rexcep t:

s e l f . e r r o r (667 , t o k e n s [ l a s t ] )# f a i l s , t h e command i s no t v a l i di f l e n ( t o k e n s ) <> 3 : # l a s t t o k e n i s i n t e g e r , bu t does i t have a

two t o k e n s name?p a r t i a l _ n a m e = s t r i n g . upper ( t o k e n s [ 1 ] ) + ’ _ ’ + \

s t r i n g . upper ( t o k e n s [ 2 ] [ 0 ] ) + t o k e n s [ 2 ] [ 1 : ]e l s e:

p a r t i a l _ n a m e = s t r i n g . upper ( t o k e n s [ 1 ] [ 0 ] ) + t o k e n s [ 1 ] [ 1 : ]b a s e _ p a i r = t o k e n s [ l a s t ]# g e t t o k e n number A lex : i t m igh t need t o

r e t u r n as i n t e g e rre turn ( 1 , p a r t i a l _ n a m e , b a s e _ p a i r )# r e t u r n no e r r o r w i t h p a r t i a l

name and base p a i r number

def a l l _ n e e d e d _ p a r a m s 2 p l o t ( s e l f ) :# j u s t check i f a t l e a s t t h e b a s i cpa ramete rs t o g e n e r a t e

# a p l o t are a l r e a d y i nf o r np i n s e l f . needed_params2p lo t :

i f not s e l f . p l o t _pa rams . has_key ( np ) :p r i n t npre turn 0

re turn 1

def c r e a t e _ p l o t _ f i l e ( s e l f ) :# p r i n t s e l f . p lo t_paramss e l f . d a t _ f i l e _ n a m e = s e l f . temp_f i l e_name . r e p l a c e ( ’ . ’ , ’ _ ’ ) + ’

_gn up lo t . d a t ’t r y :

d a t _ f i l e = open ( s e l f . d a t_ f i l e _ na me , ’w ’ )excep t:

128

s e l f . e r r o r ( 9 1 5 )s e l f . w r i t e _ d a t _ h e a d e r ( d a t _ f i l e )i f s e l f . p l o t _pa rams [ ’ t ype ’ ] . f i n d ( ’ p o l a r ’ ) >−1: # d e f i n e a l l

v a r i a b l e s f o r p o l a r p l o td a t _ f i l e . w r i t e ( ’ s e t p o l a r \ n ’ )d a t _ f i l e . w r i t e ( ’ u n s e t b o r d e r \ n ’ )d a t _ f i l e . w r i t e ( ’ s e t c l i p \ n ’ )d a t _ f i l e . w r i t e ( ’ s e t a n g l e d e g r e e s \ n ’ )d a t _ f i l e . w r i t e ( ’ s e t z e r o a x i s \ n ’ )d a t _ f i l e . w r i t e ( ’ s e t s i z e s q u a r e \ n ’ )i f s e l f . p l o t _pa rams . has_key ( ’ t i c s ’ ) :

f o r t i n [ ’ x t i c s ’ , ’ y t i c s ’ ] :d a t _ f i l e . w r i t e ( ’ s e t %s a x i s no m i r r o r %.3 f , %.3 f \ n ’ % \

( t , s e l f . p l o t _pa rams [ ’ t i c s ’ ] , s e l f . p l o t _pa rams [ ’t i c s ’ ] ) )

d a t _ f i l e . w r i t e ( ’ s e t y range [%.3 f :%.3 f ] \ n ’ % \( s e l f . p l o t _pa rams [ ’ x range ’ ] [ 0 ] , s e l f . p l o t _pa rams [ ’

x range ’ ] [ 1 ] ) )

i f s e l f . p l o t _pa rams . has_key ( ’ x range ’ ) :# w r i t e xrange t o da t f i l ed a t _ f i l e . w r i t e ( ’ s e t x range [%.3 f :%.3 f ] \ n ’ % \

( s e l f . p l o t _pa rams [ ’ x range ’ ] [ 0 ] , s e l f . p l o t _pa rams [ ’x range ’ ] [ 1 ] ) )

i f s e l f . p l o t _pa rams . has_key ( ’ y range ’ ) :# w r i t e yrange t o da t f i l ed a t _ f i l e . w r i t e ( ’ s e t y range [%.3 f :%.3 f ] \ n ’ % \

( s e l f . p l o t _pa rams [ ’ y range ’ ] [ 0 ] , s e l f . p l o t _pa rams [ ’y range ’ ] [ 1 ] ) )

f o r param i n [ ’ g r i d ’ , ’ key ’ ] :i f s e l f . p l o t _pa rams . has_key ( param ) :# w r i t e param t o da t f i l e

i f s e l f . p l o t _pa rams [ param ] . f i n d ( ’ o f f ’ ) >−1:d a t _ f i l e . w r i t e ( ’ u n s e t %s \ n ’% param )

e l s e:d a t _ f i l e . w r i t e ( ’ s e t %s \ n ’ % s e l f . p l o t _pa rams [ param ] )

i f s e l f . p l o t _pa rams [ ’ t ype ’ ] . f i n d ( ’ means ’ ) >−1 or \s e l f . p l o t _pa rams [ ’ t ype ’ ] . f i n d ( ’ l i s ’ ) >−1 :

i f l e n ( s e l f . p l o t _pa rams [ ’ keys ’ ] ) == 1 :n e w _ t i t l e = s e l f . p l o t _pa rams [ ’ p lo t_name ’ ] [ : s e l f . p l o t_pa rams [ ’

p lo t_name ’ ] . f i n d ( ’ _ ’ ) ]n e w _ t i t l e = s e l f . p l o t _pa rams [ ’ keys ’ ] [ 0 ] [ 3 ] + ’ f o r ’ \

+ n e w _ t i t l e + ’ ’ \+ s e l f . p l o t _pa rams [ ’ keys ’ ] [ 0 ] [ 0 ] + ’ ’ \+ s e l f . p l o t _pa rams [ ’ keys ’ ] [ 0 ] [ 1 ]

s e l f . p l o t _pa rams [ ’ t i t l e ’ ] = n e w _ t i t l ei f s e l f . p l o t _pa rams . has_key ( ’ x t i c s ’ ) :

d a t _ f i l e . w r i t e ( ’ s e t x t i c s ( ’ )

129

i = 0f o r x t i n s e l f . p l o t _pa rams [ ’ x t i c s ’ ] :

i f i > 0 :d a t _ f i l e . w r i t e ( ’ , ’ )

d a t _ f i l e . w r i t e ( ’"%s " %d ’ % ( x t [ 0 ] , x t [ 1 ] ) )i = i + 1

d a t _ f i l e . w r i t e ( ’ ) \ n ’ )i f not s e l f . p l o t _pa rams . has_key ( ’ x range ’ ) :

num = f l o a t ( l e n ( s e l f . p l o t _pa rams [ ’ x t i c s ’ ] ) + 1)d a t _ f i l e . w r i t e ( ’ s e t x range [ 0 . 0 0 : % . 3 f ] \ n ’ % num )

i f s e l f . p l o t _pa rams . has_key ( ’ t i t l e ’ ) :d a t _ f i l e . w r i t e ( ’ s e t t i t l e "%s " \ n ’ % s e l f . p l o t _pa rams [ ’ t i t l e ’ ] )

f o r l a b e l i n [ ’ x l a b e l ’ , ’ y l a b e l ’ ] :i f s e l f . p l o t _pa rams . has_key ( l a b e l ) :# w r i t e x l a b e l t o da t f i l e

d a t _ f i l e . w r i t e ( ’ s e t %s "%s " ’ % ( l a b e l , s e l f . p l o t _pa rams [ l a b e l ] ))

i f s e l f . p l o t _pa rams . has_key ( ’ f o n t ’ ) :i f s e l f . p l o t _pa rams [ ’ f o n t ’ ] <> ’ ’ :

d a t _ f i l e . w r i t e ( ’ f o n t "%s " ’ % s e l f . p l o t _pa rams [ ’ f o n t ’ ] )d a t _ f i l e . w r i t e ( ’ \ n ’ )

i f s e l f . p l o t _pa rams . has_key ( ’ t e r m i n a l ’ ) :# w r i t e param t o da t f i l ei f not s e l f . p l o t _pa rams [ ’ t e r m i n a l ’ ] . f i n d ( ’ s c r e e n ’ ) >−1:

d a t _ f i l e . w r i t e ( ’ s e t t e r m i n a l %s \ n ’% s e l f . p l o t _pa rams [ ’ t e r m i n a l ’] )

i f s e l f . p l o t _pa rams . has_key ( ’ o u t p u t ’ ) :d a t _ f i l e . w r i t e ( ’ s e t o u t p u t "%s " \ n ’% s e l f . p l o t _pa rams [ ’ o u t p u t ’

] )e l i f not s e l f . p l o t _pa rams [ ’ t e r m i n a l ’ ] . f i n d ( ’ windows ’ ) >−1:

ou t_ te r_name = s e l f . temp_f i l e_name . r e p l a c e ( ’ . ’ , ’ _ ’ ) +OUTPUT_DEFAULT

d a t _ f i l e . w r i t e ( ’ s e t o u t p u t "%s " \ n ’ % ou t_ te r_name )e l s e:

i f s e l f . inwindows :d a t _ f i l e . w r i t e ( ’ t e r m i n a l windows \ n ’ )

e l s e:d a t _ f i l e . w r i t e ( ’ t e r m i n a l X11 \ n ’% param )

i f s e l f . p l o t _pa rams [ ’ t ype ’ ] . f i n d ( ’ rmsd ’ ) >−1:d a t _ f i l e . w r i t e ( ’ p l o t "%s " w i th l i n e s \ n ’ % s e l f . temp_f i l e_name )

i f s e l f . p l o t _pa rams [ ’ t ype ’ ] . f i n d ( ’ p o l a r ’ ) >−1:d a t _ f i l e . w r i t e ( ’ p l o t "%s " us i n g 2 :3 w i th p o i n t s p t 5 ps 0 . 2 \ n ’ %

s e l f . temp_f i l e_name )i f s e l f . p l o t _pa rams [ ’ t ype ’ ] . f i n d ( ’ means ’ ) >−1 or \

s e l f . p l o t _pa rams [ ’ t ype ’ ] . f i n d ( ’ l i s ’ ) >−1 :i f l e n ( s e l f . p l o t _pa rams [ ’ keys ’ ] ) == 1and \

130

not s e l f . p l o t _pa rams . has_key ( ’ y l a b e l ’ ) :d a t _ f i l e . w r i t e ( ’ s e t y l a b e l "%s " \ n ’ % s e l f . p l o t _pa rams [ ’ keys ’

] [ 0 ] [ 2 ] )i f l e n ( s e l f . p l o t _pa rams [ ’ keys ’ ] ) == 1and \

not s e l f . p l o t _pa rams . has_key ( ’ x l a b e l ’ ) :d a t _ f i l e . w r i t e ( ’ s e t x l a b e l "%s " \ n ’ % s e l f . p l o t _pa rams [ ’ keys ’

] [ 0 ] [ 4 ] )i f s e l f . p l o t _pa rams [ ’ t ype ’ ] . f i n d ( ’ means ’ ) >−1:

d a t _ f i l e . w r i t e ( ’ p l o t "%s " us i n g 1 : 2 : 3 w i th y e r r o r b a r s l t 1 lw 6 \ n’ % s e l f . temp_f i l e_name )

i f s e l f . p l o t _pa rams [ ’ t ype ’ ] . f i n d ( ’ l i s ’ ) > −1:i f s e l f . p l o t _pa rams . has_key ( ’ l i s _ p a r a m ’ ) :

d a t _ f i l e . w r i t e ( ’ p l o t "%s " us i n g 1 :2 w i th l i n e s t i t l e "%s " \ n ’ %\

( s e l f . temp_f i le_name , s e l f . p l o t _pa rams [ ’l i s _ p a r a m ’ ] ) )

e l s e:d a t _ f i l e . w r i t e ( ’ p l o t "%s " us i n g 1 :2 w i th l i n e s \ n ’ % s e l f .

temp_f i l e_name )i f s e l f . p l o t _pa rams [ ’ t e r m i n a l ’ ] . f i n d ( ’ windows ’ ) >−1 or \

s e l f . p l o t _pa rams [ ’ t e r m i n a l ’ ] . f i n d ( ’X11 ’ ) > −1:d a t _ f i l e . w r i t e ( ’ pause−1\n ’ )

d a t _ f i l e . c l o s e ( )

def c r e a t e _ p l o t ( s e l f ) :i f s e l f . inwindows :

gnup lo t_exe = ’ wgnuplo t ’e l s e:

gnup lo t_exe = ’ gnu p lo t4 ’t r y :

s t a t u s = commands . g e t s t a t u s o u t p u t ( ’%s %s ’ % ( gnup lo t_exe , s e l f .d a t _ f i l e _ n a m e ) )

excep t:s e l f . e r r o r (1001 , s e l f . d a t _ f i l e _ n a m e )re turn

i f s t a t u s [ 0 ] :s e l f . e r r o r (1002 , s e l f . d a t _ f i l e _ n a m e )

# Check f o r windows , ghos tv iew , e t c .

def p r i n t _ a l l _ p a r a m s ( s e l f ) :f o r key i n s e l f . p l o t _pa rams . keys ( ) :# f o r a l l d e f i n e d keys

i f key . f i n d ( ’ range ’ ) > −1: # check i f i t ’ s range# d i f f e r e n t fo rma t f o r rangesp r i n t ’%s : [%.3 f :%.3 f ] ’ % \

( key , s e l f . p l o t _pa rams [ key ] [ 0 ] , s e l f . p l o t _pa rams [ key ] [ 1 ] )

131

e l s e: # a l l t h e o t h e r s ( e x c e p t ranges )p r i n t ’%s : "%s " ’ % ( key , s e l f . p l o t _pa rams [ key ] )

def p r i n t _ g e t ( s e l f , l i n e ) :s t r t o k = l i n e . s p l i t ( )f o r s t i n s t r t o k [ 1 : ] :

i f not s e l f . p l o t _pa rams . has_key ( s t ) :p r i n t ’%s : Not d e f i n e d ’ % s t

e l s e:i f s t . f i n d ( ’ range ’ ) > −1: # check i f i t ’ s range

# d i f f e r e n t fo rma t f o r rangesp r i n t ’%s : [%.3 f :%.3 f ] ’ % \

( s t , s e l f . p l o t _pa rams [ s t ] [ 0 ] , s e l f . p l o t _pa rams [ s t ] [ 1 ] )e l s e: # a l l t h e o t h e r s ( e x c e p t ranges )

p r i n t ’%s : "%s " ’ % ( s t , s e l f . p l o t _pa rams [ s t ] )

def val id_command_params ( s e l f ) :i f l e n ( sys . a rgv ) == 1 : # check i f t h e r e are no paramete rs

s e l f . i n t e r a c t i v e = 1 # no use r params ; then i n t e r a c t i v ere turn

i f l e n ( sys . a rgv ) <> 3 :s e l f . e r r o r ( 1 0 ) # never comes back

i f not sys . a rgv [ 1 ] . f i n d ( ’−f ’ ) > −1: # "− f " e x i s t ?s e l f . e r r o r ( 1 1 ) # never comes back

i f not s e l f . f i l e _ e x i t s ( sys . a rgv [ 2 ] ) :# f i l e e x i s t ?s e l f . e r r o r (101 , sys . a rgv [ 2 ] )# never comes back

s e l f . i n p u t _ f n = sys . a rgv [ 2 ]

def f i l e _ e x i t s ( s e l f , fn ) :t r y : # t h e r e are many ways t o check i f f i l e e x i s t

check = open ( fn , ’ r ’ ) # bu t t r y i n g t o open i t i t ’ s e a s i e s texcep t:

re turn 0 # cou ld no t open , t hen no f i l echeck . c l o s e ( )re turn 1 # cou ld open , t hen f i l e e x i s t

def w r i t e _ d a t _ h e a d e r ( s e l f , d a t _ f i l e ) :d a t _ f i l e . w r i t e ( ’ # A u t o m a t i c a l l y g e n e r a t e d f i l e by i p l o t . py \ n ’ )d a t _ f i l e . w r i t e ( ’ # g n u p l o t i n p u t f i l e f o r %s \ n ’ % s e l f . p l o t _pa rams [ ’

t ype ’ ] )

def e r r o r ( s e l f , type , e x t r a = ’ ’ , pn =0) :i f t ype == 1 :

p r i n t e x t r ai f t ype < 20 : # These u n r e c o v e r a b l e e r r o r s

132

p r i n t ’ E r r o r (%d ) : %s ’ % ( type , e r r o r _ l i s t [ t ype ] )sys . e x i t ( 1 )

e l i f 20 < type < 22 : # These are unknown e r r o r sp r i n t ’ E r r o r (%d ) : Caused unknown . Con tac t programmer ’sys . e x i t ( t ype ) # q u i t t h e program

e l i f 22 < type < 40 : # These r e c o v e r a b l e e r r o r sp r i n t ’ E r r o r (%d ) : %s ’ % ( type , e r r o r _ l i s t [ t ype ] )

e l i f 100 < type < 200: # These are u n r e c o v e r a b l e e r r o r s w i t h nof i l e n a m e s

p r i n t ’ E r r o r (%d ) : f i l e "%s " does no t e x i t ’ % ( type , e x t r a )sys . e x i t ( t ype ) # q u i t t h e program

e l i f 200 < type < 300: # These are r e c o v e r a b l e e r r o r sp r i n t ’ E r r o r (%d ) : %s ’ % ( type , e r r o r _ l i s t [ t ype ] )

e l i f 300 < type < 400:p r i n t ’ Warning (%d ) : %s ’ % ( type , w a r n i n g _ l i s t [ t ype ] )

e l i f 500 < type < 650:p r i n t ’ E r r o r (%d ) : command "%s " needs more pa rame te rs , check he lp ’

% ( type , e x t r a )e l i f 650 < type < 655:

p r i n t ’ E r r o r (%d ) : command "%s " needs e x a c t l y %d p a r a m e t e r s . Checkhe lp . ’ % ( type , e x t r a , pn )

e l i f 655 < type < 660:p r i n t ’ E r r o r (%d ) : command "%s " needs a t l e a s t %d p a r a m e t e r s .

Check he lp . ’ % ( type , e x t r a , pn )e l i f 660 < type < 665:

p r i n t ’ E r r o r (%d ) : "%s " i s no t a f l o a t ’ % ( type , e x t r a )e l i f 665 < type < 670:

p r i n t ’ E r r o r (%d ) : "%s " i s no t a v a l i d base p a i r ( i t \ ’ s e x p e c t i n gan i n t e g e r ) ’ % ( type , e x t r a )

e l i f 690 < type < 700:p r i n t ’ E r r o r (%d ) : "%s " i s no t a v a l i d command ’ % ( type , e x t r a )

e l i f 700 < type < 750:p r i n t ’ E r r o r (%d ) : "%s " i s no t a v a l i d o p t i o n ’ % ( type , e x t r a )

e l i f 800 < type < 820:p r i n t ’ E r r o r (%d ) : "%s " needs a t l e a s t %d parame tes . ’ % ( type ,

e x t r a , pn )e l i f 900 < type < 920:

p r i n t ’ E r r o r (%d ) : Could no t c r e a t e "%s " tempora ry f i l e . Checkp e r m i s s i o n s . ’ % ( type , e x t r a )

e l i f 920 < type < 930:p r i n t ’ E r r o r (%d ) : Could no t open "%s " f i l e . Check p e r m i s s i o n s . ’ %

( type , e x t r a )e l i f 930 < type < 940:

p r i n t ’ E r r o r (%d ) : F i l e "%s " has a fo rma t e r r o r i n l i n e %d . ’ % (type , e x t r a , pn )

133

e l i f 1000 < type < 1020:p r i n t ’ E r r o r (%d ) : Cannot run g n u p l o t s u c c e s f u l l y . Check g n u p l o t

e x i s t a n c e and "%s " ’ % ( type , e x t r a )e l s e: # no idea what t h e y are

p r i n t ’Unknown e r r o r t ype %d ’ % type

def message ( s e l f , type , e x t r a = ’ ’ ) :# j u s t messages t o use ri f 10 < type < 20 :

p r i n t ’ f i l e t o be p l o t t e d "%s " ’ % e x t r ae l s e:

p r i n t ’%s ’ % m e s s a g e _ l i s t [ t ype ]

def p r i n t _ h e l p ( s e l f , l i n e ) :i f s e l f . command2apply == 60 :

p r i n t he lp_messages [ ’ g e n e r a l ’ ]re turn

s t r t o k = l i n e . s p l i t ( )f o r s t i n s t r t o k [ 1 : ] :

i f he lp_messages . has_key ( s t ) :p r i n t he lp_messages [ s t ]

def wr i t e_heade r_ tmp ( s e l f , o f ) :o f . w r i t e ( ’ # A u t o m a t i c a l l y g e n e r a t e d f i l e by i p l o t . py \ n ’ )o f . w r i t e ( ’ # tempora ry f i l e f o r %s \ n ’ % s e l f . p l o t _pa rams [ ’ p lo t_name ’

] )i f s e l f . p l o t _pa rams . has_key ( ’ t r a n g e ’ ) :

o f . w r i t e ( ’ # f o r range [%.3 f :%.3 f ] \ n ’ %\( s e l f . p l o t _pa rams [ ’ t r a n g e ’ ] [ 0 ] , s e l f . p l o t _pa rams [ ’ t r a n g e

’ ] [ 0 ] ) )

def u s e f u l _ l i n e ( l i n e ) : # check i f l i n e has i n f os t r t o k = l i n e . s p l i t ( )i f not l e n ( s t r t o k ) : # no leng th , t hen i s empty

re turn 0 # no t u s e f u lt r y : # n o t h i n g i n t h e f i r s t t o k e n ?

s t r t o k [ 0 ] [ 0 ]excep t:

re turn 0 # then u s e l e s si f s t r t o k [ 0 ] [ 0 ] == ’ # ’ : # comment?

re turn 0 # u s e l e s sre turn 1 # none o f t h e o t h e r c o n d i t i o n s , u s e f u l l

# s e p a r a t e l i s t f o r e r ro r s , warnings , and messages . E a s i e r t o haves e p a r a t e l i s t f o r

134

# t r a n s l a t i o n s and f o r debugg inge r r o r _ l i s t = {10 : ’ no t enough p a r a m e t e r s \ nUsage : i p l o t . py [− f

i n p u t _ s c r i p t ] ’ , \11 : ’ bad p a r a m e t e r \ nUsage : i p l o t . py [− f i n p u t _ s c r i p t ] ’ , \21 : ’Uknown e r r o r . Con tac t programmer ’ , \21 : ’ command was no t r e c o g n i z e d . Try he lp . ’ , \24 : ’ I t canno t p l o t because p l o t needs more p a r a m e t e r s t o

be s e t up ’ , \25 : ’ command " p lo t_name " needs a f i l e name ’ , \26 : ’ Could no t c a l c u l a t e padd ing . Try " he lp padd ing " ’ , \27 : ’ " p lo t_name " p a r a m e t e r does no t e x i s t . ’ , \28 : ’ " t ype " p a r a m e t e r does no t e x i s t . ’ , \29 : ’ " no column i n f o r m a t i o n a v a i l a b l e ’ , \210 : ’ no f i l e w i th t h a t d e s c r i p t i o n ’ \}

w a r n i n g _ l i s t = {301 : ’ Changing p lo t_name might r e s u l t i n somem a l f u n c t i o n s ’ , \

331 : ’More than one f i l e was found ’ , \332 : ’ Could no t f i n d p a r a m e t e r key i n l i s t ’ }

m e s s a g e _ l i s t = {1 : ’ t h a n k s f o r u s i n g i p l o t ’ }

he lp_messages = { ’ g e n e r a l ’ : ’ \i p l o t i s an i n t e r a c t i v e programm t h a t can work i n b a t c h mode . \ n \The s e t o f commands t h a t can be used a r e : \ n \" s e t " , " g e t " , " q u i t " , " e x i t " , " i n c l u d e " , " p l o t " , " he lp " . \ n \For he lp on each command type he lp "command " . ’ , \

’ s e t ’ : ’ \s e t i s used t o i n i t i a l i z e v a r i a b l e s . The v a r i a b l e s t h a t can be

i n i t i a l i z e d a r e : \" x l a b e l " , " y l a b e l " , " t i t l e " , \ n \" x range " , " y range " , " p range " , " t r a n g e " , \ n \" key " , \ n \" g r i d " , \ n \" f ( x ) " , \ n \" s u b t i t l e " , " f x _ t i t l e " , \ n \" padd ing " , " t i c k s " , \ n \" rmsd " , " p o l a r " , " b i n d s " , " s t a r w a r s " , " means " , \ n \" f o n t " , \ n \" l a t e x " , \ n \" t e r m i n a l " \ n ’ , \

’ g e t ’ : ’ \g e t i s used t o d i s p l a y t h e v a l u e o f t h e v a r i a b l e s \usage : g e t [ a l l ] [ i n t e r e s t e d v a r i a b l e ( s ) ] ’ , \

’ q u i t ’ : ’ S e l f e x p l a n a t o r y , i n s \ ’ t i t ? / ’ , \’ e x i t ’ : ’ S e l f e x p l a n a t o r y , i n s \ ’ t i t ? / ’ , \

135

’ i n c l u d e ’ : ’ \i n c l u d e a l l o w s o t h e r s i m u l a t i o n s t o be i n c l u d e d . There a r e two t h i n g s

t h a t \ n \i n c l u d e would t a k e i n t o accoun t . 1 ) t h e y range f o r t h e p l o t changed t o

t h e \ n \maximum v a l u e of a l l i n c l u d e s . Th is f e a t u r e a l l o w s a s e t o f p l o t t o be

i n \ n \t h e same s c a l e ( i n y ) . \ n \Also , when u s i n g used l a t e x , t h e progrm would look i n t h o s e d i r e c t o r i e s

and \ n \i t w i l l i n c l u d e t h e . eps and / o r . ps f i l e s w h i t h i n t h e l a t e x and l a t e r on

t h e \ n \pdf f i l e . \ n \s i m u l a t i o n s can be i n c l u d e d i n two ways . I f on ly one name i s e n t e r e d ,

f i l e s \ n \w i l l be looked i n p r e v i o u s d i r e c t o r i e s . For example , i f t h e c u r r e n t \ n \d i r e c t o r i e s : \ n \/ u s e r / a l e x / s i m u l a t i o n s / NoSSB / images \ n \and a command : \ n \s e t i n c l u d e T6T7SSB \ n \i s e n t e r e d , then , T6T7SSB would be look under t h e c u r r e n t d i r e c t o r y , i f

no t \ n \found then under / u s e r / a l e x / s i m u l a t i o n s , i f no t found then under / u s e r /

a lex , \ n \so on and so f o r t h . \ n \I f t h a t d i r e c t o r y i s no t found a l l t h e way t o t h e roo t , t h e i t wold g i ve

an \ n \e r r o r back . I f t h e r e a s i m u l a t i o n t h a t i s no t i n t h e c u r r e n t path , t hen

t h e \ n \whole pa th shou ld be i n c l u d e , someth ing l i k e : \ n \i n c l u d e / u s e r / mat t / s ims / 1 AGB_sim \ n \Then t h e program w i l l l ook f o r t h e f i l e s under t h a t d i r e c t o r y s t r u c t u r e .

’ \}

def ge t_k eys ( param , c a t = ’ ’ ) :keys = [ \

( ’B ’ , ’ Xdisp ’ , ’ Angstroms ’ , ’ G loba l Base−Axis P a r a m e t e r s ’ , ’ r e s i d u e ’) , \

( ’B ’ , ’ Ydisp ’ , ’ Angstroms ’ , ’ G loba l Base−Axis P a r a m e t e r s ’ , ’ r e s i d u e ’) , \

( ’B ’ , ’ I n c l i n ’ , ’ d e g r e e s ’ , ’ G loba l Base−Axis P a r a m e t e r s ’ , ’ r e s i d u e ’ ), \

( ’B ’ , ’ T ip ’ , ’ d e g r e e s ’ , ’ G loba l Base−Axis P a r a m e t e r s ’ , ’ r e s i d u e ’ ) , \( ’B ’ , ’Bc ’ , ’Unknown ’ , ’ G loba l Base−Axis P a r a m e t e r s ’ , ’ r e s i d u e ’ )

, \

136

( ’B ’ , ’ Tc ’ , ’Unknown ’ , ’ G loba l Base−Axis P a r a m e t e r s ’ , ’ r e s i d u e ’ ), \

( ’C ’ , ’ Xdisp ’ , ’ Angstroms ’ , ’ G loba l Base p a i r−Axis P a r a m e t e r s ’ , ’r e s i d u e ’ ) , \

( ’C ’ , ’ Ydisp ’ , ’ Angstroms ’ , ’ G loba l Base p a i r−Axis P a r a m e t e r s ’ , ’ basep a i r ’ ) , \

( ’C ’ , ’ I n c l i n ’ , ’ d e g r e e s ’ , ’ G loba l Base p a i r−Axis P a r a m e t e r s ’ , ’ basep a i r ’ ) , \

( ’C ’ , ’ T ip ’ , ’ d e g r e e s ’ , ’ G loba l Base p a i r−Axis P a r a m e t e r s ’ , ’ basep a i r ’ ) , \

( ’C ’ , ’Bc ’ , ’Unknown ’ , ’ G loba l Base p a i r−Axis P a r a m e t e r s ’ , ’ basep a i r ’ ) , \

( ’C ’ , ’ Tc ’ , ’Unknown ’ , ’ G loba l Base p a i r−Axis P a r a m e t e r s ’ , ’ basep a i r ’ ) , \

( ’D ’ , ’ Shear ’ , ’ Angstroms ’ , ’ G loba l Base−Base P a r a m e t e r s ’ , ’ base p a i r’ ) , \

( ’D ’ , ’ S t r e t c h ’ , ’ Angstroms ’ , ’ G loba l Base−Base P a r a m e t e r s ’ , ’ basep a i r ’ ) , \

( ’D ’ , ’ S t a g g e r ’ , ’ Angstroms ’ , ’ G loba l Base−Base P a r a m e t e r s ’ , ’ basep a i r ’ ) , \

( ’D ’ , ’ Buck le ’ , ’ d e g r e e s ’ , ’ G loba l Base−Base P a r a m e t e r s ’ , ’ base p a i r ’) , \

( ’D ’ , ’ P r o p e l ’ , ’ d e g r e e s ’ , ’ G loba l Base−Base P a r a m e t e r s ’ , ’ base p a i r ’) , \

( ’D ’ , ’ Opening ’ , ’ d e g r e e s ’ , ’ G loba l Base−Base P a r a m e t e r s ’ , ’ base p a i r’ ) , \

( ’D ’ , ’Bc ’ , ’Unknown ’ , ’ G loba l Base−Base P a r a m e t e r s ’ , ’ base p a i r ’) , \

( ’D ’ , ’ Tc ’ , ’Unknown ’ , ’ G loba l Base−Base P a r a m e t e r s ’ , ’ base p a i r ’) , \

( ’E ’ , ’ S h i f t ’ , ’ Angstroms ’ , ’ G loba l I n t e r−Base P a r a m e t e r s ’ , ’ r e s i d u ep a i r ’ ) , \

( ’E ’ , ’ S l i d e ’ , ’ Angstroms ’ , ’ G loba l I n t e r−Base P a r a m e t e r s ’ , ’ r e s i d u ep a i r ’ ) , \

( ’E ’ , ’ R ise ’ , ’ Angstroms ’ , ’ G loba l I n t e r−Base P a r a m e t e r s ’ , ’ r e s i d u ep a i r ’ ) , \

( ’E ’ , ’ T i l t ’ , ’ d e g r e e s ’ , ’ G loba l I n t e r−Base P a r a m e t e r s ’ , ’ r e s i d u ep a i r ’ ) , \

( ’E ’ , ’ Ro l l ’ , ’ d e g r e e s ’ , ’ G loba l I n t e r−Base P a r a m e t e r s ’ , ’ r e s i d u ep a i r ’ ) , \

( ’E ’ , ’ Twis t ’ , ’ d e g r e e s ’ , ’ G loba l I n t e r−Base P a r a m e t e r s ’ , ’ r e s i d u ep a i r ’ ) , \

( ’E ’ , ’Dc ’ , ’Unknown ’ , ’ G loba l I n t e r −Base P a r a m e t e r s ’ , ’ r e s i d u ep a i r ’ ) , \

( ’F ’ , ’ S h i f t ’ , ’ Angstroms ’ , ’ G loba l I n t e r−Base p a i r P a r a m e t e r s ’ , ’

137

base−p a i r coup le ’ ) , \( ’F ’ , ’ S l i d e ’ , ’ Angstroms ’ , ’ G loba l I n t e r−Base p a i r P a r a m e t e r s ’ , ’

base−p a i r coup le ’ ) , \( ’F ’ , ’ R ise ’ , ’ Angstroms ’ , ’ G loba l I n t e r−Base p a i r P a r a m e t e r s ’ , ’

base−p a i r coup le ’ ) , \( ’F ’ , ’ T i l t ’ , ’ d e g r e e s ’ , ’ G loba l I n t e r−Base p a i r P a r a m e t e r s ’ , ’ base−

p a i r coup le ’ ) , \( ’F ’ , ’ Ro l l ’ , ’ d e g r e e s ’ , ’ G loba l I n t e r−Base p a i r P a r a m e t e r s ’ , ’ base−

p a i r coup le ’ ) , \( ’F ’ , ’ Twis t ’ , ’ d e g r e e s ’ , ’ G loba l I n t e r−Base p a i r P a r a m e t e r s ’ , ’ base

−p a i r coup le ’ ) , \( ’F ’ , ’Dc ’ , ’Unknown ’ , ’ G loba l I n t e r −Base p a i r P a r a m e t e r s ’ , ’

base−p a i r coup le ’ ) , \( ’G ’ , ’ S h i f t ’ , ’ Angstroms ’ , ’ Loca l I n t e r−Base P a r a m e t e r s ’ , ’ r e s i d u e

coup le ’ ) , \( ’G ’ , ’ S l i d e ’ , ’ Angstroms ’ , ’ Loca l I n t e r−Base P a r a m e t e r s ’ , ’ r e s i d u e

coup le ’ ) , \( ’G ’ , ’ R ise ’ , ’ Angstroms ’ , ’ Loca l I n t e r−Base P a r a m e t e r s ’ , ’ r e s i d u e

coup le ’ ) , \( ’G ’ , ’ T i l t ’ , ’ d e g r e e s ’ , ’ Loca l I n t e r−Base P a r a m e t e r s ’ , ’ r e s i d u e

coup le ’ ) , \( ’G ’ , ’ Ro l l ’ , ’ d e g r e e s ’ , ’ Loca l I n t e r−Base P a r a m e t e r s ’ , ’ r e s i d u e

coup le ’ ) , \( ’G ’ , ’ Twis t ’ , ’ d e g r e e s ’ , ’ Loca l I n t e r−Base P a r a m e t e r s ’ , ’ r e s i d u e

coup le ’ ) , \( ’G ’ , ’Dc ’ , ’Unknown ’ , ’ Loca l I n t e r −Base P a r a m e t e r s ’ , ’ r e s i d u e

coup le ’ ) , \( ’H ’ , ’ S h i f t ’ , ’ Angstroms ’ , ’ Loca l I n t e r −Base p a i r P a r a m e t e r s ’ , ’

base−p a i r coup le ’ ) , \( ’H ’ , ’ S l i d e ’ , ’ Angstroms ’ , ’ Loca l I n t e r−Base p a i r P a r a m e t e r s ’ , ’

base−p a i r coup le ’ ) , \( ’H ’ , ’ R ise ’ , ’ Angstroms ’ , ’ Loca l I n t e r−Base p a i r P a r a m e t e r s ’ , ’ base

−p a i r coup le ’ ) , \( ’H ’ , ’ T i l t ’ , ’ d e g r e e s ’ , ’ Loca l I n t e r −Base p a i r P a r a m e t e r s ’ , ’ base−

p a i r coup le ’ ) , \( ’H ’ , ’ Ro l l ’ , ’ d e g r e e s ’ , ’ Loca l I n t e r−Base p a i r P a r a m e t e r s ’ , ’ base−

p a i r coup le ’ ) , \( ’H ’ , ’ Twis t ’ , ’ d e g r e e s ’ , ’ Loca l I n t e r−Base p a i r P a r a m e t e r s ’ , ’ base−

p a i r coup le ’ ) , \( ’H ’ , ’Dc ’ , ’Unknown ’ , ’ Loca l I n t e r −Base p a i r P a r a m e t e r s ’ , ’ base−

p a i r coup le ’ ) , \( ’ I ’ , ’Ax ’ , ’ Angstroms ’ , ’ G loba l Axis C u r v a t u r e ’ , ’ r e s i d u e coup le ’ ) ,

\( ’ I ’ , ’Ay ’ , ’ Angstroms ’ , ’ G loba l Axis C u r v a t u r e ’ , ’ r e s i d u e coup le ’ ) ,

\

138

( ’ I ’ , ’ Ainc ’ , ’ d e g r e e s ’ , ’ G loba l Axis C u r v a t u r e ’ , ’ r e s i d u e coup le ’ ) ,\

( ’ I ’ , ’ A t ip ’ , ’ d e g r e e s ’ , ’ G loba l Axis C u r v a t u r e ’ , ’ r e s i d u e coup le ’ ) ,\

( ’ I ’ , ’ Adis ’ , ’ Angstroms ’ , ’ G loba l Axis C u r v a t u r e ’ , ’ r e s i d u e coup le ’) , \

( ’ I ’ , ’ Angle ’ , ’ d e g r e e s ’ , ’ G loba l Axis C u r v a t u r e ’ , ’ r e s i d u e coup le ’ ), \

( ’ I ’ , ’ Pa th ’ , ’ Angstroms ’ , ’ G loba l Axis C u r v a t u r e ’ , ’ r e s i d u e coup le ’) , \

( ’ I ’ , ’Dc ’ , ’Unknown ’ , ’ G loba l Axis C u r v a t u r e ’ , ’ r e s i d u e coup le ’ ) , \( ’ I ’ , ’ O f f s e t ’ , ’ Angstroms ’ , ’ G loba l Axis C u r v a t u r e ’ , ’ r e s i d u e

coup le ’ ) , \( ’ I ’ , ’L . D i r ’ , ’ d e g r e e s ’ , ’ G loba l Axis C u r v a t u r e ’ , ’ r e s i d u e coup le ’ )

, \( ’ J ’ , ’C1−C2 ’ , ’ d e g r e e s ’ , ’ Backbone P a r a m e t e r s ’ , ’ unknown ’ ) , \( ’ J ’ , ’C2−C3 ’ ’ d e g r e e s ’ , ’ Backbone P a r a m e t e r s ’ , ’ unknown ’ ) , \( ’ J ’ , ’ Phase ’ , ’ deg ree ’ , ’ Backbone P a r a m e t e r s ’ , ’ unknown ’ ) , \( ’ J ’ , ’ Ampli ’ , ’ deg ree ’ , ’ Backbone P a r a m e t e r s ’ , ’ unknown ’ ) , \( ’ J ’ , ’ Pucker ’ , ’N/A ’ , ’ Backbone P a r a m e t e r s ’ , ’ unknown ’ ) , \( ’ J ’ , ’C1 ’ , ’ deg ree ’ , ’ Backbone P a r a m e t e r s ’ , ’ unknown ’ ) , \( ’ J ’ , ’C2 ’ , ’ deg ree ’ , ’ Backbone P a r a m e t e r s ’ , ’ unknown ’ ) , \( ’ J ’ , ’C3 ’ , ’ deg ree ’ , ’ Backbone P a r a m e t e r s ’ , ’ unknown ’ ) , \( ’ J ’ , ’ Chi ’ , ’ d e g r e e s ’ , ’ Backbone P a r a m e t e r s T o r s i o n s ’ , ’ unknown ’ ) ,

\( ’ J ’ , ’Gamma ’ , ’ d e g r e e s ’ , ’ Backbone P a r a m e t e r s T o r s i o n s ’ , ’ unknown ’ )

, \( ’ J ’ , ’ D e l t a ’ , ’ d e g r e e s ’ , ’ Backbone P a r a m e t e r s T o r s i o n s ’ , ’ unknown ’ )

, \( ’ J ’ , ’ E p s i l ’ , ’ d e g r e e s ’ , ’ Backbone P a r a m e t e r s T o r s i o n s ’ , ’ unknown ’ )

, \( ’ J ’ , ’ Ze ta ’ , ’ d e g r e e s ’ , ’ Backbone P a r a m e t e r s T o r s i o n s ’ , ’ unknown ’ ) ,

\( ’ J ’ , ’ Alpha ’ , ’ d e g r e e s ’ , ’ Backbone P a r a m e t e r s T o r s i o n s ’ , ’ unknown ’ )

, \( ’ J ’ , ’ Beta ’ , ’ d e g r e e s ’ , ’ Backbone P a r a m e t e r s T o r s i o n s ’ , ’ unknown ’ )

\]p ros_key = [ ]f o r k i n keys :

i f param . f i n d ( k [ 1 ] ) > −1:p ros_key . append ( k )

i f l e n ( p ros_key ) <= 1 :re turn pros_key

i f c a t == ’ ’ :

139

re turn 0 # e r r o rpkey = [ ]f o r pk i n pros_key :

i f pk [ 0 ] . f i n d ( c a t ) > −1:pkey . append ( pk )

i f l e n ( pkey ) <= 1 :re turn pkey

re turn 0 # e r r o r

i f __name__ == ’ __main__ ’ :i p l o t ( )

A.3.4 hydrogen bond detection in 8oG

# ! / us r / b in / env py thon2 . 2import psycopsyco . f u l l ( )import math , glob , sys , Numeric

def a n g l e (A, B) :# g e t ang le between two v e c t o r s by no rma l i z i ng , t a k i n g acos o f do t

p roduc tA /= Numeric . do t (A,A) ** 0 .5B /= Numeric . do t (B , B) ** 0 .5re turn math . acos ( Numeric . do t (A, B) )

def v e c t o r (A, B) :# r e t u r n a v e c t o r based on two p o i n t sre turn A−B

def d i s t (A, B) :C = A−Bre turn math . s q r t ( Numeric . do t (C , C) )

def g e t c o o r d s ( l i n e ) :re turn Numeric . a r r a y ( [ f l o a t ( l i n e [ 5 ] ) , f l o a t ( l i n e [ 6 ] ) , f l o a t ( l i n e [ 7 ] ) ] )

f i l e L i s t = g lob . g lob ( ’ / va r / tmp2 / 8 oGrerun2 / pdbs / ana* . pdb ’ )f i l e L i s t . s o r t ( )

O8count = 0H7count = 0O 8 t a l l y = 0H 7 t a l l y = 0

f o r e n t r y i n f i l e L i s t :

140

Hf lag = 0Of lag = 0p r i n t e n t r yatoms = open ( e n t r y ) . r e a d l i n e s ( )

r e f 1 = g e t c o o r d s ( atoms [ 5 8 8 ] . s p l i t ( ) )#H7r e f 2 = g e t c o o r d s ( atoms [ 5 9 0 ] . s p l i t ( ) )#O8 / H8n7 = g e t c o o r d s ( atoms [ 5 8 7 ] . s p l i t ( ) ) #N7

f o r w i n range (783 , 45675 , 3 ) :o = g e t c o o r d s ( atoms [w ] . s p l i t ( ) )h1 = g e t c o o r d s ( atoms [w+ 1 ] . s p l i t ( ) )h2 = g e t c o o r d s ( atoms [w+ 2 ] . s p l i t ( ) )

# hand le O8d i s t a n c e 1 = d i s t ( re f2 , h1 )d i s t a n c e 2 = d i s t ( re f2 , h2 )i f ( d i s t a n c e 2 < 2 .5and a n g l e ( re f2−h2 , o−h2 ) > 2 . 0 9 ) or \( d i s t a n c e 1 < 2 .5and a n g l e ( re f2−h1 , o−h1 ) > 2 . 0 9 ) :

p r i n t "O8"O8count += 1Of lag = 1

# hand le H7d i s t a n c e 3 = d i s t ( re f1 , o )i f d i s t a n c e 3 < 2 .5and a n g l e ( n7−re f1 , o−r e f 1 ) > 2 . 0 9 :

p r i n t "H7"H7count += 1Hf lag = 1

O 8 t a l l y += Of lagH 7 t a l l y += Hf lag

p r i n t O8count , H7count , O8 ta l l y , H 7 t a l l y

A.3.5 hydration_stats.py

# ! / us r / b in / env py thon# u n l i k e t r imwa te r−modern , t h i s program i s i n t e n d e d t o COUNT t h e number

o f wa te rs# near t h e marked po in t , and a l s o t o r e t u r n t h e c l o s e s t o f them# impor t psyco# psyco . f u l l ( )import glob , os , sys

def processPDBs ( f l i s t , o u t d i r , l a s t a t o m , c l u s t e r , s o l v e n t , d i s t a n c e ) :coun t = 0

141

d i s t a n c e * = d i s t a n c eo u t f i l e = open ( ’ 1A9G−nowater−subse t−s t a t s . t x t ’ , ’w ’ )f o r f i n f l i s t :

m i n d i s t = 10000.0c l o s e c o u n t = 0b u f f e r = open ( f ) . r e a d l i n e s ( )o u t b u f f e r = b u f f e r [ : s o l v e n t ]x , y , z = 0 . 0 , 0 . 0 , 0 .0f o r atom i n c l u s t e r :

coo rds = b u f f e r [ atom ] . s p l i t ( ) [−3: ]x += f l o a t ( coo rds [ 0 ] )y += f l o a t ( coo rds [ 1 ] )z += f l o a t ( coo rds [ 2 ] )

x /= l e n ( c l u s t e r )y /= l e n ( c l u s t e r )z /= l e n ( c l u s t e r )p r i n t x , y , z

c o u n t e r = s o l v e n t

whi le c o u n t e r < l a s t a t o m :l i n e s = b u f f e r [ c o u n t e r : c o u n t e r + 3 ]o p o i n t = l i n e s [ 0 ] . s p l i t ( ) [−3: ]

d i s t = ( f l o a t ( o p o i n t [ 0 ] ) − x ) ** 2 + ( f l o a t ( o p o i n t [ 1 ] ) − y )

** 2 + ( f l o a t ( o p o i n t [ 2 ] ) − z ) ** 2i f d i s t < d i s t a n c e :

c l o s e c o u n t += 1m i n d i s t = min ( m ind i s t , d i s t )o u t b u f f e r += l i n e sp r i n t " add ing "

c o u n t e r += 3

o u t f i l e . w r i t e ( ’%s %s %s \ n ’ % ( s t r ( coun t ) , s t r ( c l o s e c o u n t ) , s t r (m i n d i s t * * 0 . 5 ) ) )

# open ("% st r immed%s . pdb " % ( o u t d i r , s t r ( coun t ) . z f i l l ( 5 ) ) , ’w ’ ) .w r i t e ( " " . j o i n ( o u t b u f f e r ) )

coun t += 2

i f __name__ == ’ __main__ ’ :o u t d i r = ’ / va r / tmp2 / 1A9G−redo / t r immedwaters−modern / ’i f not os . pa th . e x i s t s ( o u t d i r ) :

os . mkdir ( o u t d i r )

f i l e l i s t = g lob . g lob ( ’ / md0 / 1A9G−compare−modern / 1A9G−nowater /

142

wate rpdbs /* . pdb ’ )processPDBs ( f i l e l i s t , o u t d i r , 45754 , (157 , 158 , 159 , 509 , 512 , 514) ,

703 , 5)

143