RESTRAIN (CCP4: Supported Program)
NAMErestrain - refinement program including restraints, rigid body refinement, non-crystallographic symmetry, atomic and group isotropic, anisotropic and TLS thermal parameters, group and coupled occupancies etc.
SYNOPSISrestrain XYZIN foo_in.brk TLSIN foo_in.tls HKLIN foo_in.mtz XYZOUT foo_out.brk TLSOUT foo_out.tls HKLOUT foo_out.mtz
RESTRAIN version 4.6 A MACROMOLECULAR REFINEMENT PROGRAM MINIMISING A FUNCTION CONTAINING TERMS INVOLVING: STRUCTURE AMPLITUDES PHASES INTERATOMIC DISTANCES GROUP PLANARITY ISOTROPIC THERMAL PARAMETER DIFFERENCES ANISOTROPIC THERMAL PARAMETER DIFFERENCES with respect to OVERALL SCALE FACTOR OVERALL ISOTROPIC THERMAL PARAMETER OVERALL ANISOTROPIC THERMAL PARAMETERS BULK SOLVENT PARAMETERS ATOMIC COORDINATES RIGID BODY ROTATIONS AND TRANSLATIONS NON-CRYSTALLOGRAPHIC SYMMETRY OPERATORS ATOMIC ISOTROPIC THERMAL PARAMETERS ATOMIC ANISOTROPIC THERMAL PARAMETERS GROUP ISOTROPIC THERMAL PARAMETERS GROUP ANISOTROPIC THERMAL PARAMETERS GROUP TLS TENSOR COMPONENTS ATOMIC, GROUP AND COUPLED OCCUPANCIES
Department of Crystallography
Birkbeck College, Malet Street
London WC1E 7HX, UK
Contact: Ian Tickle (firstname.lastname@example.org).
1.1 REFINEMENT FACILITIESRESTRAIN is a computer program for the least-squares refinement of protein and nucleic acid structures using X-ray or neutron single crystal diffraction data. It incorporates facilities for
The design and implementation follow papers by Waser (1963), Rollett (1969), Moss (1981), Moss & Morffew (1982), Haneef et al. (1985) and Driessen et al. (1989).
The function minimised is of the form:
M = SUM [w(f) (|Fo| - G.|Fc|)2] + SUM [w(p) (PHIo - PHIc)2] + SUM [w(d) (d(t) - d(c))2] + SUM [w(b) (b(o) - b(min))2] + SUM [w(U) delta-U2] + SUM [w(Ua) delta-Ua2] + SUM [w(v) V] + SUM [w(c) (d(t) - d(c))2] (1)where
The non-bonded interaction is only operational when b(o) < b(min) and chirality restraints are applied as distance restraints along the edges of chiral tetrahedra. Equation (1) may be written as a function of three terms: M = M(a) + M(b) + M(c). M(a) is the first term and is the one conventionally found in crystallographic least-squares procedures. M(b) is the second term which allows the use of estimates of phases from isomorphous and/or anomalous data. M(c) is the sum of the remaining terms and represents pseudo-potential energy terms.
The function M may be minimised with respect to a selection of the following parameters:
Although RESTRAIN has been written primarily for refinement of macromolecular structures, the use of a user defined dictionary for interatomic and planar restraints and other options allows the user to specify additional interatomic restraints and planes, and means that virtually any structure can be refined by the program. The program at present uses a four-Gaussian expansion of scattering factors (INTERNATIONAL TABLES FOR X-RAY CRYSTALLOGRAPHY, Vol. IV). Coefficients for this expansion suitable for X-ray or neutron diffraction may be read from the dictionary.
The program is completely general and may be used for any number of reflections in any space group. The program can be used for any size of problem. The number of atoms which may be refined is only limited by the available memory of the computer used. Array sizes are increased by a global change of the relevant variables in PARAMETER statements in an INCLUDE file (common.inc), followed by re-compilation of the source file.
At Birkbeck College this program has been used for refinement of protein and nucleic acid structures using X-ray or neutron diffraction data. It has generally been used in conjunction with model building using an interactive graphics system. The program has been set up so that the input/output interfaces easily with the graphics model building program O (Jones 1991) and FFT programs. Coordinate files have the standard PDB format. Reflection input files may be either formatted or unformatted (CCP4 MTZ).
1.2 PROGRAM IMPLEMENTATIONRESTRAIN has been written in standard FORTRAN 77 (ANSI X3.9-1978) with the sole exception of the INCLUDE facility for inserting the common blocks in the individual subroutines. The program has been designed to take advantage of vector or scalar processing computers. To obtain the highest speed, space group specific versions of SG0001 have been written for some of the most common space groups. However, not all options of RESTRAIN are possible when using them (NCS, anisotropic and TLS). There is no difference in the steering parameters when using them. Currently there are subroutines available for:
Note that some of these subroutines (the monoclinic and orthorhombic ones) may be used for a higher symmetry space group provided it is a super-group with the same origin. For example P213 is a super-group of P212121 with no origin shift.
User friendliness of input/output has been an important criterion in the design of RESTRAIN. No preparation programs need be used. The authors have endeavoured to print sensible error messages on job failure, and to intercept lethal input. Any suggestions for improvement will be welcome.
2. THE USE OF RESTRAIN
2.1 GETTING STARTEDIn order to run RESTRAIN you will need either 3, 4 or 5 input files. These are listed below together with the names by which they are referenced in the RESTRAIN output.
File File name Explanation - Script with control none section 3.1 and steering data - Dictionary DICTION section 3.2 - Coordinates XYZIN section 3.3 - Optional group TLSIN section 3.4 thermal parameters - Optional reflections HKLIN or REFIN section 3.5
Alternatively you may have the control and steering data in an input file separate from the job script.
Care must be taken in preparing the coordinates for refinement. After each polymer chain a TER record must be inserted. This includes breaks in the chain due to one or more missing residues. The residues need not be numbered sequentially and the residue labels may contain non-numeric characters at any position. However, to maintain compatibility with the PDB standard format it is advisable to restrict the use of non-numeric characters to alphabetic characters, and then only in the last character position (residue insertion code).
The C-terminal residue of a protein chain may have an extra O (carboxyl) or N (amide) atom, but it must be put in a separate residue (CAR or CAM) with the atom label OXT or NXT. All atoms not contained in chains must be supplied as HETATMs. The atom labels in a residue must correspond exactly (i.e. in case and justification) with the supplied dictionary, and there must be no missing or extra atoms. Missing atoms can be dealt with by temporarily renaming the residue (e.g. for missing protein side-chain rename to GLY or ALA). Extra disordered atoms must be supplied after the TER record as HETATMs; extra distance restraints will have to be supplied for these atoms.
The PDB file may contain either Uiso's or Biso's, but the appropriate steering parameter must be specified (ISO=true and BINPUT=false or true respectively). The file may also contain anisotropic U's in the standard PDB format.
After previous refinement and extensive rebuilding you may want to reset large U or B values for atoms incorrectly positioned before rebuilding (e.g. U > 0.8 or B > 64A2) to more reasonable starting values (e.g. U=0.2 or B=16A2).
The atomic coordinates in the polymer chains need not be ordered in each residue in the same way as the atoms in the residue are ordered in the dictionary. If they are not they will be re-ordered and the output file of atomic coordinates will then be produced in dictionary order for subsequent cycles. Alternatively, set TESTIN=true and ORDER=true to use the program to order and analyse the file without carrying out any refinement.
After each run (1 or more cycles) 1 or more output files will be created:
File Filename Explanation - coordinates XYZOUT section 4.2 - TLS parameters TLSOUT section 4.3 - reflections HKLOUT or REFOUT section 4.4 - normal matrix MATOUT section 4.5
Furthermore the listing of the run (section 4.1) will have to be examined closely, since the steering data may need to be updated for the next run, especially G, U, SB1 and SB2 (section 3.1.3). You should update the cycle number CYCNO, so that you keep track of how many cycles you have done, and later relate this to the R-factor.
If you are refining NCS parameters you will need to supply updated parameters. You may also want to change the weighting coefficients for the reflections WF(i), section 3.1.3. All the required parameters are always printed at the end of every log file whenever new values have been computed; these can be pasted into the steering data ready for the next run. Refined coordinates and group thermal parameters can be read back in by the program without modification. In order to obtain output reflections define HKLOUT or REFOUT (section 3.1.1).
The input that is necessary and the sections that are relevant to you depend on the application for which you intend to use RESTRAIN. There are basically two categories:
2.2 GEOMETRIC REGULARISATIONIf you wish to regularise the geometry of your model structure, omit the definition of HKLIN or REFIN, or set FREF=false, which disables structure factor refinement. Regularisation may be useful after heavy rebuilding of a coordinate set, and may point to gross errors (look at the weighted differences between calculated and observed distances), which need manual correction on the graphics before continuing. Usually a few cycles suffice before reflections are used. You may have to raise MFACR to 0.2 or 0.3 to assist the solution of the normal equations, bearing in mind that the larger the value of MFACR the smaller the shifts in your output coordinates will be.
2.3 REFINING WITH THE USE OF REFLECTION DATAFor refinement against X-ray or neutron diffraction data, use the default FREF=true . Again the input you need will depend on the application for which you intend to use RESTRAIN. The most common categories are outlined below. If your category does not appear, or you are not sure what you want to do, seek assistance.
2.3.1 The options available.
The following options are available, either separately or in combination to refine a set of coordinates from low to high resolution. However note that some combinations do not make sense, and will cause abnormal program termination, for example if both RIGID coordinate groups and UISO/UANISO/TLS thermal parameter groups are defined, the thermal parameter groups must be completely contained within the coordinate groups, otherwise application of the refined RIGID body rotations and translations to the thermal parameters would destroy the correlations within the thermal parameter group.
2.3.2 Initial refinement from MIR and MIRAS models.
You will normally start by setting ISO=false to get an overall thermal parameter U and scale factor G. At this stage you may still want to include the MIR or MIRAS phases in the refinement. Set PHAS=true and make sure that the input reflection file contains these phases. However, if your low-resolution model is reasonable, you may not want to use these data.
Unless phasing extends to a resolution of better than 3Å you may find that progress should begin by breaking the structure up into rigid body segments and refining these as strictly rigid bodies. Set RIGID=true and specify RIGID groups; structure outside the rigid groups will not be refined. Such segments may be as small as one residue or one side chain.
If the bonds between such segments become seriously disrupted during rigid body refinement, then those parts may have to be rebuilt on a graphics system; otherwise the structure may be annealed by restrained refinement. Remember that refinement cannot usually correct errors which are larger than one third of the high resolution cut-off.
Regions of the structure which are more highly disordered may have to be omitted initially if maps show no clear main chain density. In this case the structure will have to be broken up into extra chains with TER records at the end of each chain. If the main chain density is clear and the side chain is unclear or the sequence at this point is uncertain then the residue should be treated as ALA or GLY in the case of proteins. Remember that the number of atoms in a residue in the coordinates must correspond with the number of atoms in the residue of that name in the dictionary.
Initially the data-parameter ratio will be unfavourable and the normal matrix for the positional parameters ill-conditioned. In the first cycles at low resolution you will normally get large shifts.
2.3.3 Initial refinement from molecular replacement models.
In the case where only one molecule is present in the asymmetric unit it is best to start by refining the six rigid body parameters from the molecular replacement by using RIGID=true and the RIGID specification to delineate the molecule. After convergence it may be possible to break the structure up into large chunks, e.g. in the case of domains. See sections 2.3.2 and 2.3.5 for further information.
In the case where more than one molecule is present in the asymmetric unit, one may want to proceed as with one molecule. However, it is possible at low to intermediate resolution to save on time and parameters by refining the structure making use of non-crystallographic symmetry and then only to rebuild one molecule on the graphics before further refinement. There are two modes to deal with non-crystallographic symmetry.
The orthogonal coordinates of one molecule are supplied along with the transformations operating on these coordinates which generate the coordinates of up to 14 molecules.
Set RIGID=true and define one or more RIGID bodies as before, and the molecules will then be refined as independent rigid bodies. Output will be the refined coordinates of the generated molecules, and the refined transformations, which should be input to the next cycle.
Note that the program will not notify you if the same molecule is generated twice. This may happen if a dimer is supplied and also generated. You therefore must make sure that only one molecule and the correct transformation are used by the program.
Input is the same as for MODE 1 except that RIGID=false and RIGID specifications are absent (see above). The transformations supplied are used as extra "equivalent positions" and the refinement produces an asymmetric unit where the molecules are identical and tend to an average of the real molecules.
The coordinates of only one molecule are written out and the same transformations are supplied for subsequent cycles. As in MODE 1, it is important to make sure that you do not generate a molecule that has already been read in.
See sections 2.3.2 and 2.3.5 for further information.
2.3.4 Initial refinement of macromolecule-ligand complexes.
When small errors in isomorphism are present, it may be useful to refine the protein in CONSTRAINED mode before difference Fouriers are calculated. Set RIGID=true and use RIGID groups. Use only an overall Uiso. After difference Fouriers and building in the ligand, it may be advisable to refine the ligand and the macromolecule in CONSTRAINED-RESTRAINED mode by setting RIGID=false and defining RIGID groups (see above). See sections 2.3.2 and 2.3.5 for further information.
2.3.5 Refinement at intermediate resolution.
How to proceed at intermediate resolution has already been discussed partially in sections 2.3.3 and 2.3.4. Generally it may be still useful to do some cycles of CONSTRAINED-RESTRAINED refinement before proceeding to RESTRAINED refinement only. Set RIGID=false and use RIGID records to delineate "rigid" bodies. This will accelerate convergence. Finally RESTRAINED refinement is obtained by removing the definitions of any RIGID groups.
It may now be useful to refine individual isotropic thermal parameters. When these are already present in the input coordinates they can be used and refined using ISO=true together with BINPUT=false (if Uiso's are present in the PDB file), or BINPUT=true (the default, if B's are present), together with the default ISOREF=true. When not present in your input coordinate data set, use ISO=false and ISOREF=true in the initial run. Having ISO=true and ISOREF=false will merely indicate that you want to read isotropic thermal parameters, but not refine them. This can be useful for molecular replacement models. In order to get meaningful isotropic thermal parameters it is usually necessary to include data higher than 3Å resolution. Note that MFACR (see section 3.1.3) is used to remove ill-conditioning. The input Uiso for each atom is checked and reset if necessary. The lowest allowed Uiso is set with ULIML; the highest by ULIMH.
2.3.6 Group isotropic and anisotropic thermal parameters.
In this option the thermal parameters of atomic groups are refined using the approximation that the groups possess, either partly or wholly, "correlated amplitude" motion. This is not necessarily the same as "rigid body" motion because the Bragg scattering is sensitive only to the amplitudes of vibrating atoms, not to their relative phases. Small rigid groups of bonded atoms such as the planar aromatic rings in HIS, PHE, TYR and TRP are likely to vibrate as rigid bodies, because the mean square vibration amplitude of a typical bond is very small (~ 0.002Å2). However larger groups such as secondary structure elements or domains are likely to have larger internal motions, where sub-structures have vibration amplitudes which are correlated, but whose relative phases are not (e.g. in anti-phase, as opposed to in phase); this correlated amplitude motion will be indistinguishable from true in-phase rigid body motion if only Bragg scattering data is used.
The atomic groups may be whole molecules, units of secondary structure (e.g. alpha helices) or they may be pseudo-rigid side groups such as phenyl rings, imidazole, carboxylate, guanidinium or amide groups. When units of secondary structure are chosen, there is an option to include main chain atoms only. For small groups (i.e. < 20 atoms) data at high resolution (e.g. 1.5Å) may be required for success. It should also be remembered that the model assumes harmonic thermal parameters and this may not be valid for side groups on the surface of a macromolecule.
There are three group thermal parameter options: UISO, UANISO and TLS. The UISO option refines 1 parameter per group, the UANISO option 5 or 6 per group, and the TLS (translation/libration/screw-rotation) option 19 or 20 per group. This is still likely to be far fewer than the 6 per atom required in full anisotropic refinement (see section 2.3.7). The potentially rigid groups in proteins which may be suitable are aromatic rings, the "propellors" of ASP/ASN, GLU/GLN and ARG, ligands such as heme, the secondary structure elements, domains, the entire molecule, or even the entire contents of the asymmetric unit.
For the UANISO and TLS options it is possible to refine the atomic isotropic thermal parameters in addition to the group parameters; this reduces the number of group parameters from 6 to 5 and 20 to 19 respectively (because the isotropic component of the T tensor is then not used, and is set to the mean Uiso). This is in fact the default if atomic isotropic thermal parameters are refined (ISOREF=true); if this option is not desired it must be deselected (see option NOATOM in the description of parameters).
In order to analyse the TLS tensors, the output files may be used as input to the CCP4 program TLSANL. The resulting anisotropic tensors may be visualised by using the output coordinate file to compute very high atomic resolution (0.7Å) structure factors, and then contouring the Fcalc electron density with a program such as O.
2.3.7 Individual atomic anisotropic thermal parameters.
If your data extend to atomic resolution it will be possible to refine individual atomic anisotropic thermal parameters using a 6 element anisotropic U tensor. This type of refinement can be started up by defining groups using the ANISO keyword. The isotropic U value of each atom will be put in the diagonal elements of the anisotropic U tensor (U11, U22, U33) to use as a starting value.
After refinement the new anisotropic U tensor (U11 U22 U33 U12 U13 U23) will be written to the coordinate file behind the ATOM record in a separate record identified by ANISOU using the standard PDB format. These records will then be used in future runs for reading and writing the anisotropic tensors.
2.3.8 Occupancy refinement.
Uncoupled group occupancy refinement may be useful for protein- inhibitor complexes, where the inhibitor is not present in stoichiometric amounts. The occupancy groups are defined in the control data with records using keyword OCCUp. The contiguous segment(s) comprising each group is/are specified by the starting atom number as present in the coordinates, the number of atoms in the segment (may be just 1 atom), and the group identifier using free format. Use as starting occupancy for the atoms in the group a value as suggested by the electron density.
Coupled alternative sites may be most easily created by using extra dictionary entries (see section 3.2). e.g. call the short alternative site residue ASX if it is the alternative site of the side chain for an ASP. These alternative site residues should then be added to the coordinate data set as ATOM records after chains terminated by TER, and effectively treated as separate protein chains themselves by inserting a TER record. Both the first and subsequent sites are specified as described above, but with different coupling identifiers appended; the group identifier must be the same for these coupled sites. It will be useful to use an extra restraint to tie the alternative site(s) down to the atom where it diverges, and extra restraints will also be required between atoms defined as HETATM's (see XTRDIST in section 3.1.1). Van der Waals repulsion is automatically turned off for coupled groups. It is always important to study the U values for the atoms in alternative sites because of the strong correlation between occupancy and U. Too large a U value with a low occupancy either means that the coordinates have been built in the wrong position, or that the site is not "real". A reasonable starting atomic isotropic U value for the second site is 0.2Å2.
2.4 WEIGHTINGWeighting may assume two distinct purposes in the refinement of protein structures. Firstly it may be used to drive the refinement down the correct minimum in as few cycles as possible. This will be used in the initial stages of a refinement, where as many errors should be corrected as possible. This is achieved by coarse resolution cut-off, and/or by using a small amplitude cut-off and/or a SIGMA type cut-off, and by down-weighting higher angle reflections in the remainder. This may be called convergence weighting.
Secondly in the latter stages of refinement, the weights may be used to reflect the expected discrepancies between observations and target values or functions and the corresponding quantities calculated from the model. As the model improves, higher resolution data may be included, and the higher angle data and weak reflections may be given higher weighting until the sum of the weighted residual squared over all observations and restraints equals the total number of observations and restraints minus the total number of variable parameters. This may be called statistical weighting. The weighting strategies to be adopted in the two cases may be quite different.
When applying any weights one has to recall the function that is minimised:
M = SUM [w(f) (|Fo| - G.|Fc|)2] [=M(a)] + SUM [w(p) (PHIo - PHIc)2] [=M(b)] + SUM [w(d) (d(t) - d(c))2] + SUM [w(b) (b(o) -b(min))2] + SUM [w(U) delta-U2] + SUM [w(Ua) delta-Ua2] + SUM [w(v) |V|2] + SUM [w(c) (d(t) - d(c))2] [=M(c)]
The factors w(f), w(p), w(d), w(U), w(Ua), w(v) and w(c) are the weights, the choice of which determines the relative influence of the terms in the function M which is to be minimised. It should be noted that only relative weights are significant. The choice of the absolute value of the weights does not influence the course of refinement. The relative contributions to the residual will be found in the general weighting analysis table (***ANALYSIS OF FUNCTION MINIMISED***). The weights are not directly supplied by the user. Instead weighting coefficients are supplied which are used in a formula to generate the weights. The formulae and their use are discussed in the sections below.
2.4.1 Structure amplitude weighting.
If the structure factor model perfecty described the diffraction of the macromolecule, the theory of least squares shows that the structure amplitudes should be given weights which are inversely proportional to their variances. However, due to the disorder present in macromolecular crystals, the structure factor model is always significantly in error. The final values of residuals and R factors usually owe more to errors in the model than due to experimental errors in the diffraction data.
The object of weighting the structure amplitude terms is to ensure that terms heavily affected by model or experimental errors are down-weighted. Several weighting schemes may be employed.
Note that the previously suggested procedure of adjusting the WE coefficients on each cycle is not recommended. The current recommendation is to leave the WE coefficients set at their default values, and adjust the WF coefficients only after a rebuild. In any case because the structure factor and energy weights are purely relative, adjusting only WF(1) to raise or lower the F weights will give the same effect as simultaneously adjusting the geometry weights.
Alternatively the weighting coefficients can be chosen manually so that
the mean values of
It is recommended that the user starts with scheme 1 and then when most of the ordered atoms have been refined, scheme 2 should be selected if standard deviations are available, otherwise use scheme 3 or 4. The choice of weighting coefficients is not a precise science but the resulting parameters are not likely to be critically dependent on it.
For schemes 2, 3 and 4, the optimum coefficients to make the mean values of w(f).(|Fo| - |Fc|)2 approximately independent of Fo and/or resolution, will be calculated by Nielsen's method before the first refinement cycle if USEWFC is set true, and the same values will then be used for all the cycles in the job.
2.4.2 Phase weighting.
Phase observations from isomorphous replacement or anomalous scattering measurements may be weighted using the figure of merit. The weighting formula is designed to weight down those reflections according to the difference between the observed and calculated values. Centric reflections are always given zero weight as they cannot contribute to a refinement. The formula is
w(p) = WP(1)*FOM*[180 - |PHIo - PHIc|WP(2)]2The figure of merit (FOM) must be read from the reflection file. The best way to choose WP(1) and WP(2) requires further research. Use the weighting analysis table for guidance.
2.4.3 Energy weighting.
Energy weighting involves the application of geometric restraints to the structure during refinement. The paucity of reflection data in a macromolecular refinement usually means that large random errors in atomic coordinates occur when an unrestrained refinement is attempted. These errors result in poor molecular stereochemistry.
Energy weighting uses a dictionary of target interatomic distances and standard deviations which govern the allowed deviations from the target values. Alternatively, the weights may be controlled by use of weight coefficients (WE) supplied in the steering data.
Weight Case Ideal r.m.s deviation W(d) = WE(1)2 if d(t) < 2.12Å 0.02Å W(d) = WE(2)2 if 2.12Å < d(t) < 2.625Å 0.04Å W(d) = WE(3)2 if d(t) > 2.625Å 0.05Å W(v) = WE(4)2 for planar peptide groups 0.01Å W(c) = WE(5)2 for all other planar groups 0.01Å W(c) = WE(6)2 for edges of chiral tetrahedra 0.02Å
Chiral restraints are applied as distance restraints along the edges of chiral tetrahedra with d(t)<=2.12A. In all cases WE(i)2 is the weighting coefficient that decides the relative weight of the particular energy restraint and the other terms in the function minimised.
Softer restraints than those suggested above may assist convergence at earlier stages. Note that application of harder restraints at too early a stage may severely reduce the rate of convergence. Because the structure factor and geometry weights are purely relative, the effect of reducing all the geometry weights can be obtained by increasing the weight coefficient WF(1).
Relevant information about the weighting can be found in the table under the heading:
***ANALYSIS OF ENERGY TERMS***
2.4.4 Thermal parameter restraint weighting.
There are 2 weighting coefficients (WU(1) and WU(2)) for the thermal parameter restraints which aim to minimise the difference between thermal parameters of pairs of atoms whose interatomic distance is also restrained (i.e. 1-2 and 1-3 bonded atoms), though the two types of restraint can be applied independently. WU(1) applies to isotropic thermal parameters, and WU(2) to anisotropic thermal parameters (but not group thermal parameters as these are already constrained).
The standard deviation of the half-bond restraint for an atom in the isotropic and anisotropic cases (where d is the interatomic distance) is given by the equations:
siso = WU(1).U2iso saniso = WU(2).d2
The weight for the restraint on the thermal parameter difference between atoms i and j is then:
wij = 1/(s2i + s2j)
The target of the restraint is also different in the two cases; in the isotropic case it is simply the difference between the Uiso's; in the anisotropic case it is the difference between the components of the anisotropic tensors along the line joining the atoms.
There are sound statistical and physical reasons for using different forms of the weight in the isotropic and anisotropic cases.
In the isotropic case the differences are purely statistical in origin: they are almost entirely due to the assumption of isotropy, not to any actual difference in thermal parameters. In reality atomic vibrations in a macromolecule, in particular in loosely bound regions such as chain termini and side-chains will have large anisotropic and/or librational components, so that the isotropy assumption is only very approximate.
The distribution of Uiso's is always very skewed, i.e. most cluster near the modal value, but with a long tail of large values. Consequently an atom with a value near the mode is most likely to find itself next to one with a similar value giving a small difference, whereas one with a value much larger than the mode will also most likely be near one with a value near the mode, giving a large difference. This leads to a dependence of the r.m.s. difference in Uiso proportional to the square of the mean Uiso, with a proportionality factor found empirically from refinement of high resolution (1Å) structures of ~ 1; this is the weighting coefficient WU(1).
In contrast, in the anisotropic case, where the difference is between the along-bond components of the tensors, the differences are real and reflect the physical situation. From IR spectroscopy it is found that the mean square amplitude of a typical (single C-C bond) bond vibration at ambient temperature is about 0.002Å2 (equivalent to delta-B ~ 0.16Å2), which is very rigid in comparison with the atomic vibrations (B typically > 5 to 10Å2). The atomic vibrations therefore arise almost entirely as a consequence of bond librations.
In the anisotropic case, therefore, the r.m.s. difference in the thermal tensor components should be independent of the isotropic thermal parameters. The difference between thermal tensor components will however be larger across bond angles (1-3 restraints), so a dependence on the square of the interatomic distance is used. The default value of the weighting coefficient WU(2) (0.01) is rather larger than the expected difference (0.002). This is because if the correct value is used initially the restraints are so tight that the refinement often fails to converge. It may be possible to use the correct value of WU(2) (0.0007) once convergence has been attained.
3. INPUT FILESThere are 5 input files to RESTRAIN.
FILE FILE NAME EXPLANATION - control and steering data section 3.1 - dictionary DICTION section 3.2 - atomic coordinates XYZIN section 3.3 - group thermal parameters TLSIN section 3.4 - reflections REFIN or HKLIN section 3.5
3.1 CONTROL and STEERING DATAThe control and steering data in the standard input data set both consist of a number of optional items. Within each of these data blocks the order of these items is immaterial.
Any record or part of a record can be temporarily "commented out" by use of the ! or # character; this causes all subsequent characters on the same line to be skipped.
Each record in the control data is identified by a keyword, but only the first 4 characters are significant and case-insensitive. Any other input required follows immediately in free-format (space-separated) on the same line, with the sole exception of the keyword STEER where the data must follow on the succeeding line(s). Data records (but not comments) may be continued by finishing a line with a "-". The keywords available are:
ANISo, DESOut, DICTion, DNAMe, FORMat, HKLIn, HKLOut, LABIn, LABOut, MATOut, NCSYmm, OCCUp, PNAMe, PRIVate, REFIn, REFOut, RIGId, STEEr, SYMMetry, TITLe, TLSIn, TLSOut, USECwd, XTRDist, XTRPlan, XYZIn, XYZOut
3.1.1 Description of control data.
Each of the keywords DICTION, XYZIN, TLSIN, HKLIN, REFIN, XYZOUT, TLSOUT, HKLOUT, REFOUT, MATOUT and DESOUT specifies a filename. Files may be also connected using the CCP4 logical names matching these keywords. The keyword information overrides the logical names.
3.1.2 List of steering data.
After the record with the single keyword STEER, the data follows on the next line and consists of a series of "name=value" specifications separated either by a comma or by the end of the line (a comma at the end of a line is optional) e.g.:
A=10.8, Gamma =90, ISO= f , isoref = T, Aniso= False G=2 ,High=2.8, dxyzlm=.02 , wF(1)=1.234e-6
The read statement makes use of a simulated version of the FORTRAN NAMELIST facility and thus the order in which the variables are given is immaterial. The letter case and spacing do not matter, and there may be any number of "name=value" specifications per record, up to 80 columns. However a "name=value" specification may not be split across two or more lines, and the use of the "-" continuation character is not allowed.
Only those items which you want to differ from default values need be entered. For example cell parameters are not normally supplied in the steering data because the values in the reflection and/or coordinate files are usually the correct ones. A list of variables which can be input to the program is given below. A detailed explanation of each variable is given in section 3.1.3.
The steering data may be terminated either by end-of-file, or by a variable name &EOF (without a value). In either case, this will cause refinement to be initiated. Additional steering data items (starting on a new line) may follow the &EOF variable. The refinement will then be restarted from the point that it was terminated. The values of the variables used will be those at the termination of the original refinement updated by the new supplied values. This may be repeated as often as desired.
Note for table: refer to full explanation of variable in the next section.
3.1.3 Full description of the steering data.
Default values are given in brackets immediately after the variable name.
3.2 DICTIONARYThe dictionary is read from the file DICTION. The use of a user defined dictionary makes RESTRAIN extremely flexible with respect to the type of structures that can be refined. The dictionary is divided in two blocks, the first containing all residues and accompanying restraints, the second one containing all the information necessary for the program to calculate the scattering factors for each atom type included in the first block. Keyworded free format input is used throughout, with spaces, tabs or newlines separating items, and with record continuations (max 24) being specified by a "-" at the end of the line. Character strings containing leading spaces (e.g. atoms with single character atomic symbols) must be enclosed in quotes ("..."). REMARK records may be interspersed freely to make comments.
The first block is organised into residue types, the first entry for each type being "RESI" followed by the residue name as a three letter abbreviation. Note that these residue names must correspond to those present in your coordinate set (see section 3.3). Within each residue entry the records may appear in any order.
Following the residue entry record are a series of "DIST" records defining the atom names, and each distance restraint in sequence moving down the residue. Each restraint is specified by a positional number defining which atom following the current atom it is restrained to, then the distance in Å and its standard deviation. The order of the different atoms in the residue therefore specifies the positional number. By default the restraint weights are calculated from the standard deviations. Note that the atom names must correspond to those present in your coordinate file (see section 3.3).
"DIHE" records define the name of each dihedral angle and the four positional numbers of the atoms defining this angle. Note that the names are not stored in the program. It is however sensible to use a consistent logical order, since the calculated dihedral angles will be printed in the same order, e.g. phi and psi, chi angles, omega for amino acid residues.
"CHIR" records define the name of each chiral centre and the four positional numbers of the atoms defining this centre. The order in which these atoms should be given should refer to a right-handed rotation when looking along the bond between the first atom (with the lowest positional number in the table) and the one at the centre of the tetrahedron. For Calpha chiral centres in amino acids the order therefore is N-Calpha-C-Cbeta. Note that the names are not stored in the program.
"PLAN" records define the name of each plane, the plane type, an individual plane weight (not used; for future development), and the atom pointers defining these planes. In this version of RESTRAIN only two types of planes are recognised. Planes of type 1 in the list will be put in the first category (PLANE1), all of type 2 in the second one (PLANE2). For amino acid residues the peptide planes therefore are usually put in first position. The reason for this is that RESTRAIN allows different weighting to be used for the two types of plane (see section 2.4). Note that the plane names are not stored in the program.
The residue entries in the first block are terminated by a record starting with END.
The second block consists of "ATOM" records and is organised into atom types, the first entry for each type being the atom name. Note that these atom names must correspond to those present in the first block and in your coordinate set (see section 3.3). Each atom name is followed by a record containing the 4 constants S(i), the 4 constants E(i), the constant C and the closest van der Waals radius RKL.
These constants will be used for a four-Gaussian expansion of the scattering factor:
f(hkl)=SUM(i) S(i)exp(-E(i)(sin(theta)/lambda)2)+C for i = 1,4These constants can be found in INTERNATIONAL TABLES FOR X RAY CRYSTALLOGRAPHY, Vol. IV. The van der Waals radius is used for calculation of nearest allowed distances of atoms more than three bond distances apart when REPEL=true. The second block is terminated by a record starting with END.
The distributed dictionaries (in $CLIBD) are:
chiral_pep4.dic: Main-chain chiral restraints; 4-atom peptide planes.
The first is the default if DICTION isn't assigned. A program "rdent" is available to generate RESTRAIN dictionary entries from PDB coordinate files; however it only makes the distance records (without standard deviations), the user has to work out the other sections, but this is not difficult.
The peptide dictionaries use values published by Engh & Huber (1991).
3.3 ATOMIC COORDINATESCartesian orthogonal coordinates are read from file XYZIN. The default set of orthogonal axes XO, YO and ZO is defined as follows:
XO || a YO || c* x a ZO || c*
If SCALE records are present in the file, these will override the above, as well as any cell parameters given in the steering data.
A CRYST record if present will override any crystal data (i.e. cell and space group) read from the MTZ file (if used). However any crystal data given in the steering data will override both the PDB and MTZ files.
The coordinate records must be in the format designed by the Brookhaven Protein Data Bank. The format expected is:
Care must be taken in preparing the coordinates for refinement. After each polymer chain a TER record must be inserted. All atoms not contained in chains must be labelled HETATM.
Note that atomic thermal parameters can be read as either U's or B's (B=8.PI2.U); the variable BINPUT must be set accordingly. After previous refinement and extensive rebuilding you may want to reset large U or B values for atoms incorrectly positioned before rebuilding (e.g. U > 0.8 or B > 64Å2) to more reasonable starting values (e.g. U=0.2 or B=16Å2).
The number of atoms in each residue in the polymer chains must be the same as the number of atoms in that residue in the dictionary. The names of all atoms must correspond to the names of the atoms in the dictionary. Blanks (including leading blanks) are significant in assessing an atom name.
The atomic coordinates in the polymer chains must be ordered in each residue in the same way as the atoms in the residue are ordered in the dictionary. If this is not the case, set ORDER=true in the steering data in the initial cycle. The output file of atomic coordinates will then be produced in dictionary order for subsequent cycles. Alternatively, set TESTIN=true and ORDER=true to use the program to order and analyse the file without carrying out any refinement.
For anisotropic thermal parameters the six values defining the U tensor of an atom U(11) U(22) U(33) U(12) U(13) U(23) are written out (multiplied by 104 immediately following the coordinate record of that atom. The record containing the U tensor is identified by the label ANISOU. The format used for this record is (A6,22X,6I7).
3.4 GROUP THERMAL PARAMETER CONTROL DATA
All information for the group thermal parameter refinement is contained in the file assigned to TLSIN; the steering data does not contain any information. Each thermal parameter group is defined by an entry in the TLSIN file.
The layout of a UISO entry is typically:
UISO name RANGE atom_id_start atom_id_end [selection] RANGE . . . . . . . . . . . . . . . . . . . . U Uiso (Å2)
The layout of a UANISO entry is typically:
UANISO name RANGE atom_id_start atom_id_end [selection] RANGE . . . . . . . . . . . . . . . . . . . . U U11 U22 U33 U23 U31 U12 (Å2)
The layout of a TLS entry is typically:
TLS name RANGE atom_id_start atom_id_end [selection] RANGE . . . . . . . . . . . . . . . . . . . . ORIGIN x y z (Å) T T11 T22 T33 T23 T31 T12 (Å2) L L11 L22 L33 L23 L31 L12 (deg.2) S S1 S2 S23 S31 S12 S32 S13 S21 (Å.deg.)
Uij means the element (i,j) of tensor U. Since X-ray
data allow the calculation of only eight of nine S tensor elements, the
usual constraint of setting the trace of S to zero is adopted.
This means that the elements S1 and
S2 are (S33 - S22) and (S11
- S33) of the S tensor as defined by the equation
Note that the order of the off-diagonal terms in the group U, T and L tensors is different from that of the U tensor in the coordinate file (the 23 and 12 elements are swapped).
All the records of each except the first (UISO, UANISO or TLS) are optional, and can appear in any order. The data will assume sensible defaults if not supplied (so the TLSIN file may contain only 1 line). If the U or T record is omitted, the mean isotropic thermal parameter for the group is either used as is for UISO, or converted to the equivalent anisotropic tensor for UANISO or TLS. ORIGIN specifies the local origin of a TLS group; if omitted it is set to the mean centre of the group. The L and S tensors if omitted are set to zero. In addition to the keyworded records shown above, the following are also accepted: DEFAULT, NOATOM, RESIDUE (see the next section for details).
3.4.1 Description of the data records in the TLSIN file.
Only the first 4 letters of the keywords are significant and they are case-insensitive. The format is free, that is items separated by one or more spaces. If items are left blank they default to zero values.
RANGE atom_id_start atom_id_end [atom_selection]
ORIGIN x y z
T22 T33 T23
L22 L33 L23
S2 S23 S31
S12 S32 S13
TLS N domain RANGE 1. 68. MNCH RANGE 129. 300. MNCH T .112 .165 .131 -.052 -.003 -.003 L 1.877 2.165 3.471 4.562 6.152 7.313 S .366 -.382 .147 -.981 .185 .118 .132 .140
Where TLS tensors result in U tensor that is not positive-definite, a warning message is printed out stating the atom name, number and U tensor.
If the L tensor elements are large (>20 degr2) and an atom is far away from the centre of origin for the calculation of the TLS tensors (>20Å), then the observed and calculated structure factor amplitudes can be different by several orders of magnitude. This is a consequence of the numerical instability in calculation of derivatives of the TLS tensors with respect to positional coordinates (on some machines it may also result in an overflow floating point error). These problems usually appear at the beginning of the TLS refinement of large groups if the user does not set the initial L small enough and origin of the rigid group sufficiently close to the centre of gravity. Such an error is checked for in two ways. First, a warning message is printed if the selected origin is more than 10Å away from the gravity centre. Second, a warning message is printed if more than 30% of elements of U tensors for individual atoms had to be reset to an arbitrary interval [0, ULIMH].
Note that TLS calculations, like all anisotropic calculations, cannot take advantage of space-group specific subroutines. The general space-group subroutine must be used.
3.5 AMPLITUDE AND PHASE DATAThe reflections are read from file REFIN or HKLIN. These files may contain:
H K L FOBS SIGMA(FOBS) PHASE FOM FREERFLAG Item Description Form- Unform- atted atted H K L Miller indices of reflection I R FOBS Observed structure factor amplitude I R SIGMA(FOBS) Standard deviation in observed amplitude I R PHASE Estimated phase from isomorphous and/or anomalous data I R FOM Figure of merit for phase (on scale of 0-100) I R FREERFLAG Free R flag (MTZ only) ITwo file types containing the amplitude and/or phase data are accepted. Which file type is actually read depends on the keyword REFIN or HKLIN (see section 3.1.1).
When REFIN is used, a formatted reflection file is read and the input depends on the value specified for MAXFMT which must be >=4 and <=7. When MAXFMT is 5 the items H, K, L, FOBS AND SIGMA(FOBS) will be read. The reflections are read in with the format specified after the steering data. Note that the format must be consistent with the value for MAXFMT.
When HKLIN is used then the input is read from an unformatted (MTZ) reflection file. The file has header information containing the crystal data (cell parameters and space group), which means that this information does not normally need to be supplied in the steering data.
4. OUTPUT FILESBesides line printer output (described below in section4.1) there are a number of output files depending on the steering data.
File File name Description - refined atomic coordinates XYZOUT section 4.2 - refined group thermal parameters TLSOUT section 4.3 - structure factors HKLOUT or REFOUT section 4.4 - full normal matrix MATOUT section 4.5 - design matrix DESOUT section 4.5There are also 3 scratch files used by RESTRAIN:
File Unit Description - coordinates for ordering 12 section 4.6 - reflections for scaling & weighting 14 section 4.6 - normal equations for positional parms. 11 section 4.6
4.1 LINE PRINTER OUTPUTThe program is so designed that all possible information that could be required by the user is accessible. However, to prevent unnecessary output the user can manipulate parameters that control the amount of output (see section 3.1.3). Obviously a run will produce a limited selection of the output items, depending on the refinement parameters. The major output items for each cycle are summarised below. They are subdivided in major blocks indicated by a title between a pair of three asterisks.
1. The program header stating the version number used.
2. The array dimensions which have been set using the PARAMETER statements.
3. The TITLE as supplied by the user in the control data.
4. The filenames for coordinates input XYZIN, reflections input REFIN, dictionary DICTION, coordinates output XYZOUT and reflections output REFOUT.
5. FORMAT FOR INPUT: The format specified by the user is printed.
6. Under this heading there follows a list of all the steering parameters, with their default values and the input values which were specified by the user. If no value for a parameter has been given, the default value is used, with the exception of the cell parameters and the scale factor G and overall thermal parameter U, which must be supplied by the user.
7. FRACTIONAL CRYSTALLOGRAPHIC EQUIVALENT POSITIONS. The general equivalent positions are given in the format of International Tables Vol. A. It is advisable to check these at the beginning of a refinement.
8. When refining using non-crystallographic symmetry MODE 2 (RIGID=false) ORTHOGONAL NON-CRYSTALLOGRAPHIC EQUIVALENT POSITIONS will be printed. These will then be followed by a list of ALL ORTHOGONAL EQUIVALENT POSITIONS including those generated by the non-crystallographic symmetry.
9. When extra distance restraints are to be used NUMBER OF NON-DICTIONARY RESTRAINTS will be printed. Six restraints per line are listed. These restraints are ATOM1 ATOM2 DISTANCE e.g. 190- 638 2.08 means that the distance between atoms 190 and 638 is 2.08Å. Check that the restraints are correct.
10. When extra planes are to be used NUMBER OF NON- DICTIONARY PLANES will be printed. Check that the planes are correct. E.g.
FIRST ATOM ATOMS IN PLANE 1045 6This means that there are 6 atoms in the extra plane, the first atom being number 1045, the other 5 atoms following sequentially with no atoms being skipped.
11. When atoms are to have occupancies refined (OCCREF=true) NUMBER OF OCCUPANCY GROUPS will be printed. The occupancy groups are then listed.
FIRST ATOM NUMBER GROUP COUPLING OF ATOMS NUMBER NUMBER 910 6 1 1 952 5 2 1 1045 6 1 -1This shows the two cases:
The present occupancies as read from the input coordinates are then listed.
ATOM 910 HAS OCCUPANCY 0.621. ATOM 1045 HAS OCCUPANCY 0.379. ATOM 952 HAS OCCUPANCY 0.565.Note that coupled occupancies should add up to 1.
12. In case DICPRI is true, the contents of the dictionary will be printed as it is read to facilitate the development of new entries. At the end some overall statistics are printed.
13. MOLECULAR PARAMETERS. This is self-explanatory. Note that (groups of) terminal atoms may be counted as extra residues. This is seen when for the carboxyterminal oxygen a separate residue entry in the dictionary is used.
14. If there are groups of atoms which are to have their thermal parameters refined by rigid body option, the header ATOMS IN THE FOLLOWING RANGES ARE TO BE REFINED ANISOTROPICALLY BY RIGID BODY (TLS) is printed, followed by the description of rigid bodies using the format in section 2.3.6.
15. If there are groups of atoms which are to have their thermal parameters refined anisotropically the header ATOMS IN THE FOLLOWING RANGES ARE TO BE REFINED ANISOTROPICALLY is printed, followed by 10 ranges per line giving first and last atom number (internal counters).
16. If there are rigid groups, these are listed under the heading ATOMS IN THE FOLLOWING RANGES TO BE REFINED AS RIGID GROUPS. Ten ranges per line are printed giving first and last atom number (internal counters).
17. When refining using non-crystallographic symmetry MODE 1 (RIGID=true) ATOMS IN THE FOLLOWING RANGES ARE TO BE REFINED AS RIGID GROUPS RELATED BY NON- CRYSTALLOGRAPHIC SYMMETRY is printed. For each molecule the atom ranges (internal counters) are given, followed by a description of the non-crystallographic symmetry operation in terms of a rotation and a screw translation. This is an aid in visualising the transformation involved.
18. NUMBER OF PARAMETERS TO BE REFINED. This gives an indication of the stability of the refinement seen in relation to the number of observeds and restraints.
19. The cycle number CYCNO as supplied by the user (or default value 1).
20. When refining TLS parameters there is a list of those atoms within TLS groups for which are the derived anisotropic tensors are not positive definite. This information is listed below details of the TLS group concerned.
***AGREEMENT BETWEEN FO AND FC BASED ON INPUT COORDINATES***
26. TITLES READ FROM REFLECTION FILE when a binary reflection file is used.
27. UNFAVOURABLE AGREEMENTS BETWEEN F(OBS) AND F(CALCS) AS DETERMINED BY RWDMIN. Under this header structure factors are listed, when their rootweighted (Fo - G.Fc) (DELTA ROOTW) is larger than the user supplied value for RWDMIN. In the early stages of a refinement it is advisable to print some structure factors, to check whether the amplitudes and/or phases are read correctly, and to see which reflections cause problems. In later stages this output can then be suppressed.
28. TABLE OF TOTALS DERIVED FROM THE STRUCTURE FACTORS INCLUDING THE R FACTOR. This table gives information about the number of reflections (and phases) used, W DELTA SQ or SUM w(f)(Fo - G.Fc)2 is the term being minimised. Then two residuals and a correlation coefficient are printed.
R = SUM(|Fo| - G.|Fc|) / SUM(|Fo|) RDASH = (SUM(W.(|Fo| - G.|Fc|)2) / SUM(W.|Fo|2))1/2 C = (N.SUM(|Fo|.|Fc|) - SUM(|Fo|).SUM(|Fc|)) / ((N.SUM(|Fo|2) - SUM(|Fo|)2) . (N.SUM(|Fc|2) - SUM(|Fc|)2))1/2where N is the number of amplitudes used.
The conventional R-factor is self-explanatory. However, it is the weighted R-factor which gives an indication of the progress of the refinement. As long as this residual is decreasing, there is hope, even when the unweighted R-factor temporarily increases (which is sometimes seen in the initial cycles of a refinement). The correlation coefficient may have a greater discerning power than the R-factors, when refining potential molecular replacement solutions at low resolution.
***ANALYSIS OF STRUCTURE FACTOR TERMS***
29. This table prints the mean w.delta2 values for amplitudes (and phases if PHAS is true) in batches according to the resolution (columns) and amplitudes (rows). The table will be very useful when judging the effect of the weights which are printed above the table. Above the table the weighting formula as defined by SCHEME and WF(i) is shown.
30. The values of the refined scale (G) and overall thermal parameter (U). If WATER=true, the values of the parameters SB1 AND SB2 will also be printed.
***GEOMETRY OF INPUT COORDINATES***
31. Under this header restrained interatomic distances are listed, when their rootweighted d(t) - d(c) (RWDELTA) is larger than the user supplied value for RWLMIN. In the early stages of a refinement it is advisable to print some differences, to check whether the order of the coordinates is correct, and to see which distances cause problems. In later stages this output can then be suppressed. This table also gives the r.m.s deviations from planarity of the peptide and ring planes where they exceed 0.03Å. If a chiral centre threatens to reverse hand, or has already done so, the tetrahedral volume will be printed. If many residues have this tendency as sometimes happens in the early stages of a refinement, it may be useful to use a dictionary with extra chiral restraints, and to use a value for the weighting coefficient WE(6) < WE(1).
At the right-hand side of this table the torsion angles as calculated from the coordinates are listed in the order as defined by the dictionary.
***ANALYSIS OF ENERGY TERMS***
32. A table printing the mean w.delta2 values for distance and planarity restraints in groups according to the target distance or plane type is given. This table will be very useful when judging the effect of the weighting coefficients which are also printed in this table, with WE(1) to WE(6) from left to right.
***ANALYSIS OF FUNCTION MINIMISED***
33. Under this heading a table prints the value of the function minimised (see section 1.1), showing the sum of the w.delta2 values for the amplitudes, phases, distance restraints and planarity restraints, and their relative contribution to the total minimum. This will be useful in defining the relative weights for each term. When FREF=true there will be a second table showing the relative residuals in dependence on the resolution.
***ANALYSIS OF GAUSS-SEIDEL SOLUTION OF NORMAL EQUATIONS***
34. This next block of information describes the convergence of the Gauss-Seidel iterative method for solving the normal equations for the positional parameters. The first table describes the condition of the matrix.
This is followed by a table describing the solution of the normal equations listing for each iteration : the iteration number I, MEAN(Q) and MAX(Q), the mean and maximum respectively of the elements of DELTA P(I) - DELTA P(I-1) and DELTA P (I) - DELTA P (I-1) / DELTA P (I), where P(I) = solution vector at iteration I.
The ANGLE BETWEEN SHIFT VECTOR AND DIRECTION OF STEEPEST DESCENT gives an indication of the progress towards the minimum.
In case the program cannot not solve the normal equations, MFACR will be automatically incremented, and a retry will take place. When this leads to divergence again, some suggestions are printed.
***ANALYSIS OF RESIDUAL TO DETERMINE OPTIMUM SHIFT FACTOR***
35. This table shows the results of the sampled residual calculations using
Actual shift = SFACR * calculated shiftSampled residual calculations are made to determine the optimum shift factor (ESTIMATED SHIFT FACTOR).
36. The r.m.s atomic shift is printed out. This indicates whether any refinement is still taking place, or if convergence has been reached.
37. If there are rigid groups, for each group the three translations and a rotation angle around an axis, of which the direction cosines are given, are printed together with the r.m.s atomic shift. The latter value will give an indication if convergence is being approached.
***ANALYSIS OF NON-CRYSTALLOGRAPHIC SYMMETRY***
38. When refining using non-crystallographic symmetry MODE 1 (RIGID=true) the program will print the new transformation for each molecule, followed by a description of this non-crystallographic symmetry operation in terms of a rotation and a screw translation. This can then be compared to the input value printed in item 17.
***SHIFTS IN OUTPUT COORDINATES***
39. Next is printed a listing of all atoms, to which shifts larger than DXYZLM have been applied, or which have U values not within the range ULOW to UHIGH. In case of anisotropic atoms the trace is used to determine whether the tensor is printed. In the case of multiple cycles the shifts refer to the last cycle only.
40. The r.m.s atomic shift for the original input coordinates is printed out. This will be different from the one under item 31 when more than one cycle has been run, and/or when constrained-restrained refinement has taken place.
41. When refining TLS parameters there is a list of the refined TLS groups with the derived anisotropic tensor for each atom in the group. This is checked for being positive definite. The results may be compared with those of item 20.
4.2 REFINED COORDINATESThese are written out to file XYZOUT. The coordinates are written out in the same format as the input coordinates (see section 3.3). Atomic anisotropic U tensors are also written to this file and are in the format described earlier. In the next run the file specified as XYZOUT should therefore be used as XYZIN.
4.3 REFINED GROUP THERMAL PARAMETERSThese are written out to file TLSOUT, in the same format as in file TLSIN (section 3.4), provided the latter was supplied. In the next run the file specified as TLSOUT should therefore be used as TLSIN.
4.4 STRUCTURE FACTORSThese are written out to file HKLOUT or REFOUT, and are ideally meant for FFT input. Each record contains
H K L 40000(sin(theta)/lambda)2 Fo/G SIGMA/G Fc PHASEin the format (3I4,4I6,I4) for REFOUT, or
H K L Fo/G SIGMA/G Fc PHASEunformatted for HKLOUT. When no sigma is read in, 1/sqrt(weight) replaces SIGMA in the output.
4.5 NORMAL MATRIXIf FULMAT or NORMAT is set true, the normal matrix is written to the file MATOUT. This is used for calculating standard deviations of all parameters (FULMAT) or just coordinates (NORMAT).
If DESMAT is set true, the design matrix is
written to the file DESOUT. The output file is used by another
program (FUMAIN2*) for estimation of the variance of the least-squares
residual. At present this feature is experimental.
4.6 SCRATCH FILESA formatted scratch file (unit 12) for temporarily storing the newly ordered coordinates when the option ORDER is true. Otherwise this scratch file will not be opened.
An unformatted scratch file (unit 14) may be used for temporary reflection storage when initial calculation of the overall scale and thermal parameters, or of the amplitude weighting coefficients, is required.
An unformatted scratch file (unit 11) will be opened to store the approximation to the normal matrix where contributions to the off-diagonal terms are included for the energy restraints and 3x3 blocks are used for the contribution from the position all parameters of the atoms. All other off-diagonal terms are taken as zero. This file is read several times during the solving of the normal equations (see variables SFTLIM, CGFACR and GSFACR in section 3.1.3).
5. JOB FAILURESRESTRAIN is designed to check the input data, and to either print out a message informing the user what the problem is and what corrective action has been taken, or in more severe cases to print out a message and stop, as continuation would be useful in these cases. These messages are usually preceded by '***'. Much care has been taken to make the messages as informative as possible and thought has gone into the detection of illegal combinations of refinement options (see section 2.3.1). Obviously it is impossible to allow for all eventualities, so if you find an error that is not covered or you do not understand then please seek assistance. When starting up a refinement use low values for RWDMIN (the weighted differences between observed and calculated structure factors) and RWLMIN (the weighted differences between observed and ideal distances) to obtain as much information as possible about the input reflections and coordinates respectively (see section 3.1.3). In the following paragraphs some common errors are described.
5.1 ARRAY DIMENSION ERRORSDepending on the number of atoms and residues in your structure a suitably dimensioned version of RESTRAIN will have to be used. The array dimensions of RESTRAIN dealing with problem specific variables are set using the PARAMETER statement. They are printed in each listing immediately after the program title. Exceeding the boundaries will produce a message telling which parameter to increase and a run termination. Recompilation will then be necessary. If you are not sure what to do seek assistance.
5.2 COORDINATE FILE ERRORSMake sure that protein chains are terminated with TER records.
5.3 REFLECTION FILE ERRORSReflection file errors are often caused by format errors when reading formatted files.
When using weighting schemes with the standard deviation or when using MIR or MIRAS phases you must have these present in your reflection file.
Illustration of input.This is not intended to be a working example; it contains all the commonly used options together in the same script, and is meant to illustrate the available options. Most restrain scripts are nowhere near as long as this one! Just change the filenames and column labels, and delete the other bits you don't need. Note that the script below will apply both geometric and thermal parameter (isotropic or anisotropic as appropriate) restraints by default.
#!/bin/tcsh set r=$0:r time restrain <<EOF TITLE Illustrating all the options in one script! ! ! First define the input and output files (can also do it on command line). ! All input is free format, order and letter case of keywords don't matter. ! XYZIN hexpep.brk ! Check section 3.3 for preparation guide. TLSIN hexpep.tls ! Needed for group thermal parameters. ! Described in detail below. HKLIN hexpepf.mtz LABIN FP=FP_hexpep SIGFP=SP_hexpep FREE=FreeR_flag XYZOUT $r.brk TLSOUT $r.tls HKLOUT $r.mtz LABOUT FC=FC_hexpep PHIC=PC_hexpep ! ! ANISO creates individual atomic anisotropic thermal tensors (high res.only!). ! ANISO 327.CA ! This will match either Calpha or calcium. ANISO 10. 50. ! Residues 10-50, all atoms. ANISO 200. 250. ' CA' ' CB' ! Calpha's (but not calcium!) & Cbeta's only. ANISO 100. 150. mnch ! Main chain atoms only. ANISO 151. 190. sdch ! Side chain atoms only. ! ! NCSYMM defines NCS operators (3 molecules/a.u. here; identity is assumed). ! NCSY POLAR 25.563 87.995 127.906 ! Can also say "NCSY MATRIX ...". NCSY TRANS 100.076 -3.502 9.137 ! Use lsqkab to get these. NCSY POLAR 65.746 117.435 180.153 NCSY TRANS 119.479 46.151 31.805 ! ! OCCU allows occupancies in PDB file to be used, and creates occupancy groups. ! Here group A consists of 4 atoms with 3 coupled occupancy parameters, ! i.e. their sum is constant. ! Group B consists of 6 atoms with one free occupancy parameter. ! OCCU 101.CG 4 A 1 ! First atom id, no. of atoms, group id, coupling id. OCCU 151.CB 5 B OCCU 51.SG 1 B OCCU 251.CG 4 A 2 OCCU 201.CG 4 A 3 ! ! RIGID defines rigid bodies. ! RIGID 10. 50. A ! Residues 10-50, all atoms, rigid group A. RIGID 200. 250. A ! More atoms in group A. RIGID 100. 150. A ! Yet more. RIGID 151. 190. B ! These are in rigid group B. ! ! XTRD defines extra distance restraints. ! Here's a real example with a disordered cystine. ! XTRD 18.N 618.CB 2.455 0.034 ! Atom 1 Atom 2 d [sigma(d)] XTRD 18.CA 618.CB 1.530 0.020 ! Residue 618 is an alternate s/c of 18. XTRD 18.CA 618.SG 2.822 0.043 XTRD 18.CB 622.SG 3.034 0.059 XTRD 18.SG 622.CB 3.034 0.059 XTRD 18.SG 622.SG 2.030 0.008 XTRD 18.C 618.CB 2.504 0.038 XTRD 22.N 622.CB 2.455 0.034 ! Residue 622 is an alternate s/c of 22. XTRD 22.CA 622.CB 1.530 0.020 XTRD 22.CA 622.SG 2.822 0.043 XTRD 22.C 622.CB 2.504 0.038 XTRD 618.CB 622.SG 3.034 0.059 XTRD 618.SG 622.CB 3.034 0.059 XTRD 618.SG 622.SG 2.030 0.008 ! STEER ! ! "Steering data" follows STEER keyword (uses simulated Fortran NAMELIST). ! NCYC=8, CYCNO=21, SCHEME=5 ! May want to modify these. EOF
Here is an illustrative example of a TLSIN file (group thermal parameters):
UANISO ! Overall anisotropic tensor. DEFAULT ! Defines default values, ! i.e. may be overridden. UANISO N-domain ! Group anisotropic tensor just for one domain. RANGE 1. 180. ! Domain consists of 2 contiguous segments. RANGE 98. 327. TLS C-domain ! TLS tensor for other domain. RANGE 191. 290. ! This domain has just one segment. TLS A-helix ! TLS tensor for helix main chain. RANGE 30. 55. mnch NOATOM ! Don't refine atomic Uiso's for this group. UANISO TRP 99 s/c RANGE 99. '' sdch ! U tensor for individual side chain. NOATOM ! Don't refine atomic Uiso's for this group. UISO ! Can also do group isotropic tensors. RANGE 100. 130. sdch ! Side-chains of residues 100-130 will have RESIDUE ! separate group Uiso's.
Note that you don't need to put in any values for the tensor components; the program will supply sensible defaults for any undefined tensors. Once the job has been run, the refined values of all tensor components will be put in the TLSOUT file ready for the next run. Here is an example:
! APP ANISO/TLS AT 2.1Å no refine Creact. ! Output from refinement cycle 5 UANISO Polypro helix. RANGE 1. 9. ALL U 0.0053 -0.0086 0.0033 -0.0140 0.0044 0.0007 ! (0.0112)(0.0127)(0.0204)<0.0064>(0.0063)(0.0066) TLS Alpha helix. RANGE 13. 32. ALL ORIGIN -3.103 -8.863 3.788 T 0.2217 0.1885 0.2016 -0.0052 0.0067 0.0045 ! <0.0083><0.0090><0.0145>(0.0053)<0.0052>(0.0052) L 0.62 1.67 2.23 -0.45 -0.15 0.23 ! ( 1.12)< 0.46>< 0.87>< 0.44>( 0.82)( 0.47) S -0.025 0.000 0.045 -0.008 0.048 -0.034 -0.034 -0.066 ! ( 0.057)( 0.070)< 0.034>( 0.049)( 0.052)( 0.052)( 0.054)< 0.033> END
If further refinement is necessary, such as after rebuilding, one would normally replace the old XYZIN and TLSIN files with the new ones, and insert new names for the output files. In the script above this is done automatically by creating a script with a new name. Also certain parameters would need to be updated, in particular any NCS operators, the overall scale factor G, the solvent background parameters, SB1 & SB2, and the F-weighting coefficients WF(2) ... WF(4). The new values of all updated parameters are always printed at the end of the standard output, e.g.:
NCSY POLAR ... NCSY TRANS ... G=5.0972, SB1=3.6804, SB2=10.6158 WF(2)= 1.94021E+03, WF(3)= 1.06121E+01, WF(4)= 1.22706E-02
Unix example script found in $CEXAM/unix/runnable/
VMS example script found in $CEXAM/vms/
SEE ALSOAlternative refinement program: