VECREF (CCP4: Supported Program)

NAME

vecref - Vector-space refinement of heavy atom sites in isomorphous derivatives.

SYNOPSIS

vecref MAPIN foo_in.map ATOUT foo_out.dat
[Keyworded input]

DESCRIPTION

Vector-space refinement is an alternative to the standard reciprocal-space refinement of Busing & Levy ([1]); instead of least-squares minimisation of the sum of weighted squared differences between observed and calculated structure factor amplitudes with respect to the atomic parameters, we minimise the same function of the observed and calculated heavy-atom difference Patterson function values.

Although the Patterson function is a complete representation in vector space of the set of structure factor amplitudes (provided the Patterson function is sampled at a sufficiently fine interval), the two types of refinement are not equivalent. This is because the minimisation in vector space is only done for sample points where the calculated Patterson density is significantly positive, in other words near the interatomic vector peaks, whereas each structure factor contains information from all points in real space (except in special zones). In all real crystal structures the atoms fill all the space (there are no holes), so there would be no advantage with vector-space refinement; indeed there would be problems due to considerable overlap of peaks in vector space.

For a difference structure the situation is reversed; most of the space is empty and the probability of overlaps is small; and since the structure factors represent mostly empty space, they are dominated by errors arising from taking a small difference between large quantities with small relative errors. As a result, reciprocal-space refinement of heavy-atom parameters is notoriously slow to converge and insensitive to misplaced sites and to errors in the starting values of the parameters, and refinement statistics usually give little indication of the correctness of the solution. (Typical conventional R-factors for reciprocal-space heavy-atom refinement are in the range 30% (rarely) to 70%, usually around 50%, compared with the random value for acentric data of 58.6%.)

The theoretical convergence radius for reciprocal-space refinement is dmin/4, i.e. 0.75Å at 3Å, but this is unlikely to be achieved in practice because of random errors; for vector-space refinement the theoretical convergence radius is the apparent atomic diameter, which is about dmin/sqrt(2), i.e. about 2.1Å at 3Å.

Vector-space refinement also has the considerable advantage over reciprocal-space refinement in that it is possible to perform the least-squares fit to the 3-dimensional data set using only isomorphous differences, so that heavy-atom derivatives for which no anomalous scattering data are available (or are of dubious reliability) can be refined using information from all data, centric and acentric. In this case it is important to have the derivative data already scaled accurately against the native data, as the derivative scale-factor cannot be refined in real space. The Kraut scaling technique ([2]) is recommended for this; see the documentation for program FHSCAL.

In vector space, there exists no procedure corresponding to the heavy-atom difference Fourier for finding minor sites. However other options which utilise the Patterson, such as superposition methods, are available, and of course one can still calculate structure factors from the refined atomic parameters and do a difference Fourier.

KEYWORDED INPUT

Free format using keywords. The following keywords may be used; only the leading 4 characters are significant and the order is immaterial:

ANISO, ATOM, BLIMITS, BREFINE, CYCLES, DAMP, END, GROUP, ORIGIN, RCUT, RESOLUTION, SCALE, SPACEGROUP, THRESHOLD, TITLE.

The keywords SPACEGROUP, RESOLUTION, CYCLES and ATOM are always required, the rest are optional, and assume default values if omitted.

TITLE <title>

Title (max 100 characters).

SPACEGROUP <space_group>

Space group name (e.g. P212121) or number (e.g. 19).

RESOLUTION <d_min> [<d_max>]

Approximate minimum and maximum d-spacings (high and low resolution cutoffs respectively; may be given either way round) of the data used to compute the Patterson map. d_max may be omitted, in which case no low resolution cutoff is assumed.

RCUT <radius>

Radius cutoff for Patterson peaks. The default radius is defined by the point where the density either becomes zero or a minimum; this is usually satisfactory. If RCUT is positive the radius is taken as a RCUT times the RMS radius. Note that the RMS radius is typically 0.5-0.6 times the default radius, so there is no point setting RCUT above about 1.5. If RCUT is negative, the radius (in Angstrom) is taken as abs(RCUT).

GROUP <gs> <as1> <w1> <r1> <as2> <w2> <r2>

Group symbol, any 2 characters used to represent the group scatterer, e.g. HI for HgI4-- (do not use a symbol which could be confused with an atomic symbol).

Atomic symbol, multiplicity and radius for the first type of atom in the group; i.e. if this is the central atom then w1 = 1 and r1 = 0.

Repeat for the second type of atom (optional).

CYCLES <ncyc1> <ncyc2> <ncyc3>

The program will perform <ncyc1> cycles of refinement of the occupancy factors (usually 3 to 5, possibly up to 10 for difficult cases), followed by <ncyc2> cycles of refinement of the occupancies and coordinates (usually about 5), and finally <ncyc3> cycles of refinement of all variables (occupancies, coordinates and thermal parameters) (usually 10-20).

ORIGIN

Specify to include Patterson origin. Usually exclude, except in the case of 1 atom in space group P1, when the origin is the only vector. See note 5.

SCALE <scale(i)>

Scale factors for the coordinates; if these are fractional then scale(i) = 1 (this is the default). If they are in grid units (i.e. taken directly from a map) then scale(i) = number of grid units along each cell edge. (These grid units need not be the same as those used in the Patterson read by the program.)

ATOM <as> <iat> <occ> <x> <y> <z> <b>

<as>: Atomic or group symbol.
<iat>: Atom/group identifier (integer from 1 to 9999).
<occ>: Occupancy.
<x> <y> <z>: Real space (not Patterson space) fractional/grid coordinates.
<b>: Thermal parameter (B-factor).

ANISO

This is currently not operational.

BREFINE

Specify to refine individual isotropic thermal parameters (ncyc3 must also be > 0). The default is to refine an overall B factor (when ncyc3 > 0). See note 4.

DAMP <damp>

Damping factor for occupancy and B-factor shifts when these are refined together. The default value is 0.25. The occupancies and B-factors are always highly correlated, so without damping the shifts tend to oscillate.

THRESHOLD <trms>

Only atoms with refined occupancies above this threshold times the estimated standard deviation are written to the ATOUT file. If threshold is given as negative, the absolute value of the occupancy is tested, so that negative occupancies may also be written out. Default is 2 sigma.

BLIMITS <blim(1)> <blim(2)>

Only atoms with B factors within the specified limits are accepted on input, and are written to the ATOUT file. Defaults are 0 200.

END

Terminate input and start the calculation.

INPUT AND OUTPUT FILES

MAPIN:: A Patterson map in standard CCP4 format is required; see note 2 for important information concerning its preparation.
ATOUT:: The refined coordinates are written to a file in a format suitable for inclusion in the control data for subsequent runs.; The standard library of symmetry positions (SYMOP.LIB) is only required if the space group number is specified.

Logical names used:

INPUT_DENSITY_FILE MAPIN

OUTPUT_COORDINATE_FILE ATOUT

Symmetry library SYMOP

[default $CLIBD/symop.lib]

Form-factor library ATOMSF

[default $CLIBD/atomsf.lib]

Dynamic array dimension VECREF_MAXPTS

[default 200000]

The dynamic array dimension will be automatically increased if it is not large enough. However, the program will run faster if it is set large enough initially (e.g. setenv VECREF_MAXPTS 1000000).

NOTES

Lattice type: The program takes account of non-primitive lattices by multiplying the atomic form-factor by the multiplicity factor. This means that the refined occupancies are true occupancies, not multiplied by the lattice multiplicity.
Map file: Great care must be taken in the preparation of the map file, particularly with non-primitive space groups such as C2, as the VECREF program may not be able to detect mistakes made. The most likely result of this is that you will finish up with incorrect occupancies, and hence biased protein phases. The following points should be carefully observed:
1. It is recommended that Kraut scaling of the derivative be used, with or without anomalous differences. In the latter case the FHSCAL program can be used. The scales and temperature factors given to FFT will then be 1 and 0 respectively. A bias factor of 1 is recommended, but this is not critical.
2. It is a recommended that a list of the largest isomorphous differences (or Fhle's) be checked for obvious outliers and these rejected before the Patterson is computed, otherwise these will strongly bias the occupancies. A convenient way of checking is to run a program which performs analyses of the scaling in terms of the hkl indices, such as SCALEIT or LOCAL, as large outliers usually stand out; the fractional rms difference in F^2 is a particularly sensitive check. A low resolution cutoff e.g. 20Å in FFT will often remove most of the outliers, though this should be done with caution.
3. The number of sampling intervals specified affects critically the run-time of the VECREF run, i.e. inversely proportional to the cube of the size of the interval. Normally use an interval about 1/4th of the resolution, i.e. Nx = 4.hmax etc. The Patterson must of course constitute at least an asymmetric unit in vector space, but it doesn't matter if it is more than an asymmetric unit.
4. Use the VF000 map scaling option; for V use the value computed by FHSCAL, or for Fhle Pattersons use the actual volume. F000 should be 0 as VECREF computes the value from the occupancies and scattering factors.
Group scattering factors: These are useful for complex ions like AuI4-, PtI6--, HgI4-- etc, where a substantial part of the scattering arises from atoms not located at the centre of the electron density. The spherically averaged combined scattering factor is used.
Thermal parameters: For low resolution (dmin > ~4.5) thermal parameters cannot be sensibly refined due to high correlations with the occupancies, so this should then be zero, otherwise about 5.
Usually when refining the thermal parameters, the R-factor (residual) goes up initially and then converges slowly (due to the high correlations), but not back as far as the level attained by refining occupancies and coordinates alone. This in itself should not necessarily be taken to mean that the thermal parameters are not meaningful. This increases the peak widths and hence the number of observations; this by itself will tend to push up the R-factor, but also the extra observations will be in the tails of the peaks where the signal to noise ratio is poor.
Patterson origin: The program should normally be instructed to ignore the origin peak for the purposes of least-squares unless you are confident there is no systematic discrepancy between the observed and calculated origin peak heights, as there usually is, probably because of random errors and/or missing sites.

PRINTER OUTPUT

The refined atomic parameters appear in a table under the following headings:

Atom  Parameter  Init  Old  Shift  Change  New  Esd

Atom: the atom or group symbols and identifiers.
Parameter: the names of the atomic parameters being refined.
Init: the values of the parameters at the start of the current run.
Old: the values of the parameters before the current cycle of refinement.
Shift: Under Shift appear, on the same line as the atom name, the net shifts in Angstrom for the current cycle, and, on the same line as the parameter name, the shifts for the current cycle in the same units as the corresponding parameter.
Change: Under Change appear, on the same line as the atom name, the net shifts in Angstrom from the initial values, and, on the same line as the parameter name, the shifts from the initial values in the same units as the corresponding parameter.
New: the new values of the parameters for the current cycle.
Esd: Under Esd appear, on the same line as the atom name, the estimated standard deviation of the atomic position in Angstrom, and, on the same line as the parameter name, the estimated standard deviations of the parameters in the same units as the corresponding parameter.

Atoms are not allowed to shift from their initial position by more than twice the expected convergence radius (approximately dmin*sqrt(2)); atoms for which the calculated shift would take them over this limit have their occupancy set to zero; the occupancy is then allowed to refine upwards only if the coordinate shifts move the atom back towards its initial position. Atoms with occupancy exactly zero are ignored on input; if this effect is not desired, alter the zero occupancy to some small number (e.g. 0.001).

I have not yet had sufficient experience with the program to know what constitute good R-factors and correlation coefficients, or which statistic to place the greatest reliance on, though the R-factor seems to be more discriminating. The best result I have had so far is R = 17.5%, C = 0.976, (excluding the Patterson origin, and not refining B-factors) for the test data distributed with the program. Note this is real data ! (Acknowledgements to Dr. Simon Phillips, University of Leeds.)

The list of vectors, which appears after the occupancy refinement cycles and again at the end of the run, has the Patterson map coordinates in the same units as the atomic coordinates if these were supplied as map grid coordinates, or in the same units as the map supplied if the atomic coordinates were supplied as fractions of a unit cell. The columns labelled Pobs, Pcalc, and <Pobs>, <Pcalc> are approximate peak heights and mean peak values respectively. They are intended only as a guide to the fit; in particular the Pcalc values do not include a contribution from any overlapping peaks. (These are not the data used in the least squares, where the fit is done on a grid point basis, not on a peak basis.)

ERRORS

One or more of the following messages occur when errors are discovered in the input control data; the program continues to process the data but stops when all data has been read:


   *** ERROR: Centring translation not integer.
   *** ERROR: Identity position not found.
   *** ERROR: Invalid number of atoms.
   *** ERROR: Atom symbol not in scat. fact. list.
   *** ERROR IN GETINP: Error(s) in input data.

The following conditions indicate that an array is not large enough for the problem and should be cured either by correcting the input data (in most cases by increasing the sampling interval of the Patterson), or by increasing the value assigned to the symbol specified in PARAMETER statements in the source code, taking care to modify all PARAMETER statements containing the symbol, and re-compiling and linking.


   *** ERROR: Array bound check (MSECT).
   *** ERROR: Array bound check (MRHO).
   *** ERROR IN ADDAST: Array bound check (MCF).
   *** ERROR IN PKLIST: Array bound check (MPL).
   *** ERROR IN GENPTS: Array bound check (MPT).

The following conditions are likely to occur if either the symmetry specified is wrong, or the Patterson map is less than an asymmetric unit:


   *** ERROR: Point not found.
   *** ERROR IN GENPTS: No points.

The following errors occur when the standard map handling routine detects an error; this is likely to indicate something seriously wrong with the map file, like data corruption:


   *** ERROR IN PKLIST: MGULP error.
   *** ERROR IN REFCYC: MGULP error.

PROGRAM DESCRIPTION

One important point often not clearly understood about this method of refinement is that the program does handle overlapping vectors correctly, provided first that the overlap is between known atoms and second that not all vectors arising from a pair of atoms overlap. This is because the calculated Patterson density is summed over all contributing atom pairs before being compared with the observed value. Vectors which overlap with those due to as yet unknown atoms will positively bias the occupancy, and will also affect the refined coordinates, but this is also true for the reciprocal-space method.

REFERENCES

Busing WR and Levy HA (1961) in Computing Methods and the Phase Problem in X-ray Crystal Analysis, Pergamon, Oxford.
Kraut J, Sieker LC, High DF, Freer ST (1962), Proc. Nat. Acad. Sci. USA, 48, 1417-1424.

EXAMPLES

Simple unix example script found in $CEXAM/unix/runnable/

vecref.exam (Use vecref to refine sites determined using RSPS)

(A vms version found in $CEXAM/vms/vecref.com)

AUTHOR

Originator: Ian Tickle, Birkbeck College, London

INPUT_DENSITY_FILE	MAPIN
OUTPUT_COORDINATE_FILE	ATOUT
Symmetry library	SYMOP
[default $CLIBD/symop.lib]
Form-factor library	ATOMSF
[default $CLIBD/atomsf.lib]
Dynamic array dimension	VECREF_MAXPTS
[default 200000]