CCP4i: Graphical User Interface | |

MIR Tutorial Bath - Heavy atom refinement |

BACK TO INDEX |

The heavy-atom refinement programs in the CCP4 package (**MLPHARE**, which can
do phasing as well as refinement, and **VECREF**, which does refinement only) have
been in use now for about 5 years. These programs differ in several fundamental
ways from their predecessors, and in order to understand what the new programs
do that is different, it will be useful to review some recent history.

The method of refinement of heavy atoms in protein derivatives that was commonly
in use before 1991 was originally conceived in 1961 and remained basically
unchanged for 30 years, though in the 70's it was observed that the method had poor
convergence properties, particularly (as is common) when several derivatives had
some or all of their major sites in common. The basic idea was to calculate the **most
probable** value of the native phase for each reflection in turn based on one, or a
subset, of the derivatives, and then to use this **fixed** estimate of the phase to obtain
the calculated value of |*F*_{PH}|. The difference between this and the
measured |*F*_{PH}|
is the **lack of closure error**, and the sum of the squared error could be minimised in
a conventional least-squares refinement procedure. Initial estimates of the heavy-atom parameters could
in theory be adjusted to produce at convergence a set of
parameters that best fitted the measurements.

The method, which was implemented by the program PHARE (phase and refine),
worked reasonably well provided that the set of sites used to calculate phases was
not the same as the set whose heavy-atom parameters were being refined; in
particular it could not cope with the case where only one derivative is available
(**single isomorphous replacement** or **SIR**).

In order to get round these problems, an alternative method ("** F_{HLE}**")
that required the measurement of

Because the isomorphous difference is a good approximation to |*F*_{H}| for centric
reflections, these can be used in the initial stages of refinement; however this is not
a general solution because several space groups either have no (*e.g.* R3), or only one
(*e.g.* P2_{1}) centric zone.

The principle of **maximum likelihood** is that a **joint conditional probability density
function** is constructed, the value of which measures the likelihood that the particular
set of measurements that were actually obtained, would have been obtained given
any specified set of values for the unknown parameters. The optimal set of
parameters is that which maximises the likelihood of having made the actual set of
measurements. Usually the errors in the individual measurements are all
independent of each other, so the likelihood is just a product of individual probability
functions, whose algebraic form is based on informed guesswork about the
probability of making a measurement if its true value were known, and from its
known error estimate.

L= P_{hkl}P(|F_{P}|,|F_{PHj}| | (x_{i},y_{i},z_{i},B_{i},O_{i})_{j}, s^{2}(|D_{j}|)).

The likelihood *L* = conditional probability of having made the set of observations
|*F*_{P}|, |*F*_{PHj}| given values of the
heavy atom parameters
(*x*_{i},*y*_{i},*z*_{i},*B*_{i},*O*_{i})_{j},
s^{2}(|**D**_{j}|).

log(L) = S_{hkl}log(P(...|...))

The most likely set of parameters will be the one that maximises the log(likelihood).

The main drawback of the old methods of refinement was that the protein phase was either fixed or just ignored during refinement, leading either to bias in the parameters or to loss of information. The important breakthrough with the new method is that all possible values of the phase are considered during refinement, each value being weighted according to its probability of being correct.

The **MLPHARE** program, a direct descendant of PHARE, implements the likelihood
maximisation procedure, and adjusts the overall and individual heavy-atom
parameters of a set of derivatives simultaneously from initial estimates to optimum
values.

The principle of **vector-space refinement** is very simple: the Patterson is calculated
from the initial heavy-atom parameters, it is compared with the observed
isomorphous difference Patterson, and the parameters are adjusted to minimise the
sum of weighted squared differences between the calculated and observed Pattersons.

It can be shown that the isomorphous difference Patterson has the same peaks as the heavy-atom Patterson, but at half height, and with additional uncorrelated noise peaks. So, to reduce the effect of this noise, not all the grid points in the Patterson are used in the refinement, only those that fall within the peaks of the calculated Patterson.

The weight is the reciprocal variance of the Patterson density, which depends only
on the position in Patterson space (positions on or near symmetry elements in the
point group have higher variance than the average). The **VECREF** program
implements this **real-space** method in complete constrast with the **reciprocal-space**
method used by MLPHARE, and in fact by all other heavy-atom refinement programs
(as far as the author is aware).

The isomorphous difference Patterson of course contains the complete set of information embodied in the native and derivative measured intensities, and with the exception that the overall scale factor is assumed to be correct, does not incorporate any additional assumptions. In particular, and in contrast with the difference Fourier, the Patterson does not rely on phase information. Because any Fourier tends to be dominated by the phases, and not the amplitudes, it is very easily biassed by the sites used to calculate the phases.

It therefore seems logical to make the essential cross-check against the Patterson an integral part of the refinement. A bonus of this procedure is that wrong sites are much more easily discriminated by the Patterson than by the Fourier, and so instead of the cautious stepwise addition of new sites that is required when refining in reciprocal space, caution can be thrown to the winds, and many new trial sites can be added in one go.

It might be argued that several of the weak peaks in the difference Fourier may not be heavy-atom sites at all and may be due instead to imperfect isomorphism, but against this it should be remembered that the aim is to model all differences between the native and derivative structure. Provided there is no significant change in the unit cell dimensions it doesn't matter whether the differences are caused by heavy-atom substitution or by movement of atoms in the native structure, the effect is the same.

Although VECREF can be used for the refinement, it does not calculate phases, so MLPHARE is still used for this.

BACK TO INDEX |