**truncate hklin** *foo_in.mtz* **hklout** *foo_out.mtz*
[ **plot** *foo.plt* ]

[Keyworded input]

The standard use of the program is to read a file of averaged intensities (output from SCALA, SCALEPACK2MTZ or DTREK2MTZ) and write a file containing mean amplitudes and the original intensities. If anomalous data is present then F(+), F(-), with the anomalous difference, plus I(+) and I(-) are also written out. The amplitudes are put on an approximate absolute scale using the scale factor taken from a Wilson plot.

There are two ways in TRUNCATE to calculate the amplitudes from the intensities. The simplest is just to take the square root of the intensities, setting any negative ones to zero (keyword TRUNCATE NO). Alternatively, the "truncate" procedure (keyword TRUNCATE YES, the default) calculates a best estimate of F from I, sd(I), and the distribution of intensities in resolution shells (see below). This has the effect of forcing all negative observations to be positive, and inflating the weakest reflections (less than about 3 sd), because an observation significantly smaller than the average intensity is likely to be underestimated. See reference below.

This program can be used even if the "truncate" procedure is not desired, since it produces some useful statistics on intensity distributions. These can indicate problems with the data; for instance if the data is extremely anisotropic (see the FALLOFF keyword) or if it is likely to be twinned. See Cumulative distribution plot, which for a perfect twinning becomes sigmoidal, and the moments of I (or E or z) which are different for twinned data than for untwinned.

If the input specified on the LABIN line includes an assignment for F, then no output will be generated. It is most undesirable to TRUNCATE a set of data where the intensities have already been modified to generate amplitudes.

The general formula for expected moments <I^k> /<I>^k for untwinned acentric data is:

Table of moments: k-th moment is Gamma(k+1) = k! if k is an integer k-th moment = sqrt(PI) k! if k equals integer+0.5 ie the (2k+1)th moment of E = sqrt(PI) 2k * 2k-2 * ... *2 In general Gamma(k+1) = k Gamma(k) Acentric Centric Untwinned data Perfect twin. Untwinned data Perfect twin. <E> 0.866 0.94 0.798 ? <E^3> 1.339 1.175 1.596 ? <I^2> 2.0 1.5 3.0 ? <I^3> 6.0 3.0 15.0 ? <I^4> 24.0 7.5 105.0 ?

The scale factor estimated from the Wilson plot is applied to the data and allows the data to be put on a (very approximate) absolute scale. This at least gives amplitudes of a sensible magnitude for further calculations. The calculation relies on the number of residues/atoms given by the keywords NRESIDUE/CONTENTS being roughly correct. The program does not, however, apply any temperature factor.

The various data control lines are identified by keywords Only the first 4 letters of each keyword are necessary. Most keywords are optional.

ANOMALOUS,CELL,CONTENTS,FALLOFF,HEADER,HISTORY,LABIN,LABOUT,NRESIDUE,PLOT,RANGES,RESOLUTION,RSCALE,SCALE,SYMMETRY,TITLE,TRUNCATE,VPAT

In addition, the following optional keywords control the data harvesting functionality:

DNAME,NOHARVEST,PNAME,PRIVATE,RSIZE,USECWD

(default TITLE='From Truncate') [OPTIONAL INPUT]

Title to write to output reflection file

(Default <nrange>=60) [OPTIONAL INPUT]

<nrange> is the number of resolution bins over the resolution
range for the Wilson Plot. <range> is the width of the bins on 4sin**2
theta/lambda**2 and is an alterative to <nrange>. The resolution
range used for the Wilson Plot is taken from the input data file, or set
with the RESOLUTION keyword. A subset of these bins, covering a resolution
range defined with the RSCALE keyword, is used to estimate the scale and
B-factor.

The use of this card is discouraged, as the choice of the number of bins
is important. Too few bins and the scale and overall B will be less accurate.
Too few reflections per bin could mean a large scattering of points from
the straight line. This would mean a large uncertainity in the values for
scale and B.

If this card is omitted, the program divides the resolution range into
60 bins, then checks that there are not less than 40 reflections in any
one bin and if necessary reduces the number of bins until the above condition
is satisfied. In all cases, the program will stop if the width of ranges
is greater than 0.03 or the number of bins is greater than 60.

[OPTIONAL INPUT]

Resolution limits - either 4(sin theta/lambda)**2 or d in Angstroms
(either order). Reflections outside these limits will be excluded from
all analysis and omitted on output. Defaults are taken from the range of
data in the input file (*i.e.* all data included).

[OPTIONAL INPUT]

Resolution limits for scaling (either 4(sin theta/lambda)**2 or d). This option allows you to exclude low resolution reflections from the calculation of the scale and B factor. However, all points in the range defined by RESO are plotted on the Wilson plot. It is probably a good idea to include only high resolution data (beyond 3A, if you have any data there) in the Wilson plot. This is because the assumptions behind Wilson statistics are invalid for low resolution data. The default high resolution limit is the same as RESOLUTION. The low resolution limit is, by default, set to 4.0A if the high res. limit is greater than 3.5A.

[OPTIONAL INPUT]

The default is to apply a scale factor from the Wilson plot. If a scale factor is given here, then that is applied instead. This option is useful if relative scaling is already done in SCALA.

If amplitudes rather than intensities are specified on the LABIN line,
then the Wilson scale is *not* applied, and a default scale
of 100 is used.

[OPTIONAL INPUT]

The first argument of the FALLOFF keyword should be "YES" or "NO", followed optionally by subkeywords controlling the detailed behaviour. The default is "YES", which triggers an analysis of the anisotropy of the data according to the "falloff" procedure contributed by Yorgo Modis. This calculates the falloff of mean F and mean F/sigma values as a function of (sinth/lab)**2 in 3 orthogonal directions. An overall falloff of all reflections is also calculated. The 3 mutually perpendicular directions are:

DIRECTION 2 = B*-AXIS DIRECTION 3 = PERPENDICULAR TO A* AND B* DIRECTION 1 = PERPENDICULAR TO B* AND DIRECTION 3.

If either of the subkeywords PLTX or PLTY are specified, then an output plot file (PLOT) is produced, in which Direction 1 is plotted as a thick line, Direction 2 is plotted as a hollow line with boxes at regular intervals of resolution, and Direction 3 is plotted as a thin line. The resolution range and number of resolution bins used in the calculation can be set by the keywords RESOLUTION and RANGES respectively.

Subkeywords:

- CONE <cone>
- The falloff of mean F-values along each orthogonal direction is calculated
using reflections falling within a cone orientated along that direction.
<cone> is the angle the surface of the cone makes with the associated
direction. Reflections which are located at a greater angle than
<cone> from the closest direction will not be included in the falloff
calculations.

Default: 30.0 degrees. - PLTX | PLTY
- Produce an output plot file (PLOT) and orientate it horizontally
or vertically.

Default is horizontally (PLTX).

Specify input column lables. [OPTIONAL INPUT]

Truncate takes output from SCALA, SCALEPACK2MTZ or DTREK2MTZ which generate standard labels. This is the most common usage of the program, in which case LABIN records are not required. If F is assigned,there will be no reflections output. You must assing either IMEAN/SIGIMEAN or F/SIGF.

The program labels defined are: IMEAN SIGIMEAN I(+) SIGI(+) I(-) SIGI(-) F SIGF F(+) SIGF(+) F(-) SIGF(-).

- IMEAN
- Original average Structure Intensity
- SIGIMEAN
- Standard deviation of the above
- I(+)
- Structure Intensity of hkl
- SIGI(+)
- Standard deviation of the above
- I(-)
- Structure Intensity of -h -k -l
- SIGI(-)
- Standard deviation of the above
- F
- Original average Structure Amplitude
- SIGF
- Standard deviation of the above
- F(+)
- Structure Amplitude of hkl
- SIGF(+)
- Standard deviation of the above
- F(-)
- Structure Amplitude of -h -k -l
- SIGF(-)
- Standard deviation of the above

Specify output column labels. [OPTIONAL INPUT]

The labels allowed are F SIGF DANO SIGDANO F(+) SIGF(+) F(-) SIGF(-) IMEAN SIGIMEAN I(+) SIGI(+) I(-) SIGI(-) ISYM. The output labels will default to these unless they are changed by assigning a program label to a user label.

- F
- Structure Amplitude
- SIGF
- Standard deviation of the above
- DANO
- Anomalous difference
- SIGDANO
- Standard deviation of the above
- F(+)
- Structure Amplitude for hkl
- SIGF(+)
- Standard deviation of the above
- F(-)
- Structure Amplitude for -h -k -l
- SIGF(-)
- Standard deviation of the above
- IMEAN
- Original average Structure Intensity
- SIGIMEAN
- Standard deviation of the above
- I(+)
- Structure Intensity of hkl
- SIGI(+)
- Standard deviation of the above
- I(-)
- Structure Intensity of -h -k -l
- SIGI(-)
- Standard deviation of the above
- ISYM
- Symmetry number for F: normally=0 but 1 or 2 if the F column arises entirely from F+ or F- reflections respectively

If there is no anomalous data present then only the appropriate columns (F, SIGF, IMEAN and SIGIMEAN) are output. Values may be given in any order and as either Proglabel=Userlabel or Userlabel=Proglabel.

[ALTERNATIVE COMPULSORY INPUT]

followed by number of atoms in asymmetric unit, including hydrogens

A maximum of 20 atom (element) types is allowed, each followed by a
number, *e.g.*

CONTENTS H 746 C 454 N 115 O 139 S 12 ! Must include hydrogens

The average scattering power is calculated from a table of form factors. By default the file $CLIBD/atomsf.lib contains this table of form factors. You can change the table used by assigning 'ATOMSF' to your preferred file. [NOTE the program RWCONTENTS provides the information for this keyword; how many Carbons etc., from a PDB file. Also, it gives an estimate of the number of hydrogens there would be.]

[ALTERNATIVE COMPULSORY INPUT]

<Nres> is the number of residues expected in the asymmetric unit

A very approximate atom composition is calculated:

mean mass of an amino acid = 110 add on one ordered water per amino acid = ca. 128

This is then taken as 5 C + 1.35 N + 1.5 O + 8 H /residue as number of atoms in asymmetric unit.

[OPTIONAL INPUT]

volume per atom - default = 10

PLOT or PLOT ON produces extra ascii plots in the log output. The default is PLOT OFF.

[OPTIONAL INPUT]

[OPTIONAL INPUT]

Controls printout from reading file and batch headers

- file header printing:
- batch header printing:

[OPTIONAL INPUT]

History strings to be added to history records in output file

[OPTIONAL INPUT]

Controls whether anomalous differences are output. Defaults YES if anomalous information is present on input file, otherwise NO

[OPTIONAL INPUT]

If YES (default) the data will be truncated according to the procedure of French and Wilson. If NO the data are not truncated but the structure amplitudes are calculated simply by taking the square root of the intensities. Negative intensities are set to zero.

[OPTIONAL INPUT]

Default is to use symmetry in input HKLIN file. (Normally OMIT this command.)

[OPTIONAL INPUT]

The cell dimensions in Angstroms and degrees. The angles default to 90 degrees. If this key is omitted then the cell dimensions are taken from the input file (normally OMIT this command)

Provided a Project Name and a Dataset Name are specified (either explicitly or from the MTZ file) and provided the NOHARVEST keyword is not given, the program will automatically produce a data harvesting file. This file will be written to

`$HARVESTHOME`/`DepositFiles`/*<projectname>*/*<datasetname>.truncate*

The environment variable `$HARVESTHOME` defaults to the user's
home directory, but could be changed, for example, to a group project
directory.

Project Name. In most cases, this will be inherited from the MTZ file.

A dataset, as listed in the MTZ header, is specified by a project-name/dataset-name
pair. The project-name specifies a particular structure solution project,
while the dataset-name specifies a particular dataset contributing to the
structure solution. An entry in the PNAME keyword should therefore be
accompanied by a corresponding entry in the DNAME keyword.

Dataset Name. In most cases, this will be inherited from the MTZ file.

A dataset, as listed in the MTZ header, is specified by a project-name/dataset-name
pair. The project-name specifies a particular structure solution project,
while the dataset-name specifies a particular dataset contributing to the
structure solution. An entry in the DNAME keyword should therefore be
accompanied by a corresponding entry in the PNAME keyword.

Set the directory permissions to '700', *i.e.* read/write/execute for
the user only (default '755').

Write the deposit file to the current directory, rather than a
subdirectory of $HARVESTHOME. This can
be used to send deposit files from speculative runs to the local directory
rather than the official project directory, or can be used
when the program is being run on a machine without access to the directory
`$HARVESTHOME`.

Maximum width of a row in the deposit file (default 80). <row_length> should be between 80 and 132 characters.

Do not write out a deposit file; default is to do so provided Project and Dataset names are available.

**The input files are:**

Control data file.

- HKLIN - Input reflection data file in MTZ format.
This will contain one record per reflection with the following items:

SCALA (with or without anomalous) H K L IMEAN SIGIMEAN I(+) SIGI(+) I(-) SIGI(-) SCALEPACK2MTZ or DTREK2MTZ (with anomalous) H K L IMEAN SIGIMEAN I(+) SIGI(+) I(-) SIGI(-) Any set of amplitudes in mtz format for which you wish to analyse statistics. H K L F SIGF F(+) SIGF(+) F(-) SIGF(-)

where I+ and I- are used for the anomalous differences

**The output files are:**

- HKLOUT - Output reflection data file. Not generated if F assigned as input column.
The output file is a reflection data file in standard MTZ format (

*i.e.*one record/reflection) containing 7 or 18 items per reflection as follows (see above for labels used)H K L F SDF [DANOM SD(DANOM) F(+) SDF(+) F(-) SDF(-)] Imean SDImean [I(+) SDI(+) I(-) SDI(-) ISYM]

The Fs are multiplied by the Wilson scale and the Is are multiplied by the square of the Wilson scale / 100.0. - PLOT - plot file showing fall-off of mean F in 3 perpendicular
directions.
Produced if either of the subkeywords PLTX or PLTY of the FALLOFF keyword are specified. This can be viewed with XPLOT84DRIVER, or converted to postscript with PLTDEV.

The printer output starts with details of the control data and details of the input MTZ reflection data file. Analyses of the data against resolution are then given and include intensity distributions for comparison with Wilson's theoretical distributions. The following graphs are output (which can be viewed via XLOGGRAPH or LOGGRAPH):

- Wilson plot.
- Moments of the intensity for acentric and centric data. These plots may indicate twinning, see above.
- Cumulative intensity distribution for both acentric and centric data compared to theoretical predictions. If twinning is present, these curves become sigmasoidal (and will therefore be shifted to the right of the theoretical curves for some or all of the plot), although anisotropic diffraction can confuse this picture.
- Plots of mean amplitude (F) and mean amplitude/standard deviation (F/sd) calculated for the data in resolution shells, and plotted against resolution.
- Mean F, mean F/sd and number of reflections calculated for data lying in the region of 3 orthogonal directions d1, d2, d3 (see the FALLOFF keyword). Differences in the falloff rate in the 3 directions indicates anisotropy of the data.

The program TRUNCATE reads a reflection data file of averaged intensities
(SCALA, SCALEPACK2MTZ or DTREK2MTZ output) and outputs an MTZ reflection data file
containing F and DeltaFanom values. The input intensities are assumed to
follow a normal distribution with the standard deviations, *i.e.* negative
observations must have been preserved. The truncation procedure used was
devised by French and Wilson and is based on
Bayesian statistics. The F's are calculated using the prior knowledge of
Wilson's distributions for acentric or centric data (calculated in shells
of reciprocal space in a first pass through the data) and the mean intensity
and standard deviation values. The F's output are all positive and follow
Wilson's distribution. The truncation procedure has little effect on reflections
larger than 3 standard deviations but should give significantly better
values for the weak data than those obtained by merely taking the square
root of the intensities and setting negative intensities to zero. Reflections
of less than minus four standard deviations are rejected.

The following warnings should be heeded:

Do not truncate data more than once,

*e.g.*do not merge truncated data with untruncated data and then truncate again.The standard deviations are crucial to the procedure and the standard deviation analysis from SCALA should be checked.

The procedure should not be used on data which has been forced to be positive

*e.g.*from ordinate analysis measurements on a diffractometer or where negative observations have been set to zero.

The Wilson plot part of the program attempts to calculate an absolute scale and temperature factor for a set of observed intensities, using the theory of A C Wilson. This says that IF the atoms are randomly distributed through the asymmetric unit THEN

<f**2> should equal scale*<Fobs**2> * exp(-2B sin**2/lambda**2)

By fitting a least squares line through ln(<f**2>/<Fobs**2>) v 2sin**sq/lambda**2 the program derives the scale and B value.

For real structures the assumption that the atoms are randomly distributed is obviously incorrect. The effect of this is most obvious in the low resolution reflections. The Wilson plot will deviate from a straight line from about 3.0A - 4.0A downwards. Although all the points on the Wilson plot are plotted, the scale and B are only determined from a limited resolution range determined by the user (see keyword RSCALE).

There may be a problem in evaluating <Fobs**2> if all the weak data have been systematically omitted (this should NOT be the case for data measured in any proper manner: note that if this IS the case, the Truncate procedure will also fail). If this is the case then you need to use TRUNCATE NO. The program estimates the expected number of reflections in each resolution shell and then calculates <Fobs**2> by dividing by the number of predicted reflections.

scala, scalepack2mtz, dtrek2mtz, rwcontents, Data Harvesting

- French G.S. and Wilson K.S. Acta. Cryst. (1978), A34 , 517.

K.S. Wilson and S. French

"falloff" program contributed by Yorgo MODIS, European Molecular Biology Lab (original program: W.G.J. HOL/SINEKE BREEN (part of the Groningen BIOMOL package). Incorporated into TRUNCATE by Martyn Winn.