CAD (CCP4: Supported Program)

NAME

cad - Collect and sort crystallographic reflection data from several files, to generate a single set.

SYNOPSIS

cad hklin1 foo_in_1.mtz hklin2 foo_in_2.mtz ... hklini foo_in_i.mtz hklout foo.mtz
[Keyworded input]

DESCRIPTION

Uses:

Combine and sort reflection data from up to 9 input reflection data files into a single output data file, with various possible operations being performed on the input data items. For example, you can specify a new cell or spacegroup, change column names and/or types, etc. Data can be converted from one area of reciprocal space to another, converting phases, Hendrickson-Lattmann coefficients (providing all 4 are present) and anomalous differences appropriately.
Unless otherwise instructed, the program places output data in the CCP4 asymmetric unit (which sometimes differs from that in the International Tables), and sorts it to a standard order. This is an important step when importing data from other packages. It is thus a good idea to run data through CAD after converting it to MTZ format with f2mtz.
Extend reflection data to cover more of reciprocal space. For example it is convenient to extend Cubic data to include hkl klh and lhk for many purposes. Or you may want to run refinement calculations in spacegroup P1.
Prepare data for translation functions of various types, e.g. tffc or rsearch.

INPUT AND OUTPUT FILES

The input files are one or more (up to 9) reflection data files in MTZ format, assigned to HKLIN1, HKLIN2, ... HKLIN9.

The output file is a reflection data file in MTZ format.

Each non-unique column (other than H,K,L) in the input files will generate an output column of the same name with the number of the input file appended, so that column <name> from file <number> is output with the label <name><number>. Otherwise the labels of columns which only appear in a single input file are preserved.

Missing data items, i.e. empty column entries corresponding to reflections that occur in some input files but not in the input file contributing that particular column, are represented by Missing Number Flags (see VALM keyword). A particularly important example of this is the use of CAD to fill in missing data in a dataset with MNFs, thus completing the dataset. More details can be found in the unique documentation.

KEYWORDED INPUT

The various data control lines are identified by keywords, those available being:

CELL, CENTRIC ONLY, CTYPIN, END, HISTORY, LABIN(compulsory), LABOUT, MONITOR, OUTLIM, REFMONITOR, RESOLUTION, SCALE, SORT, SYMMETRY, SYSAB KEEP, TITLE, VALM

In addition, there are a few keywords for editing dataset information in the MTZ file header:

DCELL, DNAME, DWAVELENGTH, PNAME

General Keywords

LABIN FILE_NUMBER [ ALL | <column assignment> ... ]

(Compulsory.) A line giving the names of the input data items to be selected from FILE_NUMBER to be read from HKLIN. Up to 29 columns can be specified for input from each HKLIN. If you want to pick up all items from a file, AND there are less than 30 items excluding H K L, then you can specify

LABIN FILE_NUMBER ALL

e.g.: LABI FILE_NUMBER 1 E1=F E2=SIGF E3=FC E4=PHIC ... E29=SIGFau (E<j> stands for ENTRY<j>.)

LABOUT <column assignment> ...

A line giving the new names for the data items which will be written to HKLOUT. Output labels can be changed if you wish, but the default is to keep the input label, unique-ified with the input file number if necessary (see above). E.g.:

LABO FILE_NUMBER 1 E1=Fnat1 E2=SIGFnat1

This changes the first 2 labels and leaves all the rest the same.

CTYPIN FILE <program label>=<type> ...

A line giving the names of the data types to be assigned to the entries selected for FILE . The default is to leave the input datatypes unaltered.

The data types for the different types of data which can be present in an MTZ file are as follows;
H F J D G K Q L M P W A B Y I R [ U V ]

H: index h,k,l
F: structure amplitude, F
J: intensity
D: anomalous difference
G: member of Friedel pair, F+ or F-
K: member of Friedel pair, I+ or I-
Q: standard deviation of J,F,D or other
L: standard deviation of F+ or F-
M: standard deviation of I+ or I-
P: phase angle in degrees
W: weight (of some sort)
A: phase probability coefficients (Hendrickson/Lattmann)
B: BATCH number
Y: M/ISYM, packed partial/reject flag and symmetry number
I: any other integer
R: any other real

It is essential to have correct column types for PHASES and ANOMALOUS differences:

to distinguish phases which will require changing if the reflection is moved to a symmetry equivalent;
anomalous differences which require changing sign if the reflection is changed to a Friedel pair.

In addition two special data types are used to signal that you are preparing data for translation functions of various types. They are:

U: partial FC
V: partial PHIC

There must be only one FCpart PHICpart per input file, and they must be the last items specified for LABIN. CAD generates equivalent reflections using only the ROTATIONAL part of the primitive symmetry operator; (i.e., if the spacegroup is P212121 these reflections are analysed as though the spacegroup was P222) This is allowed for in the TFFC and RSEARCH programs. See their documentation.

For the above example their output labels would be
FC1 PHIC1 FC2 PHIC2 ... FCnsymp PHICnsymp
where nsymp is the number of primitive symmetry operators.
See example.

CELL <a> <c> [ <alpha> <beta> <gamma> ]

Values given overrides all input cells from MTZ files. The default is to take cell information from HKLIN1.

CENTRIC_ONLY

Only output centric terms.

HISTORY <string>

History strings to be added to mtz o/p file HKLOUT

MONITOR NONE | BRIEF | HIST | FULL

Printing MTZ file header information as:

NONE

(default) no header information output

BRIEF

brief header output

HIST

brief + mtz history

FULL

full header output

OUTLIM [ SPACEGROUP <spacegroup> ] [ HKLLIM <hmin> <hmax> <kmin> <kmax> <lmin> <lmax> ]

Defines limits for the OUTPUT file. Use this for expanding data to cover more of reciprocal space. Subsidiary keywords:

SPACEGROUP <name or number of spacegroup>: this is used to choose a Laue code defined for the appropriate point group. The name (or number) corresponds to the spacegroup whose limits are used. NB : This does NOT alter the symmetry operators stored in the mtz file. In the unlikely event of wanting to change these, use the key word SYMM.
HKLLIM <hmin> <hmax> <kmin> <kmax> <lmin> <lmax>: used to set your own choice of hkl limits. It is better to use the spacegroup to choose a Laue group. Using HKLLIM often duplicates reflections with a zero index.

REFMONITOR <nmon>

The program prints lots of information about every <nmon>-th reflection (default 0).

RESOLUTION [ RESOLUTION OVERALL <dmin> <dmax> ] | [RESOLUTION FILE_NUMBER <dmin> <dmax> ]

Use either:

RESOLUTION OVERALL <dmin> <dmax>: for overall resolution limits, or:
RESOLUTION FILE_NUMBER <dmin> <dmax>: to set input limit for FILE_NUMBER .

<dmax>, <dmin> are the resolution limits for the data to be included, i.e. data are included for which
(1/<dmax>)**2 >= 4 sin**2theta/lambda**2 >=(1/<dmin>)**2
NOTE: Defaults are 0.1 and 1000.0 Angstrom.

SCALE FILE_NUMBER <scale> [ <temperature_factor> ]

Specifies <scale> (and optionally <temperature_factor>) to be applied to all items in FILE_NUMBER which are flagged as type F D Q (or G L for F+ F- alternatives), i.e. all items except intensities and PHASES.

The scale is applied as <scale> exp( -<temperature_factor> s**2)

(If <temperature_factor> is not explicitly supplied then it defaults to 0.0)

SORT <sort order>

Sort order for indices H K and L, e.g.


   SORT H K L
   SORT L K H

This means that the first index will be the slowest, the second the intermediate, and the last the fastest varying, e.g. SORT H K L will have H slowest, L intermediate and K fastest. Note that SORT H K L is the default sort order (i.e. that used in the absence of the SORT keyword), so that SORT is only necessary when you require a sort order which is different from this default.

SYMMETRY <spacegroup>

This can be used to change the symmetry operators in the output file. The default is to keep the symmetry of the first input file, HKLIN1.

SYSAB_KEEP

Keep systematic absences in output file. (The default is to reject them.)

TITLE <title>

Title to be used in output log file and in output hkl file.

VALM <valml> [NOOUTPUT]

The Missing Number Flag (MNF) written to HKLOUT is set to <valml>, which can take the value NaN or be a real number. If this keyword is not set, then the value of the MNF is taken from the header of HKLIN1 or set to NaN if it is not present there. If NOOUTPUT is specified then reflections with all data items missing are not output to HKLOUT.

END

Terminate input.

Dataset keywords

The following keywords allow you to change the dataset headers in MTZ files. These are necessarily complicated to allow for all possibilities! The Graphical User Interface has an interface to these options called Edit MTZ Project & Dataset which is much more user friendly!

PNAME FILE_NUMBER <program label> = <project name> ...

A line assigning project names to the columns of the input data selected from FILE_NUMBER to be read from HKLIN. The program labels should be a subset of those assigned on LABIN. Ranges can be specified with the subkeyword TO, or all program labels can be selected with the subkeyword ALL. In addition, for the first file, the program label HKL can be used to change the project name of the 3 indices. Examples:


PNAME FILE_NUMBER 1 E5=toxd
PNAME FILE_NUMBER 2 E2 TO E4=toxd
PNAME FILE_NUMBER 3 E1=toxd E2 TO E4=rnase E5 TO E6=toxd
PNAME FILE_NUMBER 4 ALL=toxd
PNAME FILE_NUMBER 1 HKL=proj1 E4 TO E5=proj2

This keyword can be used to assign a project name where there was previously none, or to replace an existing assignment.

A dataset, as listed in the MTZ header, is specified by a project-name/dataset-name pair. The project-name specifies a particular structure solution project, while the dataset-name specifies a particular dataset contributing to the structure solution. If either the PNAME keyword or the DNAME keyword or both are specified for a particular column, then the dataset assigned for that column will be changed (either to an existing dataset, or a new one). There should only be one PNAME card per file (use continuation lines if necessary).

DNAME FILE_NUMBER <program label> = <dataset name> ...

A line assigning dataset names to the columns of the input data selected from FILE_NUMBER to be read from HKLIN. The syntax is the same as for the PNAME keyword. This keyword can be used to assign a dataset name where there was previously none, or to replace an existing assignment.

A dataset, as listed in the MTZ header, is specified by a project-name/dataset-name pair. The project-name specifies a particular structure solution project, while the dataset-name specifies a particular dataset contributing to the structure solution. If either the PNAME keyword or the DNAME keyword or both are specified for a particular column, then the dataset assigned for that column will be changed (either to an existing dataset, or a new one). There should only be one DNAME card per file (use continuation lines if necessary).

DCELL FILE_NUMBER [ <pname> <dname> | <dataset ID> ] <a> <c> [ <alpha> <beta> <gamma> ]

Keyword for changing cell information for specific datasets from FILE_NUMBER read from HKLIN. The dataset is identified either by a pname/dname pair, or by the dataset number. The latter is the number listed by MTZDUMP when run on HKLIN. Note that this number may be different in HKLOUT. If you want to change the cell information for several datasets, then this keyword can be included more than once.

DWAVELENGTH FILE_NUMBER [ <pname> <dname> | <dataset ID> ] <wavelength>

Keyword for adding/changing wavelength information for specific datasets from FILE_NUMBER read from HKLIN. The dataset is identified either by a pname/dname pair, or by the dataset number. The latter is the number listed by MTZDUMP when run on HKLIN. Note that this number may be different in HKLOUT. If you want to change the wavelength information for several datasets, then this keyword can be included more than once.

PRINTER OUTPUT

The printer output first gives details taken from the input control data.

Then, for each input reflection data file, the information in the MTZ header, according to the requested level of monitoring. The labels are checked for consistency with those in the file, and the list of output labels is prepared.

The reflection data for each file is read and a summary table of the data is output .

The total number of reflection records in the output file is printed, followed by a summary of HKLOUT.

EXAMPLES

Simple unix example scripts found in $CEXAM/unix/runnable/

cad.exam (Example of combining several files and example of data being extended to P1).

(A vms version found in $CEXAM/vms/cad.com)

cad_rnase.exam (Example of adding project- and dataset-information to an mtz file).

Also found combined with other programs in the example scripts ($CEXAM/unix/runnable/)

tffc_procedure (Combining two files prior to running tffc).

RF-with-Es (Use in Rotation Funcion using Es procedure).

scalepack2mtz.exam (Use in getting scalepack data into CCP4).

phased_translation_calc (Example of extending phased MTZ file from P212121 to P1).

....and non runnable examples in $CEXAM/unix/non-runnable/

cad_then_mtzutils.exam (Example of how to save time using both cad and mtzutils).

cad_raxis.exam (f2mtz+cad on Raxis data).

mlphare_heavyatoms.exam (Extend isomorphously phased file from P212121 to P1).

AUTHOR

Eleanor Dodson, York University