CAD (CCP4: Supported Program)
NAMEcad - Collect and sort crystallographic reflection data from several files, to generate a single set.
cad hklin1 foo_in_1.mtz hklin2 foo_in_2.mtz
... hklini foo_in_i.mtz hklout foo.mtz
INPUT AND OUTPUT FILES
The input files are one or more (up to 9) reflection data files in MTZ format, assigned to HKLIN1, HKLIN2, ... HKLIN9.
The output file is a reflection data file in MTZ format.
Each non-unique column (other than H,K,L) in the input files will generate an output column of the same name with the number of the input file appended, so that column <name> from file <number> is output with the label <name><number>. Otherwise the labels of columns which only appear in a single input file are preserved.
Missing data items, i.e. empty column entries corresponding to reflections that occur in some input files but not in the input file contributing that particular column, are represented by Missing Number Flags (see VALM keyword). A particularly important example of this is the use of CAD to fill in missing data in a dataset with MNFs, thus completing the dataset. More details can be found in the unique documentation.
The various data control lines are identified by keywords, those available being:
CELL, CENTRIC ONLY, CTYPIN, END, HISTORY, LABIN(compulsory), LABOUT, MONITOR, OUTLIM, REFMONITOR, RESOLUTION, SCALE, SORT, SYMMETRY, SYSAB KEEP, TITLE, VALM
In addition, there are a few keywords for editing dataset information in the MTZ file header:
DCELL, DNAME, DWAVELENGTH, PNAME
LABIN FILE_NUMBER <i> [ ALL | <column assignment> ... ]
(Compulsory.) A line giving the names of the input data items to be selected from FILE_NUMBER <i> to be read from HKLIN<i>. Up to 29 columns can be specified for input from each HKLIN<i>. If you want to pick up all items from a file, AND there are less than 30 items excluding H K L, then you can specify
e.g.: LABI FILE_NUMBER 1 E1=F E2=SIGF E3=FC E4=PHIC ... E29=SIGFau (E<j> stands for ENTRY<j>.)
LABOUT <column assignment> ...
A line giving the new names for the data items which will be written to HKLOUT. Output labels can be changed if you wish, but the default is to keep the input label, unique-ified with the input file number if necessary (see above). E.g.:
This changes the first 2 labels and leaves all the rest the same.
CTYPIN FILE <i> <program label>=<type> ...
A line giving the names of the data types to be assigned to the entries selected for FILE <i> . The default is to leave the input datatypes unaltered.
The data types for the different types of data which can be present
in an MTZ file are as follows;
It is essential to have correct column types for PHASES and ANOMALOUS differences:
In addition two special data types are used to signal that you are preparing data for translation functions of various types. They are:
There must be only one FCpart PHICpart per input file, and they must be the last items specified for LABIN. CAD generates equivalent reflections using only the ROTATIONAL part of the primitive symmetry operator; (i.e., if the spacegroup is P212121 these reflections are analysed as though the spacegroup was P222) This is allowed for in the TFFC and RSEARCH programs. See their documentation.
For the above example their output labels would be
CELL <a> <b> <c> [ <alpha> <beta> <gamma> ]
Values given overrides all input cells from MTZ files. The default is to take cell information from HKLIN1.
Only output centric terms.
History strings to be added to mtz o/p file HKLOUT
MONITOR NONE | BRIEF | HIST | FULL
Printing MTZ file header information as:
OUTLIM [ SPACEGROUP <spacegroup> ] [ HKLLIM <hmin> <hmax> <kmin> <kmax> <lmin> <lmax> ]
Defines limits for the OUTPUT file. Use this for expanding data to cover more of reciprocal space. Subsidiary keywords:
The program prints lots of information about every <nmon>-th reflection (default 0).
RESOLUTION [ RESOLUTION OVERALL <dmin> <dmax> ] | [RESOLUTION FILE_NUMBER <i> <dmin> <dmax> ]
<dmax>, <dmin> are the resolution limits for the data to
be included, i.e. data are included for which
SCALE FILE_NUMBER <i> <scale> [ <temperature_factor> ]
Specifies <scale> (and optionally <temperature_factor>) to be applied to all items in FILE_NUMBER which are flagged as type F D Q (or G L for F+ F- alternatives), i.e. all items except intensities and PHASES.
The scale is applied as <scale> exp( -<temperature_factor> s**2)
(If <temperature_factor> is not explicitly supplied then it defaults to 0.0)
SORT <sort order>
Sort order for indices H K and L, e.g.
SORT H K L SORT L K H
This means that the first index will be the slowest, the second the intermediate, and the last the fastest varying, e.g. SORT H K L will have H slowest, L intermediate and K fastest. Note that SORT H K L is the default sort order (i.e. that used in the absence of the SORT keyword), so that SORT is only necessary when you require a sort order which is different from this default.
This can be used to change the symmetry operators in the output file. The default is to keep the symmetry of the first input file, HKLIN1.
Keep systematic absences in output file. (The default is to reject them.)
Title to be used in output log file and in output hkl file.
VALM <valml> [NOOUTPUT]
The Missing Number Flag (MNF) written to HKLOUT is set to <valml>, which can take the value NaN or be a real number. If this keyword is not set, then the value of the MNF is taken from the header of HKLIN1 or set to NaN if it is not present there. If NOOUTPUT is specified then reflections with all data items missing are not output to HKLOUT.
Dataset keywordsThe following keywords allow you to change the dataset headers in MTZ files. These are necessarily complicated to allow for all possibilities! The Graphical User Interface has an interface to these options called Edit MTZ Project & Dataset which is much more user friendly!
PNAME FILE_NUMBER <i> <program label> = <project name> ...A line assigning project names to the columns of the input data selected from FILE_NUMBER <i> to be read from HKLIN<i>. The program labels should be a subset of those assigned on LABIN. Ranges can be specified with the subkeyword TO, or all program labels can be selected with the subkeyword ALL. In addition, for the first file, the program label HKL can be used to change the project name of the 3 indices. Examples:
PNAME FILE_NUMBER 1 E5=toxd PNAME FILE_NUMBER 2 E2 TO E4=toxd PNAME FILE_NUMBER 3 E1=toxd E2 TO E4=rnase E5 TO E6=toxd PNAME FILE_NUMBER 4 ALL=toxd PNAME FILE_NUMBER 1 HKL=proj1 E4 TO E5=proj2This keyword can be used to assign a project name where there was previously none, or to replace an existing assignment.
A dataset, as listed in the MTZ header, is specified by a project-name/dataset-name pair. The project-name specifies a particular structure solution project, while the dataset-name specifies a particular dataset contributing to the structure solution. If either the PNAME keyword or the DNAME keyword or both are specified for a particular column, then the dataset assigned for that column will be changed (either to an existing dataset, or a new one). There should only be one PNAME card per file (use continuation lines if necessary).
DNAME FILE_NUMBER <i> <program label> = <dataset name> ...A line assigning dataset names to the columns of the input data selected from FILE_NUMBER <i> to be read from HKLIN<i>. The syntax is the same as for the PNAME keyword. This keyword can be used to assign a dataset name where there was previously none, or to replace an existing assignment.
A dataset, as listed in the MTZ header, is specified by a project-name/dataset-name pair. The project-name specifies a particular structure solution project, while the dataset-name specifies a particular dataset contributing to the structure solution. If either the PNAME keyword or the DNAME keyword or both are specified for a particular column, then the dataset assigned for that column will be changed (either to an existing dataset, or a new one). There should only be one DNAME card per file (use continuation lines if necessary).
DCELL FILE_NUMBER <i> [ <pname> <dname> | <dataset ID> ] <a> <b> <c> [ <alpha> <beta> <gamma> ]Keyword for changing cell information for specific datasets from FILE_NUMBER <i> read from HKLIN<i>. The dataset is identified either by a pname/dname pair, or by the dataset number. The latter is the number listed by MTZDUMP when run on HKLIN<i>. Note that this number may be different in HKLOUT. If you want to change the cell information for several datasets, then this keyword can be included more than once.
DWAVELENGTH FILE_NUMBER <i> [ <pname> <dname> | <dataset ID> ] <wavelength>Keyword for adding/changing wavelength information for specific datasets from FILE_NUMBER <i> read from HKLIN<i>. The dataset is identified either by a pname/dname pair, or by the dataset number. The latter is the number listed by MTZDUMP when run on HKLIN<i>. Note that this number may be different in HKLOUT. If you want to change the wavelength information for several datasets, then this keyword can be included more than once.
The printer output first gives details taken from the input control data.
Then, for each input reflection data file, the information in the MTZ header, according to the requested level of monitoring. The labels are checked for consistency with those in the file, and the list of output labels is prepared.
The reflection data for each file is read and a summary table of the data is output .
The total number of reflection records in the output file is printed, followed by a summary of HKLOUT.
Simple unix example scripts found in $CEXAM/unix/runnable/
(A vms version found in $CEXAM/vms/cad.com)
Also found combined with other programs in the example scripts ($CEXAM/unix/runnable/)
....and non runnable examples in $CEXAM/unix/non-runnable/
mtzutils, rsearch, tffc, unique.
Eleanor Dodson, York University