WATNCS (CCP4: Supported Program)

NAME

watncs - Pick waters which follow NCS and sort out to NCS asymmetric unit

SYNOPSIS

watncs
[Keyworded input]

DESCRIPTION

Author:
Guoguang Lu Department of Molecular Biophysics Lund University Box 124, 221 00, Lund, Sweden
E-mail:
Guoguang.Lu@mbfys.lu.se
Contents:
  1. Introduction
  2. Example command File
  3. Keyworded Input
  4. More Hints

Introduction

In protein crystallographic refinement, it is quite important to avoid false water molecules introducing noise in the electron density map. In the case that non-crystallographic symmetry exists in the crystal, many water molecules which bind to the protein should also follow NCS as their host proteins. The WATNCS program can pick those which follow operations from all the water candidates calculated from difference fourier maps. It can also sort them into each NCS asymmetric unit, in order to introduce NCS restraint in the crystallographic refinement. In the first few cycles of adding water, such a procedure is powerful to draw the refinement to the right direction.

WATNCS reads in coordinates file (in PDB format) and NCS operations. The groups of atoms satisfying all NCS operations will be written out by the program into a file with identical residue numbers but different chain names so that some refinement programs such as XPLOR or REFMAC can recoginize them as NCS equivalent atoms. The program can also write out atoms which only partially satisfy NCS operations (for example satisfy 2 of 3 NCSs).

Example Command file

UNIX example script found in $CEXAM/unix/runnable/

Another example

#
rm fort.1*
ln -s $SCR/junk1.pdb fort.11
ln -s $SCR/junk2.pdb fort.12
ln -s $SCR/junk3.pdb fort.13
ln -s $SCR/junk4.pdb fort.14
watncs << 'end'
pdb wat1wchk.pdb
out wat1wncs.pdb
mol refm2w.pdb
RELATE 
 -0.9999970     -4.2974250E-04 -2.4399513E-03 
 -1.4226431E-03 -0.7066850      0.7075269    
 -2.0283312E-03  0.7075282      0.7066822    
  0.4821968      9.3696594E-02 -0.1603031 
RELATE 
  0.9999976      2.0712232E-03  7.7215023E-04 
  2.0715299E-03 -0.9999978     -3.9639443E-04 
  7.7132753E-04  3.9799302E-04 -0.9999996    
  0.3441153     -1.6441345E-03 -0.5252590    
RELATE 
 -0.9999971     -2.3997077E-03 -2.4492259E-04 
 -1.5290348E-03  0.7091355     -0.7050705    
  1.8656463E-03 -0.7050681     -0.7091371    
  0.5303268     -0.2367973     -0.4432688    
error 0.6
least 2
!CHAIN W U V X Y Z S T
group W A
group U B
group V E
group X F
number 61
num1 127
atom O1
residue HOH
occu 1.0
temp 30.
'end'

Keyworded Input

General: Each line which starts with "!" or "#" will be ignored. Available keywords are:

ATOM, CHAIN, ERROR, GROUP, LEAST, MOL, NUM2, NUMBER, OCCU, OUT, PDB, RELATE, RESIDUE, TEMP.

PDB <filname>

Input file name of the water molecules

OUT <filenam>

Out file name of the water molecules

MOL <filename> (alternative)

Input filename of protein molecule. This is used in the case you want to sort the chain name of water molecule according the chain name of protein molecule. If you have a 2-fold rotation NCS (or 222 fold symmetry), the protein file is also helpful to put water molecule into the "right assymetric unit". See GROUP. Note: the file must NOT contain any water molecule.

RELATE 3*4 matrix - a11,a21,a31,a12,a22,a32,a12,a23,a33,tx,ty,tz

     (a11 a12 a12)      (tx)
X2 = (a21 a22 a23)*X1 + (ty)
     (a31 a32 a33)      (tz)
The NCS rotation and translation (the maxtrix must start from another line). The translation should be in orthogonal coordinates (assuming the PDB file is). If you have coordinates of protein molecules, the matrix can be obtained from the program FIT or O. The command can be repeated. If you use the FIT program, there is an output file MATRIX which conatins this matrix. If you use the O program, you can use lsq_exp to obtain the matrix to some like lsq_rt_atob. Then you write to a file by command "write .lsq_rt_atob filename (3f10.6)". You can get the matrix in the file.

ERROR <error>

Allowed error range of NCS related water molecules. I recommend 2-3 times of RMS value of protein superimpostion or 1/3 or data resoution.

LEAST <least>

Miniumium number of NCS operations which selected waters much follow. In the example file, there is a NCS of 222 symmetry. That means there are 3 NCS operations. If 4 water molecule which follow this 222 symmetry, the program will think all the 3 NCS operations are satisfied, so it will write out these 4 atoms with an identical residue number but chain names (W,U,V and X in this case). If 1 of the atom is missing, the program will think 2 operations are satisfied. In the example file least=2, the program will still write them out, but with an other chain ID (Y) and not identical residue numbers. If 2 of the 4 atoms are missing, the program only think 1 NCS operation is satisfied and it is smaller than the least requriment 2 here. So they will not be writen to the output file. If you need them output two, you have to change the command into "LEAST 1". Default same as NCS operation number.

CHAIN <chain1> <chain2> ....

Chain name for output waters. If you have 3 NCS operations, waters with the first 4 chain name should be able to apply NCS restraint. In fact, it is possible to apply NCS restraint to all the water molecules output from this program by looking at the log file and write command in the refinement program carefully although I believe most people only have patient to use the restraint recommended by the program.

GROUP <Chain_water> <Chain_protein>

In the case you have 2-fold symmetry, the program might put water molecule in a "wrong" NCS assymetric unit (giving a wrong chain name). It is not a problem at all for NCS restraint. However if you want to sort the water molecule in to right protein, you have to input the protein coordinates and tell the program which water chain name corresponds to which protein chain name. In the example file, output waters with chain W will belong to protein A...... See also MOL.

NUMBER <number>

For those water molecules for NCS restraint, the output residue number will start after this number.

NUM2 <num2>

For those water molecules not for NCS restraint, the output residue number will start after this number.

ATOM <atomname>

If this command is present, the water atoms will be output with the given atom name instead of the name from input file.

RESIDUE <HOH>

If this command is present, the water atoms will be output with the given residue name instead of the name from input file.

OCCU <occupancy_value>

If this command is present, the water atoms will be output with the given occupancy value instead of the value from input file.

TEMP <Bfactor>

If this command is present, the water atoms will be output with the given B factor instead of the value from input file.

More Hints

Automatically adding waters to models with NCS
The author recommends adding waters after all or most of protein atoms have been defined. The one can do the following steps.

  1. Calculate a Fo-Fc maps
  2. Search peaks from the map with peak searching program (such as PEAKMAX in CCP4)
  3. Run a stereo chemistry filtering program to select waters which have good interactions with protein (or other waters). I use PEAKCHECK made by Janet Smith. (I think watpeak in CCP4 can do the similar job).
  4. Use watncs to get rid of the molecules which does not fit NCS. Usually in the first cycle, I only use waters which satisfy all the NCS. In the 2nd or 3rd cycle, I allow waters which fit part of NCSs. There is almost no opportunity that false peaks which fit both stereo chemistry and NCSs, so the user almost do not have to check these new added waters interactively on graphics
  5. For those waters which fit all the NCS, introduce NCS restraint in the refinement program.
  6. Repeat 1 to 6 until R-free converges. Those old waters which do not have all NCS related mates (Y chain waters in the example files) must be input together with new waters, otherwise some "good waters" which appear from a improved electron density map and NCS-related with old waters might be missed.

There are always real water molecules which do not follow NCSs because of packing or other reasons. One have to run the above precedure without WATNCS at least once and check these water molecules interactively. However this should be in the last few cycles.

The automatic procedure can not prevent adding waters to sites where protein atoms should occupy (when protein atoms are mis-placed to some other sites). However, statistically most water molecules are correct and the procedure can significantly improve the map, the problem can be show up automatically by listing the atoms with negative values in Fo-Fc map using the DIFLIST program.

It can not be prevented that some compounds such as citrate is identified as waters. In this case, I think users can find out the problem by only checking the protein atoms but not checking all the waters.

The advantage of the above automatic proccedure is to make sure all the water molecules in the first few cycles are real. The map is improved by real molecules and NCS restraint. Later water adding would be based on better maps.