PROTIN (CCP4: Unsupported Program)

NAME

protin - prepare restraints file for old refmac (obsolete).

SYNOPSIS

protin xyzin foo_cycle_i.pdb dictprotn protin.dic protcounts foo_cycle_i.counts protout foo_cycle_i.protout
[Keyworded input]

CONTENTS

  1. Program description
  2. Keyworded input
  3. Examples
  4. Input and output files
  5. Notes
  6. Program output
  7. Error messages
  8. References
  9. Program function
  10. Output files PROTOUT and PROTCOUNTS
  11. Format of restraint dictionaries e.g. protin.dic

DESCRIPTION

PROTIN is run before the restrained maximum likelihood refinement program REFMAC to prepare an input file which contains the ideal and observed atomic parameters and details of the restraints to be applied. The program uses a dictionary of ideal protein geometry. You have a choice of three dictionaries: the default is based on John Priestle's dictionary and with added co-factors, John's original and a dictionary from Victor Lamzin. See DICTPROTN below.

There are some examples of input for different cases given below.

KEYWORDED INPUT

The various data control lines are identified by the following keywords:

CHNNAM, CHNTYP, CONTACTS, DISULPHIDE, HATOM, LIST, NONX, PEPP, SPECIAL, SYMMETRY, TITLE, VDWCUT, VDWRADII, END.

DESCRIPTION OF KEYWORDS

Compulsory keywords are: CHNNAM CHNTYP

CHNNAM ID <chainid> CHNTYP <ichtyp> [ROFFSET <nroffset>]

This keyword must be given once for each chain in the input coordinate file.
For example: CHNNAM ID A CHNTYP 1 ROFFSET 200

Subsidiary keywords:

ID <chainid>
This is the single character chain identifier which is present in the input coordinate file. If you have only one chain, this can default to a space. If you have two or more, each chain must have an ID.
CHNTYP <ichtyp>
This sets a flag for the chain type. Details of the chain type are given by the CHNTYP keyword. If there is non crystallographic symmetry you may have several chains with the same chain type, but different chain IDs. In this case, you would have a CHNNAM keyword for each chain, but only one CHNTYP keyword, thus avoiding repetition of most information.
ROFFSET <nroffset>
This is the base residue number for this chain. If your chain is numbered from 101 you will need to set ROFFSET = 100. (Default 0)
WARNING: if you specify ROFFSET then in all appropriate keywords the residue number MUST be given as the relative residue number for that chain. Thus if you have this line
CMNNAM ID B CHNTYP 2 ROFFSET 100
then B101 should be referred to as B1.

CHNTYP <ichtyp> <type information>...

<ichtyp> is the chain type code (1 to NCHTYP). This must match the information on the CHNNAM input.

<type information>... may be one or more of:

      
      NTERMinal <nres0> <resnam> <ntertyp>
      CTERMinal <creso> <resnam> <ctertyp>
      WATer | SOLvent | HOH | NON-protein
      DISULphide <ndss> <ires1> <jres1>  ... <ires_ndss> <jres_ndss>
      CISPeptide <ncis> <ires1> ... <ires_ncis>
      MULPLN     <nmlp> <ires1> ... <ires_nmlp>
      CARBOhydrate <ncarb> <ires1> ... <ires_ncarb>
      SECOndary  <ires1> <ires2> <kode>
      SPECial    RESN <ires1> <ires2> |  RESN <ires1> TO <ires2> 
                 ATNAMe <atid1> <atid2> 
                 DIST <dist> <ikwt> <ibwt> SUGA <angle>

One or more CHNTYP keywords can be entered for any chain type. Examples:

CHNTYP 1 NTERM 1 GLY 3     CTERM 21 ALA 2     MULPLN 1 143 CISPRO 1 17 
CHNTYP 1 DISUL 1 6 11     SECO  246 256 1
CHNTYP 1 SPEC RESN 202 107 ATNA CA NE2 DIST 2.1 5 1 SUG 120. 0 0
CHNTYP 1 NTER 1 GLY 3       CTER 21 ASN 2   DISU 1 6 11
CHNTYP 2 WAT

Subsidiary keywords to CHNTYP:

NTERM <nres0> <resnam> <ntertyp>
<nres0> is the sequence number of the N-terminal residue of this chain.
<resnam> is the amino acid name of N-terminal group
<ntertyp> is the type of N-terminal group as specified in the standard dictionary (e.g. 3 for an N amino terminus). (See Note 1)
CTERM <cres0> <resnam> <ctertyp>
<cres0> is the sequence number of the C-terminal residue of this chain.
<resnam> is the amino acid name of C-terminal group.
<ctertyp> is the type of C-terminal group (e.g. 2 for a COO terminus). (See Note 1).
WAT or NON-PROTEIN or SOL or HOH
This indicates that this chain is non-protein; i.e. a solvent chain. The program will not try to link residues in such a chain. Non-protein CHNTYPs should follow those for proteins.
DISU <ndss> <ires1> <jres1>...<ires_ndss> <jres_ndss>
(Use primary keyword DISU described below to specify disulphides between different chains.)
<ndss>: Number of disulphide bridges for this chain type. (max MXSS given at top of output, currently 20)
<iresi> <jresi> are the linked residue numbers. (The disulphide bridges should only be specified once for each <ires1> <jresi>, not for the reverse <jresi> <iresi>)
CISPEP <ncis> <ires1> ... <ires_ncis>
<ncis> is the number of residues with cis peptide bonds (max.=5). (Usually PROLINES.) Remember the main chain atoms are N and CA of this residue, plus C, O and CA of the residue before. (Maximum number set to 5)
<ires1> ... <ires_ncis> are the residue numbers of these peptides. (The list may also be used to specify D amino acids. In this case give <iresi> = - the residue number.)
MULPLN <nmlp> <ires1> ... <ires_nmlp>
<nmlp> is the number of special multiplanar groups. (Max MXMP given at top of output, currently 25.) A multiplanar group is any substrate which contains more than one plane; eg.
HEM(Haem) NAD CLM INT ACO ....
<ires1> <ires_nmlp> are the residue numbers of these special multiplanar groups.
CARBO <ncar> <ires1> ... <ires_ncar>
<ncar> is the number of glycosylated residues. (Max MXNCAR given at top of output, currently 6.)
<ires1> ... <ires_ncar> are the residue numbers of these residues.
SECO <ires1> <ires2> <kode>
Secondary structure restraints. Only one restraint per SECO keyword is allowed, and there is currently a maximum of 50 restraints per chain type.
<ires1> is the number of the initial residue for this stretch of structure.
<ires2> is the number of the final residue for this stretch of structure.
<kode> is a code identifying the type of structure. <kode>=-1 means restrain Phi, Psi to values of the initial structure. Otherwise <kode>=n means restrain Phi, Psi to values specified in the ideal dictionary for secondary structures type n:

 = 1   ALPHA HELIX (3.6/13)      -64.0   -40.0 
 = 2   3-10 HELIX (3.0/10)       -75.5    -4.5
 = 3   PI HELIX (4.4/16)         -57.1   -69.7 
 = 4   COLLAGEN-TYPE HELIX       -64.0   145.0 
 = 5   PARALLEL BETA SHEET      -119.0   113.0 
 = 6   ANTI-PARALLEL BETA       -139.0   135.0
 = 7   CLASSIC BETA-BULGE 1      -95.0   -65.0
 = 8   CLASSIC BETA-BULGE 2     -130.0   150.0
 = 9   BETA-BEND I (1-4)  2      -70.0   -30.0
 = 10  BETA-BEND I (1-4)  3      -90.0    10.0
 = 11  BETA-BEND II (1-4) 2      -60.0   130.0
 = 12  BETA-BEND II (1-4) 3       80.0     0.0
 = 13  GAMMA-TURN (1-3) 1        172.0   128.0
 = 14  GAMMA-TURN (1-3) 2         68.0   -61.0
 = 15  GAMMA-TURN (1-3) 3       -131.0   162.0

All Phi and Psi angles not set by these cards are unrestrained.
SPECial RESN <ires1> <ires2> ATNAMe <atid1> <atid2>
SPECial DIST <dist> <ikwt> <ibwt> SUGA <angle>
Special distances restraint. (Max. MXDG given at top of output, currently 100.)

Subsidiary keywords to CHNTYP SPEC:

RESNUMBER <ires1> <ires2>
<ires1> and <ires2> are the numbers of the linked residues.
RESNUMBER <ires1> TO <ires2>
<ires1> and <ires2> are the range of numbers of the linked residues, ie all consecutive pairs from <ires1> <ires1>+1 to <ires2>-1 <ires2> will bew linked. This is useful for instance to link polynucleotide chains.
ATNAME <atid1> <atid2>
<atid1> <atid2> are the names of linked atoms.
DISTANCE <dist> <ikwt> <ibwt>
<dist> is the ideal distance between the 2 atoms.
<ikwt> gives the code number for the distance restraint to be used in PROLSQ.
<ikwt> can be 1, 2, 3, 4 or 5
Default PROLSQ weights are:
               Bonding length (1-2 neighbour)          0.02A
               Angle related distance (1-3 neighbour)  0.04A
               Intraplanar distance (1-4 neighbour)    0.05A
               H bond or metal coordination distance   0.05A
               Special distance:                       ????

<ibwt> gives the code number for the Bfactor restraint to be used in PROLSQ.
<ibwt> can be 1,2,3,4,or 5

               Main chain bond (1-2 neighbour)         1.00A**2
               Main chain angle (1-3 neighbour)        1.50A**2
               Side chain bond                         1.00A**2
               Side chain angle                        1.50A**2
               Special                                  ?????
SUGar (optional) followed by 3 numbers: SUGA,XSIAN,CHIAN
This specifies the linkage between two carbohydrate residues. SUGA is the bond angle C1 - On - Cn in a (1 - n) linkage. (The special distance should be the C1 - On distance.) If SUGA is negative, the linkage is restrained to be a beta (1-n) linkage, if positive, an alpha (1-n) linkage. This should also be used to specify the linkage between the first carbohydrate residue and the protein sidechain (Asn or Ser). SUGA will normally have amplitude 114 degrees, but 120 degrees for the linkage to Asn.
      XSIAN is the angle O5-C1-On (default 107.6)
      CHIAN is the angle C2-C1-On (default 109.0)

Carbohydrate residues should have the same chain identifier as the protein chain to which they are attached. In the special distance defining the Asn to carbohydrate linkage, the Asn residue should be given as the second residue (JRES) and NOT the first.
Note that the code assumes that ASN is group number 3 in the dictionary, and also assumes the order of atoms for Asn and the carbohydrates. (For Asn, CG,OD1,ND2 should be atoms 6,7,8: for carbohydrates, C1,C2 must be the first and second atoms, O5 must be the last (O6 for sialic acid)).

See CCP4 newsletter 17, March 1986, article by Pete Artymiuk for further details.
Example: CHNTYP 1 SPEC RESN 202 107 ATNA CA NE2 DIST 2.1 5 1 SUG 120. 0 0

CONTACTS [NOINtermolecular] | [TRANS <ntrans>] | [ CONFormers ] | [ OCCUP <bump_occ> ] | [ NAMEd <nbump> CHNNAM <chnid1> RESNo <ires1> ... ]

NOTE: CONTACTS requires the presence of the SYMM keyword.

Specify van der Waals contacts searches are to be done. By default, non bonded contacts between all symmetry equivalent molecules are checked. The number of unit cell translations to be tested can be defined by TRANS. Translations up to <ntrans> unit cells will be applied (positive and negative). <ntrans> defaults to 2.

To prevent symmetry checking, specify NOINtermolecular.

Atoms are considered as bumping if (a) the sum of their occupancies is greater tham bump_occ. ( default 1.2). If you wish to reset this use key word:

OCCUP <bump_occ>
bump_occ used to be 1.0, but SHELXL refines occupancies to different values.
CONFormers
If this key word is set, bump checks are performed between all atoms in multiple conformations which belong to the same set. ie all those where both residues are labelled as A, or B, etc
ATOM    233  N   HIS B  10  0   -1.988   5.291   5.945  1.00 13.55   7
ATOM    234  CD2AHIS B  10  0   -6.528   3.345   6.769  0.50 15.87   6
ATOM    235  NE2AHIS B  10  0   -7.263   2.526   5.971  0.50 21.09   7
ATOM    236  CE1AHIS B  10  0   -6.574   1.869   5.099  0.50 19.35   6
ATOM    237  ND1AHIS B  10  0   -5.338   2.215   5.293  0.50 18.85   7
ATOM    238  CG AHIS B  10  0   -5.260   3.171   6.298  0.50 18.26   6

ATOM    404  CD2BHIS B  10  0   -1.925   2.079   7.626  0.50 15.07   6
ATOM    405  NE2BHIS B  10  0   -1.383   0.986   6.999  0.50 18.07   7
ATOM    406  CE1BHIS B  10  0   -2.210   0.582   6.038  0.50 16.09   6
ATOM    407  ND1BHIS B  10  0   -3.240   1.428   6.021  0.50 18.78   7
ATOM    408  CG BHIS B  10  0   -3.067   2.379   7.005  0.50 16.62   6

....
ATOM    656  OE2AGLU D  13  0   -0.518   4.288   0.606  0.67 18.74
ATOM    658  OE1AGLU D  13  0    1.477   4.954   1.382  0.67 18.61
ATOM    660  CD AGLU D  13  0    0.555   4.929   0.567  0.67 17.79

ATOM    657  OE2BGLU D  13  0    1.114   3.390  -0.972  0.33 18.14
ATOM    659  OE1BGLU D  13  0    0.766   3.550   1.136  0.33 18.70
ATOM    661  CD BGLU D  13  0    0.836   4.077   0.029  0.33 17.79
NAMEd <nbump> CHNNAM <chnid1> RESNo <ires1> CHNNAM <chnid2> RESNo <ires2> ...
This performs bump checks for each of the <nbump> named residues, regardless of the occupancy of the atoms. A list of residues is given specified by the chain id <chnid1> and the residue number <ires1>.

DISULPHIDE <ndss> CHNNAM <chnid1> <chnjd2> <ires1> <jres1> ...

Use primary keyword DISU to specify disulphides between different chains (max MXSS given at top of output, currently 20).
<ndss> :Number of disulphide bridges between these chains
<chnid1> is the chain id for the CYS residue on one side of the SS bridge.
<chnjd1> is the chain id for the CYS residue on other side of the SS bridge.
<iresi> <jresi> are the linked residue numbers.
(The disulphide bridges should only be specified once for each ires1 jresi, not for the reverse jresi iresi )
Example: DISU 2 CHNNAM A B 7 7 CHNNAM A B 20 19

HATOM <ihatom>

<ihatom> defines the required type of hydrogen atom treatment:

=0
standard non-hydrogen case (default).
=1
neutron case: permit D/H interchange.
=2
X-ray case with hydrogen atoms; shrink the ideal X-H bond lengths.

LIST [ ALL | SOME | FEW ]

ALL
list all distances calculated.
SOME
list only distances that deviate by more than 0.2A from the ideal values. Also lists the ideal dictionary.
FEW
omit lists of distances from printer output except symmetry contacts and disordered contacts.

NONX <kchn> CHNID <chnid1> <chnid2> .. <chnid_kchn> NSPA <nsp> <ires1 jres1 kode1> ... <ires_nspab jres_nsp kode_nsp> [MATRIX]

sets up the restraint to noncrystallographic symmetry.

Example:
NONX 4 CHNID A B C D NSPANS 3 5 8 1 13 21 2 89 105 1

<kchn> is the number of chains in the symmetry group.

Subsidiary keywords:

CHNID <chnid1> <chnid2> ... <chnid_kchn>
Defines the chains related by noncrystallographic symmetry.
<chnid1> <chnid2> ... <chnid_kchn> are the chain ids in this symmetry group.
NSPANS <nsp> <ires1 jres1 kode1> ... <ires_nspab jres_nsp kode_nsp>
<nsp> is the number of residue spans specified on this card. (limited to 40)
<ires1 jresi kodei> give the numbers of the initial and final residues in each span, and a code for that span defined as follows:
         KODA     Main_Chain       Side_Chain

         1    tight restraint   tight restraint
         2    tight restraint   medium restraint
         3    tight restraint   loose restraint
         4    medium restraint  medium restraint
         5    medium restraint  loose restraint
         6    loose restraint   loose restraint

MATRIX
(Best not to use this!) However, if keyword MATRIX is given then it means that symmetry transformations are known exactly a priori and KCHN matrices defining the noncrystallographic symmetry must be given after the NONX control line in the following format:
             R11 R12 R13 T1
             R21 R22 R23 T2
             R31 R32 R33 T3

R and T are the rotation and translation matrices for the symmetry transformation.

If MATRIX is NOT given then it means that the symmetry transformations are not known, and the program works them out for you (MUCH easier!)
IMPORTANT: If the non crystallographic symmetry operations are not supplied then they are determined by PROLSQ, using ONLY those atoms with tight restraints. Thus in this case, there must be sufficient atoms with tight restraints specified to allow the symmetry operations to be determined with reasonable accuracy.

PEPP <napep>

<napep> (default 5) is the number of atoms in the main chain that should be restrained to lie in the same planes (atoms of the link group):

    =4, restrain Ca, C, O, N to one plane.
    =5, restrain Ca, C, O, N, Ca to one plane (default).

SPECIAL [ CHNNAM <chnid1> <chnid2> RESNo <ires1> <ires2> ] | [ ATNAMe <atid1> <atid2> ] | [ DIST <dist ikwt ibwt>] | [ SYMM <symop> ]

Special distances restraint between atoms on different chains. (Max MXDG given at top of output, currently 100)

Example:
SPEC CHNNAM A B ATNA ZN1 NE2 RESN 49 10 DIST 2.0 1 1 SYMM -X,1-Y,Z

Subsidiary keywords:

CHNNAM <chnid1> <chnid2>
<chnid1> is the chain id for the residue on one side of the bond. <chnjd1> is the chain id for the residue on other side of the bond.
RESNUMBER <ires1> <ires2>
<ires1> and <ires2> are the numbers of the linked residues.
ATNAME <atid1> <atid2>
<atid1> <atid2> are the names of linked atoms.
DISTANCE <dist> <ikwt> <ibwt>
<dist> is the ideal distance between the 2 atoms.
<ikwt> gives the code number for the distance restraint to be used in PROLSQ and can be 1,2,3,4,or 5 Default PROLSQ weights are:
               Bonding length (1-2 neighbour)          0.02A
               Angle related distance (1-3 neighbour)  0.04A
               Intraplanar distance (1-4 neighbour)    0.05A
               H bond or metal coordination distance   0.05A
               Special distance:                       ????

<ibwt> gives the code number for the Bfactor restraint to be used in PROLSQ and can be 1,2,3,4,or 5

               Main chain bond (1-2 neighbour)         1.00A**2
               Main chain angle (1-3 neighbour)        1.50A**2
               Side chain bond                         1.00A**2
               Side chain angle                        1.50A**2
               Special                                  ?????
SYMM <symop> (This keyword must be the last given.)
<symop> must have the format like this: X-1/2,3/2-Y,1+Z

SYMMETRY <nspgrp> | <NAMSPG>

Specify the space group in International Tables style. Default is <nspgrp> = 1.

TITLE <string>

Title used on the printer output.

VDWCUT <vdwcut>

<vdwcut> is the cut-off value for looking for possible Van der Waals contacts. This saves time. (Default=5A if VDWCUT=0 or not specified.)

VDWRADII <nvdw> <type_1> <icode_1> <dvdw_1> ... <type_nvdw> <icode_nvdw> <dvdw_nvdw>

Change Van-der-Waals contact distances for some atom types.

The default atom types and contact distances held are as follows:

          TYPE     ICODE  DVDW
           C         1    3.70
           N         2    3.10
           O         3    3.00
           S         4    3.60
           FE        5    2.40
           H         6    2.40
           CA        7    3.80
           I         8    4.30
           C_SP2     9    3.40
           OW       10    3.00

Example: VDWR 1 ZN 7 3.00

<nvdw>
the number of distances to be specified.
<type_i>
the atom type name (max of 4 characters). This is not used.
<icode_i>
the atom type number from 1 to 10.

In the PROTIN dictionary, ICODE numbers have been assigned to standard atom TYPE names (C,N etc.). PROTIN assigns Van-der-Waal contact distances using these ICODEs. It is essential that you check the PROTIN dictionary to see that the atom type you are interested in has the same icode as you assign here.

  Part of dictionary:
                                                 1   26          NIC J
 -10.73511  -2.60989   1.44399              7         1                    P
  -6.68856   0.79139  -2.59833              2         2                    N1
  ..........................................ICODE ..........................
<dvdw_i>
the Van-der-Waals contact distance between two atoms of this type. VDW contact distances between 2 atoms of types i and j = (DVDWi + DVDWj)/2

If you are changing the default DVDW be careful to choose an ICODE which isn't used for other atoms types in your coordinate file. So VDWR 1 ZN 7 3.00 will override the contact distance for all atom types with ICODE = 7 including for example P. If DVDW=0 then no VDW restraint is applied to atoms with this ICODE.

END

This is obligatory as the final card.

EXAMPLES

Simple

     protin                          
     XYZIN $CTEST/toxd/toxd.pdb  
     PROTOUT $SCRATCH/protout.dat            
     PROTCOUNTS $SCRATCH/counts.dat          
     << END-protin 
     CHNNAM ID  B  CHNTYP 1
     CHNNAM ID  W  CHNTYP 2
     !CHNTYP    NTER=N-terminal resid type;CTER=C-terminal   resid type
     CHNTYP 1  NTER 1 GLN 3   CTER 59 GLY 2 
     CHNTYP 2 WAT
     PEPP 4
     SYMM 19
     VDWRadii 1 CA 7 3.8
     VDWCUT 5
     CONTACTS
     END
     END-protin 

... WITH CARBOHYDRATE

There are 4 copies of the protein chain, and 4 solvent chains. The protein chains start at residue 24 and end at 408. There are two carbohydrate groups (the NAG residues 410,411) attached to the one glycosylated residue 315. The NAG linkage is beta(1-4).

     protin  << eof
     TITLE REFINE OVALBUMIN
     CHNNAM ID A CHNTYP 1
     CHNNAM ID B CHNTYP 1
     CHNNAM ID C CHNTYP 1
     CHNNAM ID D CHNTYP 1
     CHNNAM ID E CHNTYP 2
     CHNNAM ID F CHNTYP 2
     CHNNAM ID G CHNTYP 2
     CHNNAM ID H CHNTYP 2
     CHNTYP 1 NTERM 24 GLY 5 CTERM 408 PRO  2  DISUL 1 96 143  CARBO 1 315
     CHNTYP 1 SPEC RESN 410 315 ATNA C1 ND2 DIST 1.4 1 3 SUGA -120
     CHNTYP 1 SPEC RESN 411 410 ATNA C1 O4 DIST 1.4 1 3 SUGA -114
     CHNTYP 2 NTERM  0 HOH 3 CTERM   0 HOH  2
     PEPP 4
     VDWRadii 1 CU 8 0.3
     VDWCUT 5
     CONTACTS NOINTER
     SYMM 1
     END
     eof

... WITH NONCRYSTALLOGRAPHIC SYMMETRY

There are 2 identical protein chains and 1 solvent chain. The N terminus is an acetylated ALA. Residue 401 in each protein is multiplanar group. The distance between atoms N7N and OG of residues 401 and 160 within the same chain is restrained. Noncrystallographic symmetry between the protein chains is defined and 2 residue spans have reduced restraint.

protin  XYZIN bin5.pdb DICTPROTN ideal_nad.dat PROTOUT protout.dat 
        PROTCOUNTS  counts.dat << END-protin
TITLE restrain LDH/NADH dimer
CHNNAM ID A CHNTYP 1
CHNNAM ID B CHNTYP 1
CHNNAM CHNTYP 2 
CHNTYP 1 NTERM 1 ALA 5 CTERM 331 PHE 2
CHNTYP 1 MULPLN 1 401 CISPRO 1 138 SECO 246 256 1 
CHNTYP 1 SECO 246 256 1 SPEC RESN 401 160 ATNA N7N OG DIST 4.5 1 3
CHNTYP 2 WAT 
CONTACTS TRANS 1
LIST FEW  
NONX 2 CHNID A B NSPANS 2 72 107 2 197 219 4
PEPP 4
SYMM 18
VDWRadii 1 CA 8 3.8
VDWCUT 5
END
END-protin
#

... WITH RNA

There are 4 protein chains, each linked with an RNA chain. This example includes both protein and single stranded RNA. Links are defined for the RNA (or DNA) suger-phosphate backbone.

protin  XYZIN coords.pdb  \
        PROTOUT hktmp.protout PROTCOUNTS hktmp.counts \
        DICTPROTN ${saved}/protin_vl_rna.idl   << EOF-protin
TITLE U1A/RNA 1.92A cycle 391
SYMMETRY P6522
CHNNAME ID A CHNTYP  1 ROFFSET 0
CHNNAME ID B CHNTYP  1 ROFFSET 0
CHNNAME ID C CHNTYP  1 ROFFSET 0
CHNNAME ID P CHNTYP  2 ROFFSET 0  # RNA
CHNNAME ID Q CHNTYP  2 ROFFSET 0  # RNA
CHNNAME ID R CHNTYP  2 ROFFSET 0  # RNA
CHNNAME ID X CHNTYP  3 ROFFSET 0  # Cl ion + glycerol
CHNNAME ID U CHNTYP  3 ROFFSET 0  # waters
CHNNAME ID V CHNTYP  3 ROFFSET 0
CHNNAME ID W CHNTYP  3 ROFFSET 0
CHNNAME ID Y CHNTYP  3 ROFFSET 0
CHNTYP  1    NTER   1 SER 3     CTER 99 SER 2
CHNTYP  2    WAT  # or NONprotein
CHNTYP  3    WAT
LIST    FEW
PEPP    5
CHNTYP 2 SPEC ATNAM O3' P   RESNO 1 TO 21 DIST 1.61  1 1
CHNTYP 2 SPEC ATNAM C3' P   RESNO 1 TO 21 DIST 2.61  2 2
CHNTYP 2 SPEC ATNAM O3' O1P RESNO 1 TO 21 DIST 2.53  2 2
CHNTYP 2 SPEC ATNAM O3' O2P RESNO 1 TO 21 DIST 2.53  2 2
CHNTYP 2 SPEC ATNAM O3' O5' RESNO 1 TO 21 DIST 2.53  2 2
END
EOF-protin
#

INPUT AND OUTPUT FILES

Input Files

XYZIN
The input coordinate file in standard Brookhaven format.
For Brookhaven files:
(i) the following header must be given.
 CRYST1 a b c alpha beta gamma (a6,3f9.3,3f7.2)
 SCALE1 .....................
 SCALE2 .....................
 SCALE3 .....................


(ii) some residue or atom names may need changing to match non-standard names in the dictionary. eg. an N-acetyl group has OT,CT1,CT2 not O,C,CH3
(iii) PROTIN does not support insertion codes so residues must be renumbered such that eg. 132A,132B become 132,133
(iv) Disordered residues must be given in Standard Brookhaven format.
Example:

 ....
 ATOM     94  O   TYR A  14      15.430  34.659  33.979  1.00 20.98   2
 ATOM     95  CB ATYR A  14      16.476  33.812  37.212  0.50 22.22   2
 ATOM     96  CB BTYR A  14      16.502  33.908  37.229  0.50 19.09   2
  .....
DICTPROTN
The dictionary file of ideal protein values (see Appendix A); default is $CLIBD/protin.dic. There are three dictionaries altogether:
  1. protin.dic based on John Priestle's with added co-factors from York.
  2. protin_jp.idl John Priestle's original.
  3. protin_vl.idl a new dictioary by Victor Lamzin but also some co-factors from York. All these except NAD were NOT modified by VL, see protin_vl.doc.

The default dictionary $CLIBD/protin.dic was based on John Priestle's using the Engh and Huber parameters [Acta Cryst. A47, 392-400, 1991]. There is also a RNA/DNA dictionary composing five residues ADE, CYT, GUA, THY and URA plus other co-factors. A list is given below:

      Acetyl - CoA                           ACO
      Rna Adenine ( =DNA + O2' )              ADE
      Adenine Ribose Phosphate of CoA         AP
      Adenosine Triphosphate                  ATP
                                              BME
      chloramphenicol                         CLM
      Co-enzyme A                             COA
      Rna Cytosine ( =DNA + O2' )             CYT
      dfp ???                                 DFP
      fucose                                  FUC
      galactose                               GAL
      glucose                                 GLC
      guanosine monophosphate  - 
      all 3 possible mono phosphates included GMP
      Rna Guanine ( =DNA + O2' )              GUA
      Haem                                    HEM
      4-iodophenol                            IPH
                                              IND
      Transition state intermediate in CAT    INT
      Isoquialine                             ISQ
      mannose                                 MAN
      M-cresol                                MCR
      Methyl paraben   In insulin             MPB
      methotrexate                            MTX
      Nicotinamide adenine dinucleotide       NAD
      N-acetyl Glucosamine                    NAG
      nicotinamide - part of NAD             NIC
      oxalic acid                             OXA
      Phenyl acetic acid                      PAA
      Phenyl methyl Sulphonyl                 PMS
      resorcinol   In insulin                 RES
      Resorcinol                              RPH
      sialic acid                             SIA
      Thiocyanate                             SCN
      SO4                                     SUL
      Dna Thymine                             THY
      trimethoprim - bacterial DHFR inhibitor TMP
      Rna Urasil ( =DNA + O2' )               URA
      xylose                                  XYL
SYMOP
symmetry library (default $CLIBD/symop.lib)

Output Files

PROTCOUNTS
A file containing the counts of restraints. (Unformatted sequential i.e. you can't look at it.) See Data Formats section for a detailed list of contents.
PROTOUT
A file containing details of the protein structure and restraints for input to the Hendrickson-Konnert program PROLSQ. (unformatted sequential i.e. you can't look at it) Details of the contents are listed in the Data Formats section
SCRAT1
When refining carbohydrate structures, the program uses a scratch file opened on unit 25.

NOTES

(1) Table of values for terminal groups
These values given are held in the standard dictionary which assigns the number (negative IN2) to a terminal group type. See Appendix A, card 1a.
      Number   Type
         2     C-Terminal
         3     N-Amino
         4     N-Formyl
         5     N-Acetyl

(2) Table of values for amino acids
The numbers (IN2) correspond to the types in the standard dictionaries supplied.
          Number  Type Slc  Number  Type Slc
            1     ALA  A      13    MET  M
            2     ARG  R      14    PHE  F
            3     ASN  N      15    PRO  P
            4     ASP  D      16    SER  S
            5     CYS  C      17    THR  T
            6     GLN  Q      18    TRP  W
            7     GLU  E      19    TYR  Y
            8     GLY  G      20    VAL  V
            9     HIS  H      21    HEM  X
            10    ILE  I      22    WAT  O Water
            11    LEU  L      23    SUL  U Sulphate
            12    LYS  K

Disorder is now properly treated automatically as long as your input file has the Brookhaven input.
    e.g. ATOM ... CE1A   .. TYR ...............x y z 0.4 b
         ATOM ... CE1B   .. TYR ...............x y z 0.6 b

PRINTER OUTPUT

The printer output is divided into a number of sections as indicated below. In the sections of distances, only distances that deviate by more than 0.2 Angstroms from the ideal values are listed if LIST FEW is given. If LIST SOME is given then the control data, the dictionary of the standard groups (the first part of the dictionary listing) and the section giving the summary counts are output. With LIST ALL, all of the following is output.

a) Details of the input control data.

b) Details of the ideal protein dictionary under the following headings:

 
 Dictionary of Standard Groups.
 Dictionary of Distance Restraints.
 Dictionary of Planar Groups.
 Dictionary of Chiral Centres.
 Dictionary of Potential Contact Restraints.
 Dictionary of Conformation Torsional Angles.
 Dictionary of Secondary Structure.

c) List of the atomic coordinates. Missing atoms for standard groups are listed at the end of this section. The items listed for each atom are:

 
 The atom sequence number (following the input order)
 The atom name.
 The chain number.
 The residue name.
 The residue number.
 The atom type number.
 The atom number within the residue.
 X, Y, Z, B, OCC.
 The chain identifier.
 The single letter amino acid code.
 The atom label (residue no. + atom name + chain identifier) for PROLSQ.

d) Table of interatomic distances giving ideal and model values. Those distances deviating by more than 0.2 Angstroms from the ideal distance are flagged with an asterisk. The items listed for each distance are as follows:

 
 The distance sequence number. (The position in the  list  of distances).
 The sequence number of the first atom.
 The sequence number of the second atom.
 The residue number of the first atom (with chain offset).
 The label of the first atom (residue no. + atom name).
 The residue number of the second atom.
 The label of the second atom (residue no. + atom name).
 The ideal distance.
 The observed (model) distance.
 The distance type code KDWT (See Appendix A Card 2b.)
 The distance type code KBWT (See Appendix A Card 2b.)
 blank or *

The distances are listed for the following categories:

   Intra-residue distances.
   Inter-residue or link distances.
   Special distances including Disulphides.
   Externally defined distances.

e) Listing of planar groups. This includes terminal group, link group and side chain planar groups followed planes within a special group e.g. a Haem group. The items given are:

The plane sequence number.
The residue name or LINK.
The residue number.
A list of the atoms in the plane (sequence no. + atom name).
A list of the distance codes specifying the bonded pairs (poitive if in the same direction as defined in the distance table, negative if in the opposite direction).

f) Table of chiral centres. the items given are:

 The chiral centre sequence number.
 The residue name.
 The residue number.
 The 4 atoms around the chiral centre (residue number +  atom name).
 The six distance codes.
 The ideal chiral volume.
 The model chiral volume.

g) Possible Van-der-Waals contacts for Intra-residue and Link contacts, Inter-residue contacts and possible Hydrogen Bonds. The items given are:

 
 The Van-der-Waals contact sequence number.
 The sequence number of the first atom.
 The sequence number of the second atom.
 The residue number of the first atom + atom name.
 The residue number of the second atom + atom name.
 The ideal Van-der-Waals contact distance.
 The observed distance.
 The distance type code (See Appendix A.).
 * if the observed is too close.

h) Conformational torsion angles (Preceded by details of any secondary structural elements defined.) The items given are:

(1) The torsion angle identification for the residue:

The single letter amino acid code.
The residue number.
The number of side chain torsion angles.
A list of the atoms defining the torsion angles (sequence number + atom name).
(2) Details of the torsion angles (only if LIST ALL),
first for the main chain and then for the side chain:
The torsion angle sequence number.
The angle type (1=Phi, 2=Psi, 3=omega, 4=Chi).
Identifiers of the four atoms defining the torsion angle (sequence no. + atom name).
Six distance codes.
The weighting code.
The ideal angle.

i) Non-crystallographic symmetry. This gives for each symmetry group (maximum 2):
(1) The flag KEND (=0 if a second group will be listed).
The number of chains in the symmetry group (NCHN). Flag KNOWNR=0: symmetry transformations not known; KNOWNR=1, known. Identification numbers of the chains in this symmetry group.
(2) Details of the symmetry transformations if known with one matrix for each chain.
(3) List of the atom equivalences. The items are:
The atom equivalence sequence number.
The weighting code type (See data card 9b).
Atom identifiers of the NCHN atoms in the equivalence group (sequence number + atom name).

j) The thermal ellipsoid specifiers. These are only listed if LIST ALL is used. The items given for each atom are:
The thermal ellipsoid sequence number.
The residue number.
The atom identifiers of the five atoms used in defining the thermal ellipsoid parameters. (Sequence number + atom name)

k) The summary counts for the following items:

 
 The number of atoms (NA).
 The number of distances (NDIS).
 The number of planes (NPLN).
 The number of chiral centres (NCHR).
 The number of possible contacts (NVDW).
 The number of present contacts.
 The number of torsion angles (NTOR).
 The number of group 1 symmetry equivalences (NSYM1).
 The number of group 2 symmetry equivalences (NSYM2).
 The number of thermal ellipsoid specifiers.
 The number of variable occupancy factors.

ERROR MESSAGES

Errors in control data

A syntax error in a numerical field of a data control card will give the following error message and the program will stop.
**SYNTAX ERROR IN FIELD n** text
If too many chains are defined (see data card 2) the program will stop.
STOP **TOO MANY CHAINS**
If too many special distances are defined (see data cards 7) the program will stop with the following message.
**** INSUFFICIENT SPACE EXISTS FOR STORAGE OF SPECIAL DISTANCE DATA
ICH NDIG MAXDIG IPR NDIC MAXDIC ich ndig maxdig ipr ndic maxdic
(maxdig is the max. allowed number of intra-chain special distances, maxdic is the max. allowed number of inter-chain distances)

Errors when reading dictionary file
The program will continue after these messages unless a STOP is indicated below.
a) Distances
*ERROR**** LINK GROUP IDENTIFIER, IDGRP=n OUTSIDE ALLOWED RANGE OF 1-5
*ERROR**** GROUP NO. N IS CALLED name1 ON THE DISTANCE CARD, BUT IS CALLED name2 IN THE ATOM DECK
*ERROR**** ATOM NO. n FOR DISTANCE NO. m OF RESIDUE name EXCEEDS THE HIGHEST ATOM NO. l FOR THAT RESIDUE

b) Planes
*ERROR**** GROUP NO. n IS CALLED name1 ON THE PLANES CARD, BUT IS CALLED name2 IN THE ATOM DECK
*ERROR**** THE LINK GROUP HAS MORE THAN 5 ATOMS COMPUTATION HALTED

c) Chiral centres
*ERROR**** GROUP NUMBER n IS CALLED name1 ON THE CHIRAL CARD, BUT IS CALLED name2 IN THE ATOM DECK
NUMBER OF CHIRAL CENTRE TYPES EXCEEDS THE AVAILABLE STORAGE

d) Contacts
*ERROR**** GROUP NUMBER n IS CALLED name1 ON THE CONTACTS CARD, BUT IS CALLED name2 IN THE ATOM DECK
*ERROR**** ATOM NO. n FOR CONTACT NO. m OF RESIDUE name EXCEEDS THE HIGHEST ATOM NO. max FOR THAT RESIDUE

e) Torsion angles
*ERROR**** GROUP NO. n IS CALLED name1 ON THE TORSION CARD, BUT IS CALLED name2 IN THE ATOM DECK

Errors in reading atoms and preparing the output file

If a chain identifier found in the input file was not defined in the control data the following message will be printed.
ERROR IN CHAIN IDENTIFICATION

An unidentified residue type will give the following message and the program will stop.
*ERROR**** GROUP NAME restyp GIVEN FOR RESIDUE n IS NOT AMONG THOSE IN THE TABLE OF STANDARD GROUPS. COMPUTATION HALTED

An unidentified atom name will give the following message and the atom will be omitted though the program will continue.
*ERROR**** ATOM NAME atname GIVEN FOR RESIDUE n IS NOT AMONG THOSE IN THE STANDARD TABLE FOR THE restyp GROUP. ATOM OMITTED
A duplicate atom will give the following message and the atom will be omitted though the program will continue.
*ERROR**** A DUPLICATE OF ATOM name FOR RESIDUE restyp resnumber HAS BEEN DETECTED. ATOM OMITTED
Atoms missing from standard groups in the dictionary will be flagged as follows:
THE FOLLOWING ATOMS ARE MISSING FROM THE DECK OF ATOMIC COORDINATES
atname OF RESIDUE restyp resnumber
atname OF RESIDUE restyp resnumber

The following message will be printed if missing atoms cause problems in the definition of a plane.
*THE PLANAR TERMINAL GROUP OF resname resnumber HAS NOT BEEN SPECIFIED SINCE MISSING SIDE CHAIN ATOMS OBFUSCATE THE LOCATION OF NEEDED ATOM PAIRS

REFERENCES

  1. Refinement of Protein Structures (Proceedings of the Daresbury Study Weekend, SERC Daresbury Laboratory, 1980) Compiled by P.A. Machin, J.W. Campbell and M. Elder.
  2. Macromolecular Refinement (Proceedings of the CCP4 Study Weekend, 1996), ed. E. Dodson, M. Moore, A. Ralph and S. Bailey.

AUTHORS

Authorship : W.A. Hendrickson and J.H. Konnert.

Modifications for this version including the conversion to FORTRAN 77 have been made by W. Pulford (Oxford), E.J. Dodson (York) and J.W. Campbell (Daresbury). The documentation for Daresbury was prepared by J.W. Campbell and was based on existing documentation by A. Sielecki, A. Wlodawer and W. Hendrickson.
Key words added. (Eleanor Dodson)
UPDATED July 1990 Andrew Leslie - Eleanor Dodson Symmetry contacts added. Code based loosely on S Sheriff
1993 document updated (Cameron Dunn)

PROGRAM FUNCTION AND STRUCTURE

PROTIN is run prior to running the restrained maximum likelihood protein refinement program REFMAC (see refs. 1, 2 and 3). The program reads in a standard dictionary of protein geometry and then reads in an input coordinate file and compares the observed geometry with the ideal geometry. A file (PROTOUT) of data for use by REFMAC is created containing the following sections of data. These are in the order indicated though all sections need not necessarily be present.

 
 Atom list
 List of distances
 List of planar groups
 List of chiral centres
 List of possible contacts
 List of torsion angles
 List of non-crystallographic symmetry data
 List of thermal ellipsoids

The main control of the program is divided amongst 6 subroutines PROT1 to PROT6 which are called in sequence from a jiffy main program. The main functions of these subroutines are outlined below:

PROT1
Read in and print the control data.
Set up the arrays of the ideal geometry derived from the ideal protein geometry file by the subroutines RESIDU, (SHRINK), DISTNS, PLANES, CHIRAL, VDWAAL, TORSHN, (ELLIPS).
Read the atom data from the input coordinate file in Brookhaven format, check the atom identifiers etc. and write the atom details to the printer, if required, and the PROTIN output file on Unit 10.
PROT2
Check for missing atoms and print details of these.
Calculate and examine the various kinds of distances and write details of these to the printer, if required, and to the PROTIN output file on unit 10. Special distances are also read in and processed at this stage if required.
PROT3
Prepare details of the planar groups and write them to the printer, if required, and to the PROTIN output file on Unit 10.
Prepare details of the chiral centres and write them to the printer, if required, and to the output PROTIN file on Unit 10.
PROT4
Prepare details of the non-bonded contacts and hydrogen bonds and write them to the printer, if required, and to the PROTIN output file on Unit 10.
PROT5
Prepare details of the torsion angles and write them to the printer, if required, and to the PROTIN output file on Unit 10.
PROT6
Read in and prepare details of the non-crystallographic symmetry and write them to the printer, if required, and to the PROTIN output file on Unit 10.
Write the output counts summary to the printer and write the output counts file on Unit 11.
SUGLNK Deals with linkage of carbohydrate residues. See CCP newsletter number 17, article by Pete Artymiuk.

DATA FORMATS

Format of PROTOUT

This is an unformatted sequential file divided into a number of sections as described below. These are present in the file in the order described.

a) Atom parameter records (8 words in length).
 
 word 1  (Integer)     Atom number.
 word 2  (Character*9) Atom label (residue code,residue no.,atom ID,chain ID).
 word 3  (Integer)     Atom type number (1=C, 2=N etc. See Appendix A).
 word 4  (Real)        Fractional coordinate X.
 word 5  (Real)        Fractional coordinate Y.
 word 6  (Real)        Fractional coordinate Z.
 word 7  (Real)        Temperature factor.
 word 8  (Real)        Occupancy.
b) Bonded distance records (6 words in length).
Sets of records are present for:
 
 Intra-residue distances.
 Inter-residue or Link distances.
 Special distances including disulphides.
 Externally defined distances.

The format of the records is as follows:

 
  word 1  (Integer)  Distance sequence number.
  word 2  (Integer)  Sequence no. of the first atom.
  word 3  (Integer)  Sequence no. of the second atom.
  word 4  (Real)     Ideal distance in Angstroms.
  word 5  (Integer)  Distance  type  code  KDWT  (or   LDWT)
  word 6  (Integer)  Distance  type  code  KBWT  (or   LBWT)
c) Planar groups records (2*NP+NA+4 words in length).
These have, in residue order, the planes for the terminal groups, the link groups and the side chains. These are followed by the planes for special multiplanar groups. The record format is as follows:
   word 1           (Integer)  The plane sequence number.
   word 2           (Integer)  The number of atoms in the plane (NA).
   word 3 to NA+2   (Integer)  The NA sequence numbers of the
                                atoms in the plane.
   word NA+3        (Integer)  The  number  of  bonded  pairs (NP).
   word NA+4 to NA+2*NP+3  (Integer)  The NP pairs  of  bonded  pair
                                      atom sequence number codes.
d) Chiral centre records (12 words in length)
   word 1        (Integer)  The chiral centre sequence number.
   word 2 to 5   (Integer)  The sequence  numbers  of  the  4  atoms
                            around the chiral centre.
   word 6 to 11  (Integer)  A list of the 6 distance sequence number
                            codes.
   word 12       (Real)     The ideal chiral volume.
e) Possible contact records (5 words in length).
 
   word 1  (Integer)  The Van-der-Waals contact sequence number.
   word 2  (Integer)  The sequence number of the 1st atom.
   word 3  (Integer)  The sequence number of the 2nd atom.
   word 4  (Real)     The allowed contact distance in Angstroms.
   word 5  (Integer)  The contact distance type code KTYP.
f) Torsion angle records (12 words in length).
   word 1        (Integer)  The torsion angle sequence number.
   word 2        (Integer)  The residue number.
   word 3        (Integer)  The angle type, 1=Phi,  2=Ps1,  3=omega,
                            (>=4)=Chi.
   word 4 to 7   (Integer)  The sequence  numbers  of  the  4  atoms
                            defining the torsion angle.
   word 5 to 10  (Integer)  The six distance sequence number codes.
   word 11       (Integer)  The weighting code.
   word 12       (Real)     The ideal angle in degrees.
g) Non-crystallographic symmetry records (3 types).
For each symmetry group defined (max of 2), the following records are present:
  (i) Chains in symmetry group records (NCHN+2 words in length).
  word 1            (Integer)  The  number  of   chains   in   the
                               symmetry group (NCHN).
  word 2            (Integer)  KNOWNR =1, symmetry matrices known,
                                      =0, not known.
  word 3 to NCHN+2  (Integer)  The  chain  numbers  of  the   NCHN
                               chains.
 
 (ii)  NCHN records of transformation matrices (12  words).  These  are
       only present if KNOWNR=1.
 
 (iii) Atom equivalence records (NCHN+2 words in length).
    word 1            (Integer)  The   atom   equivalence   sequence
                                  number.
    word 2            (Integer)  The weighting code specification.
    word 3 to NCHN+2  (Integer)  The NCHN atom sequence numbers  for
                                 this equivalence.
h) The thermal ellipsoid records (5 words in length)
word 1 to 5 (Integer): The atom sequence numbers of the five atoms used in the definition of the thermal ellipsoid.

Format of PROTCOUNTS

This contains 11 integer values as follows:

          word 1   NOATOM  The number of atoms.
          word 2   NODIST  The number of distances.
          word 3   NOPLAN  The number of planes.
          word 4   NOCHRL  The number of chiral centres.
          word 5   NOVDW   The number of possible contacts.
          word 6   NONOW   The number of current contacts.
          word 7   NOTOR   The number of torsion angles.
          word 8   NSYMM1  The number of group1 equivalences.
          word 9   NSYMM2  The number of group2 equivalences.
          word 10  NAXES   The number of ellipsoid specifiers.
          word 11  NOCC    The number of variable occupancies.+}

APPENDIX A

Setting up an Ideal Parameters Dictionary for PROTIN

This section describes the way in which a standard groups dictionary is set up. Three standard dictionaries are available for proteins without defined hydrogen atoms (referred to as IDEALS). Certain problems will require user modified dictionaries. The dictionary is set up as a card image file.

Data Cards 1 Standard Groups Definition

For each type of group defined, there is a header card 1a followed by a set of atom definition cards 1b. Cards 1a and 1b are distinguished by the value of the item IN1.

Card 1a. Name card (45X,2I5,10X,A4,A1)

IN1 IN2 IN3 IN4

 
    If IN1 >= 1,  Indicates a Name card.
                  If IN2<0 then IN1  takes  the  following
                  values for the following cases.

        1  trans-peptide
        2  cis-peptide
        3  trans-proline
        4  cis-proline
        5  disulphide bridge

                   If IN2>=1  then  IN1=2  flags  the  last
                   amino acid side chain  in  the  list  of
                   standard groups.
 
      If IN1  = -1,  Indicates the end of data cards 1  (Also
                          set IN3=END)
 
      If IN2  > 0,   The  residue  or  group   identification
                        number (i.e. 1 for ALA, 2 for ARG etc.)
 
      If IN2  = 0,   Indicates that the group is a link group
                        (e.g. cis or trans peptide, set  IN1  to
                         specify which type)
 
                 < 0,   MAIN, C-terminal or N-terminal group.

      IN3 is the 3 letter amino acid name (a4)

      IN4 is the 1 letter amino acid code (a1)

The set of codes for IN2 defined in the standard dictionaries are given below. It is inadvisable to alter the codes from -1 to -3 or from 1 to 20 and it should be noted that PROTIN makes some specific assumptions about particular residue types e.g. CYS=5, GLY=8, PRO=15. MET is taken as the standard group defining chirality at CA.

 
  Number Type              Number  Type Slc  Number  Type Slc
  -1     MAIN                1     ALA  A      13    MET  M
  -2     C-Terminal          2     ARG  R      14    PHE  F
  -3     N-Amino             3     ASN  N      15    PRO  P
  -4     N-Formyl            4     ASP  D      16    SER  S
  -5     N-Acetyl            5     CYS  C      17    THR  T
   0     Trans-peptide       7     GLU  E      19    TYR  Y
   0     Cis-peptide         8     GLY  G      20    VAL  V
   0     Trans-proline       9     HIS  H      21    HEM  X
   0     Cis-proline         10    ILE  I      22    WAT  O Water
   0     Disulphide          11    LEU  L      23    SUL  U Sulphate
                             12    LYS  K

Cards 1b Atom cards (3F10.5,10X,3I5,20X,A4)

X Y Z KATOM IN1 IN2 LABEL

X, Y, Z are the Cartesian coordinates, in Angstroms, in a reference frame with its origin at the Calpha atom.

KATOM is the atom type code. Each atom type in the dictionary must be assigned a code in the range 1 to 10 (numbers outside this range must not be used unless the programs are modified). Each particular code may correspond to several atom types (e.g. most metals have code = 7), but the default types are as follows:

                    Code   Type
                     1      C
                     2      N
                     3      O
                     4      S
                     5      FE
                     6      H
                     7      CA
                     8      I
                     9      C-SP2
                    10      OW

In the program PROTIN, Van-der-Waals contact distances are assigned on the basis of the atom type code (rather than the name). The distance associated with a particular code may be changed with the keyword VDWRADII, but remember that this will change the distances for all atom types with this code.

IN1 is a flag set to 0 for an atom card. A non-zero value indicates another card of type 1a and terminates the atom cards for the current group.

IN2 is the order number of an atom within a given residue, starting with 1 for N, 2 for Calpha etc. Amino acid side chains start with IN2=5 for the Cbeta atom. For peptide groups, corresponding negative numbers are used for denoting atoms belonging to the previous residue e.g. with the first 3 atoms belonging to the previous residue we have:
CA(-2)-C(-3)-O(-4)-N(1)-CA(2)

LABEL is the atom name up to 4 characters in length following the naming as used in the Brookhaven file format.
Cards 1 are terminated by IN1=-1 on a Name card with IN3=END

Data Cards 2 Interatomic Distances and Codes

For each group specified on data cards 1, a set of distance cards should be given. For each group there is a header card followed by the distance cards.

Cards 2a Header card (A4,6X,3I5)
IDGRP KIND ND IPEP
IDGRP is the residue name (equivalent to IN3 on data cards 1a). KIND is the group number (equivalent to IN2 on data cards 1a). KIND=100 terminates the distance cards. ND is the number of distances for this group. IPEP is the peptide type for link groups.

Cards 2b. Distance cards for ND distances with up to 8 distances per card. (8(2I3,2I2))
IATM(1) JATM(1) KDWT(1) KBWT(1) IATM(2) JATM(2) KDWT(2) KBWT(2) ...

 
    IATM(i) Number of the origin atom within the group.
    JATM(i) Number of the target atom within the group.
    KDWT(i), KBWT(i) are two codes indicating the type of distance
             for weighting purposes. THe codes are as follows:

 KDWT(i)  KBWT(i)   Description
    1       1       Bonded pair between 2 main chain atoms.
    1       3       Bonded pair involving 1 or more side  chain
                    atoms.
    2       2       Angle pair involving only main chain atoms.
    2       4       Angle pair involving 1 or more  side  chain
                    atoms.
    3       0       Atoms determining a torsion  angle  of  the
                    form A-D:

         A------D
              /
           B--C

      e.g. O(i)---Calpha(i+1)
 
    4       4       Used for special inter-group contacts.
 
Cards 2 are terminated by a distance header card with KIND=100.

Data Cards 3 Planar Groups Information

Cards 3a. Planar groups (A4,I2,2I3,17I4)
IDGRP KIND NCNO NA IAT(1) IAT(2) ... IAT(NA)

 
   IDGRP is the residue name 
         (equivalent to  IN3  on  data  cards 1a).
   KIND is the group number (equivalent to IN2 on data cards 1a).
         KIND=100 terminates the planar groups cards.
   NCNO is the number of non-hydrogen atoms in the planar group.
   NA is the number of atoms in the plane for this group (max. of
         17, or for link groups, 6).
   IAT(1)...IAT(NA) are the numbers of the atoms within the group
         (equivalent to IN2 or cards 2b.)  for  the  NA  atoms  
          of  the plane.

Cards 3b. Bonded pairs for link groups (16I5)

These cards are only given for Link groups.
IA(1) JA(1) IA(2) JA(2) ...

 
   IA(i), JA(i) are the atom numbers for the  bonded  pairs. The
        number of pairs given is NA*(NA-1)/2. Remember that only the
        first NAPEP atoms as defined in the data control cards for
        PROTIN will be considered to form the plane. 

Specification for multiplanar groups has been simplified.

Multiplanar groups are now indicated by a non-zero flag in IAT(17). Otherwise the input is identical to that given above. Note that this imposes a limit of 16 atoms in the plane.

Original specification of multiplanar groups is given below.

This is now redundant, but is included because many users will have dictionaries with the old type of multiplanar specification. It has been changed because of the coupling between the dictionary and the PARAMETER MXG in the source code. This meant that if MXG was changed in the program, the dictionary also had to be edited !!

For compatibility with existing dictionaries, the new version of PROTIN will STILL read and deal with multiplanar specifications using the old type of specification. It is NOT necessary to edit the dictionary. To do this, the assumption made is that if KIND is greater than the largest group number given in the list of coordinates (Card 1a), then it is an old-style dictionary, and the true group number is given in IAT(17).

Old Style Dictionary Multiplanar Specification:

KIND should be a unique identifier for each plane, and the first value used should be MXG+1, where MXG is the maximum number of groups (set in a PARAMETER statement in the program). Subsequent planes should then have identifiers MXG+2, MXG+3 etc. The other parameters are as above, except that the true group number for this residue type (i.e. IN2 on data cards 1a) must be given in the last I4 field on the card (i.e. IAT(17)). This restricts the possible number of atoms in a plane to 16.

Cards 3 are terminated by a planar group card with KIND=100.

Data Cards 4 Chiral Centres Specification (A4,2I3,4I5)

IDGRP KIND IHAND IA(1) IA(2) IA(3) IA(4)

 
   IDGRP is the residue name 
         (equivalent to  IN3  on  data  cards 1a).
   KIND is the residue type number (equivalent to IN2 on data cards 1a).
         KIND=100 terminates the chiral centre cards.
   IHAND = 1 for intrinsically chiral groups, = 0 for those whose
             chirality is related to nomenclature (e.g. Leu, Val)
   IA(1)..IA(4) are the numbers of the  atoms  within  the  group
       (equivalent  to  IN2  on  cards  1b.)  for  the  atom  at  the
        asymmetric centre and the three other atom that determine  the
        chirality of the group.

MET is chosen in the standard dictionaries to specify the Calpha centre for all handed amino acids.

Cards 4 are terminated by a chiral centres card with KIND=100.

Data Cards 5 Non-Bonded Contact Codes (A4,6X,3I5)

Cards 5a. Header cards
IDGRP KIND ND MD

 
   IDGRP is the residue name 
        (equivalent to  IN3  on  data  cards 1a).
   KIND is the residue type number (equivalent to IN2 on data cards 1a).
        KIND=100 terminates the non-bonded contact cards.
   ND is the number of non-bonded  contacts  specified  for  this
      group.
   MD is the number of  non-bonded  contacts  specified  for  the
      prolyl link group if KIND=0.

Cards 5b. Distance cards (10(2I3,I2))

As many of these cards as required are given following the header card to hold details of NA possible contact distances (up to 10 per card).
IATM(1) JATM(1) KTYP(1) IATM(2) JATM(2) KTYP(2) ...

 
   IATM(i) is the order within the residue of  the  origin  atom
           (equivalent to IN2 on a card 1b.).
   JATM(i) is the order within the residue of the target atom.
   KTYP(i) is the distance type code:
           =1, The  relative  position  of  the   given   atoms   is
               determined by only one torsion angle.
           =2, The  relative  position  of  the   given   atoms   is
               determined by two or more torsion angles.

Cards 5 are terminated by a header card with KIND=100.

Data Cards 6 Torsion Angle Specification Cards

Cards 6a. Header cards (A4,2I3,14I5)
IDGRP KIND NCHI IA(1) IA(2) ...

 
   IDGRP is the residue name 
         (equivalent to  IN3  on  data  cards  1a).
   KIND is the residue type number (equivalent to IN2 on data cards 1a).
        KIND=100 terminates the torsion angle cards.
   NCHI is the number of side chain (Chi) torsion angles for this
        residue.
   IA(1),  IA(2)...  are  the  atom  numbers  within  the   group
        specifying the torsion angles.
 
       e.g. for PHE
              IA(1), IA(2) ... =  3 1 2 3 1 2 5 6 7

               3 C(i-1)
               1 N(i)
               2 CA(i)
               3 C(i)
               1 N(i+1)
               2 CA(i+1)
               5 CB(i)
               6 CG(i)
               7 CD1(i)

               C(i-1)-N(i)-CA(i)-C(i)      specifies Phi
               N(i)-CA(i)-C(i)-N(i+1)      specifies Psi
               CA(i)-C(i)-N(i+1)-CA(i+1)   specifies omega
               N(i)-CA(i)-CB(i)-CG(i)      specifies Chi1
               CA(i)-CB(i)-CG(i)-CD1(i)    specifies Chi2

 
Cards 6b. Chi weighting codes (10X,6I5)
 
This card is not read if NCHI=0
 
IWT(1) IWT(2) ... IWT(NCHI)
 
    IWT(i) are the weighting codes for the NCHI side  chain  (Chi)
           angles as follows:

         0  no specifications
         2  planar (e.g. Chi5 of ARG)
         3  staggered (e.g. aliphatics)
         4  orthonormal (e.g. Chi2 of aromatics)


Cards 6c. Neighbour identifications of terminal group 
          and main chain  atoms (10X,6I5)
 
These are only read if KIND < 0.
 
MNABOR(1) ... MNABOR(6)
 
     MANBOR(1) ... MNABOR(6) are codes as follows:

     -1  atom is from residue i-1
      0  atom is from residue i
      1  atom is from residue i+1
      5  atom is from the terminal group
         (e.g. OT of the carboxyl terminus)


Cards 6d. Distance codes (6I4,2(4x,6I4))
 
DST1(1) ... DST1(6)  DST2(1) ... DST2(6)  DST3(1) ... DST3(6)
 
  DST1(1) ... DST1(6) are the distance codes for the Phi  angle.
         For an atom string 1-2-3-4 the six distances referred  
         to  are 1-2, 1-3, 1-4, 2-3, 2-4 and 3-4 respectively.  
         The  code  must correspond to a distance number identified 
         from cards 2.
 DST2(1) ... DST2(6) are the distance codes for the Psi angle.
 DST3(1) ... DST3(6) are  the  distance  codes  for  the  omega
             angle.
 
Cards 6 are terminated by a header card with KIND=100.
 

Data Cards 7 Secondary Structure Conformations (A20,I4,2F8.1)

LABEL KODE PHI PSI
LABEL is the label identifying this element of secondary structure. KODE is the number code for this type of structure as referred to in the subkeyword CHNTYP SECO of PROTIN. KODE=1 for alpha helix, KODE=2 for 3-10 helix, KODE=5 for parallel beta =6 for antiparallel beta, etc. PHI is the characteristic Phi value and PSI is the characteristic Psi value.

Data Cards 8 Thermal Ellipsoid Specification Cards

Cards 8a. Header cards (A4,I3,I3) IDGRP KIND NAT
IDGRP is the residue name (equivalent to IN3 on data cards 1a). KIND is the group number (equivalent to IN2 on data cards 1a). KIND=100 terminates the thermal ellipsoid cards. NAT is the number of atomic ellipsoids defined for this group. Cards 8b. Ellipsoid specifiers (5(I4,4I3)) As many of these cards as are required to specify NAT ellipsoids are given with up to 5 ellipsoids being specified per card. IATOM KPS1 KPS2 KPS3 KPS4 IATOM is the number of the atom within the group. KPS1 the the number of the atom within the group for atom P0. KPS2 the the number of the atom within the group for atom P1. KPS3 the the number of the atom within the group for atom S0. KPS4 the the number of the atom within the group for atom S1. Negative KPS numbers refer to atoms in the previous residue. Atom ellipsoids are oriented as follows: p = r(P1) - r(P0) s = r(S1) - r(S0) eps1 = p / p eps3 = ( p * s ) / ( p * s ) eps3 = eps3 * eps1 Cards 8 are terminated by a header card with KIND=100.