AREAIMOL (CCP4: Supported Program)
NAMEareaimol - Analyse solvent accessible areas
SYNOPSISareaimol XYZIN foo_in.pdb [XYZIN2 foo2_in.pdb] [XYZOUT foo_out.pdb]
DESCRIPTIONThe solvent accessible surface of a protein is defined (Lee and Richards (1971)) as the locus of the centre of a probe sphere (representing a solvent molecule) as it rolls over the Van der Waals surface of the protein. AREAIMOL calculates the solvent accessible surface area by generating surface points on an extended sphere about each atom (at a distance from the atom centre equal to the sum of the atom and probe radii), and eliminating those that lie within equivalent spheres associated with neighbouring atoms. This is different from the original Lee and Richards (1971) algorithm, which is implemented in the program SURFACE. Note also, that the solvent accessible surface is distinct from the molecular surface, which is the locus of the inward-facing point of the probe sphere (the sum of the contact and re-entrant surfaces).
AREAIMOL finds the solvent accessible area of atoms in a PDB coordinate file, and summarises the accessible area by residue, by chain and for the whole molecule. It will also attempt to identify isolated areas of surface (which could be cavities either within the molecule, or formed as a result of intermolecular contacts). It is capable of excluding specified residues from the calculations, and of generating symmetry related molecules. It can also be used to compare accessible area and analyse area differences. Accessible areas (or area differences) for individual atoms can be written to a pseudo-PDB output file
This is an extensively revised version of the old AREAIMOL program which now also incorporates the functions of DIFFAREA, RESAREA and WATERAREA. The flexibility of the area calculation has been extended by the addition of new keywords PROBE (sets probe radius), PNTDEN (sets precision of area calculation) and ATOM (allows new atom types to be defined).
The keywords are spilt into three groups:
DIFFMODE OFF | IMOL | WATER | COMPAREThis keyword controls the program function, the data required, and how it is processed and analysed (see PROGRAM FUNCTION). DIFFMODE must be the first keyword, unless it is omitted in which case the program defaults to DIFFMODE OFF.
MODE ALL | NOHOH | HOH | HOHALLControls which type of residues are included and how they are treated. There are four possible modes of operation, specified by one of the subkeywords below.
SMODE IMOL | OFFSymmetry mode keyword which is used to look at intermolecular contacts. There are two options:
SYMMETRY <space-group name | space-group number | symmetry operators>Read the symmetry operations, specified as a name (eg P212121), the International Tables number, or as a series of symmetry operations (e.g. SYMMETRY X,Y,Z * -X,Y+1/2,-Z). In the latter case, all the symmetry operators must be supplied on a single SYMMETRY keyword.
If the SYMMETRY keyword is omitted when SMODE has been specified as IMOL then the program will generate symmetry related molecules assuming P1 symmetry (essentially, lattice translations only). If SMODE is OFF then the SYMMETRY keyword is optional.
Under DIFFMODE IMOL a second SYMMETRY keyword is neccessary, to specify the symmetry operators required for the second area calculation (see below).
Note that unlike previous versions of the program, it is no longer necessary to manually exclude the identity operation when entering symmetry operations. The identity is implicity assumed. If the identity is the only operation that has been entered (or if P1 symmetry is specified) then a warning may appear, but this can be ignored (unless you are not in P1 symmetry).
TRANS [ NONE | 1 | 2 | BOTH ]
TRANSlation keyword. This causes the program to generate additional symmetry-related molecules by applying 125 translations made up from linear combinations of the primitive lattice vectors (+/-2 lattice vectors in each direction). Combining these with the spacegroup operators via the SYMMETRY keyword will generate the crystal lattice.
Only takes effect if DIFFMODE IMOL or SMODE IMOL have been specified.
For SMODE IMOL, NONE turns off the translations [default] and TRANS on its own is sufficient to switch them on.
ATOM <name> <no> <radius>Add or change an atom type and associated Van der Waals radius recognised by the program. <name> is the element name (as appears in columns 13-14 of the pdb file), and can be given in either upper or lower case (it is automatically upper-cased and right-justified before being processed). <no> is the atomic number and <radius> is the Van der Waals radius to be assigned to this atom type, in Angstroms.
If both <name> and <no> match those belonging to an atom already in the list then its Van der Waal radius will be changed to <radius>. If only one of either match, then the program ignores that occurance of the ATOM keyword and the radius will remain unchanged.
AREAIMOL assumes a single radius for each element, and only recognises a limited number of different elements. Unknown atom types (i.e. those not in AREAIMOL's internal database) will be asigned the default radius of 1.8 A. The list of recognised atoms is:
Name Atomic no. VdW rad. (A) ----------------------------- C 6 1.80 N 7 1.65 O 8 1.60 MG 12 1.60 S 16 1.85 P 15 1.90 CL 17 1.80 CO 27 1.80The ATOM keyword must appear once for each atom definition. The program can store up to twelve new atom types, in addition to those listed above.
EXCLUDE <residue1> <residue1> ...Here residuen represents a three-character residue name (eg ARG for arginine). Atoms belonging to any of the named residues will be ignored in the area calculations, and will not be written to the output Brookhaven file.
Any number of specified residue names can appear together after a single EXCLUDE, separated by a space (eg EXCLUDE PRO ARG GLY). The EXCLUDE keyword can also be repeated any number of times with one or more specified residue names.
There is a maximum number of excluded residues which is set inside the program (currently 30). If there are more than this limit then extra names will not be recorded. Names entered in lower case will automatically be converted to uppercase. Note also that the program does not check that the entries given are valid residue names, or if any are repeated.
In DIFFMODE COMPARE, the named residues will be excluded from both of the input files before the areas are calculated.
MATCHUP ALL | NOCOORDS
In DIFFMODE COMPARE MATCHUP sets the comparision criteria used when doing comparision of XYZIN and XYZIN2:
Atoms which are not included in the comparision are ignored in the output. MATCHUP is only available for DIFFMODE COMPARE.
PNTDEN <point_density>The pointdensity keyword sets the precision of the area calculation. <point_density> is the number of points per square angstrom, so that the smallest area that can be calculated is the reciprocal of this value. The default is <point_density> = 1 point per square angstrom.
Note: High values of <point_density> allow more precise estimates of the accessible surface area, but will take longer to calculate - and if <point_density> is too large then the program may exceed its memory resources and stop. At lower values of <point_density> it is possible that atoms with low surface accessibility may be diagnosed as having no accesible surface area at all.
PROBE <x>Sets the radius of the solvent molecule used as a probe in the area calculations to be equal to <x> angstroms.
The probe radius must be greater than zero, up to a limit of 25 A. The default radius is 1.4 A.
VERBOSESwitch on extended (i.e. verbose) printer output. In addition to the output described in 'PRINTER OUTPUT', the log file will also contain the following information:
OUTPUTThe OUTPUT keyword causes a list of atoms to be written to the file with logical name XYZOUT. This file has a pseudo-pdb format and should contain the CRYST1 and SCALE cards from the input file, plus for each atom: the coordinates, the associated residue, and the accessible area (if DIFFMODE OFF) or area difference (in other DIFFMODES) in the B-factor column. This is intended to mimic the output from the old AREAIMOL program.
NB: The input pdb file must contain CRYST1 cards for the OUTPUT option to function.
END(Optional) Specifies the end of keyworded input and starts AREAIMOL running.
INPUT AND OUTPUT FILES
PRINTER OUTPUTFor each area calculation performed by the program it will output an analysis of the accessible area by residue, by chain, and for the whole molecule. For each chain the accessible area of each residue will be listed, followed by the total for the chain. In the cases where only waters are considered (DIFFMODE WATERS, or MODEs HOH or HOHALL) an additional breakdown is presented of the waters which have no accessible area, and those which have areas < 5 A2, < 10 A2 and > 10 A2.
The program also outputs the contact area for each residue, chain and for the whole molecule. The contact area is defined as the area on the Van der Waals surface of an atom that can be contacted by a sphere of the given probe radius.
For modes NOHOH and ALL the program analyses the atoms which have been assigned accessible area and tries to determine how many isolated areas of surface there are (i.e. areas of surface which are unnconnected to each other on the original molecule). Multiple isolated surfaces could represent any combination of:
In the case when differences in area are calculated (DIFFMODE other than OFF), an additional analysis is presented of the number of each atom type which have non-zero area differences. This is summarised in a table with the following quantities:
There is also a breakdown of accessible area differences by residue, chain and for the whole molecule.
Additional output can be obtained by specifying the VERBOSE keyword. This causes the program to print out diagnostic information such as recognised atom types and radii and the symmetry matrices derived from the symmetry cards.
PROGRAM FUNCTIONAnalysis of surface accessible areas and area differences.
There were originally four programs to analyse solvent accessible area (AREAIMOL, RESAREA, WATERAREA and DIFFAREA). This version combines the function of the original set of programs into a single run which is controlled by the DIFFMODE keyword:
DIFFMODE OFFThis mode analyses the accessible surface area of a molecule.
In the most basic mode of operation the program performs a single area calculation, obtaining the solvent accessibility of each atom under consideration. These individual areas are then used to obtain an analysis of the total accessible area for each residue, chain and for the whole molecule.
The MODE keyword can be used to exclude certain types of residue (e.g. waters) from the calculation. The effect of intermolecular contacts (which will reduce the accessible area) can be included using the SMODE keyword (which generates symmetry-related copies of the original molecule by applying the symmetry operations supplied with the SYMMETRY keyword) and the TRANS keyword (which will apply linear combinations of primitive lattice vectors to the symmetry-related molecules to generate further copies). Combining the primitive lattice vectors with spacegroup symmetry will effectively generate the crystal lattice.
This reproduces the function of the old AREAIMOL program followed by either WATERAREA or RESAREA as appropriate.
DIFFMODE IMOLThis mode compares the difference in accessible due to the presence of intermolecular contacts, e.g. changes in accessible area due to oligmer formation.
Two area calculations are performed, one for each set of supplied symmetry operations (see SYMMETRY and TRANS keywords - if only one set of operators is supplied then the second set is assumed to consist of the identity). The difference in accessible area on each atom is then calculated and the overall area differences analysed.
The SMODE keyword has no function under the DIFFMODE IMOL option, and the SYMMETRY keyword can appear twice: each occurance gives the operators for one calculation of accessible area. Other keywords maintain their function and take effect during both calculations.
DIFFMODE WATERSThis mode only considers waters and compares the difference in accessible area when waters are treated as solvent as opposed to as protein (ie water treated as protein can 'obscure' surface area on other waters).
Only one set of coordinates is input, and two separate area calculations are carried out (the first treating waters as solvent, i.e. equivalent to MODE HOH, and the second treating them as protein, i.e. equivalent to MODE HOHALL). The area differences are then calculated and output.
The results of the calculations can be interpreted as follows:
The MODE keyword has no function under this option, although the other keywords maintain their function and take effect during both calculations.
DIFFMODE COMPAREThis mode compares the difference in accessible areas for two similar molecules, e.g. changes due to substrate or ligand binding.
Two input coordinate files are required, and two separate area calculations are carried out, one for each set of coordinates. The same MODE and symmetry operators etc (if relevant) are used in each case, so the resulting area differences will depend only on differences between the contents of the files. Area differences are calculated only for those atoms which are common to both files.
E.g. if one file describes a protein bound to a ligand and the other describes the protein alone, then using this mode will calculate the change in surface area of the protein in the presence of the ligand, or more specifically the area obscured by the ligand.
GENERAL NOTESThe following comments are based on those in the original documentation:
The area calculations also depend critically upon various parameters, such as the probe radius (taken to be 1.4 A for most calculations) and the van der Waals's radii chosen for different atoms. Many programs (including AREAIMOL) choose one radius for all carbons, one radius for all nitrogens, one for all oxygens, whereas others (e.g. SURFACE) are able to differentiate between different carbons (aliphatic, aromatic etc.), different nitrogens and so on.
SURFACE assigns the Van der Waal's radius for a given atom according to both the element and also the residue in which it appears, and thus may lead to differences in estimates of the accessible area.
Note that SURFACE calculates both the accessible area and the contact area, but does not include options for accounting for intermolecular contacts.
Unix examples script found in $CEXAM/unix/runnable/
AUTHOROriginator: Peter Brick, Imperial College
Substantial modifications/additional features: Peter Briggs, CCP4
SEE ALSOsurface, contact