AREAIMOL (CCP4: Supported Program)
NAME
areaimol
- Analyse solvent accessible areas
SYNOPSIS
areaimol XYZIN
foo_in.pdb
[XYZIN2
foo2_in.pdb]
[XYZOUT
foo_out.pdb]
[Keyworded input]
The solvent accessible surface of a protein is defined
(Lee and Richards (1971)) as the locus of
the centre of a probe sphere (representing a solvent molecule) as it
rolls over the Van der Waals surface of the protein. AREAIMOL calculates
the solvent accessible surface area by generating surface points on an
extended sphere about each atom (at a distance from the atom centre equal
to the sum of the atom and probe radii), and eliminating those that lie
within equivalent spheres associated with neighbouring atoms. This is
different from the original Lee and Richards (1971) algorithm, which
is implemented in the program SURFACE. Note also,
that the solvent accessible surface is distinct from the molecular surface,
which is the locus of the inward-facing point of the probe sphere (the
sum of the contact and re-entrant surfaces).
AREAIMOL finds the solvent accessible area of atoms
in a PDB coordinate file, and summarises the accessible area by
residue, by chain and for the whole molecule. It will also attempt
to identify isolated areas of surface (which could be cavities
either within the molecule, or formed as a result of intermolecular
contacts).
It is capable of excluding specified residues from the calculations, and
of generating symmetry related molecules. It can also be used to compare
accessible area and analyse area differences.
Accessible areas (or area differences) for individual atoms can be written
to a pseudo-PDB output file
This is an extensively revised version of the old AREAIMOL program which
now also incorporates the functions of DIFFAREA, RESAREA and WATERAREA.
The flexibility of the area calculation has been extended by
the addition of new keywords PROBE (sets probe radius), PNTDEN (sets
precision of area calculation) and ATOM (allows new atom types to be
defined).
KEYWORDED INPUT
The keywords are spilt into three groups:
- Main keywords which control the type of calculation
- DIFFMODE,
MODE,
SMODE,
SYMMETRY,
TRANS
- Secondary keywords which control parameters within the
calculation
- ATOM,
EXCLUDE,
MATCHUP,
PNTDEN,
PROBE
- Auxiliary keywords
- VERBOSE,
OUTPUT,
END
MAIN KEYWORDS
DIFFMODE OFF | IMOL | WATER | COMPARE
This keyword controls the program function, the data required, and
how it is processed and analysed (see PROGRAM FUNCTION).
DIFFMODE must be the first keyword, unless it is omitted in which case
the program defaults to DIFFMODE OFF.
- Subkeywords:
-
- OFF [default]
-
This corresponds to the function of the original AREAIMOL program. A single
input file is required and a single accessible area calculation and analysis
is performed.
- IMOL
-
This mode analyses the differences in accessible areas of a molecule due to
different intermolecular contacts (generated from different sets of symmetry
operators and/or lattice translations).
- WATERS
-
This mode analyses the differences in accessible area for the waters in
a molecule depending on whether they are treated as solvent or as protein.
- COMPARE
-
This mode is used to analyse the area differences for atoms and residues which
are common between molecules held in two different files (XYZIN and XYZIN2).
The value of DIFFMODE may moderate the behaviour of other keywords:
MODE, SMODE,
SYMMETRY and TRANS(see below).
MODE ALL | NOHOH | HOH | HOHALL
Controls which type of residues are included and how they are treated.
There are four possible modes of operation, specified by one of the
subkeywords below.
- Subkeywords:
-
- NOHOH [default]
-
All waters (residue type HOH or WAT) are ignored.
- HOH
-
The accessible area will only be calculated for waters (HOH or WAT),
treating other waters as solvent. Only waters will be analysed.
- HOHALL
-
As HOH, but waters are treated as protein, and consequently more
waters will have low solvent accessiblity.
- ALL
-
Calculate accessible area for all atoms, including waters if present
in file. Water atoms are treated as solvent when calculating
accessible area.
Warning: waters may have large accessible area assigned in this MODE,
leading to unrealistically inflated estimates of the total accessible
area. Check the output carefully.
Under DIFFMODE WATERS, the MODE keyword is
redundant and is ignored.
SMODE IMOL | OFF
Symmetry mode keyword which is used to look at intermolecular contacts.
There are two options:
- Subkeywords:
-
- IMOL
-
Account for intermolecular contacts by generating symmetry related
molecules from coordinates in Brookhaven
file before calculating accessible areas, using symmetry operators
supplied by the SYMMETRY keyword. (The
TRANS keyword can also be used, to generate molecules
related by lattice translation symmetry.)
- OFF [default]
-
Symmetry related atoms will not be generated and intermolecular contacts
are not accounted for.
Under DIFFMODE IMOL, the SMODE keyword is
redundant and is ignored.
SYMMETRY <space-group name | space-group
number | symmetry operators>
Read the symmetry operations, specified as a name (eg P212121), the
International Tables number, or as a series of symmetry operations (e.g.
SYMMETRY X,Y,Z * -X,Y+1/2,-Z). In the latter case, all the symmetry
operators must be supplied on a single SYMMETRY keyword.
If the SYMMETRY keyword is omitted when SMODE has been specified as IMOL
then the program will generate symmetry related molecules assuming P1
symmetry (essentially, lattice translations only). If SMODE is OFF then
the SYMMETRY keyword is optional.
Under DIFFMODE IMOL a second SYMMETRY keyword
is neccessary, to specify the symmetry operators required for the second
area calculation (see below).
Note that unlike previous versions of the program, it is no longer necessary
to manually exclude the identity operation when entering symmetry operations.
The identity is implicity assumed. If the identity is the only operation that
has been entered (or if P1 symmetry is specified) then a warning may
appear, but this can be ignored (unless you are not in P1 symmetry).
TRANS [ NONE | 1 | 2 | BOTH ]
TRANSlation keyword. This causes the program to generate additional
symmetry-related molecules by applying 125 translations made up from linear
combinations of the primitive lattice vectors (+/-2 lattice vectors in each
direction). Combining these with the spacegroup operators via the
SYMMETRY keyword will generate the crystal lattice.
Only takes effect if DIFFMODE IMOL or
SMODE IMOL have been specified.
- Subkeywords for DIFFMODE IMOL:
-
- 1 (or 2)
-
Apply the translation vectors on the first (or second) area calculation only.
- BOTH
-
Apply the translations on both the first and the second area calculation.
- NONE
-
[Default] Do not apply any translations.
For SMODE IMOL, NONE turns off the translations [default] and TRANS on its
own is sufficient to switch them on.
Secondary Keywords
ATOM <name> <no> <radius>
Add or change an atom type and associated Van der Waals radius recognised
by the program. <name> is the element name (as appears in
columns 13-14 of the pdb file), and can be given in either upper or lower
case (it is automatically upper-cased and right-justified before being
processed).
<no> is the atomic number and <radius> is the
Van der Waals radius to be assigned to this atom type, in Angstroms.
If both <name> and <no> match those belonging to
an atom already in the list then its Van der Waal radius will be changed
to <radius>. If only one of either match, then the program
ignores that occurance of the ATOM keyword and the radius will remain
unchanged.
AREAIMOL assumes a single radius for each element, and only recognises
a limited number of different elements. Unknown atom types (i.e. those not
in AREAIMOL's internal database) will be asigned the default radius of 1.8 A.
The list of recognised atoms is:
Name Atomic no. VdW rad. (A)
-----------------------------
C 6 1.80
N 7 1.65
O 8 1.60
MG 12 1.60
S 16 1.85
P 15 1.90
CL 17 1.80
CO 27 1.80
The ATOM keyword must appear once for each atom definition. The program
can store up to twelve new atom types, in addition to those listed above.
EXCLUDE <residue1> <residue1> ...
Here residuen represents a three-character residue name (eg
ARG for arginine). Atoms belonging to any of the named residues will be
ignored in the area calculations, and will not be written to the
output Brookhaven file.
Any number of specified residue names can appear together after a single
EXCLUDE, separated by a space (eg EXCLUDE PRO ARG GLY). The EXCLUDE
keyword can also be repeated any number of times with one or more
specified residue names.
There is a maximum number of excluded residues which is set inside the
program (currently 30). If there are more than this limit then extra names
will not be recorded. Names entered in lower case will automatically be
converted to uppercase. Note also that the program does not check that the
entries given are valid residue names, or if any are repeated.
In DIFFMODE COMPARE, the named residues will
be excluded from both of the input files before the areas are calculated.
MATCHUP ALL | NOCOORDS
Default: ALL
In DIFFMODE COMPARE MATCHUP sets the
comparision criteria used when doing comparision of XYZIN and XYZIN2:
- ALL
- (default) Atoms are only included in the comparision compared if atom name,
residue name/number, chain id and atomic coordinates are the same between both
files.
- NOCOORDS
- Uses the same criteria as ALL, except that differences in the atomic
coordinates between the files are ignored. This makes it suitable for use
with different conformations.
Atoms which are not included in the comparision are ignored in the output.
MATCHUP is only available for DIFFMODE COMPARE.
PNTDEN <point_density>
The pointdensity keyword sets the precision of the area calculation.
<point_density> is the number of points per square angstrom, so that
the smallest area that can be calculated is the reciprocal of this value. The
default is <point_density> = 1 point per square angstrom.
Note: High values of <point_density> allow more precise estimates of the
accessible surface area, but will take longer to calculate - and if
<point_density> is too large then the program may exceed its memory
resources and stop. At lower values of <point_density> it is possible that
atoms with low surface accessibility may be diagnosed as having no accesible
surface area at all.
PROBE <x>
Sets the radius of the solvent molecule used as a probe in the area calculations
to be equal to <x> angstroms.
The probe radius must be greater than zero, up to a limit of 25 A. The default
radius is 1.4 A.
Auxiliary Keywords
VERBOSE
Switch on extended (i.e. verbose) printer output. In addition to the
output described in 'PRINTER OUTPUT', the log
file will also contain the following information:
- A list of the recognised atom types and their associated radii
(see the ATOM keyword),
- The matrices derived from the SCALE cards of the input file,
- The symmetry matrices used to generate symmetry-related atoms
in SMODE IMOL or DIFFMODE
IMOL
OUTPUT
The OUTPUT keyword causes a list of atoms to be written to the file with logical
name XYZOUT. This file has a pseudo-pdb format and should contain the CRYST1
and SCALE cards from the input file, plus for each atom: the coordinates,
the associated residue, and the accessible area (if DIFFMODE
OFF) or area difference (in other DIFFMODES) in the B-factor column.
This is intended to mimic the output from the old AREAIMOL program.
NB: The input pdb file must contain CRYST1 cards for the OUTPUT option to
function.
END
(Optional) Specifies the end of keyworded input and starts AREAIMOL running.
- XYZIN
- Input coordinates in CCP4 PDB format. Must contain SCALE cards if
symmetry-related molecules are required. Must contain CRYST1 card if the
OUTPUT option is being used.
- XYZIN2
- This gives a second set of input coordinates, and is only used in
DIFFMODE COMPARE
- XYZOUT
- Output coordinate file in a pseudo-PDB format,
where the B-factor column contains the accessible area calculated for each of the
atoms (or area difference if DIFFMODE is anything other than OFF). This is an
attempt to reproduce part of the functionality of the old AREAIMOL program.
See keyword OUTPUT.
Notes
- Atoms with zero occupancy in the input files will be ignored in
the area calculations; those with finite occupancies less than unity
will be included. In either instance a warning message will appear. Also,
the program does not recognise 'TER' cards and will skip them, continuing to
read in any atoms appearing afterwards.
- The program always ignores hydrogens which are present in the input file(s).
This is because the van der Waals radii used in the area calculations implicitly
include the atomic hydrogens.
PRINTER OUTPUT
For each area calculation performed by the program it will output an
analysis of the accessible area by residue, by chain, and for the whole
molecule. For each chain the accessible area of each residue will be listed,
followed by the total for the chain. In the cases where only waters are
considered (DIFFMODE WATERS, or MODEs HOH or HOHALL) an additional breakdown
is presented of the waters which have no accessible area, and
those which have areas < 5 A2, < 10 A2 and
> 10 A2.
The program also outputs the contact area for each residue, chain and for
the whole molecule. The contact area is defined as the area on the Van
der Waals surface of an atom that can be contacted by a sphere of the given
probe radius.
For modes NOHOH and ALL the program analyses the atoms which have been
assigned accessible area and tries to determine how many isolated areas
of surface there are (i.e. areas of surface which are unnconnected to each
other on the original molecule).
Multiple isolated surfaces could represent any combination of:
- cavities within a single molecule,
- cavities formed as a result of intermolecular contacts (in which case
the calculated area will be the area of the part of the cavity formed by a
single molecule, and not of the whole cavity), and
- separate molecules or clusters of atoms.
For each isolated area of surface identified, the program reports the number
of atoms, the total accessible area and the centre of mass.
In the case when differences in area are calculated (DIFFMODE other than OFF),
an additional analysis is presented of the number of each atom type which have
non-zero area differences. This is summarised in a table with the following
quantities:
- Number
-
the number of atoms of each type read in from the input file
- Area1(Area2)
-
the total accessible areas for all atoms of that type
from the first (second) area calculations
- N-diff
-
the number of that atom type which have non-zero
area differences
- Area-diff
-
the total difference in accessible area
for all atoms of that type
There is also a breakdown of accessible area differences by residue, chain
and for the whole molecule.
Additional output can be obtained by specifying the VERBOSE
keyword. This causes the program to print out diagnostic information such as
recognised atom types and radii and the symmetry matrices derived from the
symmetry cards.
PROGRAM FUNCTION
Analysis of surface accessible areas and area differences.
There were originally four programs to analyse solvent accessible area
(AREAIMOL, RESAREA, WATERAREA and DIFFAREA). This version combines the
function of the original set of programs into a single run which is
controlled by the DIFFMODE keyword:
DIFFMODE OFF
This mode analyses the accessible surface area of a molecule.
In the most basic mode of operation the program performs
a single area calculation, obtaining the solvent accessibility of each atom
under consideration. These individual
areas are then used to obtain an analysis of the total accessible area for
each residue, chain and for the whole molecule.
The MODE keyword can be used to exclude certain types
of residue (e.g. waters) from the calculation. The effect of intermolecular
contacts (which will reduce the accessible area) can be included using the
SMODE keyword (which generates symmetry-related copies
of the original molecule by applying the symmetry operations supplied with
the SYMMETRY keyword) and the
TRANS keyword (which will apply linear combinations of
primitive lattice vectors to the symmetry-related molecules to generate
further copies). Combining the primitive lattice vectors with spacegroup
symmetry will effectively generate the crystal lattice.
This reproduces the function of the old AREAIMOL program followed by either
WATERAREA or RESAREA as appropriate.
DIFFMODE IMOL
This mode compares the difference in accessible due to the presence of
intermolecular contacts, e.g. changes in accessible area due to oligmer
formation.
Two area calculations are performed, one for each set of supplied symmetry
operations (see SYMMETRY and TRANS
keywords - if only one set of operators is supplied then the second set is
assumed to consist of the identity). The difference in accessible area on each
atom is then calculated and the overall area differences analysed.
The SMODE keyword has no function under the DIFFMODE IMOL option, and the
SYMMETRY keyword can appear twice: each occurance gives the operators for one
calculation of accessible area. Other keywords maintain their function and
take effect during both calculations.
DIFFMODE WATERS
This mode only considers waters and compares the difference in accessible
area when waters are treated as solvent as opposed to as protein (ie
water treated as protein can 'obscure' surface area on other waters).
Only one set of coordinates is input, and two separate area calculations are
carried out (the first treating waters as solvent, i.e. equivalent to MODE HOH,
and the second treating them as protein, i.e. equivalent to MODE HOHALL). The
area differences are then calculated and output.
The results of the calculations can be interpreted as follows:
- Waters with zero area in the first calculation are completely
enclosed by protein.
- Waters which have non-zero area in the first calculation but zero
area in the second are enclosed between protein and/or other waters
- Waters which have non-zero area in both calculations are not
completely enclosed by protein and/or waters and so are on the `outside'
of this shell.
The value of the area difference for each water listed is equal to the
reduction in accessible area due to being obscured by neighbouring
waters. Waters buried completely in protein will not be listed in the area
difference analysis.
The MODE keyword has no function under this option, although the other keywords
maintain their function and take effect during both calculations.
DIFFMODE COMPARE
This mode compares the difference in accessible areas for two similar molecules,
e.g. changes due to substrate or ligand binding.
Two input coordinate files are required, and two separate area calculations
are carried out, one for each set of coordinates. The same MODE and symmetry
operators etc (if relevant) are used in each case, so the resulting area
differences will depend only on differences between the contents of the files.
Area differences are calculated only for those atoms which are common to both
files.
E.g. if one file describes a protein bound to a ligand and the other describes
the protein alone, then using this mode will calculate the change in surface
area of the protein in the presence of the ligand, or more specifically the area
obscured by the ligand.
GENERAL NOTES
The following comments are based on those in the original documentation:
The area calculations also depend critically
upon various parameters, such as the probe radius (taken to be 1.4 A for
most calculations) and the van der Waals's radii chosen for different
atoms. Many programs (including AREAIMOL) choose one radius for all
carbons, one radius for all nitrogens, one for all oxygens, whereas
others (e.g. SURFACE) are able to differentiate
between different carbons (aliphatic, aromatic etc.), different nitrogens
and so on.
SURFACE assigns the Van der Waal's radius for a given atom
according to both the element and also the residue in which it appears,
and thus may lead to differences in estimates of the accessible area.
Note that SURFACE calculates both the accessible area and
the contact area, but does not include options for accounting for intermolecular
contacts.
Unix examples script found in $CEXAM/unix/runnable/
areaimol.exam
REFERENCES
- B.Lee and F.M.Richards, J.Mol.Biol., 55, 379 - 400 (1971)
AUTHOR
Originator: Peter Brick, Imperial College
Substantial modifications/additional features: Peter Briggs, CCP4
SEE ALSO
surface, contact