
|
PDB FORMAT: (CCP4: Formats)
NAME
PDB format for CCP4
- the PDB coordinate format as used in CCP4
CCP4 uses a subset of the PDB format for holding information on
atomic coordinates and other model information. The current format
is based on the PDB version 2.1 draft format, see
full format description.
CCP4 programs also use the PDB format for holding other information,
such as lists of peaks in maps, Patterson vectors, etc.
Authors of this document: John W. Campbell, Adam. C. Ralph and Martyn Winn.
- INTRODUCTION
- FORMAT OF THE 'ATOM/HETATM'CARDS
- FORMAT OF THE 'TER' CARDS
- FORMAT OF THE 'CRYST1' CARD
- FORMAT OF THE 'SCALE' CARDS
- FORMAT OF THE 'ANISOU' CARD
- STANDARD RESIDUE NAMES
- ATOM IDENTIFIERS FOR AMINO ACIDS
- REFERENCES
- INTRODUCTION
The standard coordinate data file format adopted for the CCP4 protein
crystallography program suite is that of the Brookhaven Protein Data Bank
(ref. 1). The programs will handle either
complete files or files containing only a subset of the types of record
which may be present in a complete file. In particular, the records
containing the coordinate data (ATOM, HETATM and ANISOU records) are of
interest. Their structures and those of the TER, CRYST1 and SCALE records,
which are also used by the file handling subroutines described in
rwbrook.html, are outlined below. The Brookhaven format
defines a standard setting of orthogonal axes with respect to the
crystallographic axes and this has been adopted as the standard for
the CCP. The standard set of orthogonal axes XO, YO and ZO is
defined as follows:
XO // a
YO // c* × a
ZO // c*
Within a Brookhaven format file, however, coordinates may be held with
respect to other sets of axes. If files are in a non-standard axis setting
then the CRYST1 or SCALE cards must be present. A complete description
of the file format is available from the
Brookhaven Protein Data Bank
but some selected features, relevant to the handling of the coordinate
data are described below. In general terms it may be noted that the
format is basically a card image format with fixed length 80 byte records.
- FORMAT OF THE 'ATOM/HETATM' CARDS
The format of an 'ATOM' card or 'HETATM' card is as follows:
Cols. 1-6 Record name "ATOM " or "HETATM"
7-11 Atom serial number (see note i)
13-14 Chemical symbol (right justified) )
15 Remoteness indicator ) (see note ii)
16 Branch designator )
17 Alternate location indicator (see note iii)
18-20 Residue name (see note iv)
21 Reserved )
22 Chain identifier )
) (see note v)
23-26 Residue sequence number )
27 Code for inserting residue )
31-38 X )
39-46 Y ) Orthogonal Angstrom coordinates
47-54 Z )
55-60 Occupancy
61-66 Isotropic B-factor
73-76 Segment identifier, left justified (used by XPLOR)
77-78 Element symbol, right justified )
) (see note vi)
79-80 Charge on atom )
Typical format:
(6A1,I5,1X,A4,A1,A3,1X,A1,I4,A1,3X,3F8.3,2F6.2,6X,2A4)
Notes:
- Residues occur in order of their sequence numbers which
always increase starting from the N-terminal residue. Within
each residue, the order of the atoms does not matter in general.
However, there is a standard order defined by the PDB standard.
If the residue sequence is known, certain
serial numbers may be omitted to allow for the future inser-
tion of any missing atoms. If the sequence is not reliably
known these serial numbers are simply ordinals.
- The atom names are described in below.
- Alternate locations for atoms may be denoted by A, B, C etc. here.
- The standard residue names are given in paragraph 6 below.
- The sequence identifier is a composite field made up as follows:
Cols. 21 Reserved for future expansion
22 Chain identifier, e.g. A for Haemoglobin
alpha chain
23-26 Residue sequence number
27 Code for insertions of residues,
e.g. 66A, 66B etc.
- FORMAT OF THE 'TER' CARDS
'TER' cards are used to indicate chain terminations. They are placed
at the appropriate positions within the atom cards. The format of a 'TER'
card is as follows:
Cols. 1-3 Record name "TER"
7-11 Serial number
18-20 Residue name
21-27 Sequence identifier (see description of 'ATOM'
cards above)
Typical format: (6A1,I5,6X,A3,1X,A1,I4,A1)
- FORMAT OF THE 'CRYST1' CARD
This card holds the cell parameters and has the following format
Cols. 1-6 Record name "CRYST1"
7-15 a (Å)
16-24 b (Å)
25-33 c (Å)
34-40 alpha (°)
41-47 beta (°)
48-54 gamma (°)
56-66 Space group symbol, left justified (not used)
67-70 Z (not used)
Typical Format: (6A1,3F9.3,3F7.2,1X,11A1,I4)
- FORMAT OF THE 'SCALE' CARDS
These cards hold the matrix for transforming the stored orthogonal
Angstrom coordinates to fractional crystallographic coordinates. Three
cards are required. 'S' is the rotation matrix and 'U' is the translation
matrix. The format of the cards is as follows.
Cols. 1-6 SCALE1 SCALE2 SCALE3
11-20 S11 S21 S31
21-30 S12 S22 S32
31-40 S13 S23 S33
46-55 U1 U2 U3
Typical Format: (6A1,4X,3F10.5,5X,F10.5)
To remind you:
If you have a PDB file
,
where
,
the matrix at the head of a PDB file, i.e.
.
Therefore, extending Scalei to ScaleiExt so that the
4×4 inverse matrix can be generatedfootnote ¶
,
.
The programs COORDCONV, VECTORS and HAVECS will all
convert (various formats of) fractional coordinates to orthogonal ones.
HAVECS's PHARE input type corresponds to MLPHARE's
output coordinate format.
Footnote
¶: The extension line
([ 0.00 0.00 0.00 1.00]) is necessary to cope with
[Scale14,Scale24,Scale34],
the translation component of the transformation.
- FORMAT OF 'ANISOU' CARD
This card holds information about the anisotropic temperature factors for
a particular atom, if they are refined. Note that columns 7-27 and 73-80 are
the same as the corresponding atom card. The temperature factors are
multiplied by a factor of 10**4, held as integers and represent othogonal
Us. The axis system they are based on is the same as that on which the
orthogonal co-ordinates are based. The format follows:
Cols: 1-6 Record name "ANISOU"
7-11 Atom serial number.
13-16 Atom name
17 Alternate location indicator.
18-20 Residue name
22 Chain identifier.
23-26 Residue sequence number.
27 Insertion code.
29-35 U(1,1)
36-42 U(2,2)
43-49 U(3,3)
50-56 U(1,2)
57-63 U(1,3)
64-70 U(2,3)
73-76 Segment identifier, left-justified.
77-78 Element symbol, right-justified.
79-80 Charge on the atom.
The isotropic temperature factor defined in the ATOM card is defined as:
Biso = 8pi² × (U(1,1) + U(2,2) + U(3,3))/3
- STANDARD RESIDUE NAMES
The residue abbreviations for the amino acids conform to the IUPAC-IUB
rules (ref. 2). Non-standard residues are
given a three character abbreviation chosen by the user. The amino
acids and their abbreviations are given in the table below.
Residue | Abb. | Residue | Abb.
| | |
| Acidic unknown | ACD | Homoserine | HSE
| Acetyl | ACE | Hydroxyproline | HYP
| Alanine | ALA | Hydroxylysine | HYL
| beta-Alanine | ALB | Isoleucine | ILE
| Aliphatic unknown | ALI | Leucine | LEU
| gamma-Aminobutyric acid | ABU | Lysine | LYS
| Arginine | ARG | Methionine | MET
| Aromatic unknown | ARO | Ornithine | ORN
| Asparagine | ASN | Phenylalanine | PHE
| Aspartic acid | ASP | Proline | PRO
| ASP/ASN ambiguous | ASX | Pyrollidone carboxylic acid | PCA
| Basic unknown | BAS | Sarcosine | SAR
| Betaine | BET | Serine | SER
| Cysteine | CYS | Taurine | TAU
| Cystine | CYS | Terminator | TER
| Formyl | FOR | Threonine | THR
| Glutamic acid | GLU | Thyroxine | THY
| Glutamine | GLN | Tryptophan | TRP
| GLU/GLN ambiguous | GLX | Tyrosine | TYR
| Glycine | GLY | Unknown | UNK
| Heterogen | HET | Valine | VAL
| Histidine | HIS | Water | HOH
|
- ATOM IDENTIFIERS FOR AMINO ACIDS
The atom names used follow the IUPAC-IUB rules
(ref. 3) except that
the Greek letter remoteness codes are transliterated as follows:
alpha - A beta - B gamma - G delta - D
epsilon - E zeta - Z eta - H
Atoms for which some ambiguity exists in the crystallographic
results are designated A. This will usually apply only to the
terminal atoms of asparagine and glutamine and to the ring atoms
of histidine.
The extra oxygen of the carboxyl terminal amino acid is designated OXT.
Four characters are reserved for the atom names as follows:
1-2 Chemical symbol - right justified
3 Remoteness indicator (alphabetic)
4 Branch designator (numeric)
The does not have to be adhered strictly because the chemical symbol
(element name) is defined in columns 77-78. This definition will be
taken in preference.
- REFERENCES
- F.C. Bernstein, T.F. Koetzle, G.J.B. Williams, E.F. Meyer, Jr., M.D.
Brice, J.R. Rodgers, O. Kennard, T. Shimanouchi and M. Tasumi, J. Mol.
Biol., 112 , 535-42 (1977).
- J. Biol. Chem., 241 , 527, 2491 (1966).
- IUPAC-IUB Commission on Biological Nomenclature. "Abbreviations and
Symbols for the Description of the Conformation of Polypeptide Chains.
Tentative Rules (1969)", J. Biol. Chem., 245 , 6489 (1970).
|