**refmac** **XYZIN** *foo_cycle_i.brk* **HKLIN** *foo.mtz*
**PROTOUT ***protout.cycle_i* **PROTCOUNTS** *counts.cycle_i*
**XYZOUT** *foo_cycle_j.brk* **HKLOUT** *foo_cycle_j.mtz*

[Keyworded input]

The REFMAC program can carry out rigid body, restrained or unrestrained refinement against Xray data, or idealisation of a macromolecular structure. It minimises the coordinate parameters to satisfy either a Maximum Likelihood or Least Squares residual. There are options to use different minimization methods. If the user wishes to invoke geometric restraints the program PROTIN which analyses the protein geometry and produces an output file containing restraints information must be run prior to running REFMAC. REFMAC also produces a MTZ output file containing weighted coefficients for SigmaA weighted mFo-DFcalc and 2mFo-DFcalc maps, where "missing data" have been restored.

[Program organisation and communication with it is illustrated in the EXAMPLES section.]

* If you use REFMAC please refer to one of these papers!!! *

- "Application of Maximum Likelihood Refinement" G. Murshudov, A.Vagin and E.Dodson, (1996) in the Refinement of Protein structures, Proceedings of Daresbury Study Weekend.
- "Refinement of Macromolecular Structures by the Maximum-Likelihood
Method" G.N. Murshudov, A.A.Vagin and E.J.Dodson, (1997) in Acta Cryst.
**D53**, 240-255 - "Efficient anisotropic refinement of Macromolecular structures using FFT"
G.N.Murshudov, A.Lebedev, A.A.Vagin, K.S.Wilson and E.J.Dodson (1999)
Acta Cryst. section
**D55**, 247-255

Anything input on a line after an exclamation mark ("!" or "#") is ignored and lines can be continued by using a minus sign. The program only checks the first 4 characters of each keyword. The order of the cards is not important except that an END card must be last. Each keyword has various subsiduary keywords.

Principal keywords controlling Xray refinement:

**LABIN**, **REFIne**,
**SCALe**, **SIGMaa**,
**WEIGht**

All keywords for Xray refinement :

**BINS/RANGE**, **BLIMit**,
**CELL**, **DAMP**,
**FREEflag**, **LABIn**,
**LABOut**, **MODE**,
**MONItor**, **NCYCLE**,
**PHASE**, **REFIne**,
**RIGIdbody**, **SCALe**,
**SCPArt**, **SIGMaa**,
**SYMMetry**, **WEIGht**,
**END/GO**

Keywords controlling geometric restraints:

**BFAC/TEMP**, **CHIRal**,
**DISTance**, **NCSR/NONX**,
**PLANe**, **RBONd**,
**SPHEricity**, **TORSION**,
**VDWR**, **HOLD**

(Their function is very similar to that in PROLSQ)

Optional keywords controlling the data harvesting functionality:

**PNAME**, **DNAME**,
**PRIVATE**, **USECWD**,
**RSIZE**, **NOHARVEST**

This card is essential for all refinement except idealisation of geometry. To some extent the course of the refinement is governed by the assignments given.

The following program labels can be assigned:

FP SIGFP FREE FPARTi PHIPi HLA HLB HLC HLD or PHIB FOM

Assignments for FP and SIGFP are always required.

The use of the free R flag is highly recommended. This is an important component in using maximum likelihood sensibly. If FREE is assigned, reflections which are flagged with nfree_exclud (default 0) are excluded from the gradient calculation, and therefore the agreement between them and Fc is not part of the refinement procedure.

REMARK 0: It is strongly recommended to run the uniqueify script on the first dataset as soon as possible, e.g. after TRUNCATE. One purpose of this script is to add a column of free-R flags, and it is important for the validity of the free-R approach that this is done before any model refinement. If you are continuing model refinement with a new data set, it is important to preserve the FreeR assignment used before. See FreeR assignment.

REMARK 1: Reflections flagged for free R flag calculation are omitted from any parameter refinement, and also from the scale and B-factor calculation. For ML refinement the default is to use them to estimate SigmaA. [See SCALe and SIGMA keywords.]

If you wish to add known FPART(s) to the structure factors, assign FPARTi and PHIPi. This option gives the program a lot of flexibility. Possible examples of it use are: 1) Adding in calculated hydrogen contributions. 2) Adding in anisotropic metal atoms. 3) Adding in contributions from unmodelled parts of a structure. Eg: A solvent continuum, Uninterpretable parts of an MIR map.

REMARK 2: See SCPART keyword: The FPARTi will be added to the FC *without*
any further scaling unless this is set.

PHIB FOM An input phase and its figure of merit.

Or: HLA HLB HLC HLD The Hendrickson-Lattman coefficients describing an input phase. These can be obtained from the usual routes; MIR, MAD plus density modification.

REMARK 3: In our experience using DM phases gave better result than using MIR phases. The FOM weighting may need to be changed. See PHASe keyword

FP SIGFP (and FREE if available) plus the following additional columns are output. Labels can be assigned: FC PHIC FWT PHWT DELFWT PHDELWT FOM [PHCOMB]

By default, FP and SIGFP are scaled to match FC, using 1/K0. The overall B correction, which can be anisotropic, is applied to the FC terms. IF you have APPLY OBSERVED the inverse of this B factor is applied to FP.

FC PHIC are the structure factor contributions from the input atoms only, WITHOUT any FPARTi terms added.

FWT and DELFWT are similar to the terms output from SIGMAA:

- DELFWT and PHDELWT - Fourier amplitude and phase for 'difference' map (mFo-DFCalc)
- FWT and PHWT - Fourier amplitude and phase for '2Fo-Fc' map (2mFo-DFCalc)

PHCOMB is the combined phase to use with FP. It is output only if experimental phases were input.

If prior phase probabilty distribution is available then all output Fourier coefficients correspond to combined phases.

FOM = <m> - The "figure of merit" for this reflection . See Paper for definition ( Ref ??)

For the FWT and DELFWT terms, FP is scaled by 1/K0, FCalc= vector sum of {Dc FC + D1 FPART1 + D2 FPART2 + .. }, Di are the resolution dependent SIgmAA term, and m is the figure of merit of this reflection.. (PHDELWT and PHWT will be "combined" phases if HLA etc or PHIB FOM have been assigned.) Otherwise they equal either PHIC or PHIC+180.

Missing Data: For those reflections where the FP are missing mFo is set equal to dFc. Hence the terms become FWT=dFC and DELFWT=0.0 The m and D are based on the program's estimate of SigmaA.

Rebuilding into these 2mFo-DFcalc and mFo-DFcalc maps seems easier than using classic nFO-(n-1)FC and difference maps, consistent with the established technique for SigmaA style maps. One advantage here is that since the m and D values are based on the Free set of reflections they are less biased than the values obtained by the ccp4 version of SIGMAA after refinement.

For example:

REFI TYPE RESTrained RESOLUTION 20 1.50 REFI RESI MLKF REFI BREF ISOTropic METH CDIR or REFI TYPE REST RESO 20 1.50 RESI MLKF BREF ISOT METH CDIR or REFI TYPE RIGID (all other definitions are defaults)

This keyword controls the type of refinement or idealization. Subkeywords:

- TYPE RESTrained | UNREstrained | IDEAlise | RIGID
- [Default RESTrained]
- PHASed SCBLurred <scblur> BBLUrred <bblur> SIGMacalc
- [Default : only used if PHASE definition given on LABIN ; scblur = 1.0, bblur = 0. Do not use phases for SigmA estimation even if available]
- RESIdual LSQF | MLKF
- [Default MLKF ]
- BREFinement OVERall | ISOTropic | ANISotropic | MIXED
- [Default ISOTropic] [ANISotric MIXED]
- METHod CGMAt | CGRAd | CDIR
- [Default CDIR]
- RESO <resmin> <resmax>
- [Default: all data]

In more detail, these subkeywords are:

- TYPE
- This keyword describes type of refinement.
- RESTrained invokes restrained refinement, where both the Xray residual (reflecting the agreement between the observed and the calculated Fs, and the geometric residual (reflecting the fit between the expected and the observed geometry) are minimised at the same time. The relative weighting of these two terms is given by values of Wxray and Wgeom. These can be set using the WEIGht keyword.
- UNREstrained is for unrestrained refinement. [ie: Wxray =1, Wgeom=0]
- IDEAlised is for geometric idealization. [ie: Wxray =0, Wgeom=1]
- RIGID invokes rigid body refinement. The description of domains is given below. Alternative to MODE RIGID.
- PHASed
- SCBL <scblur> BBLUr <bblur>
- If experimental phases are being used it may be necessary to blur the phase probabilities, especially after some density modification calculations. (This information can also be input with the keyword PHASE - see below.)
- SIGMAcalc - use vector differences when estimating SIGMaa; ie use the phase information.
- RESIdual
- This keyword describes the Xray part of the function.
- LSQF - defines amplitude based least-squares residual.
- MLKF - A -loglikelihood residual derived from Rice distribution for centric and acentric cases of Fs.
- If experimental phase information is available the residual is modified appropriately. This invoked by assigning appropriate input columns; see key word LABIN. (For methodology see G.N. Murshudov, A.A.Vagin and E.J.Dodson,(1997) in Acta Cryst. D53, 240-255)
- METHod
- This keyword describes method of minimization (see Tronrud, Acta Cyst 1992 A48 pp 912-916).
- CGMAT is sparse matrix as in PROLSQ.
- CGRAD is conjugate gradient.
- CDIR (default) is the conjugate direction method
- BREFinement
- This keyword describes method for assigning atomic Bvalues.
- OVERall :- Overall B-factor (Boverall) obtained from scaling is added to the atomic Bvalues.
- ISOTropic :- Individual isotropic B-factor refined for all atoms.
- ANISotropic - Individual anisotropic B-factor refined for all atoms
- MIXEd - Some atoms with isotropic, some with anisotropic B-values. In this case input file (pdb) defines which atom should be refined isotropicly and which anisotropicly. Only atoms with ANISOU card are refined anisotropicly. (This option has not been tested extensively).
- RESO
- <resmin> <resmax> [ Default: all data]

<dmin>, <dmax> (default 1, 1000) are resolution limits used for refinement in Angstroms (or in 4*sin**2/l**2 if both are < 1.0. They can be given in either order. If only one value is given, it is assumed to be the high resolution cutoff.

Program will apply blurring as follows:

HLAnew = HLA*scblur*exp(-(sin(theta)/lambda)**2*bblur) HLBnew = HLB*scblur*exp(-(sin(theta)/lambda)**2*bblur) HLCnew = HLC*scblur*exp(-(sin(theta)/lambda)**2*bblur) HLDnew = HLD*scblur*exp(-(sin(theta)/lambda)**2*bblur)

or

If PHASE and FOM are given: the program first generates the Hendrickson-Lattmann coefficients using the formula HLA = Func(FOM)*COS(DEGTOR*PHASE), HLB = Func(FOM)*SIN(DEGTOR*PHASE), HLC = HLD = 0. ie the Phase probability distribution is unimodal.

Fxray = SUM(Whkl*(|FO|-|FC|)**2)

Fxray = SUM(LLKcentric_hkl) + SUM(LLKacentric_hkl)

REMARK: If you are using IDEAlise it is better to use CGRAD option. When using RESTrained option CDIR will work much better. Maybe for NCS the CGMAT is more appropriate; it is the only one which considers off-diagonal terms for atom-atom interactions, and these can be large when the NCS restraints are tight. If the minimisation fails it may be worth trying another option.

Include all well measured data, not omitting the weak observations; it will be weighted appropriately. The low resolution data helps define the solvent shell. However if you have lost strong terms by some accident of data collection the scaling may not behave well.

The SCALE keyword has several different options.. See below for keywords for estimation of SIGMaA, triggered by SCALe MLSC. For example:

SCAL TYPE BULK LSSC ANIS RESO 10 2.1 SCAL TYPE SIMP LSSC EXPE SCALe LSSC FIXBulk BBULk 70.0 SBULk -0.75 ! Fix bulk solvent parameters for LSQ scaling

These keyword controls the type of scaling of Fo and Fc. Subkeywords:

- TYPE BULK | SIMPle
- [Default BULK]
- BAVERage <baverage>
- [If there is not sufficient data to refine a B value it is possible to hold it at some sensible value derived from the Wilson plot.]
- LSSC
- Flag to indicate all following keywords apply to estimation of scale between Fo and Fc.
- ANISotropic
- [Default isotropic overall scale]
- FIXBulk <scbulk> <bbulk>
- [Lower resolution structures may not have sufficient data to find sensible overall scales and B values for both the BULK and the protein component. It can help to fix these]
- RESO <resmin> <resmax>
- [Default: all data used for refinement]
- NCYCle <ncyc>
- [Default ncyc = 10]
- EXPE
- [Default is to not use experimental sigmas in the determination. the keyword EXPE changes this to use experimental sigmas]
- FREE
- [Default Scales are calculated against the WORKing set of reflections, but if requested it can be derived from the FREE set.]
- APPLY OBS | CALC
- [Default - ouput file contains Fobs modified to Fcalc scale]

In more detail, these subkeywords are:

- TYPE
- with one of the following sub-subkeywords:
- BULK
- (Default.) If TYPE BULK, then the scale KB is a function of 4 variables with the form:
- SIMPLE
- If TYPE is SIMPle the scale factor has the form:
- BAVErage <baverage>

- Lower resolution structures may not have sufficient data to give a robust Wilson plot overall B factor, so it is possible to fix the <B> for the structure to a set value. If you are using this option it is important to add remaining B-value to observed structure factors.
- FIXB <scbulk> <Bbulk>

- Lower resolution structures may not have sufficient data to find sensible overall scales and B values for both the BULK and the protein. SCBULK = <solvent_density>/<protein_density> Ie: For aqueous solvent, with solvent density ~ 1.0. and protein density ~ 1.35, SCBULK ~ 1.0/(1.35)
- ANISO
- Many crystals generate seriously "anisotropic" reflection data. This is presumably due to some crystalline disorder, and is not the same as anisotropy of individual atoms. However the correction can be expressed in a similar form.
- RESO
- <sub-subkeywords> <resmin resmax> or <dmin> <dmax>

[Default all data used for refinement] - NCYC <ncyc>
- Default: <ncyc> = 10
- EXPE
- Default is to not use experimental sigmas in the determination. the keyword EXPE changes this to use experimental sigmas.
- FREE
- Default is to use all reflections in the WORKing set for scaling. The keyword FREE changes this to determine the scale from the FREE set of reflections.
- APPL OBSE|CALC
- APPL OBSE | CALC will apply overall Bcorrection to either observed or calculated structure factors.

KB = K0*exp(-B0*s^2) * (1- K1*exp(-B1*s^2))

The scale formulation is based on the Babinet principal and described by Dale Tronrud and others. Ref?? TNT manual. Better results can be obtained by more sophisticated modelling, but this is a simple compromise.

KB = K0*exp(-B0*s^2) (Simple Wilson scaling) ie: K1 = 0

This may be more appropriate if you are adding bulk solvent contributions as an FPARTi term or you have parameterised most of the expected waters.

The isotropic overall B factor B0, is replaced by:

B11*s1*s1+2.0*B12*s1*s2+2.0*B13*s1*s3+B22*s2*s2+2.0*B23*s2*s3+B33*s3*s3

i.e. [s] [B] [s]* where:

[s] is vector in reciprocal orthogonal system = [s1 s2 s3] = [ h k l ] [RFR] , [h k l] are the Miller indices; RFR is the fractionalisation matrix. [B11 B12 B13] [B] = [B21 B22 B23] ; B21 = B12, B31 = B32, B32 = B23. [B31 B32 B33]

Anisotropic scaling of data should ideally be done at the merging stage but often the distortion aligns with the crystal axes, and therefore cannot be detected ifrom symmetry equivalent reflections alone. Large improvements in Rfactors can result from this correction.

Defines resolution limit for scaling. If you are using the Wilson plot option with only one gaussian for scaling this keyword allows you to exclude the low resolution data which is affected by solvent. Advice: use 4.5A to the highest resolution.

NB: Before applying bulk solvent scaling and including all low resolution data Check your distribution of <F> looks sensible. This is the raw material for all overall scaling algorithms. A good way to check this is to look at a <Fsq> plot against resolution.

This should look something like this:

+ + + + + + + + + + + <10A 5A 4.5A ............

If the low resolution looks strange - it may mean your backstop was causing problems, intensities were saturated etc etc, and including such data may give crazy solvent scales. A sensible sort of value would be : Fc 1.0 0.0 Bulk Solvent -0.4 200.0

Program always first calculates log scaling assuming Bs = exp(-B1*s^2) This minimises: LFoKBFc = sum(ln(|FO|-ln(K)-ln(|FC|)+B1*S^2)^2 where the FC is simply derived from the input coordinates, without any addition of FPARTi terms. (This option does not require iterative cycles of scaling) and after this it scales KFoBFc =sum(K*|FO|-Bs*|FC|)^2 where the FC include FPARTi and Bulk solvent correction if requested.

NB-=======================================================================

NB-=======================================================================

NB-=======================================================================

We are not really sure how best to handle scaling. If you have problems
please get in touch. In our experience there have been no problems with
data sets with resolution 2.5A or higher, unless there was some obvious
flaw; huge ice rings or Is labelled as Fs or some such thing. But with
one unusual data set which died at 2.7A there has been a problem, which
we got round by tweaking parameters, but these cases should be automatically
checked.

NB-=======================================================================

NB-=======================================================================

NB-=======================================================================

NOTE: When doing ML refinement the scale factors are only used to calculate R values.

For example:

SCALe MLSC FIXBulk BVALue 100.0 SCVAlue -0.1

The SigmA estimate SA is generally fitted to the normalised Free reflections using a 4 parameter equation of an analogous form to the bulk scaling:

SA = SA0*exp(-T0*s^2) * (1- SA1*exp(-T1*s^2))

These keywords controls the estimation of SA. Subkeywords:

- FIXBulk
- The options FIXBulk to fix parameters can be evoked in the same way as for the SCAL LSSC options, but should only be used with EXTREME care! There is no logical way to estimate expected values.
- NCYCle <ncyc>
- [Default <ncyc> = 10]

Use <ncyc> iterative cycles to determine the parameters. - WORK
- [Default Sigmaa is calculated against the FREE set of reflections]

The keyword WORK changes this to determine the scale from the WORKing set of reflections.

[Default EXPE MATR 0.5]

This keyword controls the weighting of the terms

- NOEX
- Exclude experimental sigmas from weighting.

This sub-keyword allows you not to use experimental sigmas of the observations for the Xray residual. The default action is to use them.

The remaining sub-keywords control the relative weighting of the X-ray and geometry terms in the residual.

- MATR <wmat>
- [Default 0.5] Weighting using average diagonal term of Xray and geometry Hessians (same as PROLSQ). Weighting equates wmat*average_diagonal_of_geometry to average_diagonal_of_Xray terms. Increase <wmat> to increase the weight on the Xray terms; decrease <wmat> to tighten the geometry.
- GRAD <wgrad>
- [Default 1.0] Weighting using norms of Xray and geometry gradient vectors. Weighting equates wgrad*norm_of_geometry_gradient to norm_of_Xray_gradient. Increase <wgrad> to increase the weight on the Xray terms; decrease <wgrad> to tighten the geometry.
- SIGMA <wsigma>
- [Default 0.5] <wsigma> is applied to Xray terms as 1/(wsigma**2). Decrease <wsigma> to increase the weight on the Xray terms; increase <wsigma> to tighten the geometry.

Number of bins <nbins>. Default 20, maximum allowed 100. For least-square refinements it is useful only for monitoring statistics on resolution ranges. Used for normalisation to convert Fs to Es.

Or <range>. If the value after BINS/RANGE is less than 1.0 this is assumed to define the bin width in units of 4*sin**2/Lambda**2

[Default 2.0 1000.0]

Bmin and Bmax are minimum and maximum B-factors allowed. This keyword is
not recommended. The lower limit is required by the program, but is set
by default.

Defines parameters of the cell. If this keyword is specified then cell parameters from MTZ and coordinate files will be overridden. This keyword could be important when cell dimensions are suspect and the user wants to change them.

[Default 1.0 1.0 for resolution > 2.5A, 0.5 0.5 for
resolutions < 2.5]

<Pdamp> <Bdamp> scales shifts after each cycle of refinement. If
there is limited data, or you are using unrestrained refinement it is
sensible to scale down the shifts.

[Default 0]

The default value for exclusion for the FreeR is zero. However there is
an opportunity to reset this exclusion flag here. See the description of
the program FREERFLAG.

- [Default HKRF]
- HKRF
- This invokes standard refinement of individual atomic parameters.
- RIGID
- This invokes RIGID body refinement. The description of domains is given below under keyword RIGID.

<subkeyword 1> can be one of:

- NONE
- means no monitoring. Program will work without any output info.
- FEW
- means only overall statistics will be monitored.
- MEDI
- means MEDIum number of statistics will be monitored. You can choose quantity of statistics by choosing numberi.
- MANY
- means all available info about statistics will be monitored.

The subsequent <subkeyword i> are (only used if output requested is MEDIum):

- TORSION <badtor>
- Print restrained torsion angles differing by more than BADTOR * torsig from the ideal value. Default is 3.
- HBOND
- Print distance information on possible hydrogen bonds. Not printed by default.
- DISTANCE <baddis>
- Count restrained distances differing by more than BADDIS * dissig from the ideal value and print dis- tances differing by more than 2 * BADDIS * dissig. Default is 3.
- PLANE < badpln>
- Print restrained planes differing by more than BAD- PLN * plnsig from the ideal value. Default is 2.
- VANDERWAAL (or VDWR) <badvdw>
- Print contact distances that are closer than badvdw Angstrom from the ideal. Default is .6 Angstrom.
- XSHIFT <badshi>
- Print shifts that are greater than BADSHI Angstrom. Default is 0.4A.
- CHIRAL <badchi>
- Print restrained chiral volumes differing by more than BADCHI * chisig from the ideal value. Default is 2.5
- BSHIft <dbcut>
- Needs to be explained. [ default 0.25]

[Default 5]

This keyword defines number of cycles of refinement done before PROTIN
is run again.

(See above; this functionality can also be input on the REFI line.) If experimental phases are being used it may be necessary to reduce their assigned figure of merits,especially after some density modification calculations.

Program will apply blurring as follows:

HLAnew = HLA*scblur*exp(-(sin(theta)/lambda)**2*bblur) HLBnew = HLB*scblur*exp(-(sin(theta)/lambda)**2*bblur) HLCnew = HLC*scblur*exp(-(sin(theta)/lambda)**2*bblur) HLDnew = HLD*scblur*exp(-(sin(theta)/lambda)**2*bblur)

or

If PHASE and FOM are given: the program first generates HLA and HLB using the formula HLA = Func(FOM)*COS(DEGTOR*PHASE), HLB = Func(FOM)*SIN(DEGTOR*PHASE), HLC = HLD = 0. ie the Phase probability distribution is unimodal.

Rigid body refinement in REFMAC is performed as following. Program first finds the center of mass of all domains. This is defined as (Tgx,Tgy,Tgz) where:

TgX = sum(Occ(i)*X(i))/Natom TgY = sum(Occ(i)*Y(i))/Natom summed over all atoms in the domain. TgZ = sum(Occ(i)*Z(i))/Natom

Both Euler and polar rotation angles and the coordinates of the center of mass are refined for each domain. (i.e. program finds Delta_Alpha, Delta_Beta, Delta_Gamma, and deltaTg).

REMARK: In the solution of linear equation to obtain the shifts the program uses singular value decomposition, thus avoiding problems with singularities such as that when Beta =~ 0.0 Only alpha+gamma can be determined.

Subkeywords:

- GROUp <ngroup> FROM <resnum> <chnid> TO <resnum> <chnid> | EXCLude | TRANS | EULErangles
- <ngroup>
- Domain number. Maximum number of domains in the current version of program is 100. Each domain can consist of 100 different pieces. For example:
- <resnum> <chnid>
- Spans are specified by the residue number, and the chain ID. See examples. By default all atoms belong to one domain.
- EXCLude SCHAin | MCHAin | NONE
- Exclude main chain, or side chain atoms from refinement. Rotation and translation are determined for the defined subset. Rotation and translation are applied for all atoms also. Default is NO EXCLUSION.
- TRANS <Tx> <Ty> <Tz>
- To add translation for this domain before starting refinement. Default is 0.0 0.0 0.0
- EULErangles <alpha> <beta> <gamma>
- Rotate domain by these angles (in degrees) before starting refinement (Note that all rotations are performed around center of mass of this domain). Default is 0.0 0.0 0.0
- PRINT EULEr MATRix
- Print rotation angles (EULErangles) and rotation MATRix. Default is EULErangles
- NCYCle <ncycle>
- Number of cycles for rigid body refinement. Default is 40 cycles

RIGIDbody GROUP 1 FROM 10 A TO 100 A } A domain containing most of the A chain, and RIGIDbody GROUP 1 FROM 108 A TO 176 A } an embedded bit of a B chain, consisting of RIGIDbody GROUP 1 FROM 200 B TO 220 B } residues 200-220. .. RIGIDbody GROUP 2 FROM 10 B TO 100 B } for example the equivalent domain across a NCS axis RIGIDbody GROUP 2 FROM 108 B TO 176 B } RIGIDbody GROUP 2 FROM 200 A TO 220 A }

Note: For rigid body refinement you don't need to run protin or other supporting programs.

If NSCi are set, the FPARTnsci is scaled relative of the FC FC_tot = FC_calc(PHIcalc) +nsc1*FPART1(PHIP1) + nsc2*FPART2(PHI2) + ....

If NSCi set - that partial structure factor will be scaled

Defines space group symmetry. If MTZ file is defined then symmetry from MTZ will be used. This keyword is essential for idealisation.

The following key words are used to set the weighting of the restraints. We have attempted to make the default values agree with those used for PROLSQ.

[Default 1.0 0.02 0.04 0.05 0.05 0.02]

<WDSKAL> is a multiplier used in calculating the weights for the distance restraints. The weights are of the form: Weight = WDSKAL*(model distance - ideal distance) / SIGD**2

<SIGD1> is the a value associated with bonded distances (e.g. Calpha-C).

<SIGD2> is the a value associated with a non-bonded distance determining an angle to be restrained (e.g. N - C).

<SIGD3> is the a value for a planar 1-4 distance (e.g. O(i-1)-Calpha(i)).

<SIGD4> is the a value associated with a bond distance involving hydrogen (e.g. Calpha - Halpha).

<SIGD5> is reserved for special distance groups (as set up with KDWT = 5 in the dictionary of ideal atomic dis- tances).

[Default 1.0 0.03 0.02]

<WPSKAL> is a multiplier for the distance restraints defining planes. The weight used for the peptide planes is (WPSKAL / SIGPP)**2, and for the aromatics (WPSKAL / SIGPA)**2.

<SIGPP> is the sigma value associated with distances used to define peptide planes (e.g. 0.03).

<SIGPA> is the sigma value associated with distances used to define aromatic planes (e.g. 0.02 ).

[Default 1.0 0.15]

<WCSKAL> is used in weighting Chiral groups restraints. The weight used is (<WCSKAL> / <SIGC>)**2

<SIGC> is the value associated with Chiral groups restraints (e.g. 0.02 to 0.04).

[Default 1.0 2.0 3.0 2.0 3.0 ]

<WBSKAL> is the used in calculating the weight for the tem- perature factor restraint parameters based on the types of bonding in which the atoms are involved. (See DISTANCE keyword.)

[Default 2.0 ]

<sigsph> is used to restraint atomic anisotropic tensor to be spherical

[Default 1.0 ]

<sigrbond> is used to make minimum of projections of anisotropic tensors along bond for the bonded atoms.

[Default 1.0 0.05 0.5 5.0 0.5 2.0 10.0]

<WSSCAL> is used in weighting restraints involving non- crystallographic symmetry. The weight is given by (<WSSCAL> / <SIGS>)**2

<SIGSP1>, <SIGSP2>, <SIGSP3> are a values associated with non- crystallographic positional restraints and are for tight, medium and loose restraints respectively.

<SIGSB1>, <SIGSB2>, <SIGSB3> are a values associated with non- crystallographic thermal restraints and are for tight, medium and loose restraints respectively.

[Default 1.0 15.0 3.0 15.0 20.0]

<WTSCAL> is a multiplier used in calculating the weights for the torsional angle restraints. The weight is of the form Weight = (<WTSKAL> / <SIGT>)**2

<SIGT1> is the a value associated with a pre-specified angle (usually Phi and Psi of a regular secondary structure).

<SIGT2> is the a value associated with a planar angle (e.g. omega).

<SIGT3> is the a value associated with a staggered angle (e.g. Chi1).

<SIGT4> is the a value associated with an orthonormal angle (e.g. Chi2 of aromatics).

Starting values of say 20 degress are suggested for <SIGT1> to <SIGT4>.

[Default 1.0 1.0 -0.3,0.0]

<WVSKAL> is used in the weighting scheme for Van der Waals contacts. The weight used is (<WVSKAL> / <SIGV>)**2

<SIGV> is the sigma value associated with Van der Waals dis- tances.

<DINC(1)>, <DINC(2)>, <DINC(3)>, <DINC(4)>. The relevant matrix elements are only incremented if the distance between the non-bonded atoms is less than the sum of the Van der Waals radii. The values of DINC are used to change the ideal distance for each sort of contact.

<DINC(1)> is for atoms where relative position is determined by only one torsion angle: single torsion contact.

<DINC(2)> is for atoms where two or more torsion angles determine their relative positions: multiple torsion con- tact.

<DINC(3)> is for possible hydrogen bonds (contacts between any N and any O except Nmain - Nmain or Omain - Omain).

[Default 0.3 3.0 0.2]

Restraints to current values <PDEL> is a positional shifts magnitude restraint. This goes into the matrix as (a/<PDEL>)**2, (b/<PDEL>)**2, (c/<PDEL>)**2 where a, b, c are the unit cell parameters.

<BDEL> is a shifts magnitude restraint on individual thermal parameters and goes into the matrix as (1/<BDEL>)**2. It is not used if ITEMP = 0.

<QDEL> is the shifts magnitude restraint on variable occu- pancy factors (not used if NOCC=0). This goes into the matrix as (1/<QDEL>)**2.

[Also see keyword MONItor]

`$HARVESTHOME`/`DepositFiles`/*<projectname>*/
*<datasetname>.refmac*

The environment variable `$HARVESTHOME` defaults to the user's
home directory, but could be changed, for example, to a group project
directory.

End of keywords. Time to do work.

- XYZIN
- The input coordinate file in PDB format.
- HKLIN
- Reflection data in MTZ format. This must contain columns H K L FP SIGFP and it should contain a FreeR_flag column. It may also contain FPARTi PHIPi - up to 9 partial terms may be used. It is essential to input ONLY one asymetric unit of reciprocal space (any one will do).
- PROTOUT
- The file of atom parameters and restraints passed from the program PROTIN. This is an unformatted sequential file whose structure is described in the program documentation for PROTIN. This is rewritten to a scratch file PROTSCR with certain modifications for special atoms during a run.
- PROTCOUNTS
- The second output file from PROTIN. A single record unformatted sequential file containing the summary counts from PROTIN.

It MUST contain reflections from only one asymmetric unit. REFMAC resorts the hkl list, so sort order and choice of asymmetric unit are not important. It is very sensible to have followed the "complete" script given as Example 0. See more reasons for this in the description of HKLOUT.

It is EXTREMELY ADVISEABLE to assign FreeR_flags for each reflection if you are using the Maximum Likelihood option. The FreeR set is used to estimate comparatively unbiased SigmaA values.

IF FPARTi and PHIPi have been assigned, the calculated structure factor will equal the scaled FPARTi vector(s) plus the contribution calculated from the input coordinates.

- XYZOUT
- The output coordinate file with the calculated shifts applied.
- HKLOUT
- MTZ file containing the columns H K L FP SIGFP (FreeR_flag) FC PHIC FWT PHWT DELFWT PHDELWT FOM [ PHCOMB]

PROTSCR,SCRAT1,REFSCR. These will be automatically deleted.

The printer output (logfile) is divided into a number of sections as described below. Not all the sections need appear on a given run of the program as their output depends on the options selected via the control data.

- a)
- A section giving details of the control data including details of the geometry restraints and weights.
- b)
- A section on the distance restraint information. This may give, depending on the MONItor requirements:
- The numbers of distances which differ my more than NSIG*sigma and 2*NSIG*sigma from the ideal values.
- A list of the distances which differ from the ideal values by more than 2*NSIG*sigma giving both the ideal and model values.
- The value of Sigma w*deltad**2 .
- A table giving the type of distance, the root weight, the number of distances, the rms(delta) value and the value for sigmaD for each distance type. The distance types are as follows: Type 1 = Bonded distance. Type 2 = Distance used to restrain an angle. Type 3 = Planar 1 to 4 atom distance. Type 4 = Hydrogen bond, Metal coordination distance etc.
- c)
- A section on the plane restraint information. This gives:
- Details of the planes whose rms deviation from the plane is greater than 2 sigma.
- The overall rms deviation from the planes.
- The value of Sigma w*deltap**2 .
- A matrix/gradient sample.
- d)
- A section describing the chiral centre restraints. This gives:
- A list of the chiral centres and the ideal and model values for the chiral volumes. Centres for which the ideal and model volumes differ by more than 1.0 are flagged with an asterisk.
- The value of Sigma w*deltac**2 .
- A table giving the root weight, the rms delta value, the sigmaC value and the number of chiral centres with delta values greater than 1.0.
- A matrix/gradient sample.
- e)
- A section on the non-bonded contacts. This gives:
- A list of the the single torsion contacts which are shorter than the allowed values.
- A list of the multiple torsion contacts which are shorter than the allowed values.
- A list of possible hydrogen bonds giving the atoms involved and the ideal and model distances.
- The value of Sigma w*deltav**2 .
- A table giving the contact type, the root weight, the number of contacts, the rms(delta) value, the sigmaV value and the DINC value for each contact type as follows: Type 1 = Single torsion contact. Type 2 = Multiple torsion contact. Type 3 = Possible hydrogen bond.
- f)
- A section on the conformation torsional angle restraints. This gives:
- A list of the conformational torsion angles.
- The value of Sigma w*delta**2 and the average value of w*deltat**2.
- A table giving the torsion angle type, the root weight, the number of torsion angles, the rms(delta) value and the sigmaT value for each con- formational type as follows: Type 1 = Pre-specified value. Type 2 = Planar (0,180 degrees). Type 3 = Staggered (+/- 60, 180 degrees). Type 4 = Orthonormal (+/- 90 degrees).
- g)
- A section with details of the non-crystallographic symmetry. This gives:
- For each pair of chains the transformation matrices and the relative rotation angles Phi, Psi and Chi are given.
- The number of positions in each chain which differ by more than 2 sigma from the average posi- tion.
- A list of the atoms which are more than 2 sigma from the average position.
- The values of Sigma w*deltas**2 and aver- age(w*delta**2) both positional and thermal parame- ters.
- A table giving for each type of restraint the restraint type, the root weight, the number of restraints, the sigma value and the rms(delta) val- ues for up to 4 chains. The resraint types are: Type 1 = Tight positional restraint. Type 2 = Medium positional restraint. Type 3 = Loose positional restraint. Type 4 = Tight thermal restraint. Type 5 = Medium thermal restraint. Type 6 = Loose thermal restraint.
- h)
- A section on the thermal restraints. This gives:
- The value of Sigma w*deltab**2 and the average value of w*deltab**2 .
- A table giving for each type of thermal restraint, the restraint type, the root weight, the number of restraints, the rms(deltaB) value, the sigmaB value, the mean deltaB value, the rms(deltaU) value and the mean deltaU value. The restraint types are: Type 1 = Main chain bonded atom. Type 2 = Main chain angle. Type 3 = Side chain bonded atom. Type 4 = Side chain angle. Type 5 = X-H bond. (X=any atom, H=hydrogen) Type 6 = X-H angle. Type 7 = Hydrogen bond, Metal coordination bond etc.
- i)
- A section on the Xray scaling of FO to FC. The program first estimates an approximate Wilson scale using logaritmic scaling. It then cycles round to estimate a better Wilson or Bulk scale set by least squares minimisation. The initial and final values are output. If Anisotropic scaling is requested, it then carries out a further 10 cycles to estimate the anisotropic parameters. For Example:
- j)
- Information about the Rfactor and the FreeR factor. This is presented in a form suitable for xloggraph, and for a mmCIF table. (You can grep for _R_factor )
- k)
- For the maximum likelihood option, information about the correlation coefficients, figure of merit and sigmaA. The correlation coefficient and figure of merit should increase during refinement.
- Values of K and B to fit sigmaA to the Free set of reflections are estimated. Program ouput: Final values of K-s and B-s after 40 cycles 1 1.048 3.120 2 -0.184 149.968 sigmaA should not be greater than 1.0, and if these values diverge wildly you may have too few "free" reflections to determine sigmaA accurately. You can increase the scaling low resolution cutoff, and this might help.
- l)
- Information about the Xray and geometry gradients. You should worry if the Magnitude of X_ray positional gradient is wildly different to Magnitude of Geom. positional gradient The Cosine of the angle between the Xray shift and Geometry shift direction indicates whether the two terms are moving the structure in a concerted direction(cos > 0.0, angle acute) or in opposition(cos $lt; 0.0). If the Cosine tends to -1.0, you may as well stop refinement; the Xray and geometric corrections are cancelling each other out.
- m)
- (Optional) A section on the matrix solution. This gives:
- A sample of the matrix and the vector elements.
- A sample of the progress of the conjugate gra- dient solution.
- n)
- A section on the parameter shifts. This gives:
- A list of atoms and their shifts for those atoms which have large positional or thermal shifts.
- The mean values of the shifts (in fractional coordinates and in Angstroms for the x,y,z parameters).
- The rms values of the shifts (in fractional coordinates and in Angstroms for the x,y,z parameters).

Scale and B factors calculated by log scaling = 0.4365 -7.855 log scaling is minimization of sum(ln(FO)-ln(FC)-log(scale)+sin^2(th/lm)*B) Number of reflection used for scaling= 13868 *************************************** Ls initialisation is o.k. **** Estimating Overall scales or Sigmaa values **** Maximum number of refinement cycles 10 Initial values of K-s and B-s 1 0.437 -7.855 2 -0.400 150.000 Final values of K-s and B-s after 10 cycles 1 0.469 -4.220 2 -0.400 150.000 Initial values of K-s and B-s 1 0.469 -4.220 0.000 0.000 -4.220 0.000 -4.220 2 -0.400 150.000 Final values of K-s and B-s after 10 cycles 1 0.472 -12.268 0.000 -3.527 5.880 0.000 -3.284 2 -0.400 150.000 Overall isotropic B = -3.3507

Various errors may be reported by the input-parsing routines and should be self-explanatory.

If a specific space group is requested and the required subroutines are not available then the program will stop with the error message:

SPACE GROUP UNAVAILABLE

If the limits of the program's capacity are exceeded then one of the following messages will be printed and the program will stop:

THE NUMBER OF VECTOR OR MATRIX ELEMENTS EXCEEDS AVAILABLE STORAGE OF n OR m RESPECTIVELY THE NUMBER OF ATOMS OR DISTANCES EXCEEDS AVAILABLE STORAGE OF n OR m RESPECTIVELY

It may help to read the ps version of Garib's talk at Chester ($CDOC/garib_talk.ps) or one of the references.

**Terminology:**

|Fo(h)| - observed amplitude of structure factor. |Fc(h)| - calculated amplitude of structure factor Io(h) - observed intensity of structure factor IO = |FO|**2 Ic(h) - calculated intensity of structure factor IC = |FC|**2 |Eo(h)| - normalised observed amplitude of structure factor. |Ec(h)| - normalised calculated amplitude of structure factor s : 2.0*sin(theta)/lambda D : <cos(2*pi*s^2*delta(r))> Luzzati 1952. sA : SigmaA = (ref - Read, Srinavasan) Theoretically,if EPS is the multiplicity for the reflection zone SigmaA = D*sqrt(SigmaP/SigmaN) where SigmaN = <Fo**2/EPS> and SigmaP = <Fc**2/EPS>. X(h) : 2|Eo|*|Ec|*sA/Var_D Var_D(h) : variance of distribution Var_D(h) : W*SigmaExpt(Eo) **2 + Epsilon * (1- sA**2) ) ; W=1 if SigmaExpt used, W=0 otherwise. Define I0(X) : 0 order first type modified Bessel function I1(X) : 1st order first type modified Bessel function cosh : hyberbolic cose (EXP(X)+EXP(-X))/2.0 sinh : hyberbolic sine (EXP(X)-EXP(-X))/2.0 tanh : hyberbolic tangent sinh(X)/cosh(X) m(h): figure of merit (m = I1(X)/I0(X) for acentric, tanh(X) for centric reflections) m(h) should equal <cos(AlphaTrue - AlphaCalc)> The program estimates SigmaA by minimising -logLikelihood calculated with normalised Eo and Ec. In REFMAC SigmaA can be estimated from the FreeR reflections and fitted by a two gaussian approximation similar to Tronrud's scaling. The figure of merit m which should equal <cos(AlphaTrue - AlphaCalc)> is calculated from Eo, Ec and SigmaA. where Eo = Fo/sqrt(EPS*SIGMAN) and Ec = Fc/sqrt(EPS*SIGMAN) Geometry: D_12_id : ideal value of bond distances D_12_mod: real value of bond distances D_13_id : ideal value of angle distances D_13_mod: real value of angle distances D_14_id : ideal value of distance between atoms connected by torsion angle D_14_mod: real value of this

**Description of program**

It minimises:

Ftot = Wxray *Fxray + Wgeom *Fgeom where Wgeom = 0 or 1 (Wgeom = 1 for restrained refinement) (Wgeom = 0 for unrestrained refinement) Wxray = 1/(SIGMA**2) where SIGMAis given on keyword REFI (Geometric Idealisation is equivalent to Wxray = 0) Fgeom is based on the geometric quantities refined in PROLSQ. ie: Fgeom = Wds*SUM((D_12_id-D_12_mod)**2/SIGD1**2) ===== +SUM (D_13_id-D_13_mod)**2/SIGD2**2) +SUM((D_14_id-D_14_mod)**2/SIGD3)**2 + terms for planarity, chirality etc etc... exactly as in PROTIN and PROLSQ.

Choosing of Fxray has two options so far; amplitude based least squares, and a loglikelihood residual:

LSQF defines amplitude based least-square residual. ==== SUM : sum over hkl FXRAY = SUM(Whkl*(|FO|-|FC|)**2) MLKF defines simplest version of -loglikelihood residual which comes ==== from Rice distribution for centric and acentric cases assuming there is no knowledge of the phases. Ref:?? Fxray = SUM(LLK_cen) + SUM(LLK_acen) where LLK_cen is -loglikelihood function for centric reflections LLK_acen is -loglikelihood function for acentric reflections LLK_cen = (|EO|**2+sA**2*|EC|**2)/Var_D - LN(cosh(X)) + LN(Var_D)/2 LLK_acen = (|EO|**2+sA**2*|EC|**2)/Var_D - LN(I0(X)) + LN(Var_D)

All this now uses Normalised structure factors.

Var_D = W*SigmaExpt(Eo) **2 + (1- Sum(sA(j)**2) )

where the sA(j) are the SigmaA for each independent partial structures, (W=1 if experimental sigmas used; 0 otherwise).

A two Gaussian fit to the <Eo-Ec>**2 is used to define SigmaA.

sA(j) = sA(j)0 exp(-B(j)0*s**2) ( 1-sAsolv * exp(-bsolv s**2) )

This gives a smooth curve against resolution. If no bulk solvent (ie a one Gaussian approximation) sAsolv = 0 .

The values of m and sA are best estimated using the FREE reflections. Here the sA which are a function of resolution are fitted to the FreeR reflections as a second order Gaussian. In the CCP4 version of SIGMAA the sA are deduced from the fit over resolution bins of <Eo**2> and <Ecalc**2>.

The SigmaA technique means that sA tends to 1.0 as the Rfactor falls even if this is being achieved by some unrealistic technique; loosening geometric restraints, adding too many waters etc.. Using the FreeR set helps avoid this problem ,but the number of reflections per bin were too few to give a smooth curve without interpolation. Fitting the gaussian seems a satisfactory compromise.)

The gradients are estimated from a modified difference map with coefficients {(m|FO| - D |FC|}, PHIcalc. At the end of each set of cycles an HKLOUT file is written containing h k l Sc*Fobs Fc PHIcalc FWT =(~2mFO-dFC) PHIWT DELFWt =(~ mFO-dFC) PHIDELWT Rebuilding into the FWT and DELFWT maps seems better than using classic 2Fo-Fc and Fo-Fc maps. The "missing Data" hkl terms for FWT are set = dFC , DELFWt = 0.0. This restoration of terms helps reduce systematic noise in the 2mFO-dFC maps. See example 0 for details on how to include missing data.

Scaling of Fobs to Fc now uses the Bulk solvent correction of the same type as TNT, Shelx etc. The scale is Kexp(-bs) ( 1-Ksolv exp(-bsolv))

This is excellent as long as your data is reasonable at low resolution; Dont include complete junk; eg reflections behind the Hamburg backstop. Ie - look at your xloggraph from hklplot, or some similar program which gives <F**2> v resolution. If it has one minima at ~ 5A, and a peak at 4.5A then falls away smoothly this is appropriate; if the shape is abnormal, worry about why!

The following examples are included to illustrate the various ways in which REFMAC can be run:

- Example 0.
- Completing the data to include all possible hkls. Should do this after data reduction, and certainly before using REFMAC.
- Example 1.
- Restrained refinement with overall B-factor refinement. Method is sparse matrix method. PROTIN is run first.
- Example 2.
- Unrestrained refinement by maximum likelihood method.
- Example 3.
- Idealization. Method of minimization is conjugate gradient method
- Example 4.
- Restrained refinement with Partial contribution from hydrogens.
- Example 5.
- Restrained refinement with maximum likelihood method followed by ARPP to place waters.
- Example 6.
- Restrained refinement with maximum likelihood method etc. An extremely complicated script incorporating averaging, arp, etc etc.
- Example 7.
- Restrained refinement with maximum likelihood method etc. 3A data requires fixing of protein Bfactor and BULK scales
- Example 8a.
- Example of rigid body refinement in refmac. Ordinary case with several domains. Side chains excluded from refinement
- Example 8b.
- Example of rigid body refinement in refmac. Experimental phase information used. Side chains excluded from refinement
- Example 9.
- Example of using experimental phase information. Very bad model (RMS error 2A)
- Example 10.
- Example of individual anisotropic B-value refinement. Hydrogens must be added prior to refinement

Completing the data to include all possible hkls. Should do this after data reduction, and certainly before using REFMAC. This is now done with the uniqueify script. Martyn - I think all this should be deleted?

#! /bin/sh # set -e # # Replace missing data with Missing Number Flags (in this case NaNs). # mtzmnf hklin $CEXAM/toxd/toxd_old.mtz hklout $CCP4_SCR/toxd_nan.mtz \ <<eof TITLE toxd data, testing LABI F1=FTOXD3 SIGF1=SIGFTOXD3 - D2=ANAU20 SIGD2=SIGANAU20 - F2=FAU20 SIGF2=SIGFAU20 - F3=FMM11 SIGF3=SIGFMM11 - F4=FI100 SIGF4=SIGFI100 END eof # # Case (1) # # Complete dataset and add free-R column. # Keep systematic absences with -s switch (though you probably wouldn't # want to do this). # uniqueify -s $CCP4_SCR/toxd_nan.mtz $CCP4_SCR/toxd-unique.mtz # # Case (2) # # Complete dataset and add free-R column. # Increase the fraction of reflections tagged with each free-R # indicator above the default 0.05 (sensible for toxd which has # small dataset). # uniqueify -p 0.1 $CCP4_SCR/toxd_nan.mtz $CCP4_SCR/toxd-unique2.mtz # # Case (3) # # First add free-R column to incomplete dataset. # (Silly thing to do - this is just to create a dataset with an existing # free-R column for illustrating use of uniqueify with -f switch.) # freerflag hklin $CCP4_SCR/toxd_nan.mtz hklout $CCP4_SCR/toxd_free.mtz << eof END eof # # Now complete with -f switch to indicate free-R column already present. # uniqueify -f FreeR_flag $CCP4_SCR/toxd_free.mtz $CCP4_SCR/toxd-unique3.mtz #

Restrained refinement with overall B-factor refinement. Method is sparse matrix method. PROTIN is run first.

#!/bin/csh -f # # Example of refinement by refmac # # # Set parameters # set CTEST=/y/programs/xtal/ccp4/examples cp $CTEST/toxd/toxd.pdb $CCP4_SCR/toxd0.pdb # uniqueify $CTEST/toxd/toxd.mtz set inmtz=toxd-unique.mtz start: set name = toxd set last = 0 set cycles = 4 set count = 0 while ($count != $cycles) echo '*******************************************************************' echo $count echo '*******************************************************************' @ curr = $last + 1 # # # Coomplete data and add freeR flag. You may not need this step # See complete_toxd.sh # # # Run protin to set up geometric restraints # PROTCOUNTS contains the number of distances, chiral centres,etc.. # PROTOUT contains atomic coordinates plus all possible restraint # pairings. # protin \ XYZIN $CCP4_SCR/${name}${last}.pdb \ PROTOUT $SCRATCH/protout.dat \ PROTCOUNTS $SCRATCH/counts.dat \ << eop !CHNNAME ID=chain iden;CHNTYP=match CHNTYP;ROFFSET=start resid CHNNAM ID B CHNTYP 1 CHNNAM ID W CHNTYP 2 CHNTYP 1 NTER 1 GLN 3 CTER 59 GLY 2 CHNTYP 2 WAT PEPP 4 SYMM 19 VDWRadii 1 CA 7 3.8 VDWCUT 5 END eop if ($status) exit # # Refmac step. Refine # PROTSCR - an abbreviated version of PROTOUT # it contains atomic coordinates plus all restraint # pairings for atoms which actually are present. # refmac: refmac \ HKLIN $inmtz \ HKLOUT $CCP4_SCR/${name}${last}.mtz \ PROTOUT $SCRATCH/protout.dat \ PROTCOUNTS $SCRATCH/counts.dat \ PROTSCR $SCRATCH/counts.scr \ XYZIN $CCP4_SCR/${name}${last}.pdb \ XYZOUT $CCP4_SCR/${name}${curr}.pdb \ << eor LABIN FP=FTOXD3 SIGFP=SIGFTOXD3 FREE=FreeR_flag LABO FC=FC PHIC=PHIC FWT=2FOFCWT PHWT=PH2FOFCWT - DELFWT=FOFCWT PHDELWT=PHFOFCWT REFI TYPE RESTrained RESOLUTION 20 1.50 REFI RESI MLKF REFI BREF ISOTropic METH CGMAT WEIGHT MATRIX 0.35 !Scaling parameters SCAL TYPE BULK NCYC 2 MONI FEW BINS 20 end eor # if ($status) exit # make maps. # # Sigmaa style 2mfo-dfc map with restored data # fft: fft \ hklin $CCP4_SCR/${name}${last}.mtz \ mapout $CCP4_SCR/2mfodfc_${name}${last}.mtz \ <<eof title Sigmaa style 2mfo-Dfc map calculated with refmac coefficients labi F1=2FOFCWT PHI=PH2FOFCWT binmapout end eof if ($status) exit # # Sigmaa style mfo-dfc map # fft \ hklin $CCP4_SCR/${name}${last}.mtz \ mapout $CCP4_SCR/mfodfc_${name}${last}.mtz \ <<eof title Sigmaa style mfo-Dfc map calculated with refmac coefficients labi F1=FOFCWT PHI=PHFOFCWT binmapout end eof if ($status) exit # @ last++ @ count++ end

Unrestrained refinement by maximum likelihood method.

#!/bin/csh -f # # Example of refinement by refmac # set CTEST=/y/programs/xtal/ccp4/examples cp $CTEST/toxd/toxd.pdb $CCP4_SCR/toxd0.pdb # uniqueify $CTEST/toxd/toxd.mtz set inmtz=toxd-unique.mtz start: set name = toxd set last = 0 set cycles = 4 set count = 0 while ($count != $cycles) echo '*******************************************************************' echo $count echo '*******************************************************************' @ curr = $last + 1 # # # Coomplete data and add freeR flag. You may not need this step # See complete_toxd.sh # # # # # # # Refmac step. Refine # refmac: refmac \ HKLIN $inmtz \ HKLOUT $CCP4_SCR/${name}${last}.mtz \ XYZIN $CCP4_SCR/${name}${last}.pdb \ XYZOUT $CCP4_SCR/${name}${curr}.pdb \ << eor LABIN FP=FTOXD3 SIGFP=SIGFTOXD3 FREE=FreeR_flag LABO FC=FC PHIC=PHIC FWT=2FOFCWT PHWT=PH2FOFCWT - DELFWT=FOFCWT PHDELWT=PHFOFCWT REFI TYPE UNREstrained RESOLUTION 20 1.50 REFI RESI MLKF REFI BREF ISOTropic #WEIGHT MATRIX 0.35 !Scaling parameters SCAL TYPE BULK NCYC 2 MONI FEW BINS 20 end eor # if ($status) exit # make maps. # # Sigmaa style 2mfo-dfc map with restored data # fft: fft \ hklin $CCP4_SCR/${name}${last}.mtz \ mapout $CCP4_SCR/2mfodfc_${name}${last}.mtz \ <<eof title Sigmaa style 2mfo-Dfc map calculated with refmac coefficients labi F1=2FOFCWT PHI=PH2FOFCWT binmapout end eof # if ($status) exit # Sigmaa style mfo-dfc map # fft \ hklin $CCP4_SCR/${name}${last}.mtz \ mapout $CCP4_SCR/mfodfc_${name}${last}.mtz \ <<eof title Sigmaa style mfo-Dfc map calculated with refmac coefficients labi F1=FOFCWT PHI=PHFOFCWT binmapout end eof if ($status) exit

Idealization. Method of minimization is conjugate gradient method

#!/bin/csh -f # # Example of refinement by refmac # # # Set parameters # set CTEST=/y/programs/xtal/ccp4/examples set crdin=$CTEST/toxd/toxd.pdb set crdout=toxd_1.pdb # # # # Run protin to set up geometric restraints # PROTCOUNTS contains the number of distances, chiral centres,etc.. # PROTOUT contains atomic coordinates plus all possible restraint # pairings. # protin \ XYZIN $crdin \ PROTOUT $SCRATCH/protout.dat \ PROTCOUNTS $SCRATCH/counts.dat \ << 'END-protin ' !CHNNAME ID=chain iden;CHNTYP=match CHNTYP;ROFFSET=start resid CHNNAM ID B CHNTYP 1 CHNNAM ID W CHNTYP 2 CHNTYP 1 NTER 1 GLN 3 CTER 59 GLY 2 CHNTYP 2 WAT PEPP 4 SYMM 19 VDWRadii 1 CA 7 3.8 VDWCUT 5 END 'END-protin ' if ($status) exit # # Refmac step. Refine # PROTSCR - an abbreviated version of PROTOUT # it contains atomic coordinates plus all restraint # pairings for atoms which actually are present. # refmac: refmac \ PROTOUT $SCRATCH/protout.dat \ PROTCOUNTS $SCRATCH/counts.dat \ PROTSCR $SCRATCH/counts.scr \ XYZIN $crdin \ XYZOUT $crdout \ << eor REFI TYPE IDEAlise REFI RESI MLKF REFI METH CGMAT NCYC 2 MONI FEW end eor if ($status) exit #

Restrained refinement with Partial contribution from hydrogens.

#!/bin/csh -f # start: set name = kak1_ set last = 1 set cycles = 4 set count = 0 while ($count != $cycles) echo '*******************************************************************' echo $count echo '*******************************************************************' @ curr = $last + 1 # # Run protin to set RESTRAINT lists. # protin: protin \ XYZIN $SCRATCH/${name}${last}.pdb \ PROTOUT $SCRATCH/kak_protout.dat \ PROTCOUNTS $SCRATCH/kak_counts.dat \ << END-protin CHNNAME ID A CHNTYP 1 CHNNAME ID B CHNTYP 2 CHNNAME ID C CHNTYP 3 CHNNAME ID D CHNTYP 4 CHNNAME ID E CHNTYP 5 CHNTYP 1 NTER 1 ALA 3 CTER 517 HIS 2 CISPRO 1 283 CHNTYP 2 NTER 1 LYS 3 CTER 3 LYS 2 CHNTYP 3 WAT # You can call a CHAIN WAT as long as there is no connectivity.. CHNTYP 4 WAT CHNTYP 5 WAT SYMMETRY 19 END END-protin if ($status) exit # # Run hgen to generate hydrogen position # hgen: hgen \ XYZIN $SCRATCH/${name}${last}.pdb \ XYZOUT $SCRATCH/hyd_${name}${last}.pdb \ <<eof HYDROGENS SEPARATE eof if ($status) exit # # Run sfall to calculate structure factors from hydrogens # sfall: sfall \ HKLIN $SCRATCH/kak_freer_unobs.mtz \ HKLOUT $SCRATCH/kak_freer_unobs_hyd.mtz \ XYZIN $SCRATCH/hyd_${name}${last}.pdb \ << eof NOSCALE MODE SFCALC XYZIN HKLIN LABI FP=F SIGFP=SIGF FREE=FreeR_flag LABO FC=FC_hyd PHIC=PHIC_hyd !Refinement parameters BINS 20 END eof if ($status) exit # # Run refmac - refinement adding in hydrogen contributions # refmac: refmac \ HKLIN $SCRATCH/kak_freer_unobs_hyd.mtz \ XYZIN $SCRATCH/${name}${last}.pdb \ HKLOUT $SCRATCH/${name}${last}.mtz \ PROTOUT $SCRATCH/kak_protout.dat \ PROTCOUNTS $SCRATCH/kak_counts.dat \ REFSCR $SCRATCH/prot_scr.mtz \ XYZOUT $SCRATCH/${name}${curr}.pdb << eop MODE HKRF LABI FP=F SIGFP=SIGF FREE=FreeR_flag FPART1=FC_hyd PHIP1=PHIC_hyd LABO FC=FC_p PHIC=PHIC_p FWT=2FOFCWT_p DELFWT=FOFCWT_p !Refinement parameters SCPART 1 MONI MANY PLAN 0.3 REFI TYPE REST REFI RESI MLKF RESO 20.0 1.2 REFI BREF ISOT METH CDIR WEIGHT MATRIX 0.5 !Scaling parameters SCAL TYPE BULK SCAL LSSC RESO 20 1.2 NCYC 5 BINS 20 END eop if ($status) exit # @ last++ @ count++ end

Restrained refinement with maximum likelihood method followed by ARPP to place waters

# #!/bin/csh -f # # A script which refined a structure # After refinement the REFMAC maps are calculated, and density # passed to ARPP to fit waters into. # # Step 1: protin and REFMAC. # # Step 2: Calculate ffts for 2mFodFc and mFodFc maps . # Step 3: Expand the maps as ARPP requires ( you need the "edges") # # Step 4: Run arpp to find the waters # #goto start # ########################################################################### ########################################################################### # Step 1: protin and REFMAC. # NCS restrains on molecules A B C AND water chains W X Y # start: set name = 'cutm_' set last = 6 set cycles = 3 set count = 0 set data = 'cute_dnpp.free.mtz' # while ($count != $cycles) @ curr = $last + 1 # # protin \ XYZIN $SCRATCH/${name}${last}.pdb \ PROTOUT $SCRATCH/cut_protout.dat \ PROTCOUNTS $SCRATCH/cut_counts.dat \ << eof TITL Cutinase monoclinic + DNPP CHNNAM ID A CHNTYP 1 CHNNAM ID B CHNTYP 1 CHNNAM ID C CHNTYP 1 CHNNAM ID W CHNTYP 2 CHNNAM ID X CHNTYP 2 CHNNAM ID Y CHNTYP 2 CHNTYP 1 NTERM 3 GLY 3 CTERM 193 ARG 2 DISUL 2 17 94 156 163 CHNTYP 2 WAT NONX 3 CHNID A B C NSPANS 1 1 200 1 NONX 3 CHNID W X Y NSPANS 1 1 200 2 SYMMETRY 4 LIST FEW END eof # if ($status) exit # refmac: refmac \ HKLIN $data \ PROTOUT $SCRATCH/cut_protout.dat \ PROTCOUNTS $SCRATCH/cut_counts.dat \ PROTSCR $SCRATCH/cut_counts.scr \ XYZIN $SCRATCH/${name}${last}.pdb \ HKLOUT $SCRATCH/${name}${curr}.mtz \ XYZOUT $SCRATCH/${name}${curr}.pdb << eop # LABI FP=FP SIGFP=SIGFP FREE=FreeR_flag LABO FC=FC PHIC=PHIC FWT=FWT DELFWT=DELFWT !Refinement parameters REFI TYPE REST REFI RESI MLKF RESO 20.0 2.05 PLAN 1 0.04 0.015 REFI BREF ISOT METH CGMAT WEIGH MATRIX 0.8 TEMP 1.0 3.0 5.0 6.0 8.0 1.0 !Scaling parameters SCAL TYPE BULK LSSC ANIS RESO 20 2.05 NCYC 4 ! cycles round refinement NCYC times before redoing PROTIN MONI FEW BINS 20 END eop # if ($status) exit # ########################################################################### ########################################################################### #---- fft 2Fo-Fc map # # # Step 2: Calculate ffts for 2mFodFc and mFodFc maps # ( this facilitates the averaging) # fft HKLIN $SCRATCH/${name}${curr}.mtz \ MAPOUT $SCRATCH/${name}${curr}_32_p21.map << eof TITLE 2FO-1FC RESOLUTION 20.0 2.05 GRID 128 128 128 XYZLIM 0 127 0 63 0 127 BINMAPOUT LABIN F1=FWT PHI=PHWT END eof # # # if ($status) exit # # fft HKLIN $SCRATCH/${name}${curr}.mtz \ MAPOUT $SCRATCH/${name}${curr}_11_p21.map \ << eof TITLE 1FO-1FC RESOLUTION 20.0 2.05 GRID 128 128 128 XYZLIM 0 127 0 63 0 127 BINMAPOUT LABIN F1=DELFWT PHI=PHDELWT END eof # # # if ($status) exit 2: # ########################################################################### ########################################################################### # Step 3: Expand the maps to fill the edges for ARPP # mapmask \ MAPIN $SCRATCH/${name}${curr}_32_p21.map \ MAPOUT $SCRATCH/${name}${curr}_32_p21.ext \ << eof SYMM P21 XYZLIM 0 128 0 64 0 128 eof # # if ($status) exit # mapmask \ MAPIN $SCRATCH/${name}${curr}_11_p21.map \ MAPOUT $SCRATCH/${name}${curr}_11_p21.ext \ << eof SYMM P21 XYZLIM 0 128 0 64 0 128 eof # #exit # if ($status) exit # # ########################################################################### ########################################################################### # Step 4: Run arpp to find the waters # # arpp: # arpp \ xyzin $SCRATCH/${name}${curr}.pdb \ MAPIN1 $SCRATCH/${name}${curr}_32_p21.ext \ MAPIN2 $SCRATCH/${name}${curr}_11_p21.ext \ xyzout $SCRATCH/temp_arp.pdb << eof SYMMETRY P21 REMOVE ATOMS 10 ANALYSE WATERS CUTSIGMA 0.50 MERGE 1.8 FIND ATOMS 60 BFACTOR 30 CHAIN Y CUTSIGMA AUTO #FIND ATOMS 100 BFACTOR 30 CHAIN Y CUTSIGMA 3 FDISTANCE NEWOLD 2.2 3.3 NEWNEW 2.5 REFINE WATERS END eof # # if ($status) exit # mv $SCRATCH/temp_arp.pdb $SCRATCH/${name}${curr}.pdb # /bin/rm *TMP*,*ARP* # @ last++ @ count++ end # exit # # echo " check log file error in one step " #

Restrained refinement with maximum likelihood method etc. An extremely complicated script incorporating averaging, arp, etc etc.

#!/bin/csh -f # # A script which refined a structure with 3 molecules in the # asymmetric unit; # After refinement the REFMAC maps are averaged, and density around one # molecule only is passed to ARPP to fit waters into. # This set of waters is expanded to include those around the other two molecules. then the process is cycled.. # Step 1: Find the rotations which convert the master molecule ( in this case C) # to overlap the other two ( here labelled A & B) # # Step 2: protin and REFMAC. # NCS restrains on molecules A B C AND water chains W X Y # # Step 3: Calculate ffts for 2mFodFc and mFodFc maps in P1. # ( this facilitates the averaging) # Step 4: Extract atoms for chains C and Y from refmac output coordinates. # Use ncs mask to define a mask round Molecule C and its associated waters Y # # Step 5: Average 2mFodFc and mFodFc maps over this mask. # Output two maps - wrkout only covers the masked region. # mapout is a map expanded to the volume # given on the MAPIN limits ( usually P1) # It contains the averaged density in the # volume under the mask, and its non # crystallographic equivalents, and the # unmodified density everywhere else. # # Step 6: Expand the wrkmap to lie within a P1 cell with zeros padded # where there is no density. ( ARPP requires a full cell) # # Step 7: Run arpp in P1 to find the common waters which should show up # in the averaged density. # # Step 8: Rebuild a coordinate set containing molecules A B C plus # the new ARPP waters, plus those generated by applying the # ncs symmetry to these. # ( Use grep and pdbset) goto start ########################################################################### ########################################################################### # Get rotations to average C coordinates over A and B # I assume these wont change during cycles... # Step 1: Find the rotations which convert the master molecule ( in this case C) # to overlap the other two ( here labelled A & B) # lsqkab \ workcd cutm_0.pdb \ refrcd cutm_0.pdb \ << eof FIT WRES 1 to 193 WCHAIN C MATCH RRES 1 to 193 RCHAIN A OUTPUT RMS END eof if ($status) exit # lsqkab \ workcd cutm_0.pdb \ refrcd cutm_0.pdb \ << eof FIT WRES 1 to 193 WCHAIN C MATCH RRES 1 to 193 RCHAIN B OUTPUT RMS END eof if ($status) exit # ########################################################################### ########################################################################### # Check the transformations are correct. They are! cutm_0_A.pdb is the same # as the A chain cordinates in cutm_0.pdb.. pdbset: pdbset \ xyzin $SCRATCH/cutm_0_C.pdb \ xyzout $SCRATCH/cutm_0_A.pdb \ << eof ROTATE EULER 9.70152 -119.69961 7.80144 SHIFT 29.55211 8.54860 19.71765 CHAIN W eof if ($status) exit # pdbset \ xyzin $SCRATCH/cutm_0_C.pdb \ xyzout $SCRATCH/cutm_0_B.pdb \ << eof ROTATE EULER 175.90645 -120.30214 172.70070 SHIFT -1.62819 4.70139 35.11710 CHAIN X eof if ($status) exit # exit # ########################################################################### ########################################################################### # Step 2: protin and REFMAC. # NCS restrains on molecules A B C AND water chains W X Y # start: set name = 'cutm_' set last = 6 set cycles = 3 set count = 0 set data = 'cute_dnpp.free.mtz' # while ($count != $cycles) @ curr = $last + 1 # # protin \ XYZIN $SCRATCH/${name}${last}.pdb \ PROTOUT $SCRATCH/cut_protout.dat \ PROTCOUNTS $SCRATCH/cut_counts.dat \ << eof TITL Cutinase monoclinic + DNPP CHNNAM ID A CHNTYP 1 CHNNAM ID B CHNTYP 1 CHNNAM ID C CHNTYP 1 CHNNAM ID W CHNTYP 2 CHNNAM ID X CHNTYP 2 CHNNAM ID Y CHNTYP 2 CHNTYP 1 NTERM 3 GLY 3 CTERM 193 ARG 2 DISUL 2 17 94 156 163 CHNTYP 2 WAT NONX 3 CHNID A B C NSPANS 1 1 200 1 NONX 3 CHNID W X Y NSPANS 1 1 200 2 SYMMETRY 4 LIST FEW END eof # if ($status) exit # refmac: refmac \ HKLIN $data \ PROTOUT $SCRATCH/cut_protout.dat \ PROTCOUNTS $SCRATCH/cut_counts.dat \ PROTSCR $SCRATCH/cut_counts.scr \ XYZIN $SCRATCH/${name}${last}.pdb \ HKLOUT $SCRATCH/${name}${curr}.mtz \ XYZOUT $SCRATCH/${name}${curr}.pdb << eop # LABI FP=FP SIGFP=SIGFP FREE=FreeR_flag LABO FC=FC PHIC=PHIC FWT=FWT DELFWT=DELFWT !Refinement parameters REFI TYPE REST REFI RESI MLKF RESO 20.0 2.05 PLAN 1 0.04 0.015 REFI BREF ISOT METH CGMAT WEIGH MATRIX 0.5 # (default) TEMP 1.0 3.0 5.0 6.0 8.0 1.0 !Scaling parameters SCAL TYPE BULK LSSC ANIS RESO 20 2.05 NCYC 4 ! cycles round refinement NCYC times before redoing PROTIN MONI FEW BINS 20 END eop # if ($status) exit # ########################################################################### ########################################################################### #---- fft 2Fo-Fc map in P1 cell ( needed for averaging) # # # Step 3: Calculate ffts for 2mFodFc and mFodFc maps in P1. # ( this facilitates the averaging) # fft HKLIN $SCRATCH/${name}${curr}.mtz \ MAPOUT $SCRATCH/${name}${curr}_32_p1.map << eof TITLE 2FO-1FC FFTSYMMETRY P1 GRID 128 128 128 XYZLIM 0 127 0 127 0 127 BINMAPOUT LABIN F1=FWT PHI=PHWT END eof # # # if ($status) exit # # fft HKLIN $SCRATCH/${name}${curr}.mtz \ MAPOUT $SCRATCH/${name}${curr}_11_p1.map \ << eof TITLE 1FO-1FC FFTSYMMETRY P1 GRID 128 128 128 XYZLIM 0 127 0 127 0 127 BINMAPOUT LABIN F1=DELFWT PHI=PHDELWT END eof # # # if ($status) exit 2: ########################################################################### ########################################################################### # # Remove A B W and X chains from REFMAC output coordinates grep -i -v " b " $SCRATCH/${name}${curr}.pdb > $SCRATCH/junk1.pdb grep -i -v " a " $SCRATCH/junk1.pdb > $SCRATCH/junk2.pdb grep -i -v " w " $SCRATCH/junk2.pdb > $SCRATCH/junk3.pdb grep -i -v " x " $SCRATCH/junk3.pdb > $SCRATCH/${name}${curr}_CY.pdb # # make a mask from the CY atoms only # Step 4: Use ncs mask to define a mask round Molecule C # and its associated Waters Y # rot0: ncsmask \ xyzin $SCRATCH/${name}${curr}_CY.pdb \ mskout $SCRATCH/${name}${curr}_CY.msk \ <<eof RADIUS 4 SYMM P1 GRID 128 128 128 eof # # if ($status) exit # ########################################################################### ########################################################################### # # Step 5: Average 2mFodFc and mFodFc maps over this mask. # Output two maps - wrkout only covers the masked region. # mapout is a map expanded to the volume # given on the MAPIN limits ( usually P1) # It contains the averaged density in the # volume under the mask, and its non # crystallographic equivalents, and the # unmodified density everywhere else. # # rot: # wrkout map - Averaged map within the mask only - # not expanded. One molecule # mapout map - Averaged map expanded by symm ops to # cover the extent of mapin- three ncs *nsym mols. maprot \ MAPIN $SCRATCH/${name}${curr}_32_p1.map \ mskin $SCRATCH/${name}${curr}_CY.msk \ wrkout $SCRATCH/${name}${curr}_32_CY.map \ mapout $SCRATCH/${name}${curr}_32_all.map \ <<eof MODE BOTH SCALE 0.3333 AVERAGE 3 # ROTATE EULER 0 0 0 TRANSLATE 0 0 0 ROTATE EULER 9.70152 -119.69961 7.80144 TRANSLAT 29.55211 8.54860 19.71765 ROTATE EULER 175.90645 -120.30214 172.70070 TRANSLAT -1.62819 4.70139 35.11710 END eof # if ($status) exit # Now the difference map maprot \ MAPIN $SCRATCH/${name}${curr}_11_p1.map \ mskin $SCRATCH/${name}${curr}_CY.msk \ wrkout $SCRATCH/${name}${curr}_11_CY.map \ mapout $SCRATCH/${name}${curr}_11_all.map \ <<eof MODE BOTH SCALE 0.3333 AVERAGE 3 # ROTATE EULER 0 0 0 TRANSLATE 0 0 0 ROTATE EULER 9.70152 -119.69961 7.80144 TRANSLAT 29.55211 8.54860 19.71765 ROTATE EULER 175.90645 -120.30214 172.70070 TRANSLAT -1.62819 4.70139 35.11710 END eof # # if ($status) exit # ########################################################################### ########################################################################### # Step 6: Expand the wrkmap to lie within a P1 cell with zeros padded # where there is no density. ( ARPP requires a full cell) # # Pad - work map to fill P1 cell, move it to 0-1,0-1,0-1 # ready for arpp # mapmask \ MAPIN $SCRATCH/${name}${curr}_32_CY.map \ MAPOUT $SCRATCH/${name}${curr}_32_CY_p1.map \ << eof PAD 0.0 SYMM P1 XYZLIM 0 128 0 128 0 128 eof # # if ($status) exit # mapmask \ MAPIN $SCRATCH/${name}${curr}_11_CY.map \ MAPOUT $SCRATCH/${name}${curr}_11_CY_p1.map \ << eof PAD 0.0 SYMM P1 XYZLIM 0 128 0 128 0 128 eof # #exit # if ($status) exit # # ########################################################################### ########################################################################### # Step 7: Run arpp to find the common waters which should show up # in the averaged density. # # arpp: # arpp \ xyzin $SCRATCH/${name}${curr}_CY.pdb \ MAPIN1 $SCRATCH/${name}${curr}_32_CY_p1.map \ MAPIN2 $SCRATCH/${name}${curr}_11_CY_p1.map \ xyzout $SCRATCH/temp_arp.pdb << eof SYMMETRY 1 REMOVE ATOMS 10 ANALYSE WATERS CUTSIGMA 0.50 MERGE 1.8 FIND ATOMS 60 BFACTOR 30 CHAIN Y CUTSIGMA AUTO #FIND ATOMS 100 BFACTOR 30 CHAIN Y CUTSIGMA 3 FDISTANCE NEWOLD 2.2 3.3 NEWNEW 2.5 REFINE WATERS END eof # # if ($status) exit # ########################################################################### ########################################################################### # Step 8: Rebuild a coordinate set containing molecules A B C plus # the new ARPP waters, plus those generated by applying the # ncs symmetry to these. # grep -i wat $SCRATCH/temp_arp.pdb > $SCRATCH/temp_arp_Y.pdb # coordinates from refmac - # # output CHAINS A B C of the output coordinates from refmac to # $SCRATCH/${name}${curr}_nowat.pdb # grep -i -v wat $SCRATCH/${name}${curr}.pdb > $SCRATCH/${name}${curr}_nowat.pdb # Expand the Y waters to lie near the A molecule; call them W ... pdbset \ xyzin $SCRATCH/temp_arp_Y.pdb \ xyzout $SCRATCH/temp_arp_W.pdb \ << eof ROTATE EULER 9.70152 -119.69961 7.80144 SHIFT 29.55211 8.54860 19.71765 CHAIN W eof # if ($status) exit # Expand the Y waters to lie near the B molecule; call them X ... pdbset \ xyzin $SCRATCH/temp_arp_Y.pdb \ xyzout $SCRATCH/temp_arp_X.pdb \ << eof ROTATE EULER 175.90645 -120.30214 172.70070 SHIFT -1.62819 4.70139 35.11710 CHAIN X eof # if ($status) exit # # # Put them all back together.. cat $SCRATCH/${name}${curr}_nowat.pdb $SCRATCH/temp_arp_W.pdb $SCRATCH/temp_arp_X.pdb $SCRATCH/temp_arp_Y.pdb > $SCRATCH/temp_arp.pdb mv $SCRATCH/temp_arp.pdb $SCRATCH/${name}${curr}.pdb # #/bin/rm *TMP* # @ last++ @ count++ end # exit #

Restrained refinement with maximum likelihood method etc. 3A data requires fixing of protein Bfactor and BULK scales

#!/bin/csh -f # set verbose set name = piitrig- set last = 0 set cycles = 4 set count = 0 while ($count != $cycles) @ curr = $last + 1 # #goto refmac protin: protin XYZIN $SCRATCH/$name$last.pdb \ PROTOUT $SCRATCH/${name}_protout.dat \ PROTCOUNTS $SCRATCH/${name}_counts.dat \ << 'END-protin' TITLE trigonal pII at 3.0A SYMMETRY 154 CHNNAME ID A CHNTYP 1 CHNNAME ID B CHNTYP 1 CHNNAME ID C CHNTYP 1 CHNNAME ID W CHNTYP 2 CHNTYP 1 NTER 1 MET 3 CTER 95 ASP 2 CHNTYP 2 WAT NONX 3 CHNID A B C NSPANS 1 1 95 1 LIST FEW PEPP 5 END 'END-protin' if ($status) exit # refmac: refmac \ HKLIN pii_154free.mtz \ HKLOUT $SCRATCH/$name$last.mtz \ PROTOUT $SCRATCH/${name}_protout.dat \ PROTCOUNTS $SCRATCH/${name}_counts.dat \ PROTSCR $SCRATCH/${name}_counts.scr \ XYZIN $SCRATCH/$name$last.pdb \ XYZOUT $SCRATCH/$name$curr.pdb \ << 'END-sfrk' LABIN FP=FP SIGFP=SIGFP FREE=FreeR_flag !Refinement parameters REFI TYPE RESTrained RESOLUTION 20 3.0 REFI RESI MLKF REFI BREF ISOTropic METH CDIR WEIGHT MATRIX 0.1 DAMP 0.5 0.5 !Scales shifts to avoid large shifts !Scaling parameters SCAL TYPE BULK LSSC FIXBulk SCBUlk -0.75 BBULk 70 ! Fixes bulk solvent parameters SCAL LSSC ANISO APPLy OBSErved !Scales aniso B and applies to observed SCAL BAVERAGE 30.0 ! Keeps average B of molecule constatnt BFAC 1 2.0 2.5 3.0 4.5 NCYC 4 MONI FEW BINS 20 LABO FC=FC PHIC=PHIC FWT=2FOFCWT DELFWT=FOFCWT 'END-sfrk' if ($status) exit # @ last++ @ count++ end

Example of rigid body refinement in refmac. Ordinary case with several domains. Side chains excluded from refinement

#!/bin/csh -f # ################################################################### ##################################################################### set name = cytc_refmac_ set inmtz=/y/ccp4/cytc/pseudo_alc_cprime1_sc_feph_sq_sf2_sfx_sq_a+p-unique_pt2p.mtz set last = 0 set cycles = 1 set count = 0 while ($count != $cycles) @ curr = $last + 1 refmac \ HKLIN $inmtz \ HKLOUT ${name}${curr}.mtz \ XYZOUT ${name}${curr}.pdb \ XYZIN ${name}${last}.pdb \ << eop LABI FP=Fnata SIGFP=SIGFnata FREE=FreeR_flag ! !Refinement parameters ! REFI TYPE RIGID RESI MLKF RESO 15 2.0 ! Maximum likelihood refinement. Resolution limit 15 to 2.0 A !Scaling parameters ! SCALe TYPE BULK LSSC ANIS FIXBulk BBULk 70.0 SCBUlk -0.75 ! Fix bulk solvent parameters for LSQ scaling SCAL MLSC FIXBulk BBULk 100.0 SCBUlk -0.1 ! Fix bulk solvent parameters for ML scaling ! !Rigid body parameters ! RIGIdbody NCYCle 10 ! Number of cycles for rigid body refi RIGIdbody GROUp 1 FROM 2 A TO 32 A EXCLU SCHAIns ! Define domains. Exclude side chains RIGIdbody GROUp 2 FROM 38 A TO 55 A EXCLU SCHAIns RIGIdbody GROUp 3 FROM 76 A TO 99 A EXCLU SCHAIns RIGIdbody GROUp 4 FROM 101 A TO 126 A EXCLU SCHAIns RIGIDbody GROUp 5 FROM 56 A to 75 A EXCLU SCHAins RIGIDbody PRINt EULErangles MATRix ! Print Euler angles and Matrices MONI MANY END eop if ($status) exit # @ last++ @ count++ end

Same problem but now using experimental phases.

#!/bin/csh -f set inmtz=/y/ccp4/cytc/pseudo_alc_cprime1_sc_feph_sq_sf2_sfx_sq_a+p-unique_pt2p.mtz set name = cytc_refmac_ set last = 0 set cycles = 1 set count = 0 while ($count != $cycles) @ curr = $last + 1 refmac \ HKLIN $inmtz \ HKLOUT ${name}${curr}.mtz \ XYZOUT ${name}${curr}.pdb \ XYZIN ${name}${last}.pdb \ << eop LABI FP=Fnata SIGFP=SIGFnata FREE=FreeR_flag - HLA=HLA_pt2 HLB=HLB_pt2 HLC=HLC_pt2 HLD=HLD_pt2 LABO FC=FC_DP5 PHIC=PHIC_1 FWT=2FOFCWT DELFWT=FOFCWT ! REFI PHASE BBLUrring 30.0 SCBLurring 0.9 ! " Blur" (ie Scale down) FOMs REFI PHASE ! use phased Fo Fc differences for sigmaA estimation ! REFI TYPE RIGI RESI MLKF RESO 15 2.0 ! Maximum likelihood refinement ! !Scaling parameters SCALe LSSC FIXBulk BBULk 70.0 SCBUlk -0.75 ! Fix bulk solvent parameters for LSQ scaling ! SCAL MLSC WORK ! Use working reflections for sigmaA estimation. ! Useful at the very early stages of refinement SCAL MLSC FIXBulk BBULk 100.0 SCBUlk -0.1 ! Fix bulk solvent parameters for ML scaling ! !Rigid body parameters ! RIGIdbody NCYCle 10 ! Number of cycles for rigid body refi RIGIdbody GROUp 1 FROM 2 A TO 32 A EXCLU SCHAIns ! Define domains. Exclude side chains RIGIdbody GROUp 2 FROM 38 A TO 55 A EXCLU SCHAIns RIGIdbody GROUp 3 FROM 76 A TO 99 A EXCLU SCHAIns RIGIdbody GROUp 4 FROM 101 A TO 126 A EXCLU SCHAIns RIGIDbody GROUp 5 FROM 56 A to 75 A EXCLU SCHAins RIGIDbody PRINt EULErangles MATRix ! Print Euler angles and Matrices MONI FEW END eop if ($status) exit # @ last++ @ count++ end

Example of using experimental phase information . Very bad model ( RMS error 2A)

#!/bin/csh -f # #!/bin/csh -f #set verbose set name = tnc_test_ set last = 0 set cycles = 100 set count = 0 while ($count != $cycles) @ curr = $last + 1 #goto refmac # protin \ XYZIN ${name}${last}.pdb \ << END-protin CHNNAM ID 5 CHNTYP 1 CHNTYP 1 NTERM 2 SER 3 CTERM 162 GLY 2 SYMMETRY 154 LIST FEW TITLE TNC very bad model END END-protin if ($status) exit refi: date refmac: refmac \ HKLIN tnc_test.mtz \ HKLOUT ${name}${curr}.mtz \ XYZOUT ${name}${curr}.pdb \ XYZIN ${name}${last}.pdb \ << eop LABI FP=Fnati SIGFP=SIGFnati FREE=FreeR_flag - HLA=HLA_t HLB=HLBt HLC=HLC_t HLD=HLD_t !HL coefficients. It tells program !use phased refinement #PHIB=PHIDM_t FOM=FOMDM_t ! Option if there are no HL coefficients. # LABO FC=FC_main PHIC=PHIC_main FWT=2FOFCWT DELFWT=FOFCWT !Labels for output file !Refinement parameters REFI TYPE REST RESO 12 2.8 !restrained refinement. resolution for refinement REFI RESI MLKF ! Use maximum likelihood residual for refinement REFI BREF OVER ! Refine overall B-values REFI METH CGMAT ! Sparse matrix for minimisaion WEIG MATR 0.15 ! Weight geometry and x-ray according to diagonal terms of second ! derivative matrix DAMP 0.5 0.5 ! Scales down shifts to avoid large shifts. Sometimes this value ! should be decreased. Especially at early stages of refi REFI PHASed BBLUrring 20.0 SCBLurring 0.7 ! You may need to play with blurring ! factors if you think the phase ! weighting is overestimated. !Scaling parameters SCAL TYPE BULK LSSC ANISO !Aniso scaling. Gaussian bulk solvent ! correction SCALe LSSC FIXBulk BBULk 70.0 SCBUlk -0.75 ! Fix bulk solvent parameters - often ! ill determined for resolutions < 2.8A SCAL MLSC FIXBulk BBULk 100.0 SCBUlk -0.50 ! Fix second gaussian parameters for ! SigmaA estimaton NCYC 5 ! Number of refinement cycles MONI FEW ! Give just overall statistics BINS 20 ! Number of resolution bins END eop if ($status) exit #exit # @ last++ @ count++ end

Example of refinement of individual anisotropic B values. Hydrogens must be added prior to refinement.

#!/bin/csh -f #set verbose # ########################################################################### # You have to edit labi, reso, name, data, spgr and protin # # # # data - mtz file # # name - name of coordinate file # # reso - resolution of data or resoloution limits you want to refine # # labi - input file labels # # spgr - space group of crystal # # protin - program which makes list of restraints # # # #######Following things should be optimised according to data used # # # # WEIG MATR <number> weighting x-ray and geometry. At low resolution# # it should be small (sometimes 0.1) at high # # resolution high (sometimes 6.0 or even higher) # # SPHEricity restraints on sphericity of atoms. Larger # # less spherical # # # # NB: with current version only CDIR (conjugate direction) method of # # minimisation is active # # # ########################################################################### set name = 'p1lys_' set last = 0 set cycles = 3 set count = 0 set data = 'p1lys_val-unique' set reso = '100 0.92' set spgr = '1' set labi = 'FP=FO SIGFP=SIGFO FREE=FreeR_flag' while ($count != $cycles) @ curr = $last + 1 #goto refmac # # # Generate hudrogens and calculate contribution of them. ################################################################### hgen: hgen \ XYZIN ${name}${last}.pdb \ XYZOUT ${name}_hydr.pdb \ << eof HYDROgens SEPArate ! Hydrogens in seperate file eof # # # Calculate contribution from hydrogens. NOSCALE option should # be used to preserve absolute scale of structure factros of # hydrogens ################################################################### sfall: sfall \ HKLIN ${data}.mtz \ HKLOUT ${data}_hydr.mtz \ XYZIN ${name}_hydr.pdb \ << eof NOSCALE MODE SFCALC XYZIN HKLIN LABI $labi LABO FC=FC_hyd PHIC=PHIC_hyd !Refinement parameters RESO $reso BINS 20 END eof # # Run protin to generate list of restraints ################################################################### protin \ XYZIN ${name}${last}.pdb \ << eof TITL PAV2 2.01A data RXIS at 120K CHNNAME ID A CHNTYP 1 ROFFSET 0 CHNNAME ID W CHNTYP 2 ROFFSET 0 CHNTYP 1 NTER 1 LYS 3 CTER 129 LEU 2 CHNTYP 2 WAT LIST FEW #[few]/some (for >20A)/all PEPP 5 #no. of atoms restrained to be in a plane [5] SYMMETRY $spgr LIST FEW END eof refi: date refmac: # # Run refmac to refine structure. It is refinement with # anisotropic B values ################################################################### refmac \ HKLIN ${data}_hydr.mtz \ HKLOUT ${name}${curr}.mtz \ XYZOUT ${name}${curr}.pdb \ XYZIN ${name}${last}.pdb \ << eop LABI $labi FPART1=FC_hyd PHIP1=PHIC_hyd LABO FC=FC_main PHIC=PHIC_main FWT=2FOFCWT DELFWT=FOFCWT !Labels for output file !Refinement parameters REFI TYPE REST RESO $reso !restrained refinement. resolution for refinement REFI RESI MLKF ! Use maximum likelihood residual for refinement REFI BREF ANISotropic ! Refine individual anisotropic B-values REFI METH CDIR ! Conjugate direction method. WEIG MATR 4.0 ! Weight geometry and x-ray according to diagonal ! terms of second derivative matrix. SPHERicity 5.0 ! Restraints on sphericity of atoms. Default is 2.0 ! but is not good !Scaling parameters SCAL TYPE BULK LSSC ANISO ! Aniso scaling to remove overall crystallographic ! mode. Gaussian bulk solvent correction BLIM 1.0 150.0 ! Limit of B values. For anisotropic refinement it ! is limit of eignevalues of anisotropic B tensor NCYC 5 ! Number of refinement cycles MONI FEW ! Give just overall statistics BINS 20 ! Number of resolution bins END eop if ($status) exit #exit # @ last++ @ count++ end

Garib Murshudov, York University.

This document has been prepared by: Garib Murshudov, Eleanor Dodson, Maria Turkenburg.