content="text/html; charset=iso-8859-1"> content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">

VECFEM3 Reference Manual: LSOLPP

Type: FORTRAN routine

NAME

LSOLPP - LINSOL: Solves a linear system of equations MAT*x = b

SYNOPSIS

call LSOLPP(lmat,lprec,liprec,lindex,l,ldw,liw,nproc,: lsym,ia1,info,MAT,prec,x,b,dw,iprec,index,iw,ilin, lmatbk,ptrinf,tids,epslin,iconv,ierr)
integer: lmat,lprec,liprec,lindex,l,ldw,liw,nproc,ia1,iconv, ierr
integer: index(lindex),info(ia1,7),ilin(50),iprec(liprec), iw(liw),lmatbk(nproc),ptrinf(12,nproc), tids(nproc)
double precision: MAT(lmat),prec(lprec),dw(ldw),x(l),b(l),epslin
logical: lsym

PURPOSE

LSOLPP is a routine to solve a linear system MAT*x = b with a large and sparse l x l real matrix MAT and a right hand side b of length l. Different solution methods of the generalised CG-method are used, which can be selected by the user. To accelerate the iteration and to get a better stability, the matrix MAT is normalised by a diagonal matrix and/or precondition from the left-hand side and/or the right hand side. MAT is handed over in a sparse matrix storage scheme.

ARGUMENTS

If a parameter has the attribute 'local', the parameter values can vary on different processors. If a parameter has the attribute 'global', it must have the same value on all processors. The label *PAR means that the parameter only has to be declared, but does not have to be set (initialised) on NON-PARALLEL computers. The label **PAR means that the parameter only has to be declared, but does not have to be set, if nproc is set (initialised) to 1 (on NON-PARALLEL computers nproc is internally set to 1).

lmat integer, scalar, input, local

Length of (local) matrix array MAT.

lprec integer, scalar, input, local

Length of preconditioning matrix prec. lprec must be greater or equal 5*l.

lindex integer, scalar, input, local

Length of the integer array index.

liprec integer, scalar, input, local

Length of the integer array iprec. liprec must be greater or equal 1.

l integer, scalar, input, global

Maximal number of unknowns on a processor (it must hold: l = max{lmatbk(i),i=1,..,nproc}).

ldw integer, scalar, input, local

Length of the real work array dw.

ldw =>	22*l	for PRES20 and Polyalgorithm
ldw =>	22*l	for BICO, QMR-Simulator and CG
ldw =>	22*l	Otherwise

liw integer, scalar, input, local

Length of the integer work array iw.

liw => 2*nproc	,if optim<>100 and optim<>200
liw => max(2nproc,4l)	,if optim==100 or optim==200

nproc integer, scalar, input, global <<< *PAR

Number of processors (processes), see combgn.

lsym logical, scalar, input, global

is .true., if the matrix MAT is symmetric.

ia1 integer, scalar, input, local

First dimension of matrix information array info, see sparse matrix storage scheme. The parameter must be greater or equal max{ptrinf(12,i),i=1,nproc} on each processor.

info integer, array: info(ia1,ia2), input, local

Information array for the matrix MAT (ia2 {= 7} is a constant), see sparse matrix storage scheme.

MAT double precision, array: MAT(lmat), input/output, local

Matrix array, see sparse matrix storage scheme.

prec double precision, array: prec(lprec), input, local

Preconditioning matrix.

x double precision, array: x(l), input/output, local

Solution vector.

b double precision, array: b(l), input/output, local

Right hand side.

dw double precision, array: dw(ldw), internal, local

Real work array.

iprec integer, array: iprec(liprec), input, local

Information vector of preconditioning matrix.

index integer, array: index(lindex), input, local

Array of indices allowing indirect access to the matrix array MAT, see sparse matrix storage scheme.

iw integer, array: iw(liw), internal, local

Integer work array.

ilin integer, array: ilin(nilin), input/output, global

Information vector (nilin {=50} is a constant).

(1) = ms, input, global

Method selection.

ms = 1	PRES20
ms = 2	BICO
ms = 3	Bi-CGSTAB
ms = 4	AT-PRES
ms = 5	CGS
ms = 6	QMR-Simulator
ms = 7	GMERR
ms = 9	CG-AT for non-symmetric matrices
ms =10	Polyalgorithm PRES20 -> BICO -> AT-PRES
ms =123	Classical CG for symmetric matrices.

(2) = itmax, input, global

Maximal number of iteration steps.

(3) = nmsg, input/output, global <<< *PAR

Message counter for the number of communication units.

(4) not used

(5) not used

(6) = isprec, input/output, global

indicates, if the matrix MAT is alread normalised.

isprec = 0	matrix is not normalised,
isprec = 1	matrix is normalised.

(7) not used

(8) = msprec, input, global

selects the method of preconditioning. (The preconditioning matrices are stored as one-dimensional arrays in prec and accessed via information array iprec).

msprec = 0 :	no preconditioning,
msprec = 1 :	preconditioning by (complete) LU factorisation << in preparation >>

(9) = noprec, input, global

noprec = 0	normalised system is considered for checking of stopping criterion,
noprec = 1	original system is considered for checking of stopping criterion.

(10) = nmvm, output, global

Number of computed matrix-vector multiplications.

(11) not used

(12) = lout, input, local

Unit number of the standard output file (default is 6).

(13) = idoku, input, global

Output control.

idoku = 0	print only error messages,
idoku = 1	print further information, but no output during iteration,
idoku = i>1	output always after i iteration steps on the processor with logical process(or) number 1,
idoku = -i<0	it holds the same as for idoku>0 with the difference of output on all processors.

(14) not used

(15) = msnorm, input, global

selects the method of normalisation. Note: "A" stands for a usually stored matrix (matrix MAT is stored as a one-dimensional array and accessed via array info).

msnorm = 0	no normalisation,
msnorm = 1	normalisation by row sum (prec(i) = sign A_ii/sum abs(A_ij),j=1,l),
msnorm = 2	normalisation by main diagonal elements (prec(i) = 1/A_ii),
msnorm = 3	normalisation by square sum (prec(i) = 1/sqrt(sum A_ij**2,j=1,l)),
msnorm = 4	Froboenius normalisation (prec(i) = A_ii/sum A_ij**2,j=1,l),
msnorm = 11	same as 1-4, but with square root of preconditioning matrix from left and right (automatically selected if the matrix `MAT` is symmetric).
msnorm = 12
msnorm = 13
msnorm = 14

(16) = startx, input, global

startx <> 4711	iteration starts from zero vector,
startx = 4711	an initial guess is given in `x`.

(17) = myproc, input, local <<< **PAR

Logical process(or) number.

(18) = smooth, input, global

smooth = 0	iteration returns the non-smoothed solution,
smooth <> 0	iteration returns the smoothed solution.

(19) = misc, input, global

misc = 00000	Standard LSOLPP
misc = 00001	Printing and checking - as far as possible - of vector terms(you can set ilin(21))
misc = 00010	Check, if recursion emerge(only of use on vector computers) <not yet implemented>
misc = 00100	Reading of (distributed) matrix `MAT` in LSOLPP-Format (you must set ilin(22))
misc = 01000	Storing of (distributed) matrix `MAT` in LSOLPP-Format (you must set ilin(23))
misc = 10000	Output of error

(20) = optim, input, global

optim = 000	No optimisation
optim = 001	Optimisation of data structures<not yet implemented>
optim = 010	Optimisation of data structures with printing of statistics regarding the data structures <not yet implemented>
optim = 100	Safe cache reuse, ie. arrays `MAT` and `INDEX` do not change
optim = 200	Unsafe cache reuse, ie. arrays `MAT` and `INDEX` do change

(21) = vtunit, input, global

I/O unit for printing of vector terms (should be greater 20 and less 100 - if not set, then printing on standard output).

(22) = matin, input, global

I/O unit for reading of matrix MAT (should be greater 20 and less 100 - if not set, then LSOLPP stops).

(23) = matout, input, global

I/O unit for storing of matrix MAT (should be greater 20 and less 100 - if not set, then LSOLPP stops).

lmatbk integer, array: lmatbk(nproc), input, global <<< **PAR

lmatbk(i) returns the number of unknowns on process(or) i, see sparse matrix storage scheme.

ptrinf integer, array: ptrinf(ntyp+1,nproc), input, local

ptrinf(ityp,i)+1 points to the first entry within the array info pointing to vector terms of storage pattern ityp in block i; the array info must be sorted by storage patterns (ntyp {=11} is a constant) and by blocks, see sparse matrix storage scheme.

tids integer, array: tids(nproc), input, global <<< **PAR

tids defines the mapping of the logical process(or) numbers to the physical process ID's, see combgn.

iconv integer, output, global

iconv indicates the behaviour of LSOLPP.

iconv = 1	convergence,
iconv = 2	iteration reached the maximal number of matrix-vector-multiplications (declared in `ilin(2)`),
iconv = 3	divergence,
iconv = 4	matrix is not suitable for LSOLPP, e.g. non-regular matrix,
iconv = 99	fatal error.

ierr integer, output, local

ierr is a error number with information about the error.

`ierr` =ijk	i = 0,..,9 indicates the level of routines, on which the error occurs,
	j = 0 means: wrong parameters,
	j = 1 means: error during normalisation,
	j = 2 means: error during iteration process,
	j = 3 means: error during communication,
	k is a general error counter with 0<k<100.

epslin double precision, input, global

epslin is the relative stopping factor. If (lcond1 .and. lcond2) then the iteration is stopped. If (lcond1 .and. .not. lcond2) then a new stopping criterion is computed.

lcond1=(||(precondition) smoothed residuum||₂ <= epslin *|| (precondition) r.h.s. ||₂)

and

noprec=1	Lcond2=(\|\|original smoothed residuum\|\|_max<=epslin *\|\|original r.h.s. \|\|_max)
noprec=0	Lcond2=(\|\|(precondition) smoothed residuum\|\|_max<=epslin*\|\|(preconitioned) r.h.s. \|\|_max)

EXAMPLE

N/A

METHOD

The iterative methods, smoothing, and normalisation are described in the LINSOL user's guide [10].

Normalization

At the beginning the matrix MAT is normalised from the left by a diagonal matrix prec(1..l), if MAT is not normalised (ilin(6)=isprec = 0) and a normalisation method is chosen (ilin(15)=msnorm <> 0). If MAT is symmetric or a symmetric normalisation is selected, it is additionally normalised from the right to preserve symmetry. The user can choose different methods of normalisation. We recommend the normalisation by row sum (ilin(15) = 1), because in the most cases it is the best method with regard to the performed matrix-vector multiplications. Returning from LSOLPP the array MAT is normalised (ilin(6)=isprec = 1). This is important to remember, if LSOLPP is recalled or the matrix should be used in other contexts after calling LSOLPP!

Smoothing

The residual of a generalised CG-method can oscillate strongly. Therefore both the residual and the approximated solution are smoothed in every iteration step to ensure that the Euclidean norm of the residual does not increase. By setting the parameter ilin(18) to zero it is possible to obtain the original solution instead of the smoothed one, but the iteration process is always controlled by the smoothed residual.

Non-symmetric

For non-symmetric problems, LSOLPP provides seven different solution algorithms. These algorithms are implemented both with smoothing and without smoothing. Thus we get from seven implemented algorithms thirteen methods. In the sequel the most popular and "well-known" ten basic methods, one error-minimising method and the polyalgorithm are described.

1. PRES20 [1] (Pseudo-RESidual 20), ilin(1)=1, ilin(18)<>0: For sufficient diagonal dominance of the matrix this is a quickly converging method. If the norm of the residual decreases only in the first iteration steps, then choose one of the other methods. The residuals decrease monotonically.
2a. BICO [1] (BICOnjugate gradients smoothed), ilin(1)=2, ilin(18)<>0: BICO is more robust than PRES20 and converges quickly even if the matrix is not diagonally dominant, but BICO uses two matrix-vector multiplications (MVM) by both the matrix MAT and its transposed matrix MAT^T per iteration step. The residuals decrease monotonically.
2b. BCG [2] (BiConjugate Gradients), ilin(1)=2, ilin(18)=0: BCG is more robust than PRES20 and converges quickly even if the matrix is not diagonally dominant, but BICO uses two matrix-vector multiplications (MVM) by both the matrix MAT and its transposed matrix MAT^T per iteration step. The residuals oscillate heavily. Therefore, use BICO or QMR.
3a. Bi-CGSTAB smoothed [3,8] , ilin(1)=3, ilin(18)<>0: The Bi-Conjugate Gradient STABilized (Bi-CGSTAB) method is a modification of CGS that should be more stable. The residuals decrease monotonically.
3b. Bi-CGSTAB [3] , ilin(1)=3, ilin(18)=0: This is a modification of CGS that should be more stable.
4a. AT-PRES = CGNR [4] , ilin(1)=4, ilin(18)<>0: The A_Transposed-Pseudo_RESidual (AT-PRES) method is equivalent to the CG Normal equations Residual-minimising (CGNR) method and should only be used if one of the methods BICO, CGS or Bi-CGSTAB does not converge. It may converge slowly but it is very robust. The residuals decrease monotonically.
4b. CGNE (CG Normal eq. Error-min.) [5], ilin(1)=4, ilin(18)=0: CGNE should only be used if one of the methods BICO, CGS or Bi-CGSTAB does not converge. It may converge slowly but it is very robust. The errors decrease monotonically.
5a. CGS (Conjugate Grad. Squared) smoothed, ilin(1)=5, ilin(18)<>0: The smoothed CGS [6,8] is as robust as BICO. An advantage of CGS compared to BICO is that CGS only uses matrix-vector multiplications by MAT and not by the transposed matrix MAT^T. Therefore it could run a little better (-: It's getting better all the time :-> on parallel computers. The residuals decrease monotonically.
5b. CGS (Conjugate Gradient Squared) [6], ilin(1)=5, ilin(18)=0: CGS is as robust as BICO. An advantage of CGS compared to BICO is that CGS only uses matrix-vector multiplications by MAT and not by the transposed matrix MAT^T. Therefore it could run a little better (-: It's getting better all the time :-> on parallel computers.
6. QMR simplified [7,9], ilin(1)=6, ilin(18)<>0: The Quasi-Minimal Residual (QMR) method behaves similar as BICO. In this implementation the look-ahead process for avoiding breakdowns is omitted.
7. GMERR (General Minimal ERRor), ilin(1)=7: GMERR is the generalisation for arbitrary matrices of Fridman's algorithm. GMERR should be competitive with GMRES for symmetric matrices.
9. CG-AT (CG with matrix A*A^T), ilin(1)=9: LSOLPP computes the linear system (MAT)(MAT^T)x=b; you have only to hand over the matrix MAT to LSOLPP. The matrix MAT must not be symmetric. The matrix (MAT)(MAT^T) then is symmetric; thus LSOLPP chooses the CG method as solution process.
10. Polyalgorithm [1]: The Polyalgorithm tries to exploit the advantages of the methods PRES20, BICO and AT-PRES. It starts with the most efficient - but least robust - method PRES20. If PRES20 falls asleep, it switches to BICO. If BICO does not converge fast enough, the Polyalgorithm changes to the "emergency exit" AT-PRES. The Polyalgorithm should be used if no detailed knowledge of an optimal iterative method exists. In 95% of all cases the Polyalgorithm is as fast as the best method in the set {PRES20,BICO,AT-PRES}. .RE -7 If you use the above-mentioned methods, the matrix must not be stored in the symmetric storage scheme.

Symmetric

The matrix MAT has to be given in the symmetric storage scheme. This reduces both the amount of workspace and CPU-time needed. You again get two different methods, if you choose smoothing and no smoothing respectively. For symmetric matrices both CG methods are faster and more robust than the above mentioned methods.

123a. CG smoothed = GMRES, ilin(1)=123, ilin(18)<>0: The smoothed Conjugate Gradient (CG) method is mathematically equivalent to the GMRES method.
123b. Classical CG, ilin(1)=123, ilin(18)=0

REFERENCES

[1] W. Schoenauer, Scientific Computing on Vector Computers, North-Holland, 1987, Amsterdam, New York, Oxford, Tokyo.
[2] R. Fletcher, Conjugate Gradient Methods for Indefinite Systems, Proc. Dundee Biennial Conf. on Num. Anal., 1975, ed. G.A. Watson, pp. 73-89, Springer-Verlag.
[3] H.A. van der Vorst, BI-CGSTAB: A Fast and Smoothly Converging Variant of BI-CG for the Solution of Non-symmetric Linear Systems, SIAM J. Sci. Stat. Comp., 1992, Vol. 13, No. 2, pp. 631-644.
[4] W. Schoenauer, M. Schlichte, R. Weiss, Wrong Ways and Promising Ways towards Robust and Efficient Iterative Linear Solvers, Advances in Computer Methods for Partial Differential Equations VI, Publ. IMACS, 1987, pp. 7-14.
[5] E.J. Craig, The N-Step Iteration Procedures, Math. Phys., 1955, Vol. 34, pp. 64--73.
[6] P. Sonneveld, CGS, A fast Lanczos-type Solver for non-symmetric linear Systems, SIAM J. Sci. Stat. Comp., 1989, Vol. 10, No.1, pp. 36-52.
[7] R.W. Freund and N.M. Nachtigal, QMR: a Quasi-Minimal Residual Method for Non-Hermitian Linear Systems, Numer. Math., 1991, Vol. 60, pp. 315--339.,
[8] R. Weiss, Properties of Generalised Conjugate Gradient Methods, Num. Lin. Alg. with Appl., 1994, Vol. 1, No. 1, pp. 45--63.
[9] L. Zhou and H.F. Walker, Residual Smoothing Techniques for Iterative Methods, SIAM J. Sci. Comput., 1994, Vol. 15, No. 2, pp. 297--312.
[10] R. Weiss, H. Haefner, W. Schoenauer: LINSOL (LINear SOLver) - Description and User's Guide for the Parallelized Version (DRAFT Version 0.96); University of Karlsruhe; Computing Center; Internal Report 61/95; 1995.

Authors

Program by L. Grosz, H. Haefner, C. Roll, P. Sternecker, R. Weiss 1989-96. Please contact L.Grosz.

By L. Grosz, Auckland, 4 June 2000.