cwidth Subroutine

subroutine cwidth(w, b, nequ, ncols, integs, nbloks, d, x, iflag)

************80

! CWIDTH solves an almost block diagonal linear system.

Discussion:

This routine is a variation of the theme in the algorithm
by Martin and Wilkinson.  It solves the linear system

  A * X = B

of NEQU equations in case A is almost block diagonal with all
blocks having NCOLS columns using no more storage than it takes to
store the interesting part of A.  Such systems occur in the determination
of the B-spline coefficients of a spline approximation.

Block structure of A:

The interesting part of A is taken to consist of NBLOKS
consecutive blocks, with the I-th block made up of NROWI = INTEGS(1,I)
consecutive rows and NCOLS consecutive columns of A, and with
the first LASTI = INTEGS(2,I) columns to the left of the next block.
These blocks are stored consecutively in the work array W.

For example, here is an 11th order matrix and its arrangement in
the work array W.  (The interesting entries of A are indicated by
their row and column index modulo 10.)

               ---   A   ---                          ---   W   ---

                  NROW1=3
       11 12 13 14                                     11 12 13 14
       21 22 23 24                                     21 22 23 24
       31 32 33 34      NROW2=2                        31 32 33 34
LAST1=2      43 44 45 46                               43 44 45 46
             53 54 55 56         NROW3=3               53 54 55 56
      LAST2=3         66 67 68 69                      66 67 68 69
                      76 77 78 79                      76 77 78 79
                      86 87 88 89   NROW4=1            86 87 88 89
               LAST3=1   97 98 99 90   NROW5=2         97 98 99 90
                  LAST4=1   08 09 00 01                08 09 00 01
                            18 19 10 11                18 19 10 11
                     LAST5=4

For this interpretation of A as an almost block diagonal matrix,
we have NBLOKS = 5, and the INTEGS array is

                      I = 1   2   3   4   5
                K =
INTEGS(K,I) =      1      3   2   3   1   2
                   2      2   3   1   1   4

Method:

Gauss elimination with scaled partial pivoting is used, but
multipliers are not saved in order to save storage.  Rather, the
right hand side is operated on during elimination.  The two parameters
IPVTEQ and LASTEQ are used to keep track of the action.  IPVTEQ
is the index of the variable to be eliminated next, from equations
IPVTEQ+1,...,LASTEQ, using equation IPVTEQ, possibly after an
interchange, as the pivot equation.

The entries in the pivot column are always in column
1 of W.  This is accomplished by putting the entries in rows
IPVTEQ+1,...,LASTEQ revised by the elimination of the IPVTEQ-th
variable one to the left in W.  In this way, the columns of the
equations in a given block, as stored in W, will be aligned with
those of the next block at the moment when these next equations
become involved in the elimination process.

Thus, for the above example, the first elimination steps proceed
as follows.

*11 12 13 14    11 12 13 14    11 12 13 14    11 12 13 14
*21 22 23 24   *22 23 24       22 23 24       22 23 24
*31 32 33 34   *32 33 34      *33 34          33 34
 43 44 45 46    43 44 45 46   *43 44 45 46   *44 45 46
 53 54 55 56    53 54 55 56   *53 54 55 56   *54 55 56
 66 67 68 69    66 67 68 69    66 67 68 69    66 67 68 69

In all other respects, the procedure is standard, including the
scaled partial pivoting.

Modified:

14 February 2007

Author:

Carl DeBoor

Reference:

Carl DeBoor,
A Practical Guide to Splines,
Springer, 2001,
ISBN: 0387953663,
LC: QA1.A647.v27.

Roger Martin, James Wilkinson,
Solution of Symmetric and Unsymmetric Band Equations and
the Calculation of Eigenvectors of Band Matrices,
Numerische Mathematik,
Volume 9, Number 4, December 1976, pages 279-301.

Parameters:

Input/output, real ( kind = 8 ) W(NEQU,NCOLS), on input, contains
the interesting part of the almost block diagonal coefficient matrix
A.  The array INTEGS describes the storage scheme.  On output, W
contains the upper triangular factor U of the LU factorization of a
possibly permuted version of A.  In particular, the determinant of
A could now be found as
  IFLAG * W(1,1) * W(2,1) * ... * W(NEQU,1).

Input/output, real ( kind = 8 ) B(NEQU); on input, the right hand
side of the linear system.  On output, B has been overwritten by
other information.

Input, integer ( kind = 4 ) NEQU, the number of equations.

Input, integer ( kind = 4 ) NCOLS, the block width, that is, the number of
columns in each block.

Input, integer ( kind = 4 ) INTEGS(2,NEQU), describes the block structure
of A.
INTEGS(1,I) = number of rows in block I = NROW.
INTEGS(2,I) = number of elimination steps in block I = overhang over
next block = LAST.

Input, integer ( kind = 4 ) NBOKS, the number of blocks.

Workspace, real D(NEQU), used to contain row sizes.  If storage is
scarce, the array X could be used in the calling sequence for D.

Output, real ( kind = 8 ) X(NEQU), the computed solution, if
IFLAG is nonzero.

Output, integer ( kind = 4 ) IFLAG, error flag.
= (-1)**(number of interchanges during elimination) if A is invertible;
= 0 if A is singular.

Arguments

Type IntentOptional Attributes Name
real(kind=8) :: w(nequ,ncols)
real(kind=8) :: b(nequ)
integer(kind=4) :: nequ
integer(kind=4) :: ncols
integer(kind=4) :: integs(2,nbloks)
integer(kind=4) :: nbloks
real(kind=8) :: d(nequ)
real(kind=8) :: x(nequ)
integer(kind=4) :: iflag

Source Code

subroutine cwidth ( w, b, nequ, ncols, integs, nbloks, d, x, iflag )

  !*****************************************************************************80
  !
  !! CWIDTH solves an almost block diagonal linear system.
  !
  !  Discussion:
  !
  !    This routine is a variation of the theme in the algorithm
  !    by Martin and Wilkinson.  It solves the linear system
  !
  !      A * X = B
  !
  !    of NEQU equations in case A is almost block diagonal with all
  !    blocks having NCOLS columns using no more storage than it takes to
  !    store the interesting part of A.  Such systems occur in the determination
  !    of the B-spline coefficients of a spline approximation.
  !
  !  Block structure of A:  
  !
  !    The interesting part of A is taken to consist of NBLOKS
  !    consecutive blocks, with the I-th block made up of NROWI = INTEGS(1,I)
  !    consecutive rows and NCOLS consecutive columns of A, and with
  !    the first LASTI = INTEGS(2,I) columns to the left of the next block.
  !    These blocks are stored consecutively in the work array W.
  !
  !    For example, here is an 11th order matrix and its arrangement in
  !    the work array W.  (The interesting entries of A are indicated by
  !    their row and column index modulo 10.)
  !
  !                   ---   A   ---                          ---   W   ---
  ! 
  !                      NROW1=3
  !           11 12 13 14                                     11 12 13 14
  !           21 22 23 24                                     21 22 23 24
  !           31 32 33 34      NROW2=2                        31 32 33 34
  !    LAST1=2      43 44 45 46                               43 44 45 46
  !                 53 54 55 56         NROW3=3               53 54 55 56
  !          LAST2=3         66 67 68 69                      66 67 68 69
  !                          76 77 78 79                      76 77 78 79
  !                          86 87 88 89   NROW4=1            86 87 88 89
  !                   LAST3=1   97 98 99 90   NROW5=2         97 98 99 90
  !                      LAST4=1   08 09 00 01                08 09 00 01
  !                                18 19 10 11                18 19 10 11
  !                         LAST5=4
  !
  !    For this interpretation of A as an almost block diagonal matrix,
  !    we have NBLOKS = 5, and the INTEGS array is
  !
  !                          I = 1   2   3   4   5
  !                    K =
  !    INTEGS(K,I) =      1      3   2   3   1   2
  !                       2      2   3   1   1   4
  !
  !
  !  Method:
  !
  !    Gauss elimination with scaled partial pivoting is used, but
  !    multipliers are not saved in order to save storage.  Rather, the
  !    right hand side is operated on during elimination.  The two parameters
  !    IPVTEQ and LASTEQ are used to keep track of the action.  IPVTEQ 
  !    is the index of the variable to be eliminated next, from equations 
  !    IPVTEQ+1,...,LASTEQ, using equation IPVTEQ, possibly after an 
  !    interchange, as the pivot equation.  
  !
  !    The entries in the pivot column are always in column
  !    1 of W.  This is accomplished by putting the entries in rows
  !    IPVTEQ+1,...,LASTEQ revised by the elimination of the IPVTEQ-th
  !    variable one to the left in W.  In this way, the columns of the
  !    equations in a given block, as stored in W, will be aligned with
  !    those of the next block at the moment when these next equations 
  !    become involved in the elimination process.
  !
  !    Thus, for the above example, the first elimination steps proceed
  !    as follows.
  !
  !    *11 12 13 14    11 12 13 14    11 12 13 14    11 12 13 14
  !    *21 22 23 24   *22 23 24       22 23 24       22 23 24
  !    *31 32 33 34   *32 33 34      *33 34          33 34
  !     43 44 45 46    43 44 45 46   *43 44 45 46   *44 45 46        
  !     53 54 55 56    53 54 55 56   *53 54 55 56   *54 55 56
  !     66 67 68 69    66 67 68 69    66 67 68 69    66 67 68 69
  !
  !    In all other respects, the procedure is standard, including the
  !    scaled partial pivoting.
  !
  !  Modified:
  !
  !    14 February 2007
  !
  !  Author:
  !
  !    Carl DeBoor
  !
  !  Reference:
  !
  !    Carl DeBoor,
  !    A Practical Guide to Splines,
  !    Springer, 2001,
  !    ISBN: 0387953663,
  !    LC: QA1.A647.v27.
  !
  !    Roger Martin, James Wilkinson,
  !    Solution of Symmetric and Unsymmetric Band Equations and
  !    the Calculation of Eigenvectors of Band Matrices,
  !    Numerische Mathematik,
  !    Volume 9, Number 4, December 1976, pages 279-301.
  !
  !  Parameters:
  !
  !    Input/output, real ( kind = 8 ) W(NEQU,NCOLS), on input, contains
  !    the interesting part of the almost block diagonal coefficient matrix 
  !    A.  The array INTEGS describes the storage scheme.  On output, W 
  !    contains the upper triangular factor U of the LU factorization of a 
  !    possibly permuted version of A.  In particular, the determinant of 
  !    A could now be found as
  !      IFLAG * W(1,1) * W(2,1) * ... * W(NEQU,1).
  !
  !    Input/output, real ( kind = 8 ) B(NEQU); on input, the right hand
  !    side of the linear system.  On output, B has been overwritten by
  !    other information.
  !
  !    Input, integer ( kind = 4 ) NEQU, the number of equations.
  !
  !    Input, integer ( kind = 4 ) NCOLS, the block width, that is, the number of 
  !    columns in each block.
  !
  !    Input, integer ( kind = 4 ) INTEGS(2,NEQU), describes the block structure 
  !    of A.
  !    INTEGS(1,I) = number of rows in block I = NROW.
  !    INTEGS(2,I) = number of elimination steps in block I = overhang over 
  !    next block = LAST.
  !
  !    Input, integer ( kind = 4 ) NBOKS, the number of blocks.
  !
  !    Workspace, real D(NEQU), used to contain row sizes.  If storage is 
  !    scarce, the array X could be used in the calling sequence for D.
  !
  !    Output, real ( kind = 8 ) X(NEQU), the computed solution, if 
  !    IFLAG is nonzero.
  !
  !    Output, integer ( kind = 4 ) IFLAG, error flag.
  !    = (-1)**(number of interchanges during elimination) if A is invertible;
  !    = 0 if A is singular.
  !
  implicit none

  integer ( kind = 4 ) nbloks
  integer ( kind = 4 ) ncols
  integer ( kind = 4 ) nequ

  real ( kind = 8 ) awi1od
  real ( kind = 8 ) b(nequ)
  real ( kind = 8 ) colmax
  real ( kind = 8 ) d(nequ)
  integer ( kind = 4 ) i
  integer ( kind = 4 ) icount
  integer ( kind = 4 ) iflag
  integer ( kind = 4 ) ii
  integer ( kind = 4 ) integs(2,nbloks)
  integer ( kind = 4 ) ipvteq
  integer ( kind = 4 ) ipvtp1
  integer ( kind = 4 ) istar
  integer ( kind = 4 ) j
  integer ( kind = 4 ) jmax
  integer ( kind = 4 ) lastcl
  integer ( kind = 4 ) lasteq
  integer ( kind = 4 ) lasti
  integer ( kind = 4 ) nexteq
  integer ( kind = 4 ) nrowad
  real ( kind = 8 ) ratio
  real ( kind = 8 ) rowmax
  real ( kind = 8 ) sum1
  real ( kind = 8 ) temp
  real ( kind = 8 ) w(nequ,ncols)
  real ( kind = 8 ) x(nequ)

  iflag = 1
  ipvteq = 0
  lasteq = 0
  !
  !  The I loop runs over the blocks.
  !
  do i = 1, nbloks
     !
     !  The equations for the current block are added to those currently
     !  involved in the elimination process, by increasing LASTEQ
     !  by INTEGS(1,I) after the row size of these equations has been
     !  recorded in the array D.
     !
     nrowad = integs(1,i)

     do icount = 1, nrowad

        nexteq = lasteq + icount
        rowmax = maxval ( abs ( w(nexteq,1:ncols) ) )

        if ( rowmax == 0.0D+00 ) then
           iflag = 0
           return
        end if

        d(nexteq) = rowmax

     end do

     lasteq = lasteq + nrowad
     !
     !  There will be LASTI = INTEGS(2,I) elimination steps before
     !  the equations in the next block become involved.
     !
     !  Further, LASTCL records the number of columns involved in the current
     !  elimination step.  It starts equal to NCOLS when a block
     !  first becomes involved and then drops by one after each elimination
     !  step.
     !
     lastcl = ncols
     lasti = integs(2,i)

     do icount = 1, lasti

        ipvteq = ipvteq + 1

        if ( lasteq <= ipvteq ) then

           if ( d(ipvteq) < abs ( w(ipvteq,1) ) + d(ipvteq) ) then
              exit
           end if

           iflag = 0
           return

        end if
        !
        !  Determine the smallest ISTAR in (IPVTEQ,LASTEQ) for
        !  which abs ( W(ISTAR,1) ) / D(ISTAR) is as large as possible, and
        !  interchange equations IPVTEQ and ISTAR in case  IPVTEQ < ISTAR.
        !
        colmax = abs ( w(ipvteq,1) ) / d(ipvteq)
        istar = ipvteq
        ipvtp1 = ipvteq + 1

        do ii = ipvtp1, lasteq
           awi1od = abs ( w(ii,1) ) / d(ii)
           if ( colmax < awi1od ) then
              colmax = awi1od
              istar = ii
           end if
        end do

        if ( abs ( w(istar,1) ) + d(istar) == d(istar) ) then
           iflag = 0
           return
        end if
        !
        !  Rearrange data because of pivoting.
        !
        if ( istar /= ipvteq ) then

           iflag = -iflag

           temp = d(istar)
           d(istar) = d(ipvteq)
           d(ipvteq) = temp

           temp = b(istar)
           b(istar) = b(ipvteq)
           b(ipvteq) = temp

           do j = 1, lastcl
              temp = w(istar,j)
              w(istar,j) = w(ipvteq,j)
              w(ipvteq,j) = temp
           end do

        end if
        !
        !  Subtract the appropriate multiple of equation IPVTEQ from
        !  equations IPVTEQ+1,...,LASTEQ to make the coefficient of the
        !  IPVTEQ-th unknown (presently in column 1 of W) zero, but
        !  store the new coefficients in W one to the left from the old.
        !  
        do ii = ipvtp1, lasteq

           ratio = w(ii,1) / w(ipvteq,1)
           do j = 2, lastcl
              w(ii,j-1) = w(ii,j) - ratio * w(ipvteq,j)
           end do
           w(ii,lastcl) = 0.0D+00
           b(ii) = b(ii) - ratio * b(ipvteq)

        end do

        lastcl = lastcl - 1

     end do

  end do
  !
  !  At this point, W and B contain an upper triangular linear system
  !  equivalent to the original one, with W(I,J) containing entry
  !  (I, I-1+J) of the coefficient matrix.  Solve this system by 
  !  back substitution, taking into account its block structure.
  !
  !  I-loop over the blocks, in reverse order.
  !
  i = nbloks

  do while ( 0 < i )

     lasti = integs(2,i)
     jmax = ncols - lasti

     do icount = 1, lasti

        sum1 = dot_product ( x(ipvteq+1:ipvteq+jmax), w(ipvteq,2:jmax+1) )

        x(ipvteq) = ( b(ipvteq) - sum1 ) / w(ipvteq,1)
        jmax = jmax + 1
        ipvteq = ipvteq - 1

     end do

     i = i - 1

  end do

  return
end subroutine cwidth