HP OpenVMS Systems Documentation 
OpenVMS VAX RTL Mathematics (MTH$) Manual
Chapter 2

To call the scalar BLAS from a program that runs on scalar hardware, specify the routine name preceded by BLAS1$ (for example, BLAS1$xCOPY). To call the vector BLAS from a program that runs on vector hardware, specify the routine name preceded by BLAS1$V (for example, BLAS1$VxCOPY). 
This manual describes both the scalar and vector versions of BLAS Level 1, but for simplicity the vector prefix (BLAS1$V) is used exclusively. Remember to remove the letter V from the routine prefix when you want to call the scalar version.
If you are a Compaq Fortran programmer, do not specify BLAS vector routines explicitly. Specify the Fortran intrinsic function name only. The Compaq Fortran 77 for OpenVMS VAX Systems compiler determines whether the vector or scalar version of a BLAS routine should be used. The Fortran /BLAS=([NO]INLINE,[NO]MAPPED) qualifier controls how the compiler processes calls to BLAS Level 1. If /NOBLAS is specified, then all BLAS calls are treated as ordinary external routines. The default of INLINE means that calls to BLAS Level 1 routines will be treated as known language constructs, and VAX object code will be generated to compute the corresponding operations at the call site, rather than call a usersupplied routine. If the Fortran qualifier /VECTOR or /PARALLEL=AUTO is in effect, the generated code for the loops may use vector instructions or be decomposed to run on multiple processors. If MAPPED is specified, these calls will be treated as calls to the optimized implementations of these routines in the BLAS1$ and BLAS1$V portions of the MTH$ facility. For more information on the Fortran /BLAS qualifier, refer to the DEC Fortran Performance Guide for OpenVMS VAX Systems.
Ten families of routines form BLAS Level 1. (BLAS1$VxCOPY is one family of routines, for example.) These routines operate at the vectorvector operation level. This means that BLAS Level 1 performs operations on one or two vectors. The level of complexity of the computations (in other words, the number of operations being performed in a BLAS Level 1 routine) is of the order n (the length of the vector).
Each family of routines in BLAS Level 1 contains routines coded in single precision, double precision (D and G formats), single precision complex, and double precision complex (D and G formats). BLAS Level 1 can be broadly classified into three groups:
Table 21 lists the functions and corresponding routines of BLAS Level 1.
Function  Routine  Data Type 

Copy a vector to another vector  BLAS1$VSCOPY  Single 
BLAS1$VDCOPY  Double (Dfloating or Gfloating)  
BLAS1$VCCOPY  Single complex  
BLAS1$VZCOPY  Double complex (Dfloating or Gfloating)  
Swap the elements of two vectors  BLAS1$VSSWAP  Single 
BLAS1$VDSWAP  Double (Dfloating or Gfloating)  
BLAS1$VCSWAP  Single complex  
BLAS1$VZSWAP  Double complex (Dfloating or Gfloating)  
Scale the elements of a vector  BLAS1$VSSCAL  Single 
BLAS1$VDSCAL  Double (Dfloating)  
BLAS1$VGSCAL  Double (Gfloating)  
BLAS1$VCSCAL  Single complex with complex scale  
BLAS1$VCSSCAL  Single complex with real scale  
BLAS1$VZSCAL  Double complex with complex scale (Dfloating)  
BLAS1$VWSCAL  Double complex with complex scale (Gfloating)  
BLAS1$VZDSCAL  Double complex with real scale (Dfloating)  
BLAS1$VWGSCAL  Double complex with real scale (Gfloating)  
Multiply a vector by a scalar and add a vector  BLAS1$VSAXPY  Single 
BLAS1$VDAXPY  Double (Dfloating)  
BLAS1$VGAXPY  Double (Gfloating)  
BLAS1$VCAXPY  Single complex  
BLAS1$VZAXPY  Double complex (Dfloating)  
BLAS1$VWAXPY  Double complex (Gfloating)  
Obtain the index of the first element of a vector having the largest absolute value  BLAS1$VISAMAX  Single 
BLAS1$VIDAMAX  Double (Dfloating)  
BLAS1$VIGAMAX  Double (Gfloating)  
BLAS1$VICAMAX  Single complex  
BLAS1$VIZAMAX  Double complex (Dfloating)  
BLAS1$VIWAMAX  Double complex (Gfloating)  
Obtain the sum of the absolute values of the elements of a vector  BLAS1$VSASUM  Single 
BLAS1$VDASUM  Double (Dfloating)  
BLAS1$VGASUM  Double (Gfloating)  
BLAS1$VSCASUM  Single complex  
BLAS1$VDZASUM  Double complex (Dfloating)  
BLAS1$VGWASUM  Double complex (Gfloating)  
Obtain the inner product of two vectors  BLAS1$VSDOT  Single 
BLAS1$VDDOT  Double (Dfloating)  
BLAS1$VGDOT  Double (Gfloating)  
BLAS1$VCDOTU  Single complex unconjugated  
BLAS1$VCDOTC  Single complex conjugated  
BLAS1$VZDOTU  Double complex unconjugated (Dfloating)  
BLAS1$VWDOTU  Double complex unconjugated (Gfloating)  
BLAS1$VZDOTC  Double complex conjugated (Dfloating)  
BLAS1$VWDOTC  Double complex conjugated (Gfloating)  
Obtain the Euclidean norm of the vector  BLAS1$VSNRM2  Single 
BLAS1$VDNRM2  Double (Dfloating)  
BLAS1$VGNRM2  Double (Gfloating)  
BLAS1$VSCNRM2  Single complex  
BLAS1$VDZNRM2  Double complex (Dfloating)  
BLAS1$VGWNRM2  Double complex (Gfloating)  
Generate the elements for a Givens plane rotation  BLAS1$VSROTG  Single 
BLAS1$VDROTG  Double (Dfloating)  
BLAS1$VGROTG  Double (Gfloating)  
BLAS1$VCROTG  Single complex  
BLAS1$VZROTG  Double complex (Dfloating)  
BLAS1$VWROTG  Double complex (Gfloating)  
Apply a Givens plane rotation  BLAS1$VSROT  Single 
BLAS1$VDROT  Double (Dfloating)  
BLAS1$VGROT  Double (Gfloating)  
BLAS1$VCSROT  Single complex  
BLAS1$VZDROT  Double complex (Dfloating)  
BLAS1$VWGROT  Double complex (Gfloating) 
For a detailed description of these routines, refer to the Vector MTH$ Reference Section of this manual.
The following sections provide some guidelines for using BLAS Level 1.
The vector BLAS produces unpredictable results when any element of the input argument shares a memory location with an element of the output argument. (An exception is a special case found in the BLAS1$VxCOPY routines.)
The vector BLAS and the scalar BLAS can yield different results when the input argument overlaps the output array.
For some of the routines in BLAS Level 1, the final result is independent of the order in which the operations are performed. However, in other cases (for example, some of the reduction operations), efficiency dictates that the order of operations on a vector machine be different from the natural order of operations. Because roundoff errors are dependent upon the order in which the operations are performed, some of the routines will not return results that are bitforbit identical to the results obtained by performing the operations in natural order.
Where performance can be increased by the use of a backup data type, this has been done. This is the case for BLAS1$VSNRM2, BLAS1$VSCNRM2, BLAS1$VSROTG, and BLAS1$VCROTG. The use of a backup data type can also yield a gain in accuracy over the scalar BLAS.
In accordance with LINPACK convention, underflow, when it occurs, is replaced by a zero. A system message informs you of overflow. Because the order of operations for some routines is different from the natural order, overflow might not occur at the same array element in both the scalar and vector versions of the routines.
The vector BLAS (except the BLAS1$VxROTG routines) perform operations on vectors. These vectors are defined in terms of three quantities:
Suppose x is a real array of dimension ndim, n is its vector length, and incx is the increment used to access the elements of a vector X . The elements of vector X, X_{i}, i=1,...,n, are stored in x. If incx is greater than or equal to 0, then X_{i} is stored in the following location:
x(1+(i1)*incx)
However, if incx is less than 0, then X_{i} is stored in the following location:
x(1+(ni)*incx)
It therefore follows that the following condition must be satisfied:
ndim => 1+(n1)*incx
A positive value for incx is referred to as forward indexing, and a negative value is referred to as backward indexing. A value of zero implies that all of the elements of the vector are at the same location, x_{1}.
Suppose ndim = 20 and n = 5. In this case, incx = 2 implies that X_{1}, X_{2}, X_{3}, X_{4}, and X_{5} are located in array elements x_{1}, x_{3}, x_{5}, x_{7}, and x_{9}.
If, however, incx is negative, then X_{1}, X_{2}, X_{3}, X_{4}, and X_{5} are located in array elements x_{9}, x_{7}, x_{5}, x_{3}, and x_{1}. In other words, when incx is negative, the subscript of x decreases as i increases.
For some of the routines in BLAS Level 1, incx = 0 is not permitted. In the cases where a zero value for incx is permitted, it means that x_{1} is broadcast into each element of the vector X of length n.
You can operate on vectors that are embedded in other vectors or matrices by choosing a suitable starting point of the vector. For example, if A is an n1 by n2 matrix, column j is referenced with a length of n1, starting point A(1,j), and increment 1. Similarly, row i is referenced with a length of n2, starting point A(i,1), and increment n1.
^{1} For more information, see Basic Linear Algebra Subprograms for FORTRAN Usage in ACM Transactions on Mathematical Software, Vol. 5, No. 3, September 1979. 
The MTH$ FOLR routines provide a vectorized algorithm for the linear recurrence relation. A linear recurrence uses the result of a previous pass through a loop as an operand for subsequent passes through the loop and prevents the vectorization of a loop.
The only error checking performed by the FOLR routines is for a reserved operand.
There are four families of FOLR routines in the MTH$ facility. Each family accepts each of four data types (longword integer, Ffloating, Dfloating, and Gfloating). However, all of the arrays you specify in a single FOLR call must be of the same data type.
For a detailed description of these routines, see Part 3.
The four families of FOLR routines are as follows:
where:
x  =  J for longword integer, F for Ffloating, D for Dfloating, or G for Gfloating 
y  =  P for a positive recursion element, or N for a negative recursion element 
z  =  M for multiplication, or A for addition 
The FOLR entry points end with _Vn, where n is an integer between 0 and 15 that denotes the vector registers that the FOLR routine uses. For example, MTH$VxFOLRy_z_V8 uses vector registers V0 through V8.
To determine which group of routines you should use, match the task in the left column in Table 22 that you need the routine to perform with the method of storage that you need the routine to employ. The point where these two tasks meet shows the FOLR routine you should call.
Tasks  Save each iteration in an array  Save only last result in a variable 

Multiplication AND addition  MTH$VxFOLRy_MA_V15  MTH$VxFOLRLy_MA_V5 
Multiplication OR addition  MTH$VxFOLRy_z_V8  MTH$VxFOLRLy_z_V2 
Save the contents of V0 through Vn before calling a FOLR routine if you need it after the call. The variable n can be 2, 5, 8, or 15, depending on the FOLR routine entry point. (The OpenVMS Calling Standard specifies that a called procedure may modify all of the vector registers. The FOLR routines modify only the vector registers V0 through Vn.)
The MTH$ FOLR routines assume that all of the arrays are of the same data type.
Previous  Next  Contents  Index 