HP OpenVMS Systems Documentation
OpenVMS VAX RTL Mathematics (MTH$) Manual
To call the scalar BLAS from a program that runs on scalar hardware, specify the routine name preceded by BLAS1$ (for example, BLAS1$xCOPY). To call the vector BLAS from a program that runs on vector hardware, specify the routine name preceded by BLAS1$V (for example, BLAS1$VxCOPY).
This manual describes both the scalar and vector versions of BLAS Level 1, but for simplicity the vector prefix (BLAS1$V) is used exclusively. Remember to remove the letter V from the routine prefix when you want to call the scalar version.
If you are a Compaq Fortran programmer, do not specify BLAS vector routines explicitly. Specify the Fortran intrinsic function name only. The Compaq Fortran 77 for OpenVMS VAX Systems compiler determines whether the vector or scalar version of a BLAS routine should be used. The Fortran /BLAS=([NO]INLINE,[NO]MAPPED) qualifier controls how the compiler processes calls to BLAS Level 1. If /NOBLAS is specified, then all BLAS calls are treated as ordinary external routines. The default of INLINE means that calls to BLAS Level 1 routines will be treated as known language constructs, and VAX object code will be generated to compute the corresponding operations at the call site, rather than call a user-supplied routine. If the Fortran qualifier /VECTOR or /PARALLEL=AUTO is in effect, the generated code for the loops may use vector instructions or be decomposed to run on multiple processors. If MAPPED is specified, these calls will be treated as calls to the optimized implementations of these routines in the BLAS1$ and BLAS1$V portions of the MTH$ facility. For more information on the Fortran /BLAS qualifier, refer to the DEC Fortran Performance Guide for OpenVMS VAX Systems.
Ten families of routines form BLAS Level 1. (BLAS1$VxCOPY is one family of routines, for example.) These routines operate at the vector-vector operation level. This means that BLAS Level 1 performs operations on one or two vectors. The level of complexity of the computations (in other words, the number of operations being performed in a BLAS Level 1 routine) is of the order n (the length of the vector).
Each family of routines in BLAS Level 1 contains routines coded in single precision, double precision (D and G formats), single precision complex, and double precision complex (D and G formats). BLAS Level 1 can be broadly classified into three groups:
Table 2-1 lists the functions and corresponding routines of BLAS Level 1.
|Copy a vector to another vector||BLAS1$VSCOPY||Single|
|BLAS1$VDCOPY||Double (D-floating or G-floating)|
|BLAS1$VZCOPY||Double complex (D-floating or G-floating)|
|Swap the elements of two vectors||BLAS1$VSSWAP||Single|
|BLAS1$VDSWAP||Double (D-floating or G-floating)|
|BLAS1$VZSWAP||Double complex (D-floating or G-floating)|
|Scale the elements of a vector||BLAS1$VSSCAL||Single|
|BLAS1$VCSCAL||Single complex with complex scale|
|BLAS1$VCSSCAL||Single complex with real scale|
|BLAS1$VZSCAL||Double complex with complex scale (D-floating)|
|BLAS1$VWSCAL||Double complex with complex scale (G-floating)|
|BLAS1$VZDSCAL||Double complex with real scale (D-floating)|
|BLAS1$VWGSCAL||Double complex with real scale (G-floating)|
|Multiply a vector by a scalar and add a vector||BLAS1$VSAXPY||Single|
|BLAS1$VZAXPY||Double complex (D-floating)|
|BLAS1$VWAXPY||Double complex (G-floating)|
|Obtain the index of the first element of a vector having the largest absolute value||BLAS1$VISAMAX||Single|
|BLAS1$VIZAMAX||Double complex (D-floating)|
|BLAS1$VIWAMAX||Double complex (G-floating)|
|Obtain the sum of the absolute values of the elements of a vector||BLAS1$VSASUM||Single|
|BLAS1$VDZASUM||Double complex (D-floating)|
|BLAS1$VGWASUM||Double complex (G-floating)|
|Obtain the inner product of two vectors||BLAS1$VSDOT||Single|
|BLAS1$VCDOTU||Single complex unconjugated|
|BLAS1$VCDOTC||Single complex conjugated|
|BLAS1$VZDOTU||Double complex unconjugated (D-floating)|
|BLAS1$VWDOTU||Double complex unconjugated (G-floating)|
|BLAS1$VZDOTC||Double complex conjugated (D-floating)|
|BLAS1$VWDOTC||Double complex conjugated (G-floating)|
|Obtain the Euclidean norm of the vector||BLAS1$VSNRM2||Single|
|BLAS1$VDZNRM2||Double complex (D-floating)|
|BLAS1$VGWNRM2||Double complex (G-floating)|
|Generate the elements for a Givens plane rotation||BLAS1$VSROTG||Single|
|BLAS1$VZROTG||Double complex (D-floating)|
|BLAS1$VWROTG||Double complex (G-floating)|
|Apply a Givens plane rotation||BLAS1$VSROT||Single|
|BLAS1$VZDROT||Double complex (D-floating)|
|BLAS1$VWGROT||Double complex (G-floating)|
The following sections provide some guidelines for using BLAS Level 1.
188.8.131.52 Memory Overlap
The vector BLAS produces unpredictable results when any element of the input argument shares a memory location with an element of the output argument. (An exception is a special case found in the BLAS1$VxCOPY routines.)
The vector BLAS and the scalar BLAS can yield different results when
the input argument overlaps the output array.
184.108.40.206 Round-Off Effects
For some of the routines in BLAS Level 1, the final result is independent of the order in which the operations are performed. However, in other cases (for example, some of the reduction operations), efficiency dictates that the order of operations on a vector machine be different from the natural order of operations. Because round-off errors are dependent upon the order in which the operations are performed, some of the routines will not return results that are bit-for-bit identical to the results obtained by performing the operations in natural order.
Where performance can be increased by the use of a backup data type,
this has been done. This is the case for BLAS1$VSNRM2, BLAS1$VSCNRM2,
BLAS1$VSROTG, and BLAS1$VCROTG. The use of a backup data type can also
yield a gain in accuracy over the scalar BLAS.
220.127.116.11 Underflow and Overflow
In accordance with LINPACK convention, underflow, when it occurs, is
replaced by a zero. A system message informs you of overflow. Because
the order of operations for some routines is different from the natural
order, overflow might not occur at the same array element in both the
scalar and vector versions of the routines.
18.104.22.168 Notational Definitions
The vector BLAS (except the BLAS1$VxROTG routines) perform operations on vectors. These vectors are defined in terms of three quantities:
Suppose x is a real array of dimension ndim, n is its vector length, and incx is the increment used to access the elements of a vector X . The elements of vector X, Xi, i=1,...,n, are stored in x. If incx is greater than or equal to 0, then Xi is stored in the following location:
However, if incx is less than 0, then Xi is stored in the following location:
It therefore follows that the following condition must be satisfied:
ndim => 1+(n-1)*|incx|
A positive value for incx is referred to as forward indexing, and a negative value is referred to as backward indexing. A value of zero implies that all of the elements of the vector are at the same location, x1.
Suppose ndim = 20 and n = 5. In this case, incx = 2 implies that X1, X2, X3, X4, and X5 are located in array elements x1, x3, x5, x7, and x9.
If, however, incx is negative, then X1, X2, X3, X4, and X5 are located in array elements x9, x7, x5, x3, and x1. In other words, when incx is negative, the subscript of x decreases as i increases.
For some of the routines in BLAS Level 1, incx = 0 is not permitted. In the cases where a zero value for incx is permitted, it means that x1 is broadcast into each element of the vector X of length n.
You can operate on vectors that are embedded in other vectors or matrices by choosing a suitable starting point of the vector. For example, if A is an n1 by n2 matrix, column j is referenced with a length of n1, starting point A(1,j), and increment 1. Similarly, row i is referenced with a length of n2, starting point A(i,1), and increment n1.
1 For more information, see Basic Linear Algebra Subprograms for FORTRAN Usage in ACM Transactions on Mathematical Software, Vol. 5, No. 3, September 1979.
The MTH$ FOLR routines provide a vectorized algorithm for the linear recurrence relation. A linear recurrence uses the result of a previous pass through a loop as an operand for subsequent passes through the loop and prevents the vectorization of a loop.
There are four families of FOLR routines in the MTH$ facility. Each family accepts each of four data types (longword integer, F-floating, D-floating, and G-floating). However, all of the arrays you specify in a single FOLR call must be of the same data type.
The four families of FOLR routines are as follows:
|x||=||J for longword integer, F for F-floating, D for D-floating, or G for G-floating|
|y||=||P for a positive recursion element, or N for a negative recursion element|
|z||=||M for multiplication, or A for addition|
The FOLR entry points end with _Vn, where n is an integer between 0 and 15 that denotes the vector registers that the FOLR routine uses. For example, MTH$VxFOLRy_z_V8 uses vector registers V0 through V8.
To determine which group of routines you should use, match the task in the left column in Table 2-2 that you need the routine to perform with the method of storage that you need the routine to employ. The point where these two tasks meet shows the FOLR routine you should call.
|Tasks||Save each iteration in an array||Save only last result in a variable|
|Multiplication AND addition||MTH$VxFOLRy_MA_V15||MTH$VxFOLRLy_MA_V5|
|Multiplication OR addition||MTH$VxFOLRy_z_V8||MTH$VxFOLRLy_z_V2|
Save the contents of V0 through Vn before calling a FOLR routine if you need it after the call. The variable n can be 2, 5, 8, or 15, depending on the FOLR routine entry point. (The OpenVMS Calling Standard specifies that a called procedure may modify all of the vector registers. The FOLR routines modify only the vector registers V0 through Vn.)