HP OpenVMS Systems Documentation

Content starts here

OpenVMS VAX RTL Mathematics (MTH$) Manual

Previous Contents Index

Chapter 2
Vector Routines in MTH$

This chapter discusses four sets of routines provided by the RTL MTH$ facility that support vector processing. These routines are as follows:

  • Basic Linear Algebra Subroutines (BLAS) Level 1
  • First Order Linear Recurrence (FOLR) routines
  • Vector versions of existing scalar routines
  • Fast-Vector math routines

2.1 BLAS --- Basic Linear Algebra Subroutines Level 1

BLAS Level 1 routines perform vector operations, such as copying one vector to another, swapping vectors, and so on. These routines help you take advantage of vector processing speed. BLAS Level 1 routines form an integral part of many mathematical libraries, such as LINPACK and EISPACK.1 Because these routines usually occur in the innermost loops of user code, the Run-Time Library provides versions of the BLAS Level 1 that are tuned to take best advantage of the VAX vector processors.

Two versions of BLAS Level 1 are provided. To use either of these libraries, link in the appropriate shareable image. The libraries are:

  • Scalar BLAS --- contained in the shareable image BLAS1RTL
  • Vector BLAS (routines that take advantage of vectorization) --- contained in the shareable image VBLAS1RTL


To call the scalar BLAS from a program that runs on scalar hardware, specify the routine name preceded by BLAS1$ (for example, BLAS1$xCOPY). To call the vector BLAS from a program that runs on vector hardware, specify the routine name preceded by BLAS1$V (for example, BLAS1$VxCOPY).

This manual describes both the scalar and vector versions of BLAS Level 1, but for simplicity the vector prefix (BLAS1$V) is used exclusively. Remember to remove the letter V from the routine prefix when you want to call the scalar version.

If you are a Compaq Fortran programmer, do not specify BLAS vector routines explicitly. Specify the Fortran intrinsic function name only. The Compaq Fortran 77 for OpenVMS VAX Systems compiler determines whether the vector or scalar version of a BLAS routine should be used. The Fortran /BLAS=([NO]INLINE,[NO]MAPPED) qualifier controls how the compiler processes calls to BLAS Level 1. If /NOBLAS is specified, then all BLAS calls are treated as ordinary external routines. The default of INLINE means that calls to BLAS Level 1 routines will be treated as known language constructs, and VAX object code will be generated to compute the corresponding operations at the call site, rather than call a user-supplied routine. If the Fortran qualifier /VECTOR or /PARALLEL=AUTO is in effect, the generated code for the loops may use vector instructions or be decomposed to run on multiple processors. If MAPPED is specified, these calls will be treated as calls to the optimized implementations of these routines in the BLAS1$ and BLAS1$V portions of the MTH$ facility. For more information on the Fortran /BLAS qualifier, refer to the DEC Fortran Performance Guide for OpenVMS VAX Systems.

Ten families of routines form BLAS Level 1. (BLAS1$VxCOPY is one family of routines, for example.) These routines operate at the vector-vector operation level. This means that BLAS Level 1 performs operations on one or two vectors. The level of complexity of the computations (in other words, the number of operations being performed in a BLAS Level 1 routine) is of the order n (the length of the vector).

Each family of routines in BLAS Level 1 contains routines coded in single precision, double precision (D and G formats), single precision complex, and double precision complex (D and G formats). BLAS Level 1 can be broadly classified into three groups:

  • BLAS1$VxCOPY, BLAS1$VxSWAP, BLAS1$VxSCAL and BLAS1$VxAXPY: These routines return vector outputs for vector inputs. The results of all these routines are independent of the order in which the elements of the vector are processed. The scalar and vector versions of these routines return the same results.
  • BLAS1$VxDOT, BLAS1$VIxAMAX, BLAS1$VxASUM, and BLAS1$VxNRM2: These routines are all reduction operations that return a scalar value. The results of these routines (except BLAS1$VIxAMAX) are dependent upon the order in which the elements of the vector are processed. The scalar and vector versions of BLAS1$VxDOT, BLAS1$VxASUM, and BLAS1$VxNRM2 can return different results. The scalar and vector versions of BLAS1$VIxAMAX return the same results.
  • BLAS1$VxROTG and BLAS1$VxROT: These routines are used for a particular application (plane rotations), unlike the routines in the previous two categories. The results of BLAS1$VxROTG and BLAS1$VxROT are independent of the order in which the elements of the vector are processed. The scalar and vector versions of these routines return the same results.

Table 2-1 lists the functions and corresponding routines of BLAS Level 1.

Table 2-1 Functions of BLAS Level 1
Function Routine Data Type
Copy a vector to another vector BLAS1$VSCOPY Single
  BLAS1$VDCOPY Double (D-floating or G-floating)
  BLAS1$VCCOPY Single complex
  BLAS1$VZCOPY Double complex (D-floating or G-floating)
Swap the elements of two vectors BLAS1$VSSWAP Single
  BLAS1$VDSWAP Double (D-floating or G-floating)
  BLAS1$VCSWAP Single complex
  BLAS1$VZSWAP Double complex (D-floating or G-floating)
Scale the elements of a vector BLAS1$VSSCAL Single
  BLAS1$VDSCAL Double (D-floating)
  BLAS1$VGSCAL Double (G-floating)
  BLAS1$VCSCAL Single complex with complex scale
  BLAS1$VCSSCAL Single complex with real scale
  BLAS1$VZSCAL Double complex with complex scale (D-floating)
  BLAS1$VWSCAL Double complex with complex scale (G-floating)
  BLAS1$VZDSCAL Double complex with real scale (D-floating)
  BLAS1$VWGSCAL Double complex with real scale (G-floating)
Multiply a vector by a scalar and add a vector BLAS1$VSAXPY Single
  BLAS1$VDAXPY Double (D-floating)
  BLAS1$VGAXPY Double (G-floating)
  BLAS1$VCAXPY Single complex
  BLAS1$VZAXPY Double complex (D-floating)
  BLAS1$VWAXPY Double complex (G-floating)
Obtain the index of the first element of a vector having the largest absolute value BLAS1$VISAMAX Single
  BLAS1$VIDAMAX Double (D-floating)
  BLAS1$VIGAMAX Double (G-floating)
  BLAS1$VICAMAX Single complex
  BLAS1$VIZAMAX Double complex (D-floating)
  BLAS1$VIWAMAX Double complex (G-floating)
Obtain the sum of the absolute values of the elements of a vector BLAS1$VSASUM Single
  BLAS1$VDASUM Double (D-floating)
  BLAS1$VGASUM Double (G-floating)
  BLAS1$VSCASUM Single complex
  BLAS1$VDZASUM Double complex (D-floating)
  BLAS1$VGWASUM Double complex (G-floating)
Obtain the inner product of two vectors BLAS1$VSDOT Single
  BLAS1$VDDOT Double (D-floating)
  BLAS1$VGDOT Double (G-floating)
  BLAS1$VCDOTU Single complex unconjugated
  BLAS1$VCDOTC Single complex conjugated
  BLAS1$VZDOTU Double complex unconjugated (D-floating)
  BLAS1$VWDOTU Double complex unconjugated (G-floating)
  BLAS1$VZDOTC Double complex conjugated (D-floating)
  BLAS1$VWDOTC Double complex conjugated (G-floating)
Obtain the Euclidean norm of the vector BLAS1$VSNRM2 Single
  BLAS1$VDNRM2 Double (D-floating)
  BLAS1$VGNRM2 Double (G-floating)
  BLAS1$VSCNRM2 Single complex
  BLAS1$VDZNRM2 Double complex (D-floating)
  BLAS1$VGWNRM2 Double complex (G-floating)
Generate the elements for a Givens plane rotation BLAS1$VSROTG Single
  BLAS1$VDROTG Double (D-floating)
  BLAS1$VGROTG Double (G-floating)
  BLAS1$VCROTG Single complex
  BLAS1$VZROTG Double complex (D-floating)
  BLAS1$VWROTG Double complex (G-floating)
Apply a Givens plane rotation BLAS1$VSROT Single
  BLAS1$VDROT Double (D-floating)
  BLAS1$VGROT Double (G-floating)
  BLAS1$VCSROT Single complex
  BLAS1$VZDROT Double complex (D-floating)
  BLAS1$VWGROT Double complex (G-floating)

For a detailed description of these routines, refer to the Vector MTH$ Reference Section of this manual.

2.1.1 Using BLAS Level 1

The following sections provide some guidelines for using BLAS Level 1. Memory Overlap

The vector BLAS produces unpredictable results when any element of the input argument shares a memory location with an element of the output argument. (An exception is a special case found in the BLAS1$VxCOPY routines.)

The vector BLAS and the scalar BLAS can yield different results when the input argument overlaps the output array. Round-Off Effects

For some of the routines in BLAS Level 1, the final result is independent of the order in which the operations are performed. However, in other cases (for example, some of the reduction operations), efficiency dictates that the order of operations on a vector machine be different from the natural order of operations. Because round-off errors are dependent upon the order in which the operations are performed, some of the routines will not return results that are bit-for-bit identical to the results obtained by performing the operations in natural order.

Where performance can be increased by the use of a backup data type, this has been done. This is the case for BLAS1$VSNRM2, BLAS1$VSCNRM2, BLAS1$VSROTG, and BLAS1$VCROTG. The use of a backup data type can also yield a gain in accuracy over the scalar BLAS. Underflow and Overflow

In accordance with LINPACK convention, underflow, when it occurs, is replaced by a zero. A system message informs you of overflow. Because the order of operations for some routines is different from the natural order, overflow might not occur at the same array element in both the scalar and vector versions of the routines. Notational Definitions

The vector BLAS (except the BLAS1$VxROTG routines) perform operations on vectors. These vectors are defined in terms of three quantities:

  • A vector length, specified as n
  • An array or a starting element in an array, specified as x
  • An increment or spacing parameter to indicate the distance in number of array elements to skip between successive vector elements, specified as incx

Suppose x is a real array of dimension ndim, n is its vector length, and incx is the increment used to access the elements of a vector X . The elements of vector X, Xi, i=1,...,n, are stored in x. If incx is greater than or equal to 0, then Xi is stored in the following location:


However, if incx is less than 0, then Xi is stored in the following location:


It therefore follows that the following condition must be satisfied:

ndim => 1+(n-1)*|incx|

A positive value for incx is referred to as forward indexing, and a negative value is referred to as backward indexing. A value of zero implies that all of the elements of the vector are at the same location, x1.

Suppose ndim = 20 and n = 5. In this case, incx = 2 implies that X1, X2, X3, X4, and X5 are located in array elements x1, x3, x5, x7, and x9.

If, however, incx is negative, then X1, X2, X3, X4, and X5 are located in array elements x9, x7, x5, x3, and x1. In other words, when incx is negative, the subscript of x decreases as i increases.

For some of the routines in BLAS Level 1, incx = 0 is not permitted. In the cases where a zero value for incx is permitted, it means that x1 is broadcast into each element of the vector X of length n.

You can operate on vectors that are embedded in other vectors or matrices by choosing a suitable starting point of the vector. For example, if A is an n1 by n2 matrix, column j is referenced with a length of n1, starting point A(1,j), and increment 1. Similarly, row i is referenced with a length of n2, starting point A(i,1), and increment n1.


1 For more information, see Basic Linear Algebra Subprograms for FORTRAN Usage in ACM Transactions on Mathematical Software, Vol. 5, No. 3, September 1979.

2.2 FOLR --- First Order Linear Recurrence Routines

The MTH$ FOLR routines provide a vectorized algorithm for the linear recurrence relation. A linear recurrence uses the result of a previous pass through a loop as an operand for subsequent passes through the loop and prevents the vectorization of a loop.

The only error checking performed by the FOLR routines is for a reserved operand.

There are four families of FOLR routines in the MTH$ facility. Each family accepts each of four data types (longword integer, F-floating, D-floating, and G-floating). However, all of the arrays you specify in a single FOLR call must be of the same data type.

For a detailed description of these routines, see Part 3.

2.2.1 FOLR Routine Name Format

The four families of FOLR routines are as follows:

  • MTH$VxFOLRy_MA_V15
  • MTH$VxFOLRy_z_V8
  • MTH$VxFOLRLy_z_V2


x = J for longword integer, F for F-floating, D for D-floating, or G for G-floating
y = P for a positive recursion element, or N for a negative recursion element
z = M for multiplication, or A for addition

The FOLR entry points end with _Vn, where n is an integer between 0 and 15 that denotes the vector registers that the FOLR routine uses. For example, MTH$VxFOLRy_z_V8 uses vector registers V0 through V8.

To determine which group of routines you should use, match the task in the left column in Table 2-2 that you need the routine to perform with the method of storage that you need the routine to employ. The point where these two tasks meet shows the FOLR routine you should call.

Table 2-2 Determining the FOLR Routine You Need
Tasks Save each iteration in an array Save only last result in a variable
Multiplication AND addition MTH$VxFOLRy_MA_V15 MTH$VxFOLRLy_MA_V5
Multiplication OR addition MTH$VxFOLRy_z_V8 MTH$VxFOLRLy_z_V2

2.2.2 Calling a FOLR Routine

Save the contents of V0 through Vn before calling a FOLR routine if you need it after the call. The variable n can be 2, 5, 8, or 15, depending on the FOLR routine entry point. (The OpenVMS Calling Standard specifies that a called procedure may modify all of the vector registers. The FOLR routines modify only the vector registers V0 through Vn.)

The MTH$ FOLR routines assume that all of the arrays are of the same data type.

Previous Next Contents Index