HP OpenVMS Systems Documentation
OpenVMS MACRO-32 Porting and User's Guide
2.10.2 Preserving Granularity
To preserve the granularity of a VAX MACRO memory write instruction on a byte, word, or unaligned longword on Alpha means to guarantee that the instruction executes successfully on the specified data and preserves the integrity of the surrounding data.
The VAX architecture includes instructions that perform independent access to byte, word, and unaligned longword locations in memory so two processes can write simultaneously to different bytes of the same aligned longword without interfering with each other.
The Alpha architecture defines instructions that can address only aligned longword and quadword operands. On Alpha, code that writes a data field to memory that is less than a longword in length or is not aligned can do so only by using an interruptible instruction sequence that involves a quadword load, an insertion of the modified data into the quadword, and a quadword store. In this case, two processes that intend to write to different bytes in the same quadword will actually load, perform operations on, and store the whole quadword. Depending on the timing of the load and store operations, one of the byte writes could be lost.
The compiler provides the /PRESERVE=GRANULARITY option to guarantee the integrity of byte, word, and unaligned longword writes. The /PRESERVE=GRANULARITY option causes the compiler to generate Alpha instructions that provide granularity preservation for any VAX instructions that write to bytes, words, or unaligned longwords. Alternatively, you can insert the .PRESERVE GRANULARITY and .NOPRESERVE GRANULARITY directives in sections of VAX MACRO source code as required to enable and disable granularity preservation.
For example, the instruction MOVB R1, (R2) generates the following Alpha code sequence:
If any other code thread modifies part of the data pointed to by (R2) between the LDQ_U and the STQ_U instructions, that data will be overwritten and lost.
If you have specified that granularity be preserved for the same instruction, by either the command qualifier or the directive, the Alpha command sequence becomes the following:
In this case, if the data pointed to by (R2) is modified by another code thread, the operation will be retried.
For a MOVW R1,(R2) instruction, the code generated to preserve granularity depends on whether the register R2 is currently assumed to be aligned by the compiler's register alignment tracking. If R2 is assumed to be aligned, the compiler generates essentially the same code as in the preceding MOVB example, except that it uses INSWL and MSKWL instructions instead of INSBL and MSKBL, and it uses #^B0110 in the BIC of the R2 address. If R2 is assumed to be unaligned, the compiler generates two separate LDQ_L/STQ_C pairs to ensure that the word is correctly written even if it crosses a quadword boundary.
To preserve the granularity of a MOVL R1,(R2) instruction, the compiler always writes whole longwords with a STL instruction, even if the address to which it is writing is assumed to be unaligned. If the address is unaligned, the STL instruction will cause an unaligned memory reference fault. The PALcode unaligned fault handler will then do the loads, masks, and stores necessary to write the unaligned longword. However, since PALcode is noninterruptible, this ensures that the surrounding memory locations are not corrupted.
When porting an application to an Alpha system, you should determine whether the application performs byte, word, or unaligned longword writes to memory that is shared either with processes executing on the local processor, or with processes executing on another processor in the system, or with an AST routine or condition handler. See Migrating to an OpenVMS AXP System: Recompiling and Relinking Applications for a more complete discussion of the programming issues involved in granularity operations in an Alpha system.
2.10.3 Precedence of Atomicity Over Granularity
If you enable the preservation of both granularity and atomicity, and the compiler encounters VAX code that requires that both be preserved, atomicity takes precedence over granularity.
For example, the instruction INCW 1(R0), when compiled with .PRESERVE=GRANULARITY, retries the write of the new word value, if it is interrupted. However, when compiled with .PRESERVE=ATOMICITY, it will also refetch the initial value and increment it, if interrupted. If both options are specified, it will do the latter.
In addition, while the compiler can successfully generate code for
unaligned words and longwords that preserves granularity, it cannot
generate code for unaligned words or longwords that preserves
atomicity. If both options are specified, all memory references must be
to aligned addresses.
Because compiler atomicity guarantees only affect memory modification operands in VAX instructions, you should take special care in examining VAX MACRO sources for coding problems /PRESERVE=ATOMICITY cannot resolve. For instance, consider the following VAX instruction:
For this instruction, the compiler generates an Alpha code sequence such as the following, when /PRESERVE=ATOMICITY (or .PRESERVE ATOMICITY) is specified:
Note that, in this Alpha code sequence, when the STL_C fails, only the modify operand is reread before the add. The data (R1) is not reread. This behavior differs slightly from VAX behavior. In a VAX system, the entire instruction would execute without interruption; in an Alpha system, only the modify operand is updated atomically.
As a result, code that requires the read of the data (R1) to be atomic must use another method, such as a lock, to obtain that level of synchronization.
Consider another VAX instruction:
The VAX instruction in this example is atomic on a single VAX CPU, but the Alpha instruction sequence is not atomic on a single Alpha CPU. Because the 4(R1) operand is a write operand and not a modify operand, the operation is not made atomic by the use of the LDL_L and STL_C.
Finally, consider a more complex VAX INCL instruction:
Here, only the update of the modify data is atomic. The fetch required
to obtain the address of the modify data is not part of the atomic
When preserving atomicity, the compiler must assume the modify data is aligned. An update of a field spanning a quadword boundary cannot occur atomically since this would require two read-modify-write sequences. Since software cannot handle an unaligned LDx_L or STx_C instruction as it can a normal load or store instruction, a LDx_L or STx_C instruction to an unaligned address will generate a fatal reserved operand fault.
When /PRESERVE=ATOMICITY (or .PRESERVE ATOMICITY) is specified, an INCL (R1) instruction generates LDL_L and STL_C instructions so R1 must be longword aligned.
For an INCW (R1) instruction, the compiler generates an Alpha code sequence such as the following:
An INCB instruction uses #^B0111 to generate the aligned address since
all bytes are aligned.
The compiler's methods of preserving atomicity have an interesting side effect in compiled VAX MACRO code. On VAX systems, only the interlocked instructions will work correctly to synchronize access to shared data in multiprocessor systems. On Alpha multiprocessing systems, the code resulting from a compilation of modify instructions (with atomicity preserved) and interlocked instructions would both work correctly, because the LDx_L and STx_C which the compiler generates for both sets of instructions operate correctly across multiple processors.
Because this compiler side effect is specific to Alpha systems and does not port back to VAX systems, you should avoid relying on it when porting VAX MACRO code to Alpha if you intend to run the code on both systems.
However, interlocked instructions must still be used if the memory modification is being used as an interlock for other instructions for which atomicity is not preserved. This is because the Alpha architecture does not guarantee strict write ordering. For example, consider the following VAX MACRO code sequence:
This code sequence will generate the following Alpha code sequence:
Because of the data prefetching of the Alpha architecture, the data from (R2) may be read before the store to (R1) is processed. If the INCL (R1) instruction is being used as a lock to prevent the data at (R2) from being accessed before the lock is set, the read of (R2) may occur before the increment of (R1) and thus is not protected.
The VAX interlocked instructions generate Alpha MB (memory barrier) instructions before and after the interlocked sequence. This prevents memory loads from being moved across the interlocked instruction.
For example, consider the following code sequence:
This code sequence will generate the following Alpha code sequence:
The MB instructions cause all memory operations before the MB
instruction to complete before any memory operations after the MB
instruction are allowed to begin.
When you compile your code, the compiler automatically checks STARLET.MLB for definitions of compiler directives. Similarly, when you link your code, the linker links against STARLET.OLB to resolve undefined symbols.
The following is an example of a command procedure used to compile the MACRO-32 module [SYS]SYSSNDJBC.MAR:
Not all modules need both libraries and many modules need component-specific libraries, but this example shows the basic approach to using the compiler.
2.11.1 Line Numbering in Listing File
The macro expansion line numbering scheme in the listing file is Xnn/mmm, where Xnn shows the nesting depth and mmm is the line number relative to the outermost macro, as shown in the following example.
2.11.2 Linking an Object Module
To link the object files produced by the compiler, use the following commands as a basis:
For certain VAX instructions (such as the divide instructions and
others described in this manual), the compiler produces object code
that issues a call to the OpenVMS General-Purpose Run-Time Library
(OTS$ RTL). By default, the linker links against the library that that
contains these routines.
The compiler provides full debugger support. The debug session for
compiled VAX MACRO code is similar to that for assembled VAX MACRO
code. However, there are some important differences that are described
in this section. For a complete description of debugging, see the
OpenVMS Debugger Manual.
One major difference is that the code is compiled rather than assembled. On a VAX system, each VAX MACRO instruction is a single machine instruction. On an Alpha system, each VAX MACRO instruction may be compiled into many Alpha machine instructions. A major side effect of this difference is the relocation and rescheduling of code if you do not specify /NOOPTIMIZE in your compile command.
By default, several optimizations are performed that cause the movement
of generated code across source boundaries (see Section 1.2,
Section 4.3, and Appendix A). For most code modules, debugging is
simplified if you compile with /NOOPTIMIZE, which prevents this
relocation from happening. After you have debugged your code, you can
recompile without /NOOPTIMIZE to improve performance.
Another major difference between debugging compiled code and debugging assembled code is a new concept to VAX MACRO, the definition of symbolic variables for examining routine arguments. On VAX systems, when you are debugging a routine and want to examine the arguments, you typically do something like the following:
On Alpha systems, the arguments do not reside in a vector in memory as they do on VAX systems. Furthermore, there is no AP register on Alpha systems. If you type EXAMINE @AP when debugging VAX MACRO compiled code, the debugger reports that AP is an undefined symbol.
In the compiled code, the arguments can reside in some combination of:
The compiler does not require that you figure out where the arguments
are by reading the generated code. Instead, it provides $ARGn
symbols that point to the correct argument locations. The $ARG0 symbol
is the same as @AP+0 is on VAX systems, that is, the argument count.
The $ARG1 symbol is the first argument, $ARG2 is the second argument,
and so forth. These symbols are defined in CALL_ENTRY and JSB_ENTRY
directives, but not in EXCEPTION_ENTRY directives.
There may be additional arguments in your code for which the compiler did not generate a $ARGn symbol. The number of $ARGn symbols defined for a .CALL_ENTRY routine is the maximum number detected by the compiler (either by automatic detection or as specified by MAX_ARGS) or 16, whichever is less. For a .JSB_ENTRY routine, since the arguments are homed in the caller's stack frame and the compiler cannot detect the actual number, it always creates eight $ARGn symbols.