HP OpenVMS Systems Documentation
OpenVMS MACRO-32 Porting and User's Guide
4.2.2 Changing the Compiler's Branch Prediction
The compiler provides two directives, .BRANCH_LIKELY and .BRANCH_UNLIKELY, to change its assumptions about branch prediction. The directive .BRANCH_LIKELY is for use with forward conditional branches when the probability of the branch is large, say 75 percent or more. The directive .BRANCH_UNLIKELY is for use with backward conditional branches when the probability of the branch is less than 25 percent.
These directives should only be used in performance-sensitive code. Furthermore, you should be more cautious when adding .BRANCH_UNLIKELY, because it introduces an additional branch indirection for the case when the branch is actually taken. That is, the branch is changed to a forward branch to a branch instruction, which in turn branches to the original branch target.
There is no directive to tell the compiler not to follow an unconditional branch. However, if you want the compiler to generate code that does not follow the branch, you can change the unconditional branch to be a conditional branch that you know will always be taken. For example, if you know that in the current code section R3 always contains the address of a data structure, you could change a BRB instruction to a TSTL R3 followed by a BNEQ instruction. This branch will always be taken, but the compiler will fall through and continue code generation with the next instruction. This will always cause a mispredicted branch when executed, but may be useful in some situations.
The compiler will follow the branch and will modify the code flow as
described in the previous example, moving all the code that deals with
the missing structure out of line to the end of the module.
If your code has backward conditional branches which you know will most likely not be taken, you can instruct the compiler to generate code using that assumption by inserting the directive .BRANCH_UNLIKELY immediately before the branch instruction. For example:
The .BRANCH_UNLIKELY directive is used here because the Alpha hardware would predict a backward branch to 10$ as likely to be taken. The programmer knows it is a rare case, so the directive is used to change the branch to a forward branch, which is predicted not taken.
There is an unconditional branch instruction at the forward branch destination which branches back to the original destination. Again, this code fragment is moved to a point beyond the normal routine exit point. The code that would be generated by the previous VAX MACRO code follows:
4.2.5 Forward Jumps into Loops
Because of the way that the compiler follows the code flow, a particular case that may not compile well is a forward unconditional branch into a loop. The code generated for this case usually splits the loop into two widely separated pieces. For example, consider the following macro coding construct:
The macro compiler will follow the BRB instruction when generating the code flow and will then fall through the subsequent conditional branch to 10$. However, because the code at 10$ was skipped over by the BRB instruction, it will not be generated until after the end of the routine. This will convert the conditional branch into a forward branch instead of a backward branch. The generated code layout will look like the following:
This results in the loop being very slow because the branch to 10$ is always predicted not taken, and the code flow has to keep going back and forth between the two locations. This situation can be fixed by inserting a .BRANCH_LIKELY directive before the conditional branch back to 10$. This will result in the following code flow:
4.3 Code Optimization
The MACRO-32 compiler performs several optimizations on the generated code. It performs all of them by default except VAXREGS. You can change these default values with the /OPTIMIZE switch on the command line. The valid options are:
4.3.1 Using the VAXREGS Optimization
To use the VAXREGS optimization, you must ensure that all routines correctly declare their register usage in their .CALL_ENTRY, .JSB_ENTRY, or .JSB32_ENTRY routine declarations. In addition, you must identify any VAX registers that are required or modified by any routines that are called. By default, the compiler assumes that no VAX registers are required as input to any called routine, and that all VAX registers except R0 and R1 are preserved across the call. To declare this usage, use the READ and WRITTEN qualifiers to the compiler directive .SET_REGISTERS. For example:
In this example, the compiler will assume that R3 and R4 are required inputs to the routine DO_SOMETHING_USEFUL, and that R5 is overwritten by the routine. The register usage can be determined by using the input mask of DO_SOMETHING_USEFUL as the READ qualifier, and the combined output and scratch masks as the WRITE qualifier.
4.4 Common-Based Referencing
On an Alpha system, references to data cells generally require two memory references---one reference to load the data cell address from the linkage section and another reference to the data cell itself. If several data cells are located in proximity to one other, and the ADDRESSES optimization is used, the compiler can load a register with a common base address and then reference the individual data cells as offsets from that base address. This eliminates the load of each individual data cell address and is known as common-based referencing.
The compiler performs this optimization automatically for local data psects when the ADDRESSES optimization is turned on. The compiler generates symbols of the form $PSECT_BASEn to use as the base of a local psect.
To use common-based referencing for external data psects, you must
create a prefix file which defines symbols as offsets from a common
base. The prefix file cannot be used when assembling the module for
OpenVMS VAX because the VAX MACRO assembler does not allow symbols to
be defined as offsets from external symbols.
The following example illustrates the benefits of creating a prefix file to use common-based referencing. It shows:
Consider the following simple code section (CODE.MAR), which refers to data cells in another module (DATA.MAR):
When compiling CODE.MAR without using common-based referencing, the following code is generated:
In the linkage section:
In the code section (not including the prologue/epilogue code):
By creating a prefix file that defines external data cells as offsets from a common base address, you can cause the compiler to use common-based referencing for external references. A prefix file for this example, which defines A, B, C, and D in terms of BASE, follows:
When compiling CODE.MAR using this prefix file and the ADDRESSES optimization, the following code is generated:
In the linkage section:
In the code section (not including the prologue/epilogue code):
In this example, common-based referencing shrinks the size of both the code and the linkage sections and eliminates three memory references. This method of creating a prefix file to enable common-based referencing of external data cells can be useful if you have one large, separate module that defines a data area used by many modules.
|$SETUP_CALL64||New macro that initializes the call sequence.|
|$PUSH_ARG64||New macro that does the equivalent of argument pushes.|
|$CALL64||New macro that invokes the target routine.|
|$IS_32BITS||New macro for checking the sign extension of the low 32 bits of a 64-bit value.|
|$IS_DESC64||New macro for determining if descriptor is a 64-bit format descriptor.|
|QUAD=NO/YES||New parameter for page macros to support 64-bit virtual addresses.|
|/ENABLE=QUADWORD||The QUADWORD parameter was extended to include 64-bit address computations.|
|.CALL_ENTRY QUAD_ARGS=TRUE|FALSE||QUAD_ARGS=TRUE|FALSE is a new parameter that indicates the presence (or absence) of quadword references to the argument list.|
|.ENABLE QUADWORD/.DISABLE QUADWORD||The QUADWORD parameter was extended to include 64-bit address computations.|
|EVAX_SEXTL||New built-in for sign extending the low 32 bits of a 64-bit value into a destination.|
|EVAX_CALLG_64||New built-in to support 64-bit calls with variable-size argument lists.|
|$RAB64 and $RAB64_STORE||New RMS macros for using buffers in 64-bit address space.|
The method that you use for passing 64-bit values depends on whether
the size of the argument list is fixed or variable. These methods are
described in the following sections.
5.3.1 Calls with a Fixed-Size Argument List
For calls with a fixed-size argument list, use the new macros shown in Table 5-2.
|1. Initialize the call sequence||$SETUP_CALL64|
|2. "Push" the call arguments||$PUSH_ARG64|
|3. Invoke the target routine||$CALL64|
An example of using these macros follows. Note that the arguments are pushed in reverse order, which is the same way a 32-bit PUSHL instruction is used.
MOVL 8(AP), R5 ; fetch a longword to be passed $SETUP_CALL64 3 ; Specify three arguments in call $PUSH_ARG64 8(R0) ; Push argument #3 $PUSH_ARG64 R5 ; Push argument #2 $PUSH_ARG64 #8 ; Push argument #1 $CALL64 some_routine ; Call the routine
The $SETUP_CALL64 macro initializes the state for a 64-bit call. It is required before $PUSH_ARG64 or $CALL64 can be used. If the number of arguments is greater than six, this macro creates a local JSB routine, which is invoked to perform the call. Otherwise, the argument loads and call are inline and very efficient. Note that the argument count specified in the $SETUP_CALL64 does not include a pound sign (#). (The standard call sequence requires octaword alignment of the stack with its arguments at the top. The JSB routine facilitates this alignment.)
The inline option can be used to force a call with greater than six arguments to be done without a local JSB routine. However, there are restrictions on its use (see Appendix E).
The $PUSH_ARG64 macro moves the argument directly to the correct argument register or stack location. It is not actually a stack push, but it is the analog of the PUSHL instructions used in a 32-bit call.
The $CALL64 macro sets up the argument count register and invokes the
target routine. If a JSB routine was created, it ends the routine. It
reports an error if the number of arguments pushed does not match the
count specified in $SETUP_CALL64. Both $CALL64 and $PUSH_ARG64 check
that $SETUP_CALL64 has been invoked prior to their use.
188.8.131.52 Usage Notes for $SETUP_CALL64, $PUSH_ARG64, and $CALL64
Keep these points in mind when using $SETUP_CALL64, $PUSH_ARG64, and $CALL64:
The $SETUP_CALL64, $PUSH_ARG64, and $CALL64 macros are intended to be used in an inline sequence. That is, you cannot branch into the middle of a $SETUP_CALL64/$PUSH_ARG64/$CALL64 sequence, nor can you branch around $PUSH_ARG64 macros or branch out of the sequence to avoid the $CALL64.
For more information about $SETUP_CALL64, $PUSH_ARG64, and $CALL64, see Appendix E.