HP OpenVMS Systems Documentation
Getting the Most out of Your Processor
This white paper tells image-providers---ISVs or product developers---how they can take advantage of some of the performance-enhancing features of newer Alpha processors.
The paper contains the following major sections:
1 Taking Advantage of Performance-Enhancing Features
Each generation of Alpha processors has different characteristics. Applications built to run on one generation of processors continue to run on newer generations. However, newer generations of processors are usually more capable and perform better than previous generations. By rebuilding applications for newer processors, you can take advantage of their new performance-enhancing capabilities and features.
The following two sections contain general descriptions of some of these new features. For more details, refer to the Alpha Architecture Handbook, which is available at the following web site:
Starting with the Alpha EV56 CPU, Alpha processors have added new groups of instructions, called extended instructions. The first group of extended instructions was implemented on EV56. EV6 and EV67 have additional groups of extended instructions. This paper will refer to processors with extended instructions as being "more capable" than processors with fewer or no extended instructions.
Extended instructions can improve performance for different types of work loads. For example:
1The common name and chip type for the compilers referred to in this document are as follows: EV4--21064, EV5--21164, EV56--21164A, EV6--21264, EV67--21264A
If you execute an extended instruction on a processor that does not implement it, the processor generates an exception. This exception will either cause the program to fail or cause the software to emulate the instruction.
Different generations of Alpha processors operate more efficiently with different sequences of instructions. For example, EV5 and EV6 processors operate simultaneously on different numbers of independent instructions.
When you rebuild an application for a specific processor, the compiler
automatically optimizes the instruction sequences for that processor.
An instruction sequence optimized for an EV6 processor, for example,
gathers and orders independent instructions in larger groups than for
an EV5 processor. Ordering of instructions is called
instruction scheduling. Running a program with
instruction scheduling for a particular processor on a different
processor will cause the program to operate more slowly, but the
program will not fail.
To take advantage of the performance gains possible with extended instructions and instruction scheduling, image-providers can tell Alpha compilers to generate code using extended instructions or instruction scheduling, or both. Thus, they can ship images compiled for specific processors.
The following descriptions are very general. For more detailed explanations, refer to specific compiler manuals.
When compiling an application, you have three choices:
You specify the method with one or another switch on the compile
command. These switches are explained in the next section.
Compilers make all the changes to optimize an image for a particular processor. Most Alpha compilers---including Compaq C, C++, COBOL, and Fortran---have switches that you can use to determine how the compiler optimizes an image for a processor. The following table summarizes the performance of applications compiled with these switches.
1Acceptable values for processor include GENERIC, EV4, EV5, EV56, EV6, EV67, and so on. You can also specify HOST to indicate the same processor the compiler is currently running on. For all acceptable values, refer to the documentation for specific compilers.
2OpenVMS supplies an instruction emulator for the most common extended instructions so that some extended instructions work on older platforms---but much more slowly. (Compaq recommends using the methods described in this document to avoid executing extended instructions on less capable processors.)
The following sections describe the three switch options in more detail.
If you use the /ARCHITECTURE=GENERIC switch when you rebuild an
application, the compiler creates a generic object module. When linked
into an image, this module runs well on any processor. Because you have
only a single image, applications are less complex and kits are
smaller. You will often see performance improvements when you compile
generically simply because you are compiling with a new version of the
When you use the /OPTIMIZE=TUNE= switch, the compiler optimizes the generated code for the specified processor, while still allowing for acceptable performance on less capable processors. If you tune an application for an EV6 processor, for example, the application also performs well on an EV5 processor. If the compiler generates any extended instructions, it provides alternative code for processors that do not have the instructions.
You still produce only one set of images, which results in less
complexity and a smaller kit size. However, in some cases different
processors execute different code paths within the image. Testing on
only one processor does not execute all possible code paths.
By using the /ARCHITECTURE=<processor> switch, you specify the least capable processor that the compiler will consider when it generates instructions. Less capable processors running code compiled with this switch might run poorly or not at all. Both the specified processor and more capable processors will perform the same as or better than if you used the /ARCHITECTURE=GENERIC switch.
You can consider that using the /ARCHITECTURE=<processor> switch is a way of telling the compiler that it can use new instructions provided by the specified processor. If you specify this switch either without /OPTIMIZE=TUNE= or with /OPTIMIZE=TUNE=<processor>, you produce the fastest possible code for the processor; however, the tradeoff is that the code will run poorly on less capable processors.
If you decide to use the /ARCHITECTURE=<processor> switch, you usually need to supply several different images, and the kit you supply will be larger. You also have more images and more combinations of images to test.
You do not need to supply separate images for every processor on which
the application might run. If the application uses many byte
instructions, for example, you could build images for the EV56
processor but also run them on an EV6 processor.
The following table summarizes the tradeoffs of using different switch options.
1.2.5 Combining the /ARCHITECTURE=<processor> and /OPTIMIZE=TUNE=<processor> Switches
So far, this paper has implied that you would specify either the /ARCHITECTURE= or the /OPTIMIZE=TUNE= switch. In fact, you can use these two switches together. When you use both switches, the /OPTIMIZE=TUNE=<processor1> switch specifies that the compiled code will perform best on <processor1>, and /ARCHITECTURE=<processor2> acts as a qualifier to /OPTIMIZE=TUNE=, saying that the compiler does not need to consider a processor less capable than <processor2>.
If you do not specify /ARCHITECTURE=, the switch value defaults to GENERIC. If you do not specify /OPTIMIZE=TUNE= and do specify /ARCHITECTURE=<processor>, the processor for /OPTIMIZE=TUNE= defaults to the processor specified for /ARCHITECTURE=. The following table illustrates these defaults:
Combining switches provides additional options if you know---or are willing to assume---the target processors on which your application will run. Two examples follow.
The discussions that follow seldom mention combining switches. The
discussion of /OPTIMIZE=TUNE= or tuned images assumes that the
/ARCHITECTURE= switch is not specified or is specified as
/ARCHITECTURE=GENERIC. The discussion of the /ARCHITECTURE= switch or
of Extension-Specific Images assumes that /OPTIMIZE=TUNE= is not
specified or is specified as the same processor. However, it is
possible---and sometimes helpful---to tune for a processor more capable
than the one specified in /ARCHITECTURE=. It is important to remember
that whenever you specify an architecture other than GENERIC, you
create an Extension-Specific Image even if you also specify
The following table describes the three types of images produced by using the switches described in the last section.
To help you decide the types of images you need to supply to your customers, ask yourself these questions:
You must combine the "biases" that the diagrams indicate for
your applications to decide whether to provide a generic image, a tuned
image, ESIs, or a combination of /OPTIMIZE=TUNE= and /ARCHITECTURE=
switches as described in Section 1.2.5. If you decide to create ESIs,
the following sections provide guidelines for creating them.
The following illustration shows an overview of the operations you must perform at your site and the customer site to create and run ESIs for EV56 or EV6 processors as well as generic images for less capable processors.
The example uses EV56 and EV6 processors. However, other processor types could be substituted for these two types.
The numbers in the following list correspond to numbers in the figure. Actions 1 and 2 occur at the image-provider's site. Action 3 takes place at the customer's site.
In this section, any reference to generic also applies to tuned.
Once you decide to use ESIs, you might have the following questions.
2 Creating and Activating ESIs
Providers of ESIs need to determine which ESIs to supply and to define the images customers need to run on specific systems using OpenVMS tools. Image-providers need to follow the steps in the following table.
The following sections explain these steps.
Review the reasons for using compiler switches in Section 1.2. To compile code modules for specific extended instructions, enter a command similar to the following, which compiles a C module for an EV6 processor:
You do not need to use a special link command in order for an image to be an ESI. You simply link the object modules that you have compiled as you usually do. Remember that even if you include only one extended object module with a large number of generic or tuned object modules, the result is still an ESI.
If you are explicitly linking against shareable images, you must link against the generic shareable image and not against the shareable ESI. If you do not link against the generic shareable image, the OpenVMS system will be unable to select the correct image to run on the customer's system.
Suppose, for example, that you want to link TEST.OBJ against TESTSHR.EXE, and you have the following shareable images on your (the image-provider's) system:
Examples of correct link commands against shareable images follow.
To distinguish among multiple images for different processors, image-providers need to add a short, distinguishing string to the end of the name; for example:
Here are some rules for naming images:
Here are rules for supplying images:
2.4 Choosing the Correct Image to Activate
No new functionality has been added to OpenVMS systems to support ESIs. Instead, image-suppliers must define logical names on the customer's system. The logical names you define depend on the customer's processor type.
Defining logical names causes OpenVMS to activate the correct ESI when a customer runs an application. (Typically, image-providers define logical names in a startup command file in SYS$STARTUP, which runs either when the system is booted or when the product is initialized.)
For example, to ensure that OpenVMS activates TEST_EV6.EXE if a customer runs an ESI on an EV6 processor, you must define logical names for the EV6 processor; for example:
2.4.1 Selection of the Correct Image
Defining system-wide ESI logical names like the ones described in the last section causes OpenVMS to select the correct ESI under most circumstances. For example, OpenVMS selects the correct ESI under the following circumstances:
In the last three cases, you must specify the image alone, without a directory specification. If you need to include a directory specification, do so in the logical name definition. Then use only the image name in the RUN command, for example:
If you define a logical name for a shareable image that might be called
either from a protected shareable image or from a main image installed
with privileges, you must define the logical name in executive mode.
Be careful when you build images---or any other part of your product---on an OpenVMS system where logical names for ESIs are already defined. ( Section 2.4 explains how to define logical names.) If you simply specify a name (without other components of a file specification), OpenVMS tries to perform a logical name translation on the name. Thus, when you link a generic image on a system that has an ESI logical name defined for that image, you must specify more than just the base name. This is to prevent OpenVMS from using the equivalence name. For example, the following syntax is correct:
If, instead of TEST.EXE, you enter TEST, OpenVMS attempts to translate
TEST and might create an image with the wrong name.
The following DCL code tests for the processor type and defines the appropriate ESI logicals. You can place code like this in the product's startup command file in SYS$STARTUP.
This code compares the current processor against a list of possible processors. The order of processors in the list is from less to more capable, such that an ESI for one processor works on subsequent processors in the list.
After the code tests for a processor on the list, it creates a logical name for each ESI that you have supplied for that processor. If you have supplied ESIs for processors earlier in the list, the code overwrites their logicals.
Thus, when the code completes, logicals for each image point to the most capable corresponding ESI that can run on your processor.
To customize this code for your application, for each processor section delimited by a "Do defines" comment, place a define command for every image you supplied for that processor. The example code assumes that you have supplied ESIs for the processors shown in the following table. (Note that only one ESI has been supplied for EV67.)
When you reach the label "END" in the code, a logical will have been defined to point to the most capable image that will run well on the current processor. For example, if you run the code on an EV67 processor, the example code defines TEST to be TEST_EV67 and TESTSHR to be TESTSHR_EV6.