The Question is:
Analysing Exception-Informations of HPARITH
We are using mathematical models written in Fortran on OpenVMS AXP. To be able
to continue execution after an exception, we have implemented a
Condition-Handler using LIB$ESTABLISH and LIB$REVERT. In the Condition-Handler
we are checking the actual except
ion (e.g. SIGNAL_ARRAY(2) .EQ. SS$_HPARITH) and if valid for continuation (in
the routine we established the handler) write a specific message via
SYS$GETMSG and SYS$FAOL and then do a stack rewind to establishing routine -
Y.CHF$IS_MCH_DEPTH) , ).
The message we got is as follows:
Message: %SYSTEM-F-HPARITH, high performance arithmetic trap,Imask=00000000,
Fmask=00002000, summary=04, PC=000000000004CD80, PS=0000001B
Checking Summary-Bits we found out: "Division by Zero: An attempt was made to
perform a floating divide operation with a divisor of 0."
Using the Linker-Map-File and the Exception-PC we found out the routine,
causing the exception:
Part of Linker-Map-File:
Psect Name Module Name Base End Length
---------- ----------- ---- --- ------
$CODE$ 00030000 00069327 00039328 ( 234280.) OCTA
4 PIC,CON,REL,LCL, SHR, EXE,NOWRT,NOVEC, MOD
0004C7E0 0004DDDB 000015FC ( 5628.) OCTA 4
We now want do know the statements causing the exception (as given by the
Traceback-Handler) or the variable (by using the floating register write mask
as shown below).
Assuming 64-Bit Adresses we can find out the Offset to Base-Adress of routine
found and then calculate a PC-Offset to CODE-Base-Adress - but this seems not
to be exact (because of compiler generating unknown machine code ?). We
assume, that Exception is g
iven via TETA = 0.0
Part of List-File:
10932 > = RHO*CP_TOT (ACT_STRIP, I_LEN, I_THICK)
10933 > + RHO*(H_AUSTENITE(ACT_STRIP, I_LEN,
I_THICK)-H_FERRITE(ACT_STRIP, I_LEN, I_THICK))
10934 > * DP_DTEMP_TOT (ACT_STRIP, I_LEN, I_THICK)
10936 TETA_L = LAMBDA(ACT_STRIP, I_LEN, I_THICK)/TETA
10937 TETA_V = RHO*CP_TOT (ACT_STRIP, I_LEN, I_THICK)/TETA
10938 TETA_P = RHO*( H_AUSTENITE (ACT_STRIP, I_LEN,
10939 > - H_FERRITE (ACT_STRIP, I_LEN,
If then using Register adresses as shown from List-File
Address Type Name
** R*4 TETA
REG-00000023 R*4 TETA_L
REG-00000024 R*4 TETA_V
REG-00000026 R*4 TETA_P
we are NOT able to see a relation to floating register write mask.
The question now is:
- Is the way, we are trying to find out variable or statement causing the
exception correct ?
- Why are we not able to do the last step ?
- Is there any other way to find out variable or statement causing the
Thanks in advance
Actually we are switching off LIB$ESTABLISH and LIB$REVERT via Mail-Message, to
get a Traceback in case of an exception (but the program then ends).
The Answer is :
On Alpha, the processing of arithmetic exceptions is delayed for
performance reasons, so the exception PC does not directly identify
which instruction caused the exception. This is why additional
information is captured. In this example, "Fmask=00002000" means
that the divide instruction that incurred the exception wrote its
result into register F13.
If you wish to track this manually, you should compile your code
with /LIST/MACHINE_CODE qualifiers to determine the actual sequence
of instructions generated.
You are already following the right procedure to find where in the
code to look, but you now need to look at the instructions executed
prior to the place where the exception was reported. The exception
for a divide might be delayed quite a few cycles, depending on which
Alpha model you have, so you might have to examine instructions for
some distance prior to the exception. Look specifically for a
"DIVx Fa,Fb,Fc" instruction where Fc is F13. When that instruction
was executed, Fb contained zero. (It does indeed seem likely that
Fb represents the variable TETA.)
If you wish finer granularity on exceptions, the Alpha architecture
requires you use a construct known as a trap barrier (TRAPB) or an
exception barrier (EXCB). Particularly should you need to specifically
identify a failing instruction. On Alpha, the floating point traps can
be delivered at any time up to the next TRAPB (or CALL_PAL, which
implicitly includes TRAPB) operation -- and thus the exception is
usually only effectively identified within the program unit.
If you deem it necessary and appropriate, you can explicitly request
the compiler option /SYNCHRONOUS_EXCEPTIONS, and thus cause the compiler
to insert TRAPB instructions. The presence of the TRAPB instructions
will ensure that any arithmetic exception to be delivered immediately
after the instruction that caused it. Use of this technique will
reduce the performance of your application program, however.
For details on traps, exceptions, and on floating point, you will want
to acquire the Alpha Architecture Reference Manual. (Copies of this
manual and of hardware-related documentation are available for
downloading, please see the OpenVMS FAQ for pointers.)