 |
The Question is:
how can I preform VMS hardware monitoring? can I use the DECevent for this? if
so how can I configure it to monitor memory single/double bit errors?
The Answer is :
DECevent and Compaq Analyze are the usual tools for translating the
logged errors into text, while the error log entries are generated
by OpenVMS kernel-mode components, by OpenVMS components such as RMS,
and by layered products.
OpenVMS tends to suppress the logging of recoverable (correctable)
memory errors until specified thresholds are exceeded, as the
incidence of some number of single-bit (correctable) memory errors
is entirely normal and expected. On various systems, the memory
hardware is explicitly designed to detect and to potentially correct
memory errors -- the two most common techniques being parity and ECC.
There are, however, memory errors and memory error patterns that can
be undetectable and thus uncorrectable -- these are multi-bit errors.
There are (undocumented) kernel-mode data cells containing various
error counts, these are the cells used by tools such as SHOW ERROR.
The details of the internal counts that are maintained -- and the
details of the error correction mechanisms and of the machine check
mechanisms depend on the specific OpenVMS platform and specific model
in use.
There is at least one ECO kit in this area, for the AlphaServer GS60
and AlphaServer GS140 series. Please see the ECO kit VMS712_UPDATE
for additional details.
 |
|
|
 |
|