HP OpenVMS Systems

ask the wizard
Content starts here

Detecting, correcting, and reporting memory errors

» close window

The Question is:

how can I preform VMS hardware monitoring? can I use the DECevent for this? if
 so how can I configure it to monitor memory single/double bit errors?

The Answer is :

  DECevent and Compaq Analyze are the usual tools for translating the
  logged errors into text, while the error log entries are generated
  by OpenVMS kernel-mode components, by OpenVMS components such as RMS,
  and by layered products.
  OpenVMS tends to suppress the logging of recoverable (correctable)
  memory errors until specified thresholds are exceeded, as the
  incidence of some number of single-bit (correctable) memory errors
  is entirely normal and expected.  On various systems, the memory
  hardware is explicitly designed to detect and to potentially correct
  memory errors -- the two most common techniques being parity and ECC.
  There are, however, memory errors and memory error patterns that can
  be undetectable and thus uncorrectable -- these are multi-bit errors.
  There are (undocumented) kernel-mode data cells containing various
  error counts, these are the cells used by tools such as SHOW ERROR.
  The details of the internal counts that are maintained -- and the
  details of the error correction mechanisms and of the machine check
  mechanisms depend on the specific OpenVMS platform and specific model
  in use.
  There is at least one ECO kit in this area, for the AlphaServer GS60
  and AlphaServer GS140 series.  Please see the ECO kit VMS712_UPDATE
  for additional details.

answer written or last revised on ( 8-AUG-2001 )

» close window