HP OpenVMS Systems

ask the wizard
Content starts here

Transient application failures?

» close window

The Question is:

We are having problems two very large c programs and getting them to talk to
each other.
The two processes read to and write from an area of shared memory.
The processes need to synchronise before a write or read and do this by
writing a request into a separate area of the same shared page.  It then
sets a waiting flag to true which is also another area of the same shared
page.  It then uses the sys$hiber()
call to wait for the other process to synchronise. If the waiting flag for
the other process is set these steps will be ignored.  The other process
checks for any requests in the communication area of shared memory every
cycle.  If a request is there it w
ill run until the time specified in the request is reached, then it will
wake the other process using the sys$wake call.
The lagging process which picks up the requests is crashing in random places
and we have found that it is setting its waiting flag to true  when the code
to do this is not even executed.  A stack overflow problem is therefore very
likely.  However we have
 checked the synchronisation is working properly,  There are no shared event
flags.  We have tried increasing the stack size at link time and increasing
the quotas.
Since the shared memory area is in p0 space, is it possible that with such a
big program that the shared memory page and the program  are overlapping?

The Answer is :

  Writing information into shared memory requires correct application
  synchronization, as discussed in topics 1661 and 2681 in detail.
  Topic 1661 also details a variety of common programming bugs.
  sys$hiber is not centrally intended as a synchronization tool, and
  applications using it are required to manage any spurious wakeup
  requests that can and do arise.  (As referenced in topic 1661.)  A
  periodic call to sys$wake (or sys$schdwk) is often useful in an
  application using sys$hiber -- deliberately creating a spurious
  wakeup -- as it can help drain pending activity if a sys$wake call
  is missed.
  A simple example of using shared memory from C is posted at the
  OpenVMS Ask The Wizard area.
  Care must be taken when operating in a multiprocessing environment,
  as the behaviour of the processor memory caches must be considered
  when accessing memory.  On OpenVMS Alpha, this includes the use of
  memory barriers and interlocked operations as required -- see topic
  2681 for details.
  Access to shared memory should generally be consolidated into as
  few routines as possible, and then integrated into a shareable image
  or similar -- this approach permits easier debugging and better control
  over the shared memory accesses.  (Details of the creation and use of
  shareable images are available at the Ask The Wizard website.)  This
  approach also eases the introduction of logging and debugging support
  into the application environment, as well as providing a way to
  add a condition handler for errors related to the shared memory,
  as well as easing the application-specific upgrade path(s) available
  to the programmer(s).
  As for your question on memory overlaps -- and since rogue pointers
  are certainly one possible cause -- yes.  That said, as pages of
  memory containing executable code and constants are protected (by
  default) against any write access, this is unlikely.
  The most common causes of these problems tend to involve memory
  pool (heap) corruptions, writing too much to a variable on the
  stack (thus corrupting the stack), and writing to a variable that
  is no longer in an active stack frame.  See 1661 for a rather more
  lengthy discussion of potential problems.

answer written or last revised on ( 15-FEB-2000 )

» close window