 |
The Question is:
We are having problems two very large c programs and getting them to talk to
each other.
The two processes read to and write from an area of shared memory.
The processes need to synchronise before a write or read and do this by
writing a request into a separate area of the same shared page. It then
sets a waiting flag to true which is also another area of the same shared
page. It then uses the sys$hiber()
call to wait for the other process to synchronise. If the waiting flag for
the other process is set these steps will be ignored. The other process
checks for any requests in the communication area of shared memory every
cycle. If a request is there it w
ill run until the time specified in the request is reached, then it will
wake the other process using the sys$wake call.
The lagging process which picks up the requests is crashing in random places
and we have found that it is setting its waiting flag to true when the code
to do this is not even executed. A stack overflow problem is therefore very
likely. However we have
checked the synchronisation is working properly, There are no shared event
flags. We have tried increasing the stack size at link time and increasing
the quotas.
Since the shared memory area is in p0 space, is it possible that with such a
big program that the shared memory page and the program are overlapping?
The Answer is :
Writing information into shared memory requires correct application
synchronization, as discussed in topics 1661 and 2681 in detail.
Topic 1661 also details a variety of common programming bugs.
sys$hiber is not centrally intended as a synchronization tool, and
applications using it are required to manage any spurious wakeup
requests that can and do arise. (As referenced in topic 1661.) A
periodic call to sys$wake (or sys$schdwk) is often useful in an
application using sys$hiber -- deliberately creating a spurious
wakeup -- as it can help drain pending activity if a sys$wake call
is missed.
A simple example of using shared memory from C is posted at the
OpenVMS Ask The Wizard area.
Care must be taken when operating in a multiprocessing environment,
as the behaviour of the processor memory caches must be considered
when accessing memory. On OpenVMS Alpha, this includes the use of
memory barriers and interlocked operations as required -- see topic
2681 for details.
Access to shared memory should generally be consolidated into as
few routines as possible, and then integrated into a shareable image
or similar -- this approach permits easier debugging and better control
over the shared memory accesses. (Details of the creation and use of
shareable images are available at the Ask The Wizard website.) This
approach also eases the introduction of logging and debugging support
into the application environment, as well as providing a way to
add a condition handler for errors related to the shared memory,
as well as easing the application-specific upgrade path(s) available
to the programmer(s).
As for your question on memory overlaps -- and since rogue pointers
are certainly one possible cause -- yes. That said, as pages of
memory containing executable code and constants are protected (by
default) against any write access, this is unlikely.
The most common causes of these problems tend to involve memory
pool (heap) corruptions, writing too much to a variable on the
stack (thus corrupting the stack), and writing to a variable that
is no longer in an active stack frame. See 1661 for a rather more
lengthy discussion of potential problems.
 |
|
|
 |
|