The Question is:
batch jobs periodically 'stop'. no error messages in logs, no console messages.
(and yessir, i've poked around the FAQ's for a while)
i have an Alphaserver 4100, VMS 7.1-2, 2.5 G memory, disk-shadowing in use. on
a very unpredictable schedule, batch-jobs running DEC-Basic .EXE images
against Prolog-3 indexed files simply.... stop. no indications of problems are
detectable. upon re-star
t, the jobs will run to completion normally, and will
run to completion without modification for many more cycles. the system has
been checked for disk errors, IO contention, process quotas, and
a number of other 'obvious' problems. no luck. this problem has persisted for
the better part of a two year period. ANY assistance will be heartily
appreciated. Additional info: the software is 'off the shelf' thirdparty stuff
that runs quite normally at
the support-engineers from this thirdparty provider have tried a number of
fixes - no good.
i have moved files from one-disk to another (attempt to reduce head contention)
- no good.
i have implemented a schedule of 'file rebuilds'
using ANALYZE / RMS and CONVERT/FDL on all files
that are involved - no good.
"hopefully awaiting a blow from the magic stick"
thanks in advance.
The Answer is :
That application software runs on one site has relatively little
bearing on whether or not the application will run at another
(and different) site -- site-specific latent application problems
and site-specific coding dependencies are surprisingly common within
For some of the typical programming bugs that can lead to unpredictable
behaviour, please see topics (1661) and (2681).
As a suggestion, establish a signal handler within the application
images, and code the handler to report details of any errors.
Compare the PQL parameter settings for the default process quotas.
Check the default mailbox quota parameters.
Check the disk fragmentation levels.
Check the OpenVMS system error log for any RMS bugchecks.
Ensure you have all current mandatory ECOs for OpenVMS applied.
Check the auditing logs for any unexpected use of WORLD privilege,
and for unexpected use of the $forcex or $delprc system services.
The $delprc call is used by the DCL command STOP/ID. (You may well
have to enable these audits.)
If there is privileged-mode code involved, consider setting the
parameter BUGCHECKFATAL to cause non-fatal system bugchecks to
be elevated to fatal OpenVMS system bugchecks -- rather than
simply having the process terminate, a non-fatal bugcheck will
then cause the OpenVMS system to crash (and to write a dumpfile).