The Question is:
We noticed, several times, that batches planned to start at Day1:0:00 were
effectively starting at day0:23:59:26.94 (as seen in the log file
characteristics and in the program results).
We rely on the OpenVms scheduler and the standard queueing system. Is there
something already known about such timing problems ?
The Answer is :
In an OpenVMS Cluster, the clocks on individual nodes can and do
drift (apart). Only one node within the cluster is the "timekeeper"
for jobs scheduled on queues. If the current timekeeper's clock is
running fast(er), and the job being released is executed on a node
with a slow(er) clock, it will appear that the job has started
before its due time.
Remember that the scheduled time is "/AFTER". Queue management does
not guarantee exactly when the job will start, just that it will be
AFTER the scheduled time, as determined by the clock on the node
that is serving and controlling the queues; on the timekeeping node.
Please see the OpenVMS FAQ for the architectural hardware clock
accuracy (drift) specifications, and please realize that the system
software clock can drift further as high-IPL activity can block
clock interrupt processing. This means that the clocks within a
cluster can potentially drift apart over time, increasing the
likelyhood that the situation described will arise. Again, the
FAQ has technical details on this.
Remember that your OpenVMS system is a computer, and not a
chronometer. Customer and business requirements, as well as
arcane topics such as temperature stability of the reference
source crystal, as well as particular application mechanisms
all dictate the clock accuracy. Higher accuracy is certainly
technically possible and is available through a variety of
optional external hardware and/or software, as is described
in the FAQ.
What can be done? There are several approaches to minimize the
effects of clock drift; to more closely synchronize the clocks:
1) Correcting drift
a) Keep nodes in synch with SET TIME/CLUSTER (manual or periodic)
(The SET TIME/CLUSTER command uses the same system mechanisms
as the SYSMAN CONFIGURATION SET TIME command that some folks
will reference, but SET TIME/CLUSTER has the benefit of being
a directly-accessable DCL command.)
b) Employ a time service like DTSS or NTP to keep nodes in synch
c) Purchase external chronometric hardware and/or software to
maintain time the the accuracy that your business requires.
2) Reducing the impact
a) Don't schedule jobs for exactly midnight - consider
using /AFTER="TOMORROW+00:05" or another similar
combination time instead. (The offset should be
larger than the largest expected cluster time skew.)
b) Learn the typical clock drift of your systems and make
sure programs and algorithms don't expect higher accuracy
c) Put a short delay, say 1 minute, at the start of jobs to
allow for cluster time drift
d) Do not depend on queue manager timing for high accuracy
events. If you need something to happen at an accurate,
specific time, use a permanent job or job scheduler,
running at a higher or real-time priority, and using
direct system service calls (eg: $creprc) rather than
SUBMIT or $sndjbc calls
Note that you cannot depend on particular node(s) to be the queue
manager timekeeper, nor can you (nor should you) predict, or even
(easily) determine which node is the timekeeper at a particular
moment, so don't even think about it!
For additional details on timekeeping and on clock synchronization
techniques and tools, please see the OpenVMS FAQ.