HP OpenVMS Systems

ask the wizard
Content starts here

Batch Jobs, Clusters, and /AFTER Time?

» close window

The Question is:

We noticed, several times, that batches planned to start at Day1:0:00 were
 effectively starting at day0:23:59:26.94 (as seen in the log file
 characteristics and in the program results).
We rely on the OpenVms scheduler and the standard queueing system. Is there
 something already known about such timing problems ?

The Answer is :

    In an OpenVMS Cluster, the clocks on individual nodes can and do
    drift (apart).  Only one node within the cluster is the "timekeeper"
    for jobs scheduled on queues. If the current timekeeper's clock is
    running fast(er), and the job being released is executed on a node
    with a slow(er) clock, it will appear that the job has started
    before its due time.
    Remember that the scheduled time is "/AFTER".  Queue management does
    not guarantee exactly when the job will start, just that it will be
    AFTER the scheduled time, as determined by the clock on the node
    that is serving and controlling the queues; on the timekeeping node.
    Please see the OpenVMS FAQ for the architectural hardware clock
    accuracy (drift) specifications, and please realize that the system
    software clock can drift further as high-IPL activity can block
    clock interrupt processing.  This means that the clocks within a
    cluster can potentially drift apart over time, increasing the
    likelyhood that the situation described will arise.  Again, the
    FAQ has technical details on this.
    Remember that your OpenVMS system is a computer, and not a
    chronometer.  Customer and business requirements, as well as
    arcane topics such as temperature stability of the reference
    source crystal, as well as particular application mechanisms
    all dictate the clock accuracy.  Higher accuracy is certainly
    technically possible and is available through a variety of
    optional external hardware and/or software, as is described
    in the FAQ.
    What can be done? There are several approaches to minimize the
    effects of clock drift; to more closely synchronize the clocks:
    1) Correcting drift
        a) Keep nodes in synch with SET TIME/CLUSTER (manual or periodic)
           (The SET TIME/CLUSTER command uses the same system mechanisms
           as the SYSMAN CONFIGURATION SET TIME command that some folks
           will reference, but SET TIME/CLUSTER has the benefit of being
           a directly-accessable DCL command.)
        b) Employ a time service like DTSS or NTP to keep nodes in synch
        c) Purchase external chronometric hardware and/or software to
           maintain time the the accuracy that your business requires.
    2) Reducing the impact
        a) Don't schedule jobs for exactly midnight - consider
           using /AFTER="TOMORROW+00:05" or another similar
           combination time instead.  (The offset should be
           larger than the largest expected cluster time skew.)
        b) Learn the typical clock drift of your systems and make
           sure programs and algorithms don't expect higher accuracy
        c) Put a short delay, say 1 minute, at the start of jobs to
           allow for cluster time drift
        d) Do not depend on queue manager timing for high accuracy
           events. If you need something to happen at an accurate,
           specific time, use a permanent job or job scheduler,
           running at a higher or real-time priority, and using
           direct system service calls (eg: $creprc) rather than
           SUBMIT or $sndjbc calls
    Note that you cannot depend on particular node(s) to be the queue
    manager timekeeper, nor can you (nor should you) predict, or even
    (easily) determine which node is the timekeeper at a particular
    moment, so don't even think about it!
    For additional details on timekeeping and on clock synchronization
    techniques and tools, please see the OpenVMS FAQ.

answer written or last revised on ( 19-MAY-2003 )

» close window