HP OpenVMS Cluster Systems
2.4.2 Losing a Member
Table 2-3 describes the phases of a transition caused by the failure
of a current OpenVMS Cluster member.
Table 2-3 Transitions Caused by Loss of a Cluster Member
The duration of this phase depends on the cause of the failure and on
how the failure is detected.
During normal cluster operation, messages sent from one computer to
another are acknowledged when received.
A message is not acknowledged within a period determined by OpenVMS
Cluster communications software
The repair attempt phase begins.
A cluster member is shut down or fails
The operating system causes datagrams to be sent from the computer
shutting down to the other members. These datagrams state the
computer's intention to sever communications and to stop sharing
resources. The failure detection and repair attempt phases are
bypassed, and the reconfiguration phase begins immediately.
If the virtual circuit to an OpenVMS Cluster member is broken, attempts
are made to repair the path. Repair attempts continue for an interval
specified by the PAPOLLINTERVAL system parameter. (System managers can
adjust the value of this parameter to suit local conditions.)
Thereafter, the path is considered irrevocably broken, and steps must
be taken to reconfigure the OpenVMS Cluster system so that all
computers can once again communicate with each other and so that
computers that cannot communicate are removed from the OpenVMS Cluster.
If a cluster member is shut down or fails, the cluster must be
reconfigured. One of the remaining computers acts as coordinator and
exchanges messages with all other cluster members to determine an
optimal cluster configuration with the most members and the most votes.
This phase, during which all user (application) activity is blocked,
usually lasts less than 3 seconds, although the actual time depends on
OpenVMS Cluster system recovery
Recovery includes the following stages, some of which can take place in
When a computer is removed from the cluster, OpenVMS Cluster software
ensures that all I/O operations that are started prior to the
transition complete before I/O operations that are generated after the
transition. This stage usually has little or no effect on applications.
Lock database rebuild
Because the lock database is distributed among all members, some
portion of the database might need rebuilding. A rebuild is performed
A computer leaves the OpenVMS Cluster
A rebuild is always performed.
A computer is added to the OpenVMS Cluster
A rebuild is performed when the LOCKDIRWT system parameter is greater
Caution: Setting the LOCKDIRWT system parameter to different values on the same model or type of computer can cause the distributed lock manager to use the computer with the higher value. This could cause undue resource usage on that computer.
Disk mount verification
This stage occurs only when the failure of a voting member causes
quorum to be lost. To protect data integrity, all I/O activity is
blocked until quorum is regained. Mount verification is the mechanism
used to block I/O during this phase.
Quorum disk votes validation
If, when a computer is removed, the remaining members can determine
that it has shut down or failed, the votes contributed by the quorum
disk are included without delay in quorum calculations that are
performed by the remaining members. However, if the quorum watcher
cannot determine that the computer has shut down or failed (for
example, if a console halt, power failure, or communications failure
has occurred), the votes are not included for a period (in seconds)
equal to four times the value of the QDSKINTERVAL system parameter.
This period is sufficient to determine that the failed computer is no
longer using the quorum disk.
If the transition is the result of a computer rebooting after a
failure, the disks are marked as improperly dismounted.
Reference: See Sections 6.5.5 and 6.5.6
for information about rebuilding disks.
XFC cache change
If the XFC cache is active on this node, a check is made to determine
if there are any nodes in the cluster that do not support the XFC
cache. If so, any XFC cache data must be flushed before continuing with
the cluster transition.
Clusterwide logical name recovery
This stage ensures that all nodes in the cluster have matching
clusterwide logical name information.
When you assess the effect of a state transition on application users,
consider that the application recovery phase includes activities such
as replaying a journal file, cleaning up recovery units, and users
logging in again.
2.5 OpenVMS Cluster Membership
OpenVMS Cluster systems based on LAN or IP network use a cluster group
number and a cluster password to allow multiple independent OpenVMS
Cluster systems to coexist on the same extended LAN or IP network and
to prevent accidental access to a cluster by unauthorized computers.
When using IP network for cluster communication, the remote node's IP
address must be present in the SYS$SYSTEM:PE$IP_CONFIG.DAT local file.
2.5.1 Cluster Group Number
The cluster group number uniquely identifies each
OpenVMS Cluster system on a LAN or IP or communicates by a common
memory region (that is, communicating using SMCI). This group number
must be either from 1 to 4095 or from 61440 to 65535.
Rule: If you plan to have more than one OpenVMS
Cluster system on a LAN or an IP network, you must coordinate the
assignment of cluster group numbers among system managers.
2.5.2 Cluster Password
The cluster password prevents an unauthorized computer
using the cluster group number, from joining the cluster. The password
must be from 1 to 31 characters; valid characters are letters, numbers,
the dollar sign ($), and the underscore (_).
The cluster group number and cluster password are maintained in the
cluster authorization file, SYS$COMMON:[SYSEXE]CLUSTER_AUTHORIZE.DAT.
This file is created during the installation of the operating system,
if you indicate that you want to set up a cluster that utilizes the
shared memory or the LAN. The installation procedure then prompts you
for the cluster group number and password.
If you convert an OpenVMS Cluster that uses only the CI or DSSI
interconnect to one that includes a LAN or shared memory interconnect,
the SYS$COMMON:[SYSEXE]CLUSTER_AUTHORIZE.DAT file is created when you
execute the CLUSTER_CONFIG.COM command procedure, as described in
Reference: For information about OpenVMS Cluster group
data in the CLUSTER_AUTHORIZE.DAT file, see Sections 8.4 and
If all nodes in the OpenVMS Cluster do not have the same cluster
password, an error report similar to the following is logged in the
error log file.
**** V3.4 ********************* ENTRY 343 ********************************
Logging OS 1. OpenVMS
System Architecture 2. Alpha
OS version XC56-BL2
Event sequence number 102.
Timestamp of occurrence 16-SEP-2009 16:47:48
Time since reboot 0 Day(s) 1:04:52
Host name PERK
System Model AlphaServer ES45 Model 2
Entry Type 98. Asynchronous Device Attention
---- Device Profile ----
Product Name NI-SCA Port
---- NISCA Port Data ----
Error Type and SubType x0600 Channel Error, Invalid Cluster Password
Datalink Device Name EIA8:
Remote Node Name CHBOSE
Remote Address x000064A9000400AA
Local Address x000063B4000400AA
Error Count 1. Error Occurrences This Entry
----- Software Info -----
UCB$x_ERRCNT 6. Errors This Unit
2.6 Synchronizing Cluster Functions by the Distributed Lock Manager
The distributed lock manager is an OpenVMS feature for
synchronizing functions required by the distributed file system, the
distributed job controller, device allocation, user-written OpenVMS
Cluster applications, and other OpenVMS products and software
The distributed lock manager uses the connection manager and SCS to
communicate information between OpenVMS Cluster computers.
2.6.1 Distributed Lock Manager Functions
The functions of the distributed lock manager include the following:
- Synchronizes access to shared clusterwide resources, including:
- Records in files
- Any user-defined resources, such as databases and memory
Each resource is managed clusterwide by an OpenVMS Cluster computer.
- Implements the $ENQ and $DEQ system services to provide
clusterwide synchronization of access to resources by allowing the
locking and unlocking of resource names.
Reference: For detailed information about system
services, refer to the HP OpenVMS System Services Reference Manual.
- Queues process requests for access to a locked resource. This
queuing mechanism allows processes to be put into a wait state until a
particular resource is available. As a result, cooperating processes
can synchronize their access to shared objects, such as files and
- Releases all locks that an OpenVMS Cluster computer holds if the
computer fails. This mechanism allows processing to continue on the
- Supports clusterwide deadlock detection.
2.6.2 System Management of the Lock Manager
The lock manager is fully automated and usually requires no explicit
system management. However, the LOCKDIRWT and LOCKRMWT system
parameters can be used to adjust the distribution of activity and
control of lock resource trees across the cluster.
A lock resource tree is an abstract entity on which locks can be
placed. Multiple lock resource trees can exist within a cluster. For
every resource tree, there is one node known as the directory node and
another node known as the lock resource master node.
A lock resource master node controls a lock resource tree and is aware
of all the locks on the lock resource tree. All locking operations on
the lock tree must be sent to the resource master. These locks can come
from any node in the cluster. All other nodes in the cluster only know
about their specific locks on the tree.
Furthermore, all nodes in the cluster have many locks on many different
lock resource trees, which can be mastered on different nodes. When
creating a new lock resource tree, the directory node must first be
queried if a resource master already exists.
The LOCKDIRWT parameter allocates a node as the directory node for a
lock resource tree. The higher a node's LOCKDIRWT setting, the higher
the probability that it will be the directory node for a given lock
For most configurations, large computers and boot nodes perform
optimally when LOCKDIRWT is set to 1 and satellite nodes have LOCKDIRWT
set to 0. These values are set automatically by the CLUSTER_CONFIG.COM
procedure. Nodes with a LOCKDIRWT of 0 will not be the directory node
for any resources unless all nodes in the cluster have a LOCKDIRWT of 0.
In some circumstances, you may want to change the values of the
LOCKDIRWT parameter across the cluster to control the extent to which
nodes participate as directory nodes.
LOCKRMWT influences which node is chosen to remaster a lock resource
tree. Because there is a performance advantage for nodes mastering a
lock resource tree (as no communication is required when performing a
locking operation), the lock resource manager supports remastering lock
trees to other nodes in the cluster. Remastering a lock resource tree
means to designate another node in the cluster as the lock resource
master for that lock resource tree and to move the lock resource tree
A node is eligible to be a lock resource master node if it has locks on
that lock resource tree. The selection of the new lock resource master
node from the eligible nodes is based on each node's LOCKRMWT system
parameter setting and each node's locking activity.
LOCKRMWT can contain a value between 0 and 10; the default is 5. The
following list describes how the value of the LOCKRMWT system parameter
affects resource tree mastery and how lock activity can affect the
- Any node that has a LOCKRMWT value of 0 will attempt to remaster a
lock tree to another node which has locks on that tree, as long as the
other node has a LOCKRMWT greater than 0.
- Nodes with a LOCKRMWT value of 10 will be given resource trees
from other nodes that have a LOCKRMWT less than 10.
- Otherwise, the difference in LOCKRMWT is computed between the
master and the eligible node. The higher the difference, the more
activity is required by the eligible node for the lock tree to move.
In most cases, maintaining the default value of 5 for LOCKRMWT is
appropriate, but there may be cases where assigning some nodes a higher
or lower LOCKRMWT is useful for determining which nodes master a lock
tree. The LOCKRMWT parameter is dynamic, hence it can be adjusted, if
2.6.3 Large-Scale Locking Applications
The Enqueue process limit (ENQLM), which is set in the SYSUAF.DAT file
and which controls the number of locks that a process can own, can be
adjusted to meet the demands of large scale databases and other server
Prior to OpenVMS Version 7.1, the limit was 32767. This limit was
removed to enable the efficient operation of large scale databases and
other server applications. A process can now own up to 16,776,959
locks, the architectural maximum. By setting ENQLM in SYSUAF.DAT to
32767 (using the Authorize utility), the lock limit is automatically
extended to the maximum of 16,776,959 locks. $CREPRC can pass large
quotas to the target process if it is initialized from a process with
the SYSUAF Enqlm quota of 32767.
Reference: See the HP OpenVMS Programming Concepts Manual for additional
information about the distributed lock manager and resource trees. See
the HP OpenVMS System Manager's Manual for more information about Enqueue Quota.
2.7 Resource Sharing
Resource sharing in an OpenVMS Cluster system is enabled by the
distributed file system, RMS, and the distributed lock manager.
2.7.1 Distributed File System
The OpenVMS Cluster distributed file system allows all
computers to share mass storage and files. The distributed file system
provides the same access to disks, tapes, and files across the OpenVMS
Cluster that is provided on a standalone computer.
2.7.2 RMS and the Distributed Lock Manager
The distributed file system and OpenVMS Record Management Services
(RMS) use the distributed lock manager to coordinate clusterwide file
access. RMS files can be shared to the record level.
Almost any disk or tape device can be made available to the entire
OpenVMS Cluster system. The devices can be:
- Connected to a supported storage subsystem
- A local device that is served to the OpenVMS Cluster
All cluster-accessible devices appear as if they are connected to every
2.8 Disk Availability
Locally connected disks can be served across an OpenVMS Cluster by the
2.8.1 MSCP Server
The MSCP server makes locally connected disks,
including the following, available across the cluster:
- DSA disks local to OpenVMS Cluster members using SDI
- HSG and HSV disks in an OpenVMS Cluster using mixed interconnects
- SCSI and HSZ disks
- SAS, LSI 1068 SAS and LSI Logic 1068e SAS disks
- FC and HSG disks
- Disks on boot servers and disk servers located anywhere in the
In conjunction with the disk class driver (DUDRIVER), the MSCP server
implements the storage server portion of the MSCP protocol on a
computer, allowing the computer to function as a storage controller.
The MSCP protocol defines conventions for the format and timing of
messages sent and received for certain families of mass storage
controllers and devices designed by HP. The MSCP server decodes and
services MSCP I/O requests sent by remote cluster nodes.
Note: The MSCP server is not used by a computer to
access files on locally connected disks.
2.8.2 Device Serving
Once a device is set up to be served:
- Any cluster member can submit I/O requests to it.
- The local computer can decode and service MSCP I/O requests sent by
remote OpenVMS Cluster computers.
2.8.3 Enabling the MSCP Server
The MSCP server is controlled by the MSCP_LOAD and MSCP_SERVE_ALL
system parameters. The values of these parameters are set initially by
answers to questions asked during the OpenVMS installation procedure
(described in Section 8.4), or during the CLUSTER_CONFIG.COM procedure
(described in Chapter 8).
The default values for these parameters are as follows:
- MSCP is not loaded on satellites.
- MSCP is loaded on boot server and disk server nodes.
Reference: See Section 6.3 for more information about
setting system parameters for MSCP serving.
2.9 Tape Availability
Locally connected tapes can be served across an OpenVMS Cluster by the
2.9.1 TMSCP Server
The TMSCP server makes locally connected tapes,
available across the cluster including the following:
- HSG and HSV tapes
- SCSI tapes
- SAS tapes
The TMSCP server implements the TMSCP protocol, which is used to
communicate with a controller for TMSCP tapes. In conjunction with the
tape class driver (TUDRIVER), the TMSCP protocol is implemented on a
processor, allowing the processor to function as a storage controller.
The processor submits I/O requests to locally accessed tapes, and
accepts the I/O requests from any node in the cluster. In this way, the
TMSCP server makes locally connected tapes available to all nodes in
the cluster. The TMSCP server can also mak HSG and HSV tapes accessible
to OpenVMS Cluster satellites.
2.9.2 Enabling the TMSCP Server
The TMSCP server is controlled by the TMSCP_LOAD system parameter. The
value of this parameter is set initially by answers to questions asked
during the OpenVMS installation procedure (described in Section 4.2.3)
or during the CLUSTER_CONFIG.COM procedure (described in Section 8.4).
By default, the setting of the TMSCP_LOAD parameter does not load the
TMSCP server and does not serve any tapes.
2.10 Queue Availability
The distributed queue manager makes queues available
across the cluster to achieve the following:
Permit users on any OpenVMS Cluster computer to submit batch and print
jobs to queues that execute on any computer in the OpenVMS Cluster
Users can submit jobs to any queue in the cluster, provided that the
necessary mass storage volumes and peripheral devices are accessible to
the computer on which the job executes.
Distribute the batch and print processing work load over OpenVMS
System managers can set up generic batch and print queues that
distribute processing work loads among computers. The distributed queue
manager directs batch and print jobs either to the execution queue with
the lowest ratio of jobs-to-queue limit or to the next available
The distributed queue manager uses the distributed lock manager to
signal other computers in the OpenVMS Cluster to examine the batch and
print queue jobs to be processed.