HP OpenVMS Systems Documentation

Content starts here

OpenVMS Cluster Systems

Previous Contents Index

Chapter 10
Maintaining an OpenVMS Cluster System

Once your cluster is up and running, you can implement routine, site-specific maintenance operations---for example, backing up disks or adding user accounts, performing software upgrades and installations, running AUTOGEN with the feedback option on a regular basis, and monitoring the system for performance.

You should also maintain records of current configuration data, especially any changes to hardware or software components. If you are managing a cluster that includes satellite nodes, it is important to monitor LAN activity.

From time to time, conditions may occur that require the following special maintenance operations:

  • Restoring cluster quorum after an unexpected computer failure
  • Executing conditional shutdown operations
  • Performing security functions in LAN and mixed-interconnect clusters

10.1 Backing Up Data and Files

As a part of the regular system management procedure, you should copy operating system files, application software files, and associated files to an alternate device using the OpenVMS Backup utility.

Some backup operations are the same in an OpenVMS Cluster as they are on a single OpenVMS system. For example, an incremental back up of a disk while it is in use, or the backup of a nonshared disk.

Backup tools for use in a cluster include those listed in Table 10-1.

Table 10-1 Backup Methods
Tool Usage
Online backup Use from a running system to back up:
  • The system's local disks
  • Cluster-shareable disks other than system disks
  • The system disk or disks

Caution: Files open for writing at the time of the backup procedure may not be backed up correctly.

Menu-driven or +standalone BACKUP Use one of the following methods:
  • If you have access to the OpenVMS Alpha or VAX distribution CD-ROM, back up your system using the menu system provided on that disc. This menu system, which is displayed automatically when you boot the CD-ROM, allows you to:
    • Enter a DCL environment, from which you can perform backup and restore operations on the system disk (instead of using standalone BACKUP).
    • Install or upgrade the operating system and layered products, using the POLYCENTER Software Installation utility.

    Reference: For more detailed information about using the menu-driven procedure, see the OpenVMS Upgrade and Installation Manual and the OpenVMS System Manager's Manual.

  • If you do not have access to the OpenVMS VAX distribution CD-ROM, you should use standalone BACKUP to back up and restore your system disk. Standalone BACKUP:
    • Should be used with caution because it does not:
      1. Participate in the cluster
      2. Synchronize volume ownership or file I/O with other systems in the cluster
    • Can boot from the system disk instead of the console media. Standalone BACKUP is built in the reserved root on any system disk.

    Reference: For more information about standalone BACKUP, see the OpenVMS System Manager's Manual.

+VAX specific
++Alpha specific

Plan to perform the backup process regularly, according to a schedule that is consistent with application and user needs. This may require creative scheduling so that you can coordinate backups with times when user and application system requirements are low.

Reference: See the OpenVMS System Management Utilities Reference Manual: A--L for complete information about the OpenVMS Backup utility.

10.2 Updating the OpenVMS Operating System

When updating the OpenVMS operating system, follow the steps in Table 10-2.

Table 10-2 Upgrading the OpenVMS Operating System
Step Action
1 Use standalone BACKUP to back up the system disk.
2 Perform the update procedure once for each system disk.
3 Install any mandatory updates.
4 Run AUTOGEN on each node that boots from that system disk.
5 Run the user environment test package (UETP) to test the installation.
6 Use the OpenVMS Backup utility to make a copy of the new system volume.

Reference: See the appropriate OpenVMS upgrade and installation manual for complete instructions.

10.2.1 Rolling Upgrades

The OpenVMS operating system allows an OpenVMS Cluster system running on multiple system disks to continue to provide service while the system software is being upgraded. This process is called a rolling upgrade because each node is upgraded and rebooted in turn, until all the nodes have been upgraded.

If you must first migrate your system from running on one system disk to running on two or more system disks, follow these steps:

Step Action
1 Follow the procedures in Section 8.5 to create a duplicate disk.
2 Follow the instructions in Section 5.10 for information about coordinating system files.

These sections help you add a system disk and prepare a common user environment on multiple system disks to make the shared system files such as the queue database, rightslists, proxies, mail, and other files available across the OpenVMS Cluster system.

10.3 LAN Network Failure Analysis

The OpenVMS operating system provides a sample program to help you analyze OpenVMS Cluster network failures on the LAN. You can edit and use the SYS$EXAMPLES:LAVC$FAILURE_ANALYSIS.MAR program to detect and isolate failed network components. Using the network failure analysis program can help reduce the time required to detect and isolate a failed network component, thereby providing a significant increase in cluster availability.

Reference: For a description of the network failure analysis program, refer to Appendix D.

10.4 Recording Configuration Data

To maintain an OpenVMS Cluster system effectively, you must keep accurate records about the current status of all hardware and software components and about any changes made to those components. Changes to cluster components can have a significant effect on the operation of the entire cluster. If a failure occurs, you may need to consult your records to aid problem diagnosis.

Maintaining current records for your configuration is necessary both for routine operations and for eventual troubleshooting activities.

10.4.1 Record Information

At a minimum, your configuration records should include the following information:

  • A diagram of your physical cluster configuration. (Appendix D includes a discussion of keeping a LAN configuration diagram.)
  • SCSNODE and SCSSYSTEMID parameter values for all computers.
  • VOTES and EXPECTED_VOTES parameter values.
  • DECnet names and addresses for all computers.
  • Current values for cluster-related system parameters, especially ALLOCLASS and TAPE_ALLOCLASS values for HSC subsystems and computers.
    Reference: Cluster system parameters are described in Appendix A.
  • Names and locations of default bootstrap command procedures for all computers connected with the CI.
  • Names of cluster disk and tape devices.
  • In LAN and mixed-interconnect clusters, LAN hardware addresses for satellites.
  • Names of LAN adapters.
  • Names of LAN segments or rings.
  • Names of LAN bridges.
  • Names of wiring concentrators or of DELNI or DEMPR adapters.
  • Serial numbers of all hardware components.
  • Changes to any hardware or software components (including site-specific command procedures), along with dates and times when changes were made.

10.4.2 Satellite Network Data

The first time you execute CLUSTER_CONFIG.COM to add a satellite, the procedure creates the file NETNODE_UPDATE.COM in the boot server's SYS$SPECIFIC:[SYSMGR] directory. (For a common-environment cluster, you must rename this file to the SYS$COMMON:[SYSMGR] directory, as described in Section 5.10.2.) This file, which is updated each time you add or remove a satellite or change its Ethernet or FDDI hardware address, contains all essential network configuration data for the satellite.

If an unexpected condition at your site causes configuration data to be lost, you can use NETNODE_UPDATE.COM to restore it. You can also read the file when you need to obtain data about individual satellites. Note that you may want to edit the file occasionally to remove obsolete entries.

Example 10-1 shows the contents of the file after satellites EUROPA and GANYMD have been added to the cluster.

Example 10-1 Sample NETNODE_UPDATE.COM File

    define node EUROPA address 2.21
    define node EUROPA hardware address 08-00-2B-03-51-75
    define node EUROPA load assist agent sys$share:niscs_laa.exe
    define node EUROPA load assist parameter $1$DJA11:<SYS10.>
    define node EUROPA tertiary loader sys$system:tertiary_vmb.exe
    define node GANYMD address 2.22
    define node GANYMD hardware address 08-00-2B-03-58-14
    define node GANYMD load assist agent sys$share:niscs_laa.exe
    define node GANYMD load assist parameter $1$DJA11:<SYS11.>
    define node GANYMD tertiary loader sys$system:tertiary_vmb.exe

Reference: See the DECnet--Plus documentation for equivalent NCL command information.

10.5 Cross-Architecture Satellite Booting

Cross-architecture satellite booting permits VAX boot nodes to provide boot service to Alpha satellites and Alpha boot nodes to provide boot service to VAX satellites. For some OpenVMS Cluster configurations, cross-architecture boot support can simplify day-to-day system operation and reduce the complexity of managing OpenVMS Cluster that include both VAX and Alpha systems.

Note: Compaq will continue to provide cross-architecture boot support while it is technically feasible. This support may be removed in future releases of the OpenVMS operating system.

10.5.1 Sample Configurations

The sample configurations that follow show how you might configure an OpenVMS Cluster to include both Alpha and VAX boot nodes and satellite nodes. Note that each architecture must include a system disk that is used for installations and upgrades.

Caution: The OpenVMS operating system and layered product installations and upgrades cannot be performed across architectures. For example, OpenVMS Alpha software installations and upgrades must be performed using an Alpha system. When configuring OpenVMS Cluster systems that use the cross-architecture booting feature, configure at least one system of each architecture with a disk that can be used for installations and upgrades. In the configurations shown in Figure 10-1 and Figure 10-2, one of the workstations has been configured with a local disk for this purpose.

In Figure 10-1, several Alpha workstations have been added to an existing VAXcluster configuration that contains two VAX boot nodes based on the DSSI interconnect and several VAX workstations. For high availability, the Alpha system disk is located on the DSSI for access by multiple boot servers.

Figure 10-1 VAX Nodes Boot Alpha Satellites

In Figure 10-2, the configuration originally consisted of a VAX boot node and several VAX workstations. The VAX boot node has been replaced with a new, high-performance Alpha boot node. Some Alpha workstations have also been added. The original VAX workstations remain in the configuration and still require boot service. The new Alpha boot node can perform this service.

Figure 10-2 Alpha and VAX Nodes Boot Alpha and VAX Satellites

10.5.2 Usage Notes

Consider the following guidelines when using the cross-architecture booting feature:

  • The OpenVMS software installation and upgrade procedures are architecture specific. The operating system must be installed and upgraded on a disk that is directly accessible from a system of the appropriate architecture. Configuring a boot server with a system disk of the opposite architecture involves three distinct system management procedures:
    • Installation of the operating system on a disk that is directly accessible from a system of the same architecture.
    • Moving the resulting system disk so that it is accessible by the target boot server. Depending on the specific configuration, this can be done using the Backup utility or by physically relocating the disk.
    • Setting up the boot server's network database to service satellite boot requests. Sample procedures for performing this step are included in Section 10.5.3.
  • System disks can contain only a single version of the OpenVMS operating system and are architecture specific. For example, OpenVMS VAX Version 7.1 cannot coexist on a system disk with OpenVMS Alpha Version 7.1.
  • The CLUSTER_CONFIG command procedure can be used only to manage cluster nodes of the same architecture as the node executing the procedure. For example, when run from an Alpha system, CLUSTER_CONFIG can manipulate only Alpha system disks and perform node management procedures for Alpha systems.
  • No support is provided for cross-architecture installation of layered products.

10.5.3 Configuring DECnet

The following examples show how to configure DECnet databases to perform cross-architecture booting. Note that this feature is available for systems running DECnet for OpenVMS (Phase IV) only.

Customize the command procedures in Examples 10-2 and 10-3 according to the following instructions.

Replace... With...
alpha_system_disk or vax_system_disk The appropriate disk name on the server
label The appropriate label name for the disk on the server
ccc-n The server circuit name
alpha or vax The DECnet node name of the satellite
xx.yyyy The DECnet area.address of the satellite
aa-bb-cc-dd-ee-ff The hardware address of the LAN adapter on the satellite over which the satellite is to be loaded
satellite_root The root on the system disk (for example, SYS10) of the satellite

Example 10-2 shows how to set up a VAX system to serve a locally mounted Alpha system disk.

Example 10-2 Defining an Alpha Satellite in a VAX Boot Node

$! VAX system to load Alpha satellite
$!  On the VAX system:
$!  -----------------
$!  Mount the system disk for MOP server access.
$ MOUNT /SYSTEM alpha_system_disk: label ALPHA$SYSD
$!  Enable MOP service for this server.
$!  Configure MOP service for the ALPHA satellite.
NCP> DEFINE NODE alpha HARDWARE ADDRESS aa-bb-cc-dd-ee-ff

Example 10-3 shows how to set up an Alpha system to serve a locally mounted VAX system disk.

Example 10-3 Defining a VAX Satellite in an Alpha Boot Node

 $! Alpha system to load VAX satellite
 $!  On the Alpha system:
 $!  --------------------
 $!  Mount the system disk for MOP server access.
 $ MOUNT /SYSTEM vax_system_disk: label VAX$SYSD
 $!  Enable MOP service for this server.
 $!  Configure MOP service for the VAX satellite.
 NCP> DEFINE NODE vax HARDWARE ADDRESS aa-bb-cc-dd-ee-ff

Then, to boot the satellite, perform these steps:

  1. Execute the appropriate command procedure from a privileged account on the server
  2. Boot the satellite over the adapter represented by the hardware address you entered into the command procedure earlier.

10.6 Controlling OPCOM Messages

When a satellite joins the cluster, the Operator Communications Manager (OPCOM) has the following default states:

  • For all systems in an OpenVMS Cluster configuration except workstations:
    • OPA0: is enabled for all message classes.
    • The log file SYS$MANAGER:OPERATOR.LOG is opened for all classes.
  • For workstations in an OpenVMS Cluster configuration, even though the OPCOM process is running:
    • OPA0: is not enabled.
    • No log file is opened.

10.6.1 Overriding OPCOM Defaults

Table 10-3 shows how to define the following system logical names in the command procedure SYS$MANAGER:SYLOGICALS.COM to override the OPCOM default states.

Table 10-3 OPCOM System Logical Names
System Logical Name Function
OPC$OPA0_ENABLE If defined to be true, OPA0: is enabled as an operator console. If defined to be false, OPA0: is not enabled as an operator console. DCL considers any string beginning with T or Y or any odd integer to be true, all other values are false.
OPC$OPA0_CLASSES Defines the operator classes to be enabled on OPA0:. The logical name can be a search list of the allowed classes, a list of classes, or a combination of the two. For example:


You can define OPC$OPA0_CLASSES even if OPC$OPA0_ENABLE is not defined. In this case, the classes are used for any operator consoles that are enabled, but the default is used to determine whether to enable the operator console.

OPC$LOGFILE_ENABLE If defined to be true, an operator log file is opened. If defined to be false, no log file is opened.
OPC$LOGFILE_CLASSES Defines the operator classes to be enabled for the log file. The logical name can be a search list of the allowed classes, a comma-separated list, or a combination of the two. You can define this system logical even when the OPC$LOGFILE_ENABLE system logical is not defined. In this case, the classes are used for any log files that are open, but the default is used to determine whether to open the log file.
OPC$LOGFILE_NAME Supplies information that is used in conjunction with the default name SYS$MANAGER:OPERATOR.LOG to define the name of the log file. If the log file is directed to a disk other than the system disk, you should include commands to mount that disk in the SYLOGICALS.COM command procedure.

10.6.2 Example

The following example shows how to use the OPC$OPA0_CLASSES system logical to define the operator classes to be enabled. The following command prevents SECURITY class messages from being displayed on OPA0.


In large clusters, state transitions (computers joining or leaving the cluster) generate many multiline OPCOM messages on a boot server's console device. You can avoid such messages by including the DCL command REPLY/DISABLE=CLUSTER in the appropriate site-specific startup command file or by entering the command interactively from the system manager's account.

10.7 Shutting Down a Cluster

In addition to the default shutdown option NONE, the OpenVMS Alpha and OpenVMS VAX operating systems provide the following options for shutting down OpenVMS Cluster computers:


In addition, in response to the "Shutdown options [NONE]:" prompt, you can specify the DISABLE_AUTOSTART=n option, where n is the number of minutes before autostart queues are disabled in the shutdown sequence.

Reference: See Section 7.13 for more information.

If you do not select any of these options (that is, if you select the default SHUTDOWN option NONE), the shutdown procedure performs the normal operations for shutting down a standalone computer. If you want to shut down a computer that you expect will rejoin the cluster shortly, you can specify the default option NONE. In that case, cluster quorum is not adjusted because the operating system assumes that the computer will soon rejoin the cluster.

Previous Next Contents Index