HP OpenVMS Systems Documentation

Content starts here

Guidelines for OpenVMS Cluster Configurations

Previous Contents Index

Appendix D
Multiple-Site OpenVMS Clusters

This appendix describes multiple-site OpenVMS Cluster configurations in which multiple nodes are located at sites separated by relatively long distances, from approximately 25 to 125 miles, depending on the technology used. This configuration was introduced in OpenVMS Version 6.2. General configuration guidelines are provided and the three technologies for connecting multiple sites are discussed. The benefits of multiple site clusters are cited and pointers to additional documentation are provided.

The information in this appendix supersedes the Multiple-Site VMScluster Systems addendum manual.

D.1 What is a Multiple-Site OpenVMS Cluster System?

A multiple-site OpenVMS Cluster system is an OpenVMS Cluster system in which the member nodes are located in geographically separate sites. Depending on the technology used, the distances can be as great as 150 miles.

When an organization has geographically dispersed sites, a multiple-site OpenVMS Cluster system allows the organization to realize the benefits of OpenVMS Cluster systems (for example, sharing data among sites while managing data center operations at a single, centralized location).

Figure D-1 illustrates the concept of a multiple-site OpenVMS Cluster system for a company with a manufacturing site located in Washington, D.C., and corporate headquarters in Philadelphia. This configuration spans a geographical distance of approximately 130 miles (210 km).

Figure D-1 Site-to-Site Link Between Philadelphia and Washington

D.1.1 ATM, DS3, and FDDI Intersite Links

The following link technologies between sites are approved for OpenVMS VAX and OpenVMS Alpha systems:

  • Asynchronous transfer mode (ATM)
  • DS3
  • FDDI

High-performance local area network (LAN) technology combined with the ATM, DS3, and FDDI interconnects allows you to utilize wide area network (WAN) communication services in your OpenVMS Cluster configuration. OpenVMS Cluster systems configured with the GIGAswitch crossbar switch and ATM, DS3, or FDDI interconnects approve the use of nodes located miles apart. (The actual distance between any two sites is determined by the physical intersite cable-route distance, and not the straight-line distance between the sites.) Section D.3 describes OpenVMS Cluster systems and the WAN communications services in more detail.


To gain the benefits of disaster tolerance across a multiple-site OpenVMS Cluster, use Disaster Tolerant Cluster Services for OpenVMS, a system management and software package from Compaq.

Consult your Compaq Services representative for more information.

D.1.2 Benefits of Multiple-Site OpenVMS Cluster Systems

Some of the benefits you can realize with a multiple-site OpenVMS Cluster system include the following:

Benefit Description
Remote satellites and nodes A few systems can be remotely located at a secondary site and can benefit from centralized system management and other resources at the primary site, as shown in Figure D-2. For example, a main office data center could be linked to a warehouse or a small manufacturing site that could have a few local nodes with directly attached site-specific devices. Alternatively, some engineering workstations could be installed in an office park across the city from the primary business site.
Data center management consolidation A single management team can manage nodes located in data centers at multiple sites.
Physical resource sharing Multiple sites can readily share devices such as high-capacity computers, tape libraries, disk archives, or phototypesetters.
Remote archiving Backups can be made to archival media at any site in the cluster. A common example would be to use disk or tape at a single site to back up the data for all sites in the multiple-site OpenVMS Cluster. Backups of data from remote sites can be made transparently (that is, without any intervention required at the remote site).
Increased availability In general, a multiple-site OpenVMS Cluster provides all of the availability advantages of a LAN OpenVMS Cluster. Additionally, by connecting multiple, geographically separate sites, multiple-site OpenVMS Cluster configurations can increase the availability of a system or elements of a system in a variety of ways:
  • Logical volume/data availability---Volume shadowing or redundant arrays of independent disks (RAID) can be used to create logical volumes with members at both sites. If one of the sites becomes unavailable, data can remain available at the other site.
  • Site failover---By adjusting the VOTES system parameter, you can select a preferred site to continue automatically if the other site fails or if communications with the other site are lost.
  • Disaster tolerance---When combined with the software, services, and management procedures provided by the Disaster Tolerant Cluster Services for OpenVMS, you can achieve a high level of disaster tolerance. Consult your Compaq Services representative for further information.

Figure D-2 shows an OpenVMS Cluster system with satellites accessible from a remote site.

Figure D-2 Multiple-Site OpenVMS Cluster Configuration with Remote Satellites

D.1.3 General Configuration Guidelines

The same configuration rules that apply to OpenVMS Cluster systems on a LAN also apply to a multiple-site OpenVMS Cluster configuration that includes ATM, DS3, or FDDI intersite interconnect. General LAN configuration rules are stated in the following documentation:

  • OpenVMS Cluster Software Software Product Description (SPD 29.78.xx)
  • Chapter 8 of this manual

Some configuration guidelines are unique to multiple-site OpenVMS Clusters; these guidelines are described in Section D.3.4.

D.2 Using FDDI to Configure Multiple-Site OpenVMS Cluster Systems

Since VMS Version 5.4--3, FDDI has been the most common method to connect two distant OpenVMS Cluster sites. Using high-speed FDDI fiber-optic cables, you can connect sites with an intersite cable-route distance of up to 25 miles (40 km), the cable route distance between sites.

You can connect sites using these FDDI methods:

  • To obtain maximum performance, use a full-duplex FDDI link at 100 Mb/s both ways between GIGAswitch/FDDI bridges at each site for maximum intersite bandwidth.
  • To obtain maximum availability, use a dual FDDI ring at 100 Mb/s between dual attachment stations (DAS) ports of wiring concentrators or GIGAswitch/FDDI bridges for maximum link availability.
  • For maximum performance and availability, use two disjoint FDDI LANs, each with dedicated host adapters and full-duplex FDDI intersite links connected to GIGAswitch/FDDI bridges at each site.

Refer to the GIGAswitch/FDDI ATM Linecard Reference Manual for configuration information. Additional OpenVMS Cluster configuration guidelines and system management information can be found in this manual and in OpenVMS Cluster Systems. See the New Features and Documentation Overview Manual for information about ordering the current version of these manuals.

The inherent flexibility of OpenVMS Cluster systems and improved OpenVMS Cluster LAN protocols also allow you to connect multiple OpenVMS Cluster sites using the ATM or DS3 or both communications services.

D.3 Using WAN Services to Configure Multiple-Site OpenVMS Cluster Systems

This section provides an overview of the ATM and DS3 wide area network (WAN) services, describes how you can bridge an FDDI interconnect to the ATM or DS3 or both communications services, and provides guidelines for using these services to configure multiple-site OpenVMS Cluster systems.

The ATM and DS3 services provide long-distance, point-to-point communications that you can configure into your OpenVMS Cluster system to gain WAN connectivity. The ATM and DS3 services are available from most common telephone service carriers and other sources.


DS3 is not available in Europe and some other locations. Also, ATM is a new and evolving standard, and ATM services might not be available in all localities.

ATM and DS3 services are approved for use with the following OpenVMS versions:

Service Approved Versions of OpenVMS
ATM OpenVMS Version 6.2 or later
DS3 OpenVMS Version 6.1 or later

The following sections describe the ATM and DS3 communication services and how to configure these services into multiple-site OpenVMS Cluster systems.

D.3.1 The ATM Communications Service

The ATM communications service that uses the SONET physical layer (ATM/SONET) provides full-duplex communications (that is, the bit rate is available simultaneously in both directions as shown in Figure D-3). ATM/SONET is compatible with multiple standard bit rates. The SONET OC-3 service at 155 Mb/s full-duplex rate is the best match to FDDI's 100 Mb/s bit rate. ATM/SONET OC-3 is a standard service available in most parts of the world. In Europe, ATM/SONET is a high-performance alternative to the older E3 standard.

Figure D-3 ATM/SONET OC-3 Service

To transmit data, ATM frames (packets) are broken into cells for transmission by the ATM service. Each cell has 53 bytes, of which 5 bytes are reserved for header information and 48 bytes are available for data. At the destination of the transmission, the cells are reassembled into ATM frames. The use of cells permits ATM suppliers to multiplex and demultiplex multiple data streams efficiently at differing bit rates. This conversion of frames into cells and back is transparent to higher layers.

D.3.2 The DS3 Communications Service (T3 Communications Service)

The DS3 communications service provides full-duplex communications as shown in Figure D-4. DS3 (also known as T3) provides the T3 standard bit rate of 45 Mb/s. T3 is the standard service available in North America and many other parts of the world.

Figure D-4 DS3 Service

D.3.3 FDDI-to-WAN Bridges

You can use FDDI-to-WAN (for example, FDDI-to-ATM or FDDI-to-DS3 or both) bridges to configure an OpenVMS Cluster with nodes in geographically separate sites, such as the one shown in Figure D-5. In this figure, the OpenVMS Cluster nodes at each site communicate as though the two sites are connected by FDDI. The FDDI-to-WAN bridges make the existence of ATM and DS3 transparent to the OpenVMS Cluster software.

Figure D-5 Multiple-Site OpenVMS Cluster Configuration Connected by DS3

In Figure D-5, the FDDI-to-DS3 bridges and DS3 operate as follows:

  1. The local FDDI-to-DS3 bridge receives FDDI packets addressed to nodes at the other site.
  2. The bridge converts the FDDI packets into DS3 packets and sends the packets to the other site via the DS3 link.
  3. The receiving FDDI-to-DS3 bridge converts the DS3 packets into FDDI packets and transmits them on an FDDI ring at that site.

Compaq recommends using the GIGAswitch/FDDI system to construct FDDI-to-WAN bridges. The GIGAswitch/FDDI, combined with the DEFGT WAN T3/SONET option card, was used during qualification testing of the ATM and DS3 communications services in multiple-site OpenVMS Cluster systems.

D.3.4 Guidelines for Configuring ATM and DS3 in an OpenVMS Cluster System

When configuring a multiple-site OpenVMS Cluster, you must ensure that the intersite link's delay, bandwidth, availability, and bit error rate characteristics meet application needs. This section describes the requirements and provides recommendations for meeting those requirements.

D.3.4.1 Requirements

To be a configuration approved by Compaq, a multiple-site OpenVMS Cluster must comply with the following rules:

Maximum intersite link route distance The total intersite link cable route distance between members of a multiple-site OpenVMS Cluster cannot exceed 150 miles (242 km). You can obtain exact distance measurements from your ATM or DS3 supplier.

This distance restriction can be exceeded when using Disaster Tolerant Cluster Services for OpenVMS, a system management and software package for configuring and managing OpenVMS disaster tolerant clusters.

Maximum intersite link utilization Average intersite link utilization in either direction must be less than 80% of the link's bandwidth in that direction for any 10-second interval. Exceeding this utilization is likely to result in intolerable queuing delays or packet loss.
Intersite link specifications The intersite link must meet the OpenVMS Cluster requirements specified in Table D-3.
OpenVMS Cluster LAN configuration rules Apply the configuration rules for OpenVMS Cluster systems on a LAN to a configuration. Documents describing configuration rules are referenced in Section D.1.3.

D.3.4.2 Recommendations

When configuring the DS3 interconnect, apply the configuration guidelines for OpenVMS Cluster systems interconnected by LAN that are stated in the OpenVMS Cluster Software SPD (SPD 29.78.nn) and in this manual. OpenVMS Cluster members at each site can include any mix of satellites, systems, and other interconnects, such as CI and DSSI.

This section provides additional recommendations for configuring a multiple-site OpenVMS Cluster system.

DS3 link capacity/protocols

The GIGAswitch with the WAN T3/SONET option card provides a full-duplex, 155 Mb/s ATM/SONET link. The entire bandwidth of the link is dedicated to the WAN option card. However, the GIGAswitch/FDDI's internal design is based on full-duplex extensions to FDDI. Thus, the GIGAswitch/FDDI's design limits the ATM/SONET link's capacity to 100 Mb/s in each direction.

The GIGAswitch with the WAN T3/SONET option card provides several protocol options that can be used over a DS3 link. Use the DS3 link in clear channel mode, which dedicates its entire bandwidth to the WAN option card. The DS3 link capacity varies with the protocol option selected. Protocol options are described in Table D-1.

Table D-1 DS3 Protocol Options
Protocol Option Link Capacity
ATM 1 AAL--5 2 mode with PLCP 3 disabled. 39 Mb/s
ATM AAL--5 mode with PLCP enabled. 33 Mb/s
HDLC 4 mode (not currently available). 43 Mb/s

1Asynchronous transfer mode
2ATM Adaptation Layer
3Physical Layer Convergence Protocol
4High-Speed Datalink Control

For maximum link capacity, Compaq recommends configuring the WAN T3/SONET option card to use ATM AAL--5 mode with PLCP disabled.

Intersite bandwidth

The intersite bandwidth can limit application locking and I/O performance (including volume shadowing or RAID set copy times) and the performance of the lock manager.

To promote reasonable response time, Compaq recommends that average traffic in either direction over an intersite link not exceed 60% of the link's bandwidth in that direction for any 10-second interval. Otherwise, queuing delays within the FDDI-to-WAN bridges can adversely affect application performance.

Remember to account for both OpenVMS Cluster communications (such as locking and I/O) and network communications (such as TCP/IP, LAT, and DECnet) when calculating link utilization.

Intersite delay

An intersite link introduces a one-way delay of up to 1 ms per 100 miles of intersite cable route distance plus the delays through the FDDI-to-WAN bridges at each end. Compaq recommends that you consider the effects of intersite delays on application response time and throughput.

For example, intersite link one-way path delays have the following components:

  • Cable route one-way delays of 1 ms/100 miles (0.01 ms/mile) for both ATM and DS3.
  • FDDI-to-WAN bridge delays (approximately 0.5 ms per bridge, and two bridges per one-way trip)

Calculate the delays for a round trip as follows:

WAN round-trip delay = 2 x (N miles x 0.01 ms per mile + 2 x 0.5 ms per FDDI-WAN bridge)

An I/O write operation that is MSCP served requires a minimum of two round-trip packet exchanges:

WAN I/O write delay = 2 x WAN round-trip delay

Thus, an I/O write over a 100-mile WAN link takes at least 8 ms longer than the same I/O write over a short, local FDDI.

Similarly, a lock operation typically requires a round-trip exchange of packets:

WAN lock operation delay = WAN round-trip delay

An I/O operation with N locks to synchronize it incurs the following delay due to WAN:

WAN locked I/O operation delay = (N x WAN lock operation delay) + WAN I/O delay

Bit error ratio

The bit error ratio (BER) parameter is an important measure of the frequency that bit errors are likely to occur on the intersite link. You should consider the effects of bit errors on application throughput and responsiveness when configuring a multiple-site OpenVMS Cluster. Intersite link bit errors can result in packets being lost and retransmitted with consequent delays in application I/O response time (see Section D.3.6). You can expect application delays ranging from a few hundred milliseconds to a few seconds each time a bit error causes a packet to be lost.

Intersite link availability

Interruptions of intersite link service can result in the resources at one or more sites becoming unavailable until connectivity is restored (see Section D.3.5).

System disks

Sites with nodes contributing quorum votes should have a local system disk or disks for those nodes.

System management

A large, multiple-site OpenVMS Cluster requires a system management staff trained to support an environment that consists of a large number of diverse systems that are used by many people performing varied tasks.

Microwave DS3 links

You can provide portions of a DS3 link with microwave radio equipment. The specifications in Section D.3.6 apply to any DS3 link. The BER and availability of microwave radio portions of a DS3 link are affected by local weather and the length of the microwave portion of the link. Consider working with a microwave consultant who is familiar with your local environment if you plan to use microwaves as portions of a DS3 link.

D.3.5 Availability Considerations

If the FDDI-to-WAN bridges and the link that connects multiple sites become temporarily unavailable, the following events could occur:

  • Intersite link failures can result in the resources at one or more sites becoming unavailable until intersite connectivity is restored.
  • Intersite link bit errors (and ATM cell losses) and unavailability can affect:
    • System responsiveness
    • System throughput (or bandwidth)
    • Virtual circuit (VC) closure rate
    • OpenVMS Cluster transition and site failover time

Many communication service carriers offer availability-enhancing options, such as path diversity, protective switching, and other options that can significantly increase the intersite link's availability.

D.3.6 Specifications

This section describes the requirements for successful communications and performance with the WAN communications services.

To assist you in communicating your requirements to a WAN service supplier, this section uses WAN specification terminology and definitions commonly used by telecommunications service providers. These requirements and goals are derived from a combination of Bellcore Communications Research specifications and a Digital analysis of error effects on OpenVMS Clusters.

Table D-2 describes terminology that will help you understand the Bellcore and OpenVMS Cluster requirements and goals used in Table D-3.

Use the Bellcore and OpenVMS Cluster requirements for ATM/SONET - OC3 and DS3 service error performance (quality) specified in Table D-3 to help you assess the impact of the service supplier's service quality, availability, down time, and service-interruption frequency goals on the system.


To ensure that the OpenVMS Cluster system meets your application response-time requirements, you might need to establish WAN requirements that exceed the Bellcore and OpenVMS Cluster requirements and goals stated in Table D-3.

Table D-2 Bellcore and OpenVMS Cluster Requirements and Goals Terminology
Specification Requirements Goals
Bellcore Communications Research Bellcore specifications are the recommended "generic error performance requirements and objectives" documented in the Bellcore Technical Reference TR--TSY--000499 TSGR: Common Requirements. These specifications are adopted by WAN suppliers as their service guarantees. The FCC has also adopted them for tariffed services between common carriers. However, some suppliers will contract to provide higher service-quality guarantees at customer request.

Other countries have equivalents to the Bellcore specifications and parameters.

These are the recommended minimum values. Bellcore calls these goals their "objectives" in the TSGR: Common Requirements document.
OpenVMS Cluster In order for Compaq to approve a configuration, parameters must meet or exceed the values shown in the OpenVMS Cluster Requirements column in Table D-3.

If these values are not met, OpenVMS Cluster performance will probably be unsatisfactory because of interconnect errors/error recovery delays, and VC closures that may produce OpenVMS Cluster state transitions or site failover or both.

If these values are met or exceeded, then interconnect bit error--related recovery delays will not significantly degrade average OpenVMS Cluster throughput. OpenVMS Cluster response time should be generally satisfactory.

Note that if the requirements are only being met, there may be several application pauses per hour. 1

For optimal OpenVMS Cluster operation, all parameters should meet or exceed the OpenVMS Cluster Goal values.

Note that if these values are met or exceeded, then interconnect bit errors and bit error recovery delays should not significantly degrade average OpenVMS Cluster throughput.

OpenVMS Cluster response time should be generally satisfactory, although there may be brief application pauses a few times per day. 2

1Application pauses may occur every hour or so (similar to what is described under OpenVMS Cluster Requirements) because of packet loss caused by bit error.
2Pauses are due to a virtual circuit retransmit timeout resulting from a lost packet on one or more NISCA transport virtual circuits. Each pause might last from a few hundred milliseconds to a few seconds.

Table D-3 OpenVMS Cluster DS3 and SONET OC3 Error Performance Requirements
Parameter Bellcore Requirement Bellcore Goal OpenVMS Cluster Requirement1 OpenVMS Cluster Goal1 Units
Errored seconds (% ES) <1.0% <0.4% <1.0% <0.028% % ES/24 hr
  The ES parameter can also be expressed as a count of errored seconds, as follows:
  <864 <345 <864 <24 ES per 24-hr period
Burst errored seconds (BES) 2 <= 4 -- <= 4 Bellcore Goal BES/day
Bit error ratio (BER) 3 1 x 10 -9 2 x 10 -10 1 x 10 -9 6 x 10 -12 Errored bits/bit
DS3 channel unavailability None <= 97 @ 250 miles, linearly decreasing to 24 @ <= 25 miles None Bellcore Goal Min/yr
SONET channel unavailability None <= 105 @ 250 miles, linearly decreasing to 21 @ <= 50 miles None Bellcore Goal Min/yr
Channel-unavailable event 4 None None None 1 to 2 Events/year

1Application requirements might need to be more rigorous than those shown in the OpenVMS Cluster Requirements column.
2Averaged over many days.
3Does not include any burst errored seconds occurring in the measurement period.
4The average number of channel down-time periods occurring during a year. This parameter is useful for specifying how often a channel might become unavailable.

Table Key
  • Availability---The long-term fraction or percentage of time that a transmission channel performs as intended. Availability is frequently expressed in terms of unavailability or down time.
  • BER (bit error ratio)--- " The BER is the ratio of the number of bits in error to the total number of bits transmitted during a measurement period, excluding all burst errored seconds (defined below) in the measurement period. During a burst errored second, neither the number of bit errors nor the number of bits is counted. "
  • BES (burst errored second)--- " A burst errored second is any errored second containing at least 100 errors. "
  • Channel---The term for a link that is used in the Bellcore TSGR: Common Requirements document for a SONET or DS3 link.
  • Down time---The long-term average amount of time (for example, minutes) that a transmission channel is not available during a specified period of time (for example, 1 year).

    " ...unavailability or downtime of a channel begins when the first of 10 [or more ] consecutive Severely Errored Seconds (SESs) occurs, and ends when the first of 10 consecutive non-SESs occurs. " The unavailable time is counted from the first SES in the 10--SES sequence. " The time for the end of unavailable time is counted from the first fault-free second in the [non-SES ] sequence. "

  • ES (errored second)--- " An errored second is any one-second interval containing at least one error. "
  • SES (severely errored second)--- " ...an SES is a second in which the BER is greater than 10-3 . "

Previous Next Contents Index