HP OpenVMS Systems Documentation

Content starts here HP OpenVMS Availability Manager User's Guide

HP OpenVMS Availability Manager User's Guide


Previous Contents Index


Chapter 6
Performing Fixes on OpenVMS Nodes

Fixes allow you to resolve resource availability problems and improve system availability.

This chapter discusses the following topics:

  • Understanding fixes
  • Performing fixes

Caution

Performing certain fixes can have serious repercussions, including possible system failure. Therefore, only experienced system managers should perform fixes.

6.1 Understanding Fixes

When you suspect or detect a resource availability problem, in many cases you can use the Availability Manager Data Analyzer to analyze the problem and to perform a fix to improve the situation.

Data Analyzer fixes fall into the following categories:

  • Node fixes
  • Process fixes
  • Disk fixes
  • Cluster interconnect fixes

You can access fixes, by category, from the pages listed in Table 6-1.

Table 6-1 Accessing Availability Manager Fixes
Fix Category and Name Available from This Page
Node fixes:
Crash Node
Adjust Quorum
Node Summary
CPU
Memory Summary
I/O Process
SCA Port
SCA Circuit
LAN Virtual Circuit
LAN Path (Channel)
LAN Device
Process fixes:
General process fixes:
Delete Process
Exit Image
Suspend Process
Resume Process
Process Priority


Process memory fixes:

Purge Working Set (WS)
Adjust Working Set (WS)


Process limits fixes:

Direct I/O
Buffered I/O
AST
Open file
Lock
Timer
Subprocess
I/O Byte
Pagefile Quota
All of the process fixes are available from the following pages:
Memory Summary
I/O Process
CPU Process
Single Process
Disk fixes:
Cancel disk MV
Cancel SSM MV
All of the disk fixes are available from the following pages:
Disk Status Summary
Disk Volume Summary
Cluster interconnect fixes: These fixes are available from the following lines of data on the Cluster Summary page (Figure 4-1):
- SCA Port:/ Adjust Priority Right-click a data item on the Local Port Data display line to display a menu. Then select Port Fix....
- SCA Circuit:/ Adjust Priority Right-click a data item on the Circuits Data display line to display a menu. Then select Circuit Fix....
LAN Virtual Circuit Summary:
Maximum Transmit Window Size
Maximum Receive Window Size
Checksumming
Compression
ECS Maximum Delay
Right-click a data item on the LAN Virtual Circuit Summary line to display a menu. Then select VC LAN Fix.... Alternatively, you can use the Fix menu on the LAN VC Details page.
LAN Path (Channel) Summary:
Adjust Priority
Hops
Right-click a data item on the LAN Path (Channel) Summary line to display a menu. Then select Fixes.... Alternatively, you can use the Fix menu on the Channel Details page.
LAN Device Details:
Adjust Priority
Set Maximum Buffer Size
Start LAN Device
Stop LAN Device
You can access these fixes in the following ways:
  • Right-click an item in the LAN Path (Channel) Summary category to display a menu. Then select LAN Device Details... to display pages containing Fix options.
  • Right-click an item in the LAN Device Summary page and then select LAN Device Fixes.....
  • Select Fixes... on the LAN Device Details page.

Table 6-2 summarizes various problems, recommended fixes, and the expected results of fixes.

Table 6-2 Summary of Problems and Matching Fixes
Problem Fix Result
Node resource hanging cluster Crash Node Node fails with operator-requested shutdown. See Section 6.2.2 for the crash dump footprint for this type of shutdown.
Cluster hung Adjust Quorum Quorum for cluster is adjusted.
Process looping, intruder Delete Process Process no longer exists.
Endless process loop in same PC range Exit Image Exits from current image.
Runaway process, unwelcome intruder Suspend Process Process is suspended from execution.
Process previously suspended Resume Process Process starts from point it was suspended.
Runaway process or process that is overconsuming Process Priority Base priority changes to selected setting.
Low node memory Purge Working Set (WS) Frees memory on node; page faulting might occur for process affected.
Working set too high or low Adjust Working Set (WS) Removes unused pages from working set; page faulting might occur.
Process quota has reached its limit and has entered RWAIT state Adjust Process Limits Process limit is increased, which in many cases frees the process to continue execution.
Process has exhausted its pagefile quota Adjust Pagefile Quota Pagefile quota limit of the process is adjusted.
Disk volume is in mount verify state Cancel disk MV Disk volume is taking out of the mount verify state and put into the mount verify timeout state. The disk can now be dismounted with the $ DISMOUNT/ABORT command.
Shadow set is in mount verify state due to a shadow set member being in a mount verify state Cancel SSM MV The shadow set member is ejected from the shadow set, enabling the shadow set to return to a mounted state. This is equivalent to $ SET SHADOW/FORCE_REMOVAL command.

Most process fixes correspond to an OpenVMS system service call, as shown in the following table:

Process Fix System Service Call
Delete Process $DELPRC
Exit Image $FORCEX
Suspend Process $SUSPND
Resume Process $RESUME
Process Priority $SETPRI
Purge Working Set (WS) $PURGWS
Adjust Working Set (WS) $ADJWSL
Adjust process limits of the following:
Direct I/O (DIO)
Buffered I/O (BIO)
Asynchronous system trap (AST)
Open file (FIL)
Lock queue (ENQ)
Timer queue entry (TQE)
Subprocess (PRC)
I/O byte (BYT)
None

Note

Each fix that uses a system service call requires that the process execute the system service. A hung process has the fix queued to it, and the fix does not execute until the process is operational again.

Be aware of the following facts before you perform a fix:

  • You must have write access to perform a fix. To perform LAN fixes, you must have control access.
  • You cannot undo many fixes. For example, after using the Crash Node fix, the node must be rebooted (either by the node if the node reboots automatically, or by a person performing a manual boot).
  • Do not apply the Exit Image, Delete Process, or Suspend Process fix to system processes. Doing so might require you to reboot the node.
  • Whenever you exit an image, you cannot return to that image.
  • You cannot delete processes that have exceeded their job or process quota.
  • The Availability Manager Data Collector ignores fixes applied to the SWAPPER process.

How to Perform Fixes

Standard OpenVMS privileges restrict users' write access. When you run the Data Analyzer, you must have the CMKRNL privilege to send a write (fix) instruction to a node with a problem.

The following options are displayed at the bottom of all fix pages:

Option Description
OK Applies the fix and then exits the page. Any message associated with the fix is displayed in the Event pane.
Cancel Cancels the fix.
Apply Applies the fix and does not exit the page. Any message associated with the fix is displayed in the Return Status section of the page and in the Event pane.

The following sections explain how to perform node, process and disk fixes.

Note

Node, process and disk fixes generate an event when they are executed. The events are entered into the event log on the system that is running the Data Analyzer. See the "Events generated by fixes" section in Table C-2 for a list of these events.

6.2 Performing Node Fixes

Node fixes fall into the following categories:

  • Fixes that allow you to deliberately fail (or crash) a node
  • A fix that allows you to adjust cluster quorum

To perform a node fix, follow these steps:

  1. On the Node Summary, CPU, Memory, or I/O page, select the Fix menu.
  2. Select Fix Options.

6.2.1 Adjust Quorum

The default node fix displayed is the Adjust Quorum fix, which forces a node to recalculate the quorum value. This fix is the equivalent of the Interrupt Priority level C (IPC) mechanism used at system consoles for the same purpose. The fix forces the adjustment for the entire cluster so that each node in the cluster has the same new quorum value.

The Adjust Quorum fix is useful when the number of votes in a cluster falls below the quorum set for that cluster. This fix allows you to readjust the quorum so that it corresponds to the current number of votes in the cluster.

The Adjust Quorum page is shown in Figure 6-1.

Figure 6-1 Adjust Quorum


6.2.2 Crash Node

Caution

The Crash Node fix is an operator-requested bugcheck from the Data Collector. It takes place as soon as you click OK in the Crash Node fix. After you perform this fix, the node cannot be restored to its previous state. After a crash, the node must be rebooted.

When you select the Crash Node option, the Data Analyzer displays the Crash Node page, shown in Figure 6-2.

Figure 6-2 Crash Node


Note

Because the node cannot report a confirmation when a Crash Node fix is successful, the crash success message is displayed after the timeout period for the fix confirmation has expired.

Recognizing a System Failure Forced by the Availability Manager

Because a user with suitable privileges can force a node to fail from the Data Analyzer by using the Crash Node fix, system managers have requested a method for recognizing these particular failure footprints so that they can distinguish them from other failures. These failures all have identical footprints: they are operator-induced system failures in kernel mode at IPL 8. The top of the kernel stack is similar the following display:


                SP => Quadword system address 
                      Quadword data 
                      1BE0DEAD.00000000 
                      00000000.00000000 
                      Quadword data            TRAP$CRASH 
                      Quadword data            SYS$RMDRIVER + offset 

6.3 Performing Process Fixes

. Process fixes fall into the following categories:

  • Fixes that allow you to affect the process. For instance, change its priority, suspend it, or resume it
  • A fix that allows you to adjust the memory of a process
  • A fix that allows you to adjust the quotas or limits of of a process

To perform a process fix, follow these steps:

  1. On the Memory or I/O page, right-click a process name.
  2. Click Fix Options.
    The Data Analyzer displays these Process tabs:
    Process General
    Process Memory
    Process Limits
  3. Click one of these tabs to bring it to the front.
  4. Click the down arrow to display the process fixes in this group, as shown in Figure 6-3, where the Process General tab has been chosen.

    Figure 6-3 Process General Options


  5. Select a process fix (for example, Process Priority, shown in Figure 6-3), to display a fix page.

Some of the fixes, such as Process Priority, require you to use a slider to change the default value. When you finish setting a new process priority, click Apply at the bottom of the page to apply that fix.

6.3.1 General Process Fixes

The following sections describe Data Analyzer general process fixes. These fixes include instructions telling how to delete, suspend, and resume a process.

6.3.1.1 Delete Process

In most cases, a Delete Process fix deletes a process. However, if a process is waiting for disk I/O or is in a resource wait state (RWAST), this fix might not delete the process. In this situation, it is useless to repeat the fix. Instead, depending on the resource the process is waiting for, a Process Limit fix might free the process. As a last resort, reboot the node to delete the process.

Caution

Deleting a system process can cause the system to hang or become unstable.

When you select the Delete Process option, the Data Analyzer displays the page shown in Figure 6-4.

Figure 6-4 Delete Process


After reading the explanation, click Apply at the bottom of the page to apply the fix. A message displayed on the page indicates that the fix has been successful.

6.3.1.2 Exit Image

Exiting an image on a node can stop an application that a user requires. Make sure you check the Single Process page before you exit an image to determine which image is running on the node.

Caution

Exiting an image on a system process could cause the system to hang or become unstable.

When you select the Exit Image option, the Data Analyzer displays the page shown in Figure 6-5.

Figure 6-5 Exit Image Page


After reading the explanation in the page, click Apply at the bottom of the page to apply the fix. A message displayed on the page indicates that the fix has been successful.

6.3.1.3 Suspend Process

Suspending a process that is consuming excess CPU time can improve perceived CPU performance on the node by freeing the CPU for other processes to use. (Conversely, resuming a process that was using excess CPU time while running might reduce perceived CPU performance on the node.)

Caution

Do not suspend system processes, especially JOB_CONTROL, because this might make your system unusable. (For more information, see HP OpenVMS Programming Concepts Manual, Volume I.)

When you select the Suspend Process option, the Data Analyzer displays the page shown in Figure 6-6.

Figure 6-6 Suspend Process


After reading the explanation, click Apply at the bottom of the page to apply the fix. A message displayed on the page indicates that the fix has been successful.

6.3.1.4 Resume Process

Resuming a process that was using excess CPU time while running might reduce perceived CPU performance on the node. (Conversely, suspending a process that is consuming excess CPU time can improve perceived CPU performance by freeing the CPU for other processes to use.)

When you select the Resume Process option, the Data Analyzer displays the page shown in Figure 6-7.

Figure 6-7 Resume Process


After reading the explanation, click Apply at the bottom of the page to apply the fix. A message displayed on the page indicates that the fix has been successful.

6.3.1.5 Process Priority

If the priority of a compute-bound process is too high, the process can consume all the CPU cycles on the node, affecting performance dramatically. On the other hand, if the priority of a process is too low, the process might not obtain enough CPU cycles to do its job, also affecting performance.

When you select the Process Priority option, the Data Analyzer displays the page shown in Figure 6-8.

Figure 6-8 Process Priority


To change the base priority for a process, drag the slider on the scale to the number you want. The current priority number is displayed in a small box above the slider. You can also click the line above or below the slider to adjust the number by 1.

When you are satisfied with the new base priority, click Apply at the bottom of the page to apply the fix. A message displayed on the page indicates that the fix has been successful.

6.3.2 Process Memory Fixes

The following sections describe the Availability Manager fixes you can use to correct process memory problems--- Purge Working Set and Adjust Working Set fixes.

6.3.2.1 Purge Working Set

This fix purges the working set to a minimal size. You can use this fix to reclaim a process's pages that are not in active use. If the process is in a wait state, the working set remains at a minimal size, and the purged pages become available for other uses. If the process becomes active, pages the process needs are page-faulted back into memory, and the unneeded pages are available for other uses.

Be careful not to repeat this fix too often: a process that continually reclaims needed pages can cause excessive page faulting, which can affect system performance.

When you select the Purge Working Set option, the Data Analyzer displays the page shown in Figure 6-9.

Figure 6-9 Purge Working Set


After reading the explanation on the page, click Apply at the bottom of the page to apply the fix. A message displayed on the page indicates that the fix has been successful.

6.3.2.2 Adjust Working Set

Adjusting the working set of a process might prove to be useful in a variety of situations. Two of these situations are described in the following list.
  • If a process is page-faulting because of insufficient memory, you can reclaim unused memory from other processes by decreasing the working set of one or more of them.
  • If a process is page-faulting too frequently because its working set is too small, you can increase its working set.

Caution

If the automatic working set adjustment is enabled for the system, a fix to adjust the working set size disables the automatic adjustment for the process. For more information, see OpenVMS online help for SET WORKING_SET/ADJUST, which includes /NOADJUST.

When you select the Adjust Working Set fix, the Data Analyzer displays the page shown in Figure 6-10.

Figure 6-10 Adjust Working Set


To perform this fix, use the slider to adjust the working set to the limit you want. You can also click the line above or below the slider to adjust the number by 1.

When you are satisfied with the new working set limit, click Apply at the bottom of the page to apply the fix. A message displayed on the page indicates that the fix has been successful.

6.3.3 Process Limits Fixes

If a process is waiting for a resource, you can use a Process Limits fix to increase the resource limit so that the process can continue. The increased limit is in effect only for the life of the process, however; any new process is assigned the quota that was set in the UAF.

When you click the Process Limits tab, you can select any of the following options:

Direct I/O
Buffered I/O
AST
Open File
Lock
Timer
Subprocess
I/O Byte
Pagefile Quota

These fix options are described in the following sections.

6.3.3.1 Direct I/O Count Limit

You can use this fix to adjust the direct I/O count limit of a process. When you select the Direct I/O option, the Data Analyzer displays the page shown in Figure 6-11.

Figure 6-11 Direct I/O Count Limit


To perform this fix, use the slider to adjust the direct I/O count to the limit you want. You can also click the line above or below the slider to adjust the number by 1.

When you are satisfied with the new direct I/O count limit, click Apply at the bottom of the page to apply the fix. A message displayed on the page indicates that the fix has been successful.

6.3.3.2 Buffered I/O Count Limit

You can use this fix to adjust the buffered I/O count limit of a process. When you select the Buffered I/O option, the Data Analyzer displays the page shown in Figure 6-12.

Figure 6-12 Buffered I/O Count Limit


To perform this fix, use the slider to adjust the buffered I/O count to the limit you want. You can also click the line above or below the slider to adjust the number by 1.

When you are satisfied with the new buffered I/O count limit, click Apply at the bottom of the page to apply the fix. A message displayed on the page indicates that the fix has been successful.

6.3.3.3 AST Queue Limit

You can use this fix to adjust the AST queue limit of a process. When you select the AST option, the Data Analyzer displays a page similar to the one shown in Figure 6-13.

Figure 6-13 AST Queue Limit


To perform this fix, use the slider to adjust the AST queue limit to the number you want. You can also click the line above or below the slider to adjust the number by 1.

When you are satisfied with the new AST queue limit, click Apply at the bottom of the page to apply the fix. A message displayed on the page indicates that the fix has been successful.

6.3.3.4 Open File Limit

You can use this fix to adjust the open file limit of a process. When you select the Open File option, the Data Analyzer displays a page similar to the one shown in Figure 6-14.

Figure 6-14 Open File Limit


To perform this fix, use the slider to adjust the open file limit to the number you want. You can also click the line above or below the slider to adjust the number by 1.

When you are satisfied with the new open file limit, click Apply at the bottom of the page to apply the fix. A message displayed on the page indicates that the fix has been successful.

6.3.3.5 Lock Queue Limit

You can use this fix to adjust the lock queue limit of a process. When you select the Lock option, the Data Analyzer displays a page that is similar to the one shown in Figure 6-15.

Figure 6-15 Lock Queue Limit


To perform this fix, use the slider to adjust the lock queue limit to the number you want. You can also click the line above or below the slider to adjust the number by 1.

When you are satisfied with the new lock queue limit, click Apply at the bottom of the page to apply the fix. A message displayed on the page indicates that the fix has been successful.

6.3.3.6 Timer Queue Entry Limit

You can use this fix to adjust the timer queue entry limit of a process. When you select the Timer option, the Data Analyzer displays the page shown in Figure 6-16.

Figure 6-16 Timer Queue Entry Limit


To perform this fix, use the slider to adjust the timer queue entry limit to the number you want. You can also click the line above or below the slider to adjust the number by 1.

When you are satisfied with the new timer queue entry limit, click Apply at the bottom of the page to apply the fix. A message displayed on the page indicates that the fix has been successful.

6.3.3.7 Subprocess Creation Limit

You can use this fix to adjust the creation limit of the subprocess of a process. When you select the Subprocess option, the Data Analyzer displays the page shown in Figure 6-17.

Figure 6-17 Subprocess Creation Limit


To perform this fix, use the slider to adjust the subprocess creation limit of a process to the number you want. You can also click the line above or below the slider to adjust the number by 1.

When you are satisfied with the new subprocess creation limit, click Apply at the bottom of the page to apply the fix. A message displayed on the page indicates that the fix has been successful.

6.3.3.8 I/O Byte

You can use this fix to adjust the I/O byte limit of a process. When you select the I/O Byte option on the movable bar, the Data Analyzer displays a page similar to the one shown in Figure 6-18.

Figure 6-18 I/O Byte


To perform this fix, use the slider to adjust the I/O byte limit to the number you want. You can also click the line above or below the slider to adjust the number by 1.

When you are satisfied with the new I/O byte limit, click Apply at the bottom of the page to apply the fix. A message displayed on the page indicates that the fix has been successful.

6.3.3.9 Pagefile Quota

You can use this fix to adjust the pagefile quota limit of a process. This quota is share among all the processes in a job and is measured in pagelets (512 byte pages). When you select the Pagefile Quota option, the Data Analyzer displays the page shown in Figure 6-19.

Figure 6-19 Pagefile Quota


To perform this fix, use the slider to adjust the pagefile quota limit to the number you want. You can also click above or below the slider to adjust the fix value by 1 on VAX systems, or by the number of pagelets in a page for Alpha and I64 systems.

When you are satisfied with the new pagefile quota limit, click Apply at the bottom of the page to apply the fix. A message displayed on the page indicates that the fix has been successful.

6.4 Performing Disk Fixes

Disk fixes fall into the following categories:
  • Forcing a disk volume out of a mount verify state
  • Forcing a shadow set member out of a shadow set, allowing the shadow set to come out of a mount verify state and resume normal operations

To perform a node fix, follow these steps:

  1. On the Disk Status Summary or Disk Volume Summary page, select the Fix menu.
  2. Select Fix Options.

6.4.1 Cancel Disk Volume Mount Verification

The default disk fix displayed is the Cancel Disk Mount Verification (MV) fix, which forces a disk volume that is in a mount verify state into a mount verify timeout state. This fix is the equivalent of the Interrupt Priority level C (IPC) mechanism used at system consoles for the same purpose.

The Cancel Disk Mount Verification (MV) fix is useful where disk volumes are mounted cluster-wide, and the host node for the disk volume fails. Once this fix is used on a disk volume, the disk then can be dismounted with a $ DISMOUNT/ABORT command.

The Cancel Disk MV page is shown in Figure 6-20.

Figure 6-20 Cancel Disk MV


After reading the explanation on the page, click Apply at the bottom of the page to apply the fix. A message displayed on the page indicates that the fix has been successful.

6.4.2 Cancel Shadow Set Mount Verification

The Cancel Shadow Set Mount Verification (SSM MV) fix forces the ejection of an unavailable shadow set member from a shadow set that is in a mount verify state.

The Cancel SSM MV fix is useful to regain use of a shadow set that is in a mount verify state because a shadow set member resides on a host node that has failed. This is especially useful where the shadow set contains the System Authorization file, and having the shadow set in a mount verify state prevents logins to the node or cluster.

This fix is the equivalent to the $ SET SHADOW/FORCE_REMOVAL command.

The Cancel SSM MV page is shown in Figure 6-21.

Figure 6-21 Cancel SSM MV


After reading the explanation on the page, click Apply at the bottom of the page to apply the fix. A message displayed on the page indicates that the fix has been successful.

6.5 Performing Cluster Interconnect Fixes

Note

All cluster interconnect fixes require that managed objects be enabled.

The following are categories of cluster interconnect fixes:

  • Port adjust priority fix
  • Circuit adjust priority fix
  • LAN virtual circuit (VC) summary fixes
  • LAN channel (path) fixes
  • LAN device fixes

The following sections describe these types of fixes. The descriptions also indicate whether or not the fix is currently available.

6.5.1 Port Adjust Priority Fix

To access the Port Adjust Priority fix, right-click a data item in the Local Port Data display line (see Figure 4-3). The Data Analyzer displays a shortcut menu with the Port Fix option.

This page (Figure 6-22) allows you to change the cost associated with this port, which, in turn, affects the routing of cluster traffic.

Figure 6-22 Port Adjust Priority


6.5.2 Circuit Adjust Priority Fix

To access the Circuit Adjust Priority fix, right-click a data item in the circuits data display line (see Figure 4-4). The Data Analyzer displays a shortcut menu with the Circuit Fix option.

This page (Figure 6-23) allows you to change the cost associated with this circuit, which, in turn, affects the routing of cluster traffic. In the below text figures 6-23 to 6-34 on a Cluster Over IP interface would be updated in the next Documentation update.

Figure 6-23 Circuit Adjust Priority


6.5.3 LAN Virtual Circuit Fixes

To access LAN virtual circuit fixes, right-click a data item in the LAN Virtual Circuit Summary category (see Figure 4-6), or use the Fix menu on the LAN Device Details... page.

The Data Analyzer displays a shortcut menu with the following options:

  • Channel Summary
  • VC LAN Details...
  • VC LAN Fix...

When you select VC LAN Fix..., the Data Analyzer displays the first of several fix pages. Use the Fix Type box to select one of the following LAN VC fixes:

  • Maximum Transmit Window Size
  • Maximum Receive Window Size
  • Checksumming
  • Compression
  • ECS Maximum Delay

These fixes are described in the following sections.

6.5.3.1 LAN VC Checksumming Fix

The LAN VC Checksumming fix (Figure 6-24) allows you to turn checksumming on or off for the virtual circuit.

Figure 6-24 LAN VC Checksumming


6.5.3.2 LAN VC Maximum Transmit Window Size Fix

The LAN VC Transmit Window Size fix (Figure 6-25) allows you to adjust the maximum transmit window size for the virtual circuit.

Figure 6-25 LAN VC Maximum Transmit Window Size


6.5.3.3 LAN VC Maximum Receive Window Size Fix

The LAN VC Maximum Receive Window Size fix (Figure 6-26) allows you to adjust the maximum receive window size for the virtual circuit.

Figure 6-26 LAN VC Maximum Receive Window Size


6.5.3.4 LAN VC Compression Fix

The LAN VC Compression fix (Figure 6-27) allows you to turn compression on or off for the virtual circuit. This fix, however, might not be available on all target systems.

Figure 6-27 LAN VC Compression


6.5.3.5 LAN VC ECS Maximum Delay Fix

The LAN VC ECS Maximum Delay fix (Figure 6-28) sets a management-specific limit on the maximum delay (in microseconds) an ECS member channel can have. You can set a value between 0 and 3000000. Zero disables a prior management delay setting.

You can use this fix to override PEdriver automatically calculated delay thresholds. This ensures that all channels with delays less than the value supplied are included in the VC's ECS.

Figure 6-28 LAN VC ECS Maximum Delay


On the sample page shown in Figure 6-28, you cannot read the following text (which is displayed when you move the slider down): "The fix operates as follows: Whenever at least none tight peer channel has a delay of less than the management-supplied value, all tight peer channels with delays less than the management-supplied value are automatically included in the ECS. When all tight peer channels have delays equal to or greater than the management setting, the ECS membership delay thresholds are automatically calculated and used.

You must determine an appropriate value for your configuration by experimentation. An initial value of 2000 (2ms) to 5000 (5ms) is suggested."

On this page, the following note of caution is also displayed:

Caution

By overriding the automatic delay calculations, you can include a channel in the ECS whose average delay is consistently greater than 1.5 to 2 times the average delay of the fastest channels. When this occurs, the overall VC throughput becomes the speed of the slowest ECS member channel. An extreme example is when the management delay permits a 10Mb/sec Ethernet channel to be included with multiple 1Gb/sec channels. The resultant VC throughput drops to 10Mb/sec.

6.5.4 LAN Channel Fixes

To access LAN path fixes, right-click an item on a LAN Path (Channel) Summary line (see Figure 4-6). The Data Analyzer displays a shortcut menu with the following options:

  • Channel Details...
  • LAN Device Details...
  • Fixes...

Click Fixes... or use the Fix menu on the Channel Details page. The Data Analyzer displays a page with the following Fix Types:

  • Adjust Priority
  • Hops
  • Max Packet Size

These fixes are described in the following sections.

6.5.4.1 LAN Path (Channel) Adjust Priority Fix

The LAN Path (Channel) Adjust Priority fix (Figure 6-29) allows you to change the cost associated with this channel by adjusting its priority. This, in turn, affects the routing of cluster traffic.

Figure 6-29 LAN/IP Path (Channel) Adjust Priority


6.5.4.2 LAN Path (Channel) Hops Fix

LAN Path (Channel) Hops fix (Figure 6-30) allows you to change the hops for the channel. This change, in turn, affects the routing of cluster traffic.

Figure 6-30 LAN/IP Path (Channel) Hops


6.5.5 LAN Device Fixes

To access LAN device fixes, right-click an item in the LAN Path (Channel) Summary category (see Figure 4-6). The Data Analyzer displays a shortcut menu with the following options:
  • Channel Details...
  • LAN Device Details...
  • Fixes...

Select LAN Device Details to display the LAN Device Details window. From the Device Details window, select Fix... from the Fix menu. (These fixes are also accessible from the LAN Device Summary page.)

The Data Analyzer displays the first of several pages, each of which contains a fix option:

Adjust Priority
Set Max Buffer Size
Start LAN Device
Stop LAN Device

These fixes are described in the following sections.

6.5.5.1 LAN Device Adjust Priority Fix

The LAN Device Adjust Priority fix (Figure 6-31) allows you to adjust the management priority for the device. This fix changes the cost associated with this device, which, in turn, affects the routing of cluster traffic.

Starting with OpenVMS Version 7.3-2, a channel whose priority is -128 is not used for cluster communications. The priority of a channel is the sum of the management priority assigned to the local LAN device and the channel itself. Therefore, you can assign any combination of channel and LAN device management priority values to arrive at a total of -128.

Figure 6-31 LAN/IP Device Adjust Priority


6.5.5.2 LAN Device Set Maximum Buffer Fix

The LAN Device Set Maximum Buffer fix (Figure 6-32) allows you to set the maximum packet size for the device, which changes the maximum packet size associated with this channel. This change, in turn, affects the routing of cluster traffic.

Figure 6-32 LAN Device Set Maximum Buffer Size


6.5.5.3 LAN Device Start Fix

The LAN Device Start fix (Figure 6-33) starts the use of this particular LAN device. This fix allows you, at the same time, to enable this device for cluster traffic.

Figure 6-33 LAN/IP Device Start


6.5.5.4 LAN Device Stop Fix

The LAN Device Stop fix (Figure 6-34) stops the use of this particular LAN device. At the same time, this fix disables this device for cluster traffic.

Caution

This fix could result in interruption of cluster communications for this node. The node might exit the cluster (CLUEXIT crash).

Figure 6-34 LAN/IP Device Stop



Chapter 7
Customizing the Availability Manager Data Analyzer

This chapter explains how to customize the following Availability Manager Data Analyzer features:

Feature Description
Nodes or node groups You can select one or more groups or individual nodes to monitor.
Data collection For OpenVMS nodes, you can choose the types of data you want to collect as well as set several types of collection intervals. (On Windows nodes, specific types of data are collected by default.)
Data filters For OpenVMS nodes, you can specify a number of parameters and values that limit the amount of data that is collected.
Event escalation You can customize the way events are displayed in the Event pane of the System Overview window (Figure 2-25), and you can configure events to be signaled to OPCOM and OpenView.
Event filters You can specify the severity of events that are displayed as well as several other filter settings for events.
Security On Data Analyzer and Data Collector nodes, you can change passwords. On OpenVMS Data Collector nodes, you can edit a file that contains security triplets.
Watch process You can specify up to eight processes for the Data Analyzer to monitor and report on if they exit and also if they subsequently are created.

In addition, you can change the group membership of nodes, as explained in Section 7.4.1 and Section 7.4.2.

Table 7-1 shows the levels of customization the Data Analyzer provides. At each level, you can customize specific features. The table shows the features that can be customized at each level.

Table 7-1 Levels of Customization
Customizable Features Application Operating System Group Node
Nodes or node groups X      
Data collection   X X X
Data filters   X X X
Event escalation X X X X
Event filters   X X X
Security   X X X
Watch process   X X X

7.1 Understanding Levels of Customization

You can customize each feature at one or more of the following levels, as shown in Table 7-1:

  • Application
  • Operating System
  • Group
  • Node

In addition to the four levels of customization are Availability Manager Data Analyzer Defaults (AM Defaults), which are top-level, built-in values that are preset (hardcoded) within the Availability Manager Data Analyzer. Users cannot change these settings themselves. If no customizations are made at any of the four levels, the AM Default values are used.

The following list describes the four levels of customization.

  • Application values override AM Defaults for nodes and groups of nodes as well as event escalation (unless overriding customization are made at the operating system, group, or node levels).
  • Operating system values override Application values for event escalation. Operating System values override AM Defaults for the remaining features shown in Table 7-1.
  • Group values override Operating System and Application values as well as AM Defaults.
  • Node values override Group, Operating System, and Application values, as well as AM Defaults.

Any of these four levels of customization overrides AM Defaults. Also, customizing values at any successive level overrides the value set at the previous level. For example, customizing values for Data filters at the Group level overrides values for Data filters set at the Operating System level. Similarly, customizing values for Data filters at the Node level overrides values for Data filters set at the Group level.

7.1.1 Recognizing Levels of Customization

The customization levels for various Data Analyzer values are displayed as icons on some pages. The OpenVMS Data Collection Customization page (Figure 7-1) displays several of these icons.

Figure 7-1 OpenVMS Data Collection Customization


The icons preceding each data item in Figure 7-1 indicate the current customization level for each collection choice. Table 7-2 describes these icons and tells where each appears in Figure 7-1.

Table 7-2 Customization Icons in Figure 7-1
Icon Location Meaning
Graph Before "Disk volume" Current setting is from the built-in AM Defaults.
Magnifying glass Bottom left of window Current setting is from the Application level.
Swoosh Before "Disk status" Current setting has been modified at the OpenVMS Operating System Level.
Double monitors Before "Cluster summary" Current setting has been modified at the group level.
Single monitor Before "Memory" Current setting has been modified at the node level.

7.1.2 Setting Levels of Customization

When you customize values, the Data Analyzer keeps track of the next higher level of each value. This means that you can reset a value to the value set at the next higher level.

To return to the values set at the preceding level, click the Use default values button at the top of a customization page. The icon on the "Use default values" button and explanation at the bottom of the page indicate the previous customization level.

In the main System Overview window (see Figure 2-25), you can select the customization levels that are shown in Table 7-1. The following sections explain levels of customization in more detail.

7.1.3 Knowing the Number of Nodes Affected by Each Customization Level

Another way of looking at Data Analyzer customization is to consider the number of nodes affected by each level of customization. Depending on which customization menu you use and your choice of menu items, your customizations can affect one or more nodes, as indicated in the following table.

Nodes Affected Action
All nodes Select Customize Application... on the menu shown in Figure 7-2.
All Windows nodes Select Operating Systems --> Customize Windows NT... on the menu shown in Figure 7-2.
All OpenVMS nodes Select Operating Systems --> Customize OpenVMS... on the menu shown in Figure 7-2.
Nodes in a group Select Customize... on the shortcut menu shown in Figure 7-7. The customization options you choose affect only the group of nodes that you select.
One node Select Customize... on the shortcut menu shown in Figure 7-8 or on the Customize shortcut menu on the Node page. The customization options you choose affect only the node that you select.

7.2 Customizing Settings at the Application and Operating System Levels

In the System Overview window menu bar, select Customize. The Data Analyzer displays the shortcut menu shown in Figure 7-2.

Figure 7-2 Application and Operating System Customization Menu


7.2.1 Customizing Application Settings

When you select Customize Application..., by default the Data Analyzer displays the Group/Nodes Lists page (Figure 7-3), where the Inclusion lists tab is the default.

Note

The Event Escalation tab displayed on the Application Settings page (Figure 7-3) is explained in Section 7.7.

7.2.1.1 Application Settings---Groups/Nodes Inclusion Page

On the Groups/Nodes Inclusion page (Figure 7-3) you can select groups of nodes or individual nodes to be displayed.

Figure 7-3 Application Settings---Groups/Nodes Inclusion


On the Groups/Nodes Inclusion page, you have the following choices:

  • Group List
    Select the Group List check box. Then enter the names of the groups of nodes you want to monitor. (The names are case-sensitive, so be sure to enter the correct case.)
    For instructions for changing the group membership of a node, see Section 7.4.1 and Section 7.4.2
  • Node List
    Select the Node List check box. Then enter the names of individual nodes you want to monitor. (The names are case-sensitive, so be sure to enter the correct case.)
  • Both Group List and Node List
    If you select both check boxes, you can enter the names of groups of nodes as well as individual nodes you want to monitor. (If you enter the name of an individual node, the Data Analyzer displays the name of the group that the node is in, but no additional nodes in that group.)
  • Neither list
    The Group List and Node List are not used; all groups and all nodes are monitored.

If you decide to return to the default (Group List: DECAMDS) or to enter names again, select Use default values.

After you enter a list of nodes or groups of nodes, click one of the following buttons at the bottom of the page:

Option Description
OK Accepts the choice of names you have entered and exits the page.
Cancel Cancels the choice of names and does not exit the page.
Apply Accepts the choice of names you have entered but does not exit the page.

If nodes were previously selected for monitoring, their names are not removed from the display even if you click OK or Apply. They are filtered out the next time the Data Analyzer is started.

7.2.1.2 Application Settings---Groups/Nodes Exclusion Lists

As an alternative to the Inclusion lists on the Groups/Nodes Inclusion page, you can click the Exclusion lists tab in Figure 7-4, where you can select groups of nodes or individual nodes to be excluded from display.

Figure 7-4 Application Settings---Groups/Nodes Exclusion Lists


On the Groups/Nodes Exclusion Lists page, you have the following choices:

  • Group List
    Select the Group List check box. Then enter the names of the groups of nodes you want to exclude from monitoring. (The names are case-sensitive, so be sure to enter the correct case.)
    For instructions on changing the group membership of a node, see Section 7.4.1 and Section 7.4.2.
  • Node List
    Select the Node List check box. Then enter the names of individual nodes you want to exclude from monitoring. (The names are case-sensitive, so be sure to enter the correct case.)
  • Both Group List and Node List
    If you select both check boxes, you can enter the names of groups of nodes as well as individual nodes you want to exclude from monitoring. (If you enter the name of an individual node, the Data Analyzer displays the name of the group that the node is in, but no additional nodes in that group.)
  • Neither box
    The Group List and Node List are not used; all groups and all nodes are monitored.

After you enter a list of nodes or groups of nodes, click one of the buttons at the bottom of the page:

Option Description
OK Accepts the choice of names you have entered and exits the page.
Cancel Cancels the choice of names and does not exit the page.
Apply Accepts the choice of names you have entered but does not exit the page.

If nodes were previously selected for monitoring, their names are not removed from the display even if you click OK or Apply to exclude them from monitoring.

7.2.2 Customizing Windows Operating System Settings

When you select Customize Windows NT..., the Data Analyzer displays a page similar to the one shown in Figure 7-5.

Figure 7-5 Windows Operating System Customization


The default page displayed is the Event Customization page. Instructions for using this page are in Section 7.8.1. The other tabs displayed are the Event Escalation page, which is explained in Section 7.7, and the Windows Security Customization page, which is explained in Section 7.9.2.2.

7.2.3 Customizing OpenVMS Operating System Settings

When you select Customize OpenVMS..., the Data Analyzer displays the pages shown in Figure 7-6, which contains tabs for the last six types of customization listed in Table 7-1. (Instructions for making these types of customizations are later in this chapter, beginning in Section 7.5.

Figure 7-6 OpenVMS Operating System Customization


7.3 Customizing Settings at the Group Level

To perform customizations at the group level, right-click a group name in the System Overview window. The Data Analyzer displays a small menu similar to the one shown in Figure 7-7.

Figure 7-7 Group Customization Menu


When you select Customize, the Data Analyzer displays a page similar to the one shown in Figure 7-6.

7.4 Customizing Settings at the Node Level

To customize a specific node, do either of the following:

  • Select the Customize option at the top of the Group/Node page.
  • Right-click a node name in the Node pane of the System Overview window (see Figure 2-25).
    The Data Analyzer displays the shortcut menu shown in Figure 7-8.

Note

You can customize nodes in any state.

Figure 7-8 Node Customization Menu


When you select Customize, the Data Analyzer displays a customization page similar to the one shown in Figure 7-6.

7.4.1 Changing the Group of an OpenVMS Node

Each Availability Manager Data Collector node is assigned to the DECAMDS group by default.

Note

You need to place nodes that are in the same cluster in the same group. If such nodes are placed in different groups, some of the data collected might be misleading.

You need to edit a logical on each Data Collector node to change the group for that node. To do this, follow these steps:

  1. Assign a unique name of up to 15 alphanumeric characters to the AMDS$GROUP_NAME logical name in the AMDS$AM_SYSTEM:AMDS$LOGICALS.COM file. For example:


    $ AMDS$DEF AMDS$GROUP_NAME FINANCE ! Group FINANCE; OpenVMS Cluster alias 
    
  2. Apply the logical name by restarting the Data Collector:


    $ @SYS$STARTUP:AMDS$STARTUP RESTART 
    

7.4.2 Changing the Group of a Windows Node

Note

These instructions apply to versions prior to Version 2.0-1.

You need to edit the Registry to change the group of a Windows node. To edit the Registry, follow these steps:

  1. Click the Windows Start button. On the menu displayed, first select Programs, then Accessories, and then Command Prompt.
  2. Type REGEDIT after the angle prompt (>).
    The system displays a screen for the Registry Editor, with a list of entries under My Computer.
  3. On the list displayed, expand th HKEY_LOCAL_MACHINE entry.
  4. Double-click SYSTEM.
  5. Click CurrentControlSet.
  6. Click Services.
  7. Click damdrvr.
  8. Click Parameters.
  9. Double-click Group Name. Then type a new group name of 15 alphanumeric characters or fewer, and click OK to make the change.
  10. On the Control Panel, select Services, and then select Stop for "PerfServ."
  11. Again on the Control Panel, select Devices, and then select Stop for "damdrvr."
  12. First restart damdrvr under "Devices," and then restart PerfServ under "Services."
    This step completes the change of groups for this node.

7.5 Customizing OpenVMS Data Collection

Note

Before you start this section, be sure to read the explanation of data collection, events, thresholds, and occurrences in Chapter 1. Also, be sure you understand background and foreground data collection.

When you choose the Customize OpenVMS menu option in the System Overview window (see Figure 7-2), by default the Data Analyzer displays the OpenVMS Data Collection Customization page (Figure 7-9) where you can select types of data you want to collect for all of the OpenVMS nodes you are currently monitoring. You can also change the default Data Analyzer intervals at which data is collected or updated.

Figure 7-9 OpenVMS Data Collection Customization


Table 7-3 identifies the page on which each type of data collected and displayed in Figure 7-9 appears and indicates whether or not background data collection is turned on for that type of data collection. See Chapter 1 for information about background data collection. (You can also customize data collection at the group and node levels, as explained in Section 7.1.)

Note

When you select a type of data collection, an icon appears on the "Use default values" button indicating the previous (higher) level of customization where customizations might have been made. Pressing the "Use default values" button followed by the "Apply" button causes any customizations made at the current level to be discarded and the values from the previous collection to be used.

You can select more than one collection choice using the Shift and/or Ctrl keys. In this case, none of the icons appear on the "Use default values" button. Pressing the "Use default values" button causes each selected collection choice to be reset to the value at its own previous level of customization.

Table 7-3 Data Collection Choices
Data Collected Background Data Collection Default Page Where Data Is Displayed
Cluster summary No Cluster Summary page
CPU mode No CPU Modes Summary page
CPU summary No CPU Process States page
Disk status No Disk Status Summary page
Disk volume No Disk Volume Summary page
I/O data No I/O Summary page
Lock contention No Lock Contention page
Memory No Memory Summary page
Node summary Yes Node pane, Node Summary page, and the top pane of the CPU, Memory, and I/O pages
Page/Swap file No I/O Page Faults page
Single disk Yes 1 Single Disk Summary page
Single process Yes 2 Data collection for the Process Information page

1Data is collected by default when you open a Single Disk Summary page.
2Data is collected by default when you open a Single Process page.

You can choose additional types of background data collection by selecting the Collect check box for each one on the Data Collection Customization page of the Customize OpenVMS... menu (Figure 7-6). A check mark indicates that data is to be collected at the intervals described in Table 7-4.

Note

For accurate evaluation of events that require cluster-wide data collection (lock contention, disk status and volume), it is recommended that cluster-wide data collections be collected with background data collection at the OpenVMS Group level. This is described in Section 7.3.