HP OpenVMS Systems Documentation
Availability Manager User's Guide
Order Number: AA-RNSJA-TE
This guide explains how to use Availability Manager software to detect and correct system availability problems.
Revision/Update Information: This is a new manual.
Data Analyzer: Windows NT 4.0, SP 3 or higher; Windows
Software Version: Availability Manager Version 1.4
Compaq Computer Corporation
© 2001 Compaq Computer Corporation
Compaq, VAX, VMS, and the Compaq logo Registered in U.S. Patent and Trademark Office.
OpenVMS is a trademark of Compaq Information Technologies Group, L.P. in the United States and other countries.
Microsoft, Windows, Windows NT, and Windows 95 are trademarks of Microsoft Corporation in the United States and other countries.
Motif, OSF/1, and UNIX are trademarks of The Open Group in the United States and other countries.
All other product names mentioned herein may be the trademarks of their respective companies.
Confidential computer software. Valid license from Compaq required for possession, use, or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license.
Compaq shall not be liable for technical or editorial errors or omissions contained herein. The information in this document is provided "as is" without warranty of any kind and is subject to change without notice. The warranties for Compaq products are set forth in the express limited warranty statements accompanying such products. Nothing herein should be construed as constituting an additional warranty.
The Compaq OpenVMS documentation set is available on CD-ROM.
This guide is intended for system managers who install and use Compaq Availability Manager software. It is assumed that the system managers who use this product are familiar with Windows terms and functions.
This guide contains the following chapters and appendixes:
The following manuals provide additional information:
For additional information about Compaq OpenVMS products and services, access the Compaq website at the following location:
Compaq welcomes your comments on this manual. Please send comments to either of the following addresses:
How to Order Additional Documentation
Use the following World Wide Web address to order additional documentation:
If you need help deciding which documentation best meets your needs, call 800-282-6672.
The following conventions are used in this guide:
|Availability||Alerts users to resource availability problems; provides capabilities to improve availability.|
|Centralized management||Provides centralized management of remote nodes within an extended local area network (LAN).|
|Intuitive interface||Provides an easy-to-learn and easy-to-use graphical user interface (GUI).|
|Correction capability||Allows real-time intervention, including adjustment of node and process parameters, even when remote nodes are hung.|
|Customization||Adjusts to site-specific requirements through a wide range of customization options.|
|Scalability||Makes it easier to monitor multiple OpenVMS nodes.|
The Data Analyzer and Data Collector nodes communicate over an extended LAN using an IEEE 802.3 Extended Packet format protocol. Once a secure connection is established, the Data Analyzer instructs the Data Collector to gather specific system and process data.
Although you can run the Data Analyzer as a member of a monitored cluster, it is typically run on a system that is not a member of the cluster being monitored. You can have more than one Data Analyzer application executing in a LAN, but only one Data Analyzer at a time should be running on each system.
Figure 1-1 shows a possible configuration of Data Analyzer and Data Collector nodes.
Figure 1-1 Availability Manager Node Configuration
In Figure 1-1, the Data Analyzer can monitor nodes A, B, and C across the network. The password on node D does not match the password of the Data Analyzer; therefore, the Data Analyzer cannot monitor node D.
For information about password security, see Section 1.4.
After installing the Availability Manager software, you can begin to request information from one or more Data Collector nodes.
Requesting and receiving information requires the Availability Manager to perform a number of steps, which are shown in Figure 1-2 and explained after the figure.
Figure 1-2 Requesting and Receiving Information
The following steps correspond to the numbers in Figure 1-2.
In step 4, the Availability Manager also checks the data for any events that
should be signaled. The following section explains in more detail how
data analysis and event detection work.
1.3 How Does the Availability Manager Identify Performance Problems?
When the Availability Manager detects problems on your system, it uses a combination of methods to bring these problems to the attention of the system manager. If no data display is open for a particular node, the Availability Manager reduces the data collection interval so that data can be analyzed more closely. Performance events are also signaled in the Events pane in the lower portion of the Application window (Figure 1-3).
The following topics are related to detecting and signaling problems:
This section explains how the Availability Manager collects and analyzes data.
It also defines terms related to data collection and analysis.
18.104.22.168 Types of Data Collection
Figure 1-4 Data Collection Page
Figure 1-5 Sample Node Summary Page
An event is a problem or potential problem associated with resource availability. Users can customize criteria for events. Events are associated with types of data collected. For example, collection of CPU data is associated with the PRCCUR, PRCMWT, and PRCPWT events. (Appendix B describes events, and Appendix C describes the events that each type of data collection can signal.)
As data is collected, the Availability Manager evaluates it and signals an event whenever the data meets the user-specified criteria. These criteria are called thresholds and occurrences and are explained in Section 22.214.171.124.
Data collection intervals, which are displayed on the Data Collection page (Figure 1-4), specify the frequency of data collection.
Table 1-1 describes each interval.
|Interval (in seconds)||Description|
|Display||How often data should be collected as a foreground activity.|
|Event||How often data should be collected as a background activity if any events have been posted for that type of data.|
|NoEvent||How often data should be collected as a background activity if no events have been posted for that type of data.|
The following list indicates how the Availability Manager determines which collection interval to use for a particular type of data:
The Availability Manager posts events when data values exceed user-defined thresholds and occurrences. Threshold and occurrence values are displayed on event customization pages similar to the one shown in Figure 1-6.
Figure 1-6 Sample Event Customization Page
The Availability Manager uses the threshold value as a criterion for posting an event. In many cases, if a condition exceeds that value, the Availability Manager displays a message in the Events pane of the Application window (see Figure 1-3). Some thresholds are used in more complex tests.
An occurrence (or trigger) for a specific event is the number of consecutive data collections that must exceed the event threshold before the Availability Manager signals the event in the Events pane of the Application window and logs it in the Event Log file.
For example, the disk status data that the Availability Manager collects includes the error count on a disk. If you select the Disk Status check box on the Data Collection page (Figure 1-4) and the error count exceeds the threshold value of 15 on the Event Customization page (Figure 1-6) for more than one data collection, an event is posted.
Chapter 6 explains how users can change default values for event thresholds and occurrences.
The Availability Manager evaluates every data collection for events. Any time a data value in a data collection exceeds a threshold, an occurrence counter is incremented. Whenever the occurrence count matches the Occurrence value on the Event Customization page (Figure 1-6), the event is signaled.
If, at any time during data collection, the data does not exceed the threshold, the occurrence counter is set to 0, and the event is removed from the Events pane. Figure 1-7 depicts this sequence.
Figure 1-7 Flow Chart of Event Testing