HP OpenVMS Systems Documentation
OpenVMS Performance Management
4.4 Creating, Maintaining, and Interpreting MONITOR Summaries
Consider the following guidelines when using MONITOR:
See the OpenVMS System Manager's Manual, Volume 2: Tuning, Monitoring, and Complex Systems and the OpenVMS System Management Utilities Reference Manual: M--Z for information about using
4.4.2 MONITOR Modes of Operation
Use live mode to collect data on a running system and to generate one or more of the following types of MONITOR output---ASCII screen images, binary recording files, or formatted ASCII summary files.
Use live mode to display data about a remote system connected to your system with DECnet for OpenVMS.
Use playback mode to read a binary recording file and produce one or
more of the following types of MONITOR output---ASCII screen images,
binary recording files, or formatted ASCII summary files.
When MONITOR data is recorded continuously, a summary report can cover
any contiguous time segment.
The two multifile summary reports reports are not saved as files. To keep them, you must do either of the following:
4.4.5 Customizing Your Reports
The report you require for the evaluation procedure is one that covers a period that best represents the typical operation of your system. You might want, for example, to evaluate your system only during hours of peak acitvity.
To generate a summary of the appropriate time segment, edit the
MONSUM.COM command procedure and change the beginning and ending times
on one of the two MONITOR commands that produce the summary reports.
The summary reports produced by MONSUM.COM are in the multifile summary format---there is one column of averages for each node in a VMScluster, as well as some overall row statistics. For noncluster systems, the row statistics can be ignored.
If you prefer to use a report in the standard summary format (which
includes current, minimum, and maximum statistics), execute a MONITOR
playback summary command referencing the input data file of interest as
the only file in the /INPUT list. Note that a new data file is created
for each system whenever it reboots. Remember to use the /BEGINNING and
/ENDING qualifiers to select the desired time period.
You are encouraged to observe current system activity regularly by running MONITOR in live mode. In live mode, always begin an analysis with the MONITOR CLUSTER and MONITOR SYSTEM classes to obtain an overview of system performance.
Then, monitor other classes to examine components of particular interest.
4.4.8 More About Multifile Reports
In multifile reports, a page or more is devoted to each MONITOR class. Each column represents one node, and is headed by the node name and beginning and ending times of the segment requested. In most cases, time segments for all nodes will be roughly the same. Differences of a few minutes are typical, because data collection on the various nodes is not synchronized.
In some cases, one or more time segments will be shorter than others; in these cases, some of the requested data was not recorded (probably because the nodes were unavailable). Note that if data is unavailable for some period within the bounds of a request, that fact is not explicitly specified.
However, such a gap can occur only when the column of data uses more than one input file; and if multiple files contributed to the column, the number is shown in parentheses to the right of the node name. In cases where a time segment is missing, this number must be greater than 1. If no number appears, there is only one input data file for that column, and the column includes no missing time segments.
To summarize, if all beginning and ending times are not roughly the
same or if a parenthesized number appears, some data may be
unavailable, and you may want to base your evaluation on a different
time segment that includes more complete data. Whenever the multifile
report is based on incomplete data, the Row Average statistic can be
weighted unfairly in favor of one or more nodes.
While interpreting MONITOR statistics, keep in mind that the collection interval has no effect on the accuracy of MONITOR rates. It does, however, affect levels, because they represent sampled data. In other words, the smaller the collection interval, the more accurate MONITOR level statistics will be. (For more information on MONITOR rates and levels, refer to the OpenVMS System Manager's Manual, Volume 2: Tuning, Monitoring, and Complex Systems.)
Although the interval value supplied with MONITOR.COM is adequate for most purposes, it does represent a trade-off between statistical accuracy and the consumption of disk space. Thus, before you base major decisions on MONITOR level statistics, be sure to verify them by running MONITOR for a time with a much smaller collection interval while carefully observing disk space usage.
|If you observe...||Then you...|
||Can rule out memory limitations.|
||Should investigate memory limitations further. (See Chapter 7.)|
You can also determine memory limitations by using SHOW SYSTEM to
review the RW_FPG and RW_MPG parameters. If either parameter is
displayed consistently, there is a serious shortage of memory. Very
little improvement can be made by tuning the system. Compaq recommends
buying more memory.
5.2.2 I/O Limitations
I/O limitations occur when the number or speed of devices is insufficient. You will also find an I/O limitation when application design errors either place inappropriate demand on particular devices or do not employ sufficiently large blocking factors or numbers of buffers.
To determine if you may have an I/O limitation, enter the DCL command MONITOR IO or MONITOR SYSTEM and observe the rates for direct I/O and buffered I/O.
|Your system is not performing any direct I/O||Do not have a disk I/O limitation.|
|You observe that there is no buffered I/O||Do not have a terminal I/O limitation.|
|Either or both operations are occurring||Cannot rule out the possibility of an I/O limitation. (See Chapter 8.)|
The CPU can become the binding resource when the work load places extensive demand on it. Perhaps all the work becomes heavily computational, or there is some condition that gives unfair advantages to certain users.
To determine if there is a CPU limitation, use the DCL command MONITOR STATES.
You might also use the DCL command MONITOR MODES to observe the amount of user mode time. The MONITOR MODES display also reveals the amount of idle time, which is sometimes called the null time.
|Many of your processes are in the computable state||There is a CPU limitation.|
|Many of your processes are in the computable outswapped state||Be sure to address the issue of a memory limitation first. (See Section 9.2.4.)|
|The user mode time is high||It is likely there is a limitation occurring around the CPU utilization.|
|There is almost no idle time||The CPU is being heavily used.|
A final indicator of a CPU limitation that the MONITOR MODES display
provides is the amount of kernel mode time. A high percentage of time
in kernel mode can indicate excessive consumption of the CPU resource
by the operating system. This problem is more likely the result of a
memory limitation but could indicate a CPU limitation as well. If you
decide to investigate the CPU limitation further, proceed through the
steps in Chapter 9.
5.3 After the Preliminary Investigation
When you have completed your preliminary investigation, you are ready to:
Once you take the appropriate remedial action, monitor the effectiveness of the changes and, if you do not obtain sufficient improvement, try again. In some cases, you will need to repeat the same steps, but either increase or decrease the magnitude of the changes you made. In other cases, you will proceed further in the investigation and uncover some other underlying cause of the problem and take corrective steps.
The diagrams and text do not attempt to depict this looping. Rather, repetition is always implied, pending the outcome of the changes. Therefore, tuning is frequently an iterative process. The approach to tuning presented by this chapter and Chapter 10 assumes that you can uncover multiple causes of performance problems by repeating the steps shown until you achieve satisfactory performance.
Effective tuning requires that you can observe the undesirable performance behavior while you test.
You will find it especially helpful to keep a listing of the current values of all your system parameters nearby as you conduct the following investigations. Running SYSGEN and specifying a file name is one method for obtaining this listing. (See the OpenVMS System Manager's Manual, Volume 2: Tuning, Monitoring, and Complex Systems.)
$ RUN SYS$SYSTEM:SYSGEN SYSGEN> SET/OUTPUT=filename SYSGEN> SHOW/ALL SYSGEN> SHOW/SPECIAL SYSGEN> EXIT $ PRINT/DELETE filename
Overall responsiveness of a system depends largely on the
responsiveness of its CPU, memory, and disk I/O resources. If each
resource responds satisfactorily, then so will the entire system.
6.1 Understanding System Responsiveness
Each resource must operate efficiently by itself and it must also interact with other resources.
An important aspect of your evaluation is to distinguish between resources that might be performing poorly because they are overcommitted and those that might be doing so because one or both of the following conditions has occurred:
A binding resource or bottleneck is an overcommitted resource that causes the others to be blocked or burdened with overhead operations. Proper identification of such a resource is critical to correction of a performance problem. Upgrading a nonbinding resource will do nothing to improve a bottlenecked system.
Detecting bottlenecks is particularly important for analyzing interactions of the CPU with each of the other resources.
For example, CPU blockage occurs when CPU capacity, though it appears
sufficient to meet demand, cannot be used because the CPU must wait for
disk I/O to complete or memory to be allocated.
6.1.2 Balancing Resource Capacities
Because of the potential for bottlenecks, it is especially important to maintain balance among the capacities of your system's resources.
For example, when upgrading to a faster CPU, consider the effect the
additional CPU power will have on the other primary resources. Because
the faster CPU can initiate more I/O requests per unit of time, you
must ensure that the disk I/O subsystem has sufficient capacity to
handle the increased traffic.
6.2 Evaluating Responsiveness of System Resources
For each resource, key MONITOR statistics help you answer such questions as:
Two prime measures of resource responsiveness include:
For each resource, you can use MONITOR summaries to examine or estimate
one or both of these quantities.
6.3 Improving Responsiveness of System Resources
You can investigate four main ways to improve responsiveness:
Excess memory capacity is often used to reduce the demand on an overworked disk I/O subsystem by increasing the size of each I/O transfer, thereby reducing the total number of I/O operations.
The CPU benefits as well, because it needs to do less work executing system services and device driver software.
The primary means of offloading I/O to memory is the extensive use of caches (page caches, XQP caches, virtual I/O or extended file caching, RMS blocking) to reduce the number of I/O operations.