Ask the Wizard Questions
Modified Page Writer tuning
The Question is:
I have a serious problem at our V5.2-2 VAX/VMS site. In order to try
to improve system performance, I increased the MPW parameters. The
changes did not prove to be beneficial. So, I tried to restore them
to their original values but I have not been abl e to. To make
matters worse, VMS seems to have further augmented them.
We have a VAX 3100/90, equipped with 128 MB memory and 5 RZ28 and 1
RZ25 disk drives, that is the boot node for a cluster that includes
one other machine--a VAXStation 4000/60. The 3100/90 has been
'super-charged' with a Nemonix accelerator board. (The Nemonix
board, which doubles the clock speed and increases the cache from 256
KB to 2 MB has halved our transaction time while providing and extra
The application is a student registration system that was developed
using Powerhouse from Cognos. The typical user working set is
approximately 5000 pages. The user's WSEXTENT is 5500 while the
WSDEF and WSQUOTA are set at 2048. We have 53 users during peak
operation. While memory is scarce, we have more than ample disk
capacity and we incur approximately 500 faults per second. Of this
number, 10 to 20 are hard faults. I have implemented four 100,000
block page files with each on a separate disk.
MPW_IOLIMIT is at the default of 4.
The original MPW parameters are as follows:
Original Modified Present
MPW_WAITLIMIT 10605 15900 34580
MPW_IOLIMIT 4 4 4
MPW_WRTCLUSTER 120 120 120
MPW_HILIMIT 10485 1573 31437
MPW_LOWAITLIMIT 10365 15548 15718
MPW_LOLIMIT 120 120 13099
MPW_THRESH 200 200 128
VMS also changed MMG_CTLFLAGS for 3 to 1
If I perform a conversational boot, I can see that the parameters in
VAXVMSSYS.PAR are use but when the system is actually up and running,
the values are changed. I can change the values using SYSGEN and
WRITE ACTIVE (one must be very careful doing this- -a mistake in order
of change can result in a hung system) but if I reboot or run AUTOGEN
and reboot the wrong parameters are ACTIVE while the desired ones are
CURRENT. Our page fault rate has more than doubled and our
transaction time has increased.
One solution may be to restore the system disk but I feel that this
may not help. Any ideas?
The Answer is:
Rule for MPW_IOLIMIT is 3+# pagefiles + #swapfiles
If parameters are changing dynamically, look for tools such as Dynamic
Load Balancer or ROBOtune. These dynamically change parameters and may
not be helping you as much as just setting the parameters correctly in
the first place might.
MPW parameters look ok except someone messed with MPW_LOLIMIT which
is seldom used anymore except under pool expansion and such.
AUTOGEN usually wants to set MMG_CTLFLAGS to 3 or 255 -- never just 1.
Sounds like MODPARAMS.DAT may need to be cleaned up a bit too.
Remove all entries pertaining to MPW_* or FREELIM, FREEGOAL, BORROWLIM,
and GROWLIM, WSINC, WSDEC, PFRATL, PFRATH and re-AUTOGEN with FEEDBACK.
If WSINC is at its default, it can be increased to a higher value such
as 512 or 1024 on your system.
On PowerHouse systems, it is sometimes better to keep a few extra pages
on the free list by increasing FREEGOAL from its default of 12000 (in
your config) to something closer to 24000 to encourage more trimming.
Be careful not to force swapping by lowering BALSETCNT. Swapping in
PowerHouse systems can be very painful. It is better to use proactive
swapping and trimming rather than forcing swapping using balsetcnt.
Establish a baseline for parameters and performance then make changes
sparingly, measuring the benefit as you go. Back out changes that
exhibit no apparent benefit.
In addition to the comments around dynamic changes to the values
of SYSGEN parameters -- something that OpenVMS does not do once the
system has passed the basic parameter sanity checks -- it's distinctly
possible these systems are simply overloaded and need to be upgraded.
There is a guide to system tuning provided in the performance management
manual -- system tuning requires a systematic and system-wide approach.
If there are definite intervals of inactivity on individual processes,
I would consider the the SWPOUTPGCNT parameter, in an attempt to force
an inactive process to swap out during intervals of inactivity, rather
than incrementally trimming it back.
You have mentioned fault rates, but not CPU loading and CPU modes -- I
would look at these factors and see if a system upgrade is warranted.
I'd also make certain the I/O queues did not indicate a performance
restriction on a particular disk spindle. (All of this is covered in
the performance manual.)
Using SWPOUTPGCNT is an old approach that seldom applies unless we're
satisfying a free list starvation and the swapper has reached critical
mode. If MMG_CTLFLAGS is enabled, you will almost never swap the old
way and that's a good thing. The old way of swapping caused processes
to actually have their lists trimmed, forcing them to rejustify their
list sizes when they became active again...
Proactive swapping will begin when the list hists FREEGOAL and swaps
process at current size...
Powerhouse applications have been known to connect to RMS indexed files
multiple times often requesting 5 buffers of 4 - 10 KB each per user.
In a particular case I analyzed, each user had an average of 20 files
with 20 buffers of 4Kb each. Some 1.5Mb or 3000 pagelettes of private
pages each. The solution for that customer was to give some 20 major
files 50 - 200 global buffers each: $SET FILE/GLO=42 HOT:*.DAT
Combined with a reduction of the per-connect number of buffers from
its default (5 for many files) to an explicit 2 by issuing the
following command before entering powerhouse:
$SET RMS /IND/BUF=2
Note, this is counter intuituve tuning... normally the more buffer
the merrier, but when memory is scarce and sharing is possible...
In that case, each process ended up using some 1000 pagelettes VM
less, and touch for fewer of the pages gining much better paging.
As always, your mileage may vary. To find out whether this technique
might be of use to your application I suggest you use SDA as follows:
SDA> SET PROC "Some_powerhouse_user_process_name"
SDA> SHOW PROC/CHAN ---> see which dat files are open
SDA> SHOW PROC/RMS=BDBSUM ---> see one data line per buffer used.