 |
The Question is:
We are running OpenVMS V7.1 on a cluster with two Alpha systems (BERT and
ERNIE). We have in-house programs that we INSTALL so they are shared.
Modified executables are created almost daily, often multiple times a day and
INSTALLed (/OPEN/HEADER/SHARED)
using REPLACE. Occasionally, the INSTALL fails on ERNIE due to insufficient
global pages/sections. This happens when previous versions of the executables
appear open and users are accessing them because they have not yet exited and
restarted with the n
ew executable. By restarting users, we are eventually able to INSTALL the new
executable. While investigating the issue, we encountered the same scenario
stated in Wizard Entry #6781. However, the solution indicated in the answer
to #6781 did not corre
ct the problem, only a reboot of ERNIE removed the extra files in the $SHOW
DEVICE/FILES/SYSTEM.
For example,
Executables in URS directory:
AUMENU.EXE;2812
AUMENU.EXE;2811
AUMENU.EXE;2810
AUMENU.EXE;2809
URMENU.EXE;2538
URMENU.EXE;2537
URMENU.EXE;2536
ERNIE
$ INSTALL LIST
URDATA4:<UNIRAD>.EXE
AUMENU;2810 Open Hdr Shar
(could not get ;2812 installed)
URMENU;2538 Open Hdr Shar
BERT
$INSTALL LIST
URDATA:<UNIRAD>.EXE
AUMENU;2812 Open Hdr Shar
URMENU;2538 Open Hdr Shar
ERNIE
$ SHOW DEV/FILES/SYS URS
PID=00000000
AUMENU;2750
AUMENU;2764
AUMENU;2766
AUMENU;2775
AUMENU;2795
AUMENU;2797
AUMENU;2797
AUMENU;2797
AUMENU;2797
AUMENU;2797
AUMENU;2797
AUMENU;2797
AUMENU;2797
AUMENU;2798
AUMENU;2802
AUMENU;2803
AUMENU;2805
AUMENU;2805
AUMENU;2806
AUMENU;2806
AUMENU;2806
AUMENU;2806
AUMENU;2806
AUMENU;2806
AUMENU;2806
AUMENU;2806
AUMENU;2806
AUMENU;2806
AUMENU;2809
AUMENU;2810
AUMENU;2811
URMENU;2488
URMENU;2494
URMENU;2503
URMENU;2528
URMENU;2529
URMENU;2534
URMENU;2535
URMENU;2535
URMENU;2536
URMENU;2537
URMENU;2538
BERT
$ SHOW DEV/FILES/SYS URS
PID=00000000
AUMENU;2809
AUMENU;2811
AUMENU;2812
URMENU;2537
URMENU;2538
ERNIE
Open images via F$GETJPI
AUMENU;2809
AUMENU;2811
AUMENU;2812
URMENU;2537
URMENU;2538
BERT
Open images via F$GETJPI
AUMENU;2809
AUMENU;2811
AUMENU;2812
URMENU;2537
URMENUl2538
Using the solution to #6781, on ERNIE I attempted to do an INSTALL DELETE of
the obsolete/old versions of AUMENU and received the error Known File Entry
not found
After rebooting ERNIE, only the latest versions of AUMENU and URMENU were shown
with
$ INSTALL LIST, $ SHOW DEV/FILES/SYS URS and F$GETJPI. After our nightly
backup runs which log off all users first, BERT was the same with the latest
versions.
Since the reboot of ERNIE, we have started a similar pattern:
This is for AUMENU.EXE on node ERNIE only.
URS:*.EXEs are purged to three versions each night.
This tables indicates the # of versions via a $show dev/files/sys urs as well
as if a user is running as shown with f$getjpi
Version#: ;2817 ;2818 ;2820 ;2821 ;2822 ;2823 ;2824 ;2825
Created: 20sep 01oct 04oct 05oct 8:42 05oct 10:45 05oct
12:35 06oct 10:24 06oct 11:08
01oct 1 2 - - - - - -
02oct
$show before stop/id 1 2 - - - - - -
f$getjpi No yes - - - - - -
$ show after stop/id 0 2 - - - - - -
04oct
$show before stop/id - 2 - - - - - -
f$getjpi - yes - - - - - -
$ show after stop/id - 2 - - - - - -
05oct
$show before stop/id - 2 1 - - - - -
f$getjpi - yes yes - - - - -
$ show after stop/id - 1 1 - - - - -
06oct
$show before stop/id - 1 1 0 1 3 - -
f$getjpi - no no yes no yes - -
$ show after stop/id - 1 0 0 1 3 - -
07oct
$show before stop/id - 1 - - - 2 - 1
f$getjpi - no - - - yes yes yes
$ show after stop/id - 1 - - - 2 - 1
10/13/04 16:00
Executables in URS directory latest version# is INSTALLed:
AUMENU.EXE;2832
AUMENU.EXE;2831
AUMENU.EXE;2830
URMENU.EXE;2572
URMENU.EXE;2571
URMENU.EXE;2570
ERNIE
SHOW/DEV/FILES/SYSTEM URS
PID=00000000
AUMENU;2818
AUMENU;2823
AUMENU
The Answer is :
As you have discovered, delete-pending global sections are not actually
deleted until all current accessors close the section. So, the only way
to ensure the section is deleted is to close the application.
Since this is clearly (very!) volatile code, consider adding a mechanism
to the application to inform active processes that the section has been
updated. Perhaps a regular timer AST that samples a system logical
name? Set the logical name to the version number of the current
image. If a particular instance sees a change, issue a message to the
user to exit, or just force an exit. Set the timer interval long
enough so it isn't a performance drain, but short enough so you get
your resource back in a reasonable time (say, 30 mins?). Better
would be a lock-based communications mechanism, indicating the
application should be restarted.
GBLSECTIONS and GBLPAGES are both relatively "cheap", so given your
usage pattern, it shouldn't be too much of a problem to increase both
parameters to take into account your worst case consumption. Adding
(say) an extra 100 GBLSECTIONS is not unreasonable. It's difficult to
suggest a value for GBLPAGES without knowing the sizes of the images,
but consider that each increase of 262144 GBLPAGES costs only 1 page of
physical memory. As long as the resources are eventually returned, all
you need to do is allocate sufficient resource to make sure you never
hit the limit. Consider it as a cost of your usage requirements, and
of your current relatively high frequency of software updates.
Do also consider what is being updated, as -- if this is data stored
in a COMMON, for instance -- approaches based on mapping and remapping
global sections, and on shared data. (For related information on
COMMONs, please see topic (2486).) This avoids the restart entirely,
and the application can be updated to map the data as required, or
could be modified to update the data directly and avoid the restart
and the remapping and the reinstallation entirely.
Topic (6781) is specific to specification of the version number on
an installed image, and that discussion has no particular relationship
to this delete-pending environment.
 |
|
|
 |
|