The Question is:
I have created an indexed file as following:
- variable length, maximium 100 bytes
- single primary key (with NO duplicate) of 19 bytes of string,
starting at beginning of the record
- the nature of primary key is a string of timestamp in format
- initially the file is empty (as to be created daily in
future production environment during day start processing)
I have used EDIT/FDL and then created and tested the file using
SYS$PUT in a toy program. To eliminate accumulative effect, an
empty file is created from scratch using the FDL file for each
The following setup is preapred for this test:
- the entire OpenVMS is solely used by one single user for
- pre-allocation is used to ensure the file is large enough
during file creation
The performance result is shown below.
- 200 records are written, 1 seconds is used, average 200 msg per sec
- 400 records are written, 2 seconds is used, average 200 msg per sec
- 800 records are written, 5 seconds is used, average 160 msg per sec
- 1600 records are written, 9 seconds is used, average 177 msg per sec
- 3200 records are written, 20 seconds is used, average 160 msg per sec
- 6400 records are written, 42 seconds is used, average 152 msg per sec
- 12800 records are written, 81 seconds is used, average 158 msg per sec
- 25600 records are written, 166 seconds is used, average 154 msg per sec
In real life, data will be received from socket at (max) 200 messages
per second. For the duration of 4 trading hours, there are 2.88 million
messages to be written. I have to ensure that the rate of SYS$PUT must at
least maintain constantly at 200 records per second.
I understand that the nature of ascending pattern of primary keys has
the impact to cause RMS to frequently rebuild index and split bucket
(please correct me if I use the terminology incorrectly).
How can I avoid the degrading performance of SYS$PUT ?
The following changes have been attempted but no improvement can be seen:
- changing data_key_compression, data_record_compression, key_compression
from YES to NO (actually I do not know whether it should be YES or NO,
- changing SYS$OPEN to reduce from SHRUPD|SHRDEL|SHRPUT|SHRGET to only
SHRGET (since it is expected another program will read records by
- enlarging bucket size (from 3 to 10, then to 30, finally to max 63)
What else can I further consider ? process quota ? defer write ?
enlarging index bucket size (how to do this?) ?
Many thanks for your time and assistance
The Answer is :
Please contact HP consulting services, as this certainly appears
to be a non-trivial application environment. This RMS performance
discussion is well beyond the assistance that can be reasonably
offered here in Ask The Wizard, as well.
A text-based time value is certainly a reasonable key, and it
will compress to about eight characters. That said, the
OpenVMS Wizard would more likely use a quadword time value as
Beware the time-change for daylight savings time -- the keys
can and often should be in UTC or similar, and thus the TDF
(Timezone Differential Factor) information is often needed.
Also consider using UTC-format time itself and the associated
system service and RTL routines, particularly if you do not
wish to run the system time in UTC.
When using EDIT/FDL, you must input the final size and not the
initial size of the file.
You will want to set the file allocation and file extension
sizes appropriately for file activity. Often an extension
size of 500 blocks is reasonable, though you will want to
investigate pre-sizing the file as appropriate.
Be sure to select a non-default number of buffers. The number
of buffers is based on the index depth, and the index depth
on the prototype file will not be realistic. You will likely
see a value around 4, but you probably want to use 10 to 20.
See RAB$B_MBC or SET RMS/INDEXED.
When measuing your performance during your testing, make sure
to insert the keys in ascending order. You should see a second
index level at 1000 and a third at 100,000 records with small
You can use the rms_tools Freeware spreadsheet as a tool to
predict the index level. See:
You will also want to consider what other caches are active,
as a controller or block cache may not be caching appropriate
data. (It is possible that you are exceeding the cache, and
causing blocks to be flushed.)
The OpenVMS Wizard will assume few or no read I/O operations.
You will want to consider the write I/O activity, using tools
such as SET FILE/STATISTICS and MONITOR RMS. Also see the
ANALYZE/SYSTEM command SHOW PROCESS/RMS=FSB, or use the
RMS_STATS tool from the Freeware area mentioned earlier.
You will want to consider if you can coallesce multiple record
operations into one I/O; you will want to consider the risks of
loss of data during a failure against the costs of the I/O.
Larger I/Os tend to prefer larger bucket sizes, while smaller
and more frequent I/Os tend to prefer smaller buckets.
If you can coallesce records, enable defered writes and
flush ever 100 ms or so (assuming 10 I/Os per second) or
flush based on the numbers of records stacked. Or let RMS
manage the buffers, writing them when they overflow.
If you can not group and thus every $put must become an
I/O write operation, WRITE IO, you will want to select smaller
buffers as RMS always writes entire buckets. (Too small,
however, and the numbers of index writes will increase.
Typical (small) bucket sizes should fit between six and
roughly thirty records, and the index buckets should be
sized for thirty to a thousand data buckets per index bucket.
As for the bucket splits, if your data is input in ascending
order as you indicate, there will be no bucket splits. The
records are written to the end, and the indices will extend.
As for your performance questions in general, you will want
to measure and explain all RMS activity. Use the available
tools. Understand all of the compontents of the I/O path,
including the block and controller caches, the interconnect
speeds, and the spiral transfer rates. For instance, run
the application and determine what RMS has buffered after
10, then 1,000, then 100,000 records. Is this what you
expected to be buffered? (use ANALYZE/SYSTEM with the
SHOW PROCESS/RMS=(RAB,BOBSUM) command.)
Key compression is normally enabled, though index compression
is normally disabled as this provides for binary searches.
Any file sharing will enable full locking.
Also consider a hardware upgrade, as faster processors and
particularly Fibre Channel I/O can dramatically improve file
system throughput. (Tuning is an on-going and expensive
process, as well.)
As fo how to enlarge index buckets, create a file with multiple
AREAS using EDIT/FDL or other tools, and create an area with a
larger bucket size for the index.