HP OpenVMS Systems Documentation
Guide to OpenVMS File Applications
10.3 Optimizing and Redesigning File Characteristics
To maintain your files properly, you must occasionally tune them. Tuning involves adjusting and readjusting the characteristics of the file, generally to make the file run faster or more efficiently, and then reorganizing the file to reflect those changes.
There are two ways to tune files. You can redesign your FDL file to change file characteristics or parameters. You can change these characteristics either interactively with the Edit/FDL utility (the preferred method) or by using a text editor. With the redesigned FDL file, then, you can create a new data file.
You can also optimize your data file by using ANALYZE/RMS_FILE with the /FDL qualifier. This method, rather than actually redesigning your FDL file, produces an FDL file containing certain statistics about the file's use that you can then use to tune your existing data file.
Figure 10-8 shows how to use the RMS utilities to perform the tuning cycle.
Figure 10-8 RMS Tuning Cycle
There are many ways to redesign an FDL file. If you want to make small changes, you can use the ADD, DELETE, and MODIFY commands at the main menu (main editor) level.
However, if you want to make substantial changes to an FDL file, you should invoke the Touch-up script. Because sequential and relative files are simple in design, the Touch-up script works only with FDL files that describe indexed files. If you want to redesign sequential and relative files, you can use the command listed above (ADD, DELETE, or MODIFY), or you can go through the design phase again, using the scripts for those organizations.
To completely redesign an existing FDL file that describes an indexed sequential file, use the following command syntax:
To improve the performance of a data file, use a 3-step procedure that includes analysis, FDL optimization, and conversion of the file. If used periodically during the life of a data file, this procedure yields a file that performs optimally.
For the analysis, use the ANALYZE/RMS_FILE/FDL command to create an output file (analysis-fdl-file) that reflects the current state of the data file. The command syntax for creating the analysis-fdl-file follows:
The output file analysis-fdl-file contains all of the information and statistics about the data file, including create-time attributes and information that reflects changes made to the structure and contents of the data file over its life.
For FDL optimization, use the Edit/FDL utility to produce an optimized output file (optimized-fdl-file). You can do this by modifying either the orginal FDL file (original-fdl-file) if available, or the FDL output of the file analysis analysis-fdl-file.
Modification of an FDL file can be performed either interactively using a terminal dialogue or noninteractively by allowing the Edit/FDL utility to calculate optimal values based on analysis information.
To optimize the file interactively using an OPTIMIZE script, use a command with the following format:
Conversion is the process of applying the optimized FDL file to the original data file. You use the Convert utility to do this using a command with the following syntax:
If your file has been used for some time or if it is extremely volatile, the numerous deletions and insertions of records may have caused the optimal design of the file to deteriorate. For example, numerous extensions will degrade performance by causing window-turn operations. In indexed files, deletions can cause empty but unusable buckets to accumulate.
If additions or insertions to a file cause too many extensions, the file's performance will also deteriorate. To improve performance, you could increase the file's window size, but this uses an expensive system resource and at some point may itself hurt performance. A better method is to make the file contiguous again.
This section presents techniques for cleaning up your files. These
techniques include using the Copy utility, the Convert utility, and the
To use the COPY command with the /CONTIGUOUS qualifier, use the following command syntax:
If you do not want to rename the file, use the same name for input-filespec and output-filespec.
By default, if the input file is contiguous, COPY likewise tries to create a contiguous output file. By using the /CONTIGUOUS qualifier, you ensure that the output file is copied to consecutive physical disk blocks.
The /CONTIGUOUS qualifier can only be used when you copy disk files; it
does not apply to tape files. For more information, see the COPY
command in the OpenVMS DCL Dictionary.
To use the Convert utility to make a file contiguous, use the following command syntax:
If you do not want to rename the file, use the same name for
input-filespec and output-filespec.
If you delete a number of records from a Prolog 3 indexed file, it is possible that you deleted all of the data entries in a particular bucket. RMS generally cannot use such empty buckets to write new records.
With Prolog 3 indexed files, you can reclaim such buckets by using the Convert/Reclaim utility. This utility allows you to reclaim the buckets without incurring the overhead of reorganizing the file with CONVERT.
As the data buckets are reclaimed, the pointers to them in the index buckets are deleted. If as a result any of the index buckets become empty, they too are reclaimed.
To use CONVERT/RECLAIM, use the following command syntax, in which filespec specifies a Prolog 3 indexed file:
Please note that the file cannot be open for shared access at the time
that you give the CONVERT/RECLAIM command.
Using the Convert utility is the easiest way to reorganize a file. In addition, CONVERT cleans up split buckets in indexed files. Also, because the file is completely reorganized, buckets in which all the records were deleted will disappear. (Note that this is not the same as bucket reclamation. With CONVERT, the file becomes a new file and records receive new RFAs.)
To use the Convert utility to reorganize a file, use the following command syntax:
If you do not want to rename the file, use the same name for
input-filespec and output-filespec.
Another part of maintaining files is making sure that you protect the data in them. You should keep duplicates of your files in another place in case something happens to the originals. In other words, you need to back up your files. Then, if something does happen to your original data, you can restore the duplicate files.
The Backup utility (BACKUP) allows you to create backup copies of files and directories, and to restore them as well. These backup copies are called save sets, and they can reside on either disk or magnetic tape. Save sets are also written in BACKUP format; only BACKUP can interpret the data.
Unlike the DCL command COPY, which makes new copies of files (updating the revision dates and assigning protection from the defaults that apply), BACKUP makes copies that are identical in all respects to the originals, including dates and protection.
To use the Backup utility to create a save set of your file, use the following command syntax:
You have to use the /SAVE_SET qualifier only if the output file will be backed up to disk. You can omit the qualifier for magnetic tape.
For more information about BACKUP, see the description of the Backup utility in the OpenVMS System Management Utilities Reference Manual.
If you specify a separate bucket size for the Level 1 index, it should match the bucket size assigned to the rest of the index.
The bucket size is always a multiple of disk cluster size. The
ANALYZE/RMS_FILE primary attribute ANALYSIS_OF_KEY now has a new
secondary attribute called LEVEL1_RECORD_COUNT that represents the
index level immediately above the data. It makes the tuning algorithm
more accurate when duplicate key values are specified.
A.4 Global Buffers
The global buffer count is the number of I/O buffers that two or more
processes can access. This algorithm tries to cache or "map"
the whole Key 0 index (at least up to a point) into memory for quicker
and more efficient access.
A.5 Index Depth
The indexed design routines simulate the loading of data buckets with records based on your data regarding key sizes, key positions, record sizes (mean and maximum), compression values, load method, and fill factors.
When the Edit/FDL utility finds the number of required data buckets, it can determine the actual number of index records in the next level up (each of which points to a data bucket). The process is repeated until all the required index records for a level can fit in one bucket, the root bucket. When a file exceeds 32 levels, the Edit/FDL utility issues an error message.
With a line_plot, the design calculations are performed up to 63 times---once for each legal bucket size. With a surface_plot, each line of the plot is equivalent to a line_plot with a different value for the variable on the Y-axis.
This glossary defines terms used in this manual.
accessor: A process that accesses a file or a record
stream that accesses a record.
alternate key: An optional key within the data records
in an indexed file; used by RMS to build an alternate index. See also
key (indexed file) and primary key.
area: An RMS-maintained region of an indexed file. It
allows you to specify placement or specific bucket sizes, or both, for
particular portions of a file. An area consists of any number of
buckets, and there may be from 1 to 255 areas in a file.
asynchronous record operation: An operation in which
your program may possibly regain control before the completion of a
record retrieval or storage request. Completion ASTs and the Wait
service are the mechanisms provided by RMS for programs to synchronize
with asynchronous record operations. See also synchronous record
bits per inch: The recording density of a magnetic
tape. Indicates how many characters can fit on one inch of the
recording surface. See also density.
block: The smallest number of consecutive bytes that
RMS transfers during read and write operations. A block is 512 8-bit
bytes on a Files--11 On-Disk Structure disk; on magnetic tape, a block
may be anywhere from 8 to 8192 bytes.
block I/O: The set of RMS procedures that allows you
direct access to the blocks of a file regardless of file organization.
block spanning: In a sequential file, the option for
records to cross block boundaries.
bootstrap block: A block in the index file of a system
disk. Can contain a program that loads the operating system into memory.
bucket: A storage structure, consisting of 1 to 32
blocks, used for building and processing relative and indexed files. A
bucket contains one or more records or record cells. Buckets are the
units of contiguous transfer between RMS buffers and the disk.
bucket split: The result of inserting records into a
full bucket. To minimize bucket splits, RMS attempts to keep half of
the records in the original bucket and transfer the remaining records
to a newly created bucket.
buffer: A memory area used to temporarily store data.
Buffers are generally categorized as being either user buffers or I/O
cluster: The basic unit of space allocation on a
Files--11 On-Disk Structure volume. Consists of one or more contiguous
blocks, with the number being specified when the volume is initialized.
contiguous area: A group of physically adjacent blocks.
count field: A 2-byte field prefixed to a
variable-length record that specifies the number of data bytes in the
record. This field may be formatted in either LSB or MSB format.
cylinder: The tracks at the same radius on all
recording surfaces of a disk.
density: The number of bits per inch (bpi) of magnetic
tape. Typical values are 800 bpi and 1600 bpi. See also bits per
directory: A file used to locate files on a volume. A
directory file contains a list of files and their unique internal
directory tree: The subdirectories created beneath a
directory and the subdirectories within the subdirectories (and so
disk: See volume (disk).
extent: One or more adjacent clusters allocated to a
file or to a portion of a file.
FDL: See File Definition Language.
file: An organized collection of related items
(records) maintained in an accessible storage area, such as disk or
File Definition Language: A special-purpose language
used to write file creation and run-time specifications for data files.
These specifications are written in text files called FDL files; they
are then used by the RMS utilities and library routines to create the
actual data files.
file header: A block in the index file describing a
file on a Files--11 On-Disk Structure disk, including the location of
the file's extents. There is at least one file header for every file on
file organization: The physical arrangement of data in
the file. You select the specific organization from those offered by
RMS, based on your individual needs for efficient data storage and
retrieval. See also indexed file organization, relative
file organization, and sequential file organization.
Files--11 On-Disk Structure: The standard physical
disk structure used by RMS.
fixed-length control field: A fixed-size area,
prefixed to a VFC record, containing additional information that can be
processed separately and that may have no direct relationship to the
other contents of the record. For example, the fixed-length control
field might contain line sequence numbers for use in editing operations.
fixed-length record format: Property of a file in
which all records are the same length. This format provides simplicity
in determining the exact location of a record in the file and
eliminates the need to prefix a record size field to each record.
global buffer: A buffer that many processes share.
home block: A block in the index file, normally next
to the bootstrap block, that identifies the volume as a Files--11
On-Disk Structure volume and provides specific information about the
volume, such as volume label and protection.
index: The structure that allows retrieval of records
in an indexed file by key value. See also key (indexed file).
index file: A file on each Files--11 On-Disk Structure
volume that provides the means for identification and initial access to
the volume. Contains the access information for all files (including
itself) on the volume: bootstrap block, home block, file headers.
indexed file organization: A file organization that
allows random retrieval of records by key value and sequential
retrieval of records in sorted order by key value. See also key
interrecord gap (IRG): An interval of blank space
between data records on the recording surface of a magnetic tape. The
IRG enables the tape unit to decelerate, stop if necessary, and
accelerate between record operations.
I/O buffer: A buffer used for performing input/output
IRG: See interrecord gap.
key (indexed file): A character string, a packed
decimal number, a 2- or 4-byte unsigned binary number, or a 2- or
4-byte signed integer within each data record in an indexed file. You
define the length and location within the records; RMS uses the key to
build an index. See also primary key, alternate key,
and random access by key value.
key (relative file): The relative record number of
each data record cell in a data file; RMS uses the relative record
numbers to identify and access data records in a relative file in
random access mode. See also relative record number.
local buffer: A buffer that is dedicated to one
locate mode: Technique used for a record input
operation in which the data records are not copied from the I/O buffer,
but a pointer is returned to the record in the I/O buffer. See also
move mode: Technique used for a record transfer in
which the data records are copied between the I/O buffer and your
program buffer for calculations or operations on the record. See also
multiblock: An I/O unit that includes up to 127
blocks. Use is restricted to sequential files.
multiple-extent file: A disk file having two or more
native mode: The processor's primary execution mode in
which the programmed instructions are interpreted as byte-aligned,
variable-length instructions that operate on the following data types:
byte, word, longword, and quadword integers; floating and double
floating character strings; packed decimals; and variable-length bit
fields. The other instruction execution mode is compatibility mode.
OpenVMS RMS: See RMS (Record Management
primary key: The mandatory key within the data records
of an indexed file; used to determine the placement of records within
the file and to build the primary index. See also key (indexed
file) and alternate key.
random access by key (indexed file): Retrieval of a
data record in an indexed file by either a primary or alternate key
within the data record. See also key (indexed file).
random access by key (relative file): Retrieval of a
data record in a relative file by the relative record number of the
record. See also key (relative files).
random access by record file address (RFA): Retrieval
of a record by the record's unique address, which RMS returns to you.
This record access mode is the only means of randomly accessing a
sequential file containing variable-length records.
random access by relative record number: Retrieval of
a record by its relative record number. For relative files and
sequential files (on disk devices) that contain fixed-length records,
random access by relative record number is synonymous with random
access by key. See also random access by key (relative files
only) and relative record number.