HP OpenVMS Systems Documentation
OpenVMS Alpha Guide to Upgrading Privileged-Code Applications
This approach is easily generalized to more buffers and IRPEs. The only thing omitted from this example is the code that allocates and links together the IRPEs. The following example shows the associated error callback routine in its entirety; it can handle an arbitrary number of IRPEs.
2.2.2 Impact of MMG_STD$IOLOCK, MMG_STD$UNLOCK Changes
The interface changes to the MMG_STD$IOLOCK and MMG_STD$UNLOCK routines
are described in Appendix B. The general approach to these changes is
to use the corresponding replacement routines and the new DIOBM
OpenVMS device drivers that perform data transfers using direct I/O functions do so by locking the buffer into memory while still in process context, that is, in a driver FDT routine. The PTE address of the first page that maps the buffer is obtained and the byte offset within the page to the start of the buffer is computed. These values are saved in the IRP ( irp$l_svapte and irp$l_boff ). The rest of the driver then uses values in the irp$l_svapte and irp$l_boff cells and the byte count in irp$l_bcnt in order to perform the transfer. Eventually when the transfer has completed and the request returns to process context for I/O postprocessing, the buffer is unlocked using the irp$l_svapte value and not the original process buffer address.
To support 64-bit addresses on a direct I/O function, one only needs to ensure the proper handling of the buffer address within the FDT routine.
Almost all device drivers that perform data transfers via a direct I/O function use OpenVMS-supplied FDT support routines to lock the buffer into memory. Because these routines obtain the buffer address either indirectly from the IRP or directly from a parameter that is passed by value, the interfaces for these routines can easily be enhanced to support 64-bit wide addresses.
However, various OpenVMS Alpha memory management infrastructure changes made to support 64-bit addressing have a potentially major impact on the use of the 32-bit irp$l_svapte cell by device drivers prior to OpenVMS Alpha Version 7.0. In general, there are two problems:
In most cases, both of these PTE access problems are solved by copying the PTEs that map the buffer into nonpaged pool and setting irp$l_svapte to point to the copies. This copy is done immediately after the buffer has been successfully locked. A copy of the PTE values is acceptable because device drivers only read the PTE values and are not allowed to modify them. These PTE copies are held in a new nonpaged pool data structure, the Direct I/O Buffer Map (DIOBM) structure. A standard DIOBM structure (also known as a fixed-size primary DIOBM) contains enough room for a vector of 9 (DIOBM$K_PTECNT_FIX) PTE values. This is sufficient for a buffer size up to 64K bytes on a system with 8 KB pages.1 It is expected that most I/O requests are handled by this mechanism and that the overhead to copy a small number of PTEs is acceptable, especially given that these PTEs have been recently accessed to lock the pages.
The standard IRP contains an embedded fixed-size DIOBM structure. When the PTEs that map a buffer fit into the embedded DIOBM, the irp$l_svapte cell is set to point to the start of the PTE copy vector within the embedded DIOBM structure in that IRP.
If the buffer requires more than 9 PTEs, then a separate "secondary" DIOBM structure that is variably-sized is allocated to hold the PTE copies. If such a secondary DIOBM structure is needed, it is pointed to by the original, or "primary" DIOBM structure. The secondary DIOBM structure is deallocated during I/O postprocessing when the buffer pages are unlocked. In this case, the irp$l_svapte cell is set to point into the PTE vector in the secondary DIOBM structure. The secondary DIOBM requires only 8 bytes of nonpaged pool for each page in the buffer. The allocation of the secondary DIOBM structure is not charged against the process BYTLM quota, but it is controlled by the process direct I/O limit (DIOLM). This is the same approach used for other internal data structures that are required to support the I/O, including the kernel process block, kernel process stack, and the IRP itself.
However, as the size of the buffer increases, the run-time overhead to copy the PTEs into the DIOBM becomes noticeable. At some point it becomes less expensive to create a temporary window in S0/S1 space to the process PTEs that map the buffer. The PTE window method has a fixed cost, but the cost is relatively high because it requires PTE allocation and TB invalidates. For this reason, the PTE window method is not used for moderately sized buffers.
The transition point from the PTE copy method with a secondary DIOBM to the PTE window method is determined by a new system data cell, ioc$gl_diobm_ptecnt_max , which contains the maximum desireable PTE count for a secondary DIOBM. The PTE window method will be used if the buffer is mapped by more than ioc$gl_diobm_ptecnt_max PTEs.
When a PTE window is used, irp$l_svapte is set to the S0/S1 virtual address in the allocated PTE window that points to the first PTE that maps the buffer. This S0/S1 address is computed by taking the S0/S1 address that is mapped by the first PTE allocated for the window and adding the byte offset within page of the first buffer PTE address in page table space. A PTE window created this way is removed during I/O postprocessing.
The PTE window method is also used if the attempt to allocate the required secondary DIOBM structure fails due to insufficient contiguous nonpaged pool. With an 8 Kb page size, the PTE window requires a set of contiguous system page table entries equal to the number of 8 Mb regions in the buffer plus 1. Failure to create a PTE window as a result of insufficient SPTEs is unusual. However, in the unlikely event of such a failure, if the process has not disabled resource wait mode, the $QIO request is be backed out and the requesting process is put into a resource wait state for nonpaged pool (RSN$_NPDYNMEM). When the process is resumed, the I/O request is retried. If the process has disabled resource wait mode, a failure to allocate the PTE window results in the failure of the I/O request.
When the PTE window method is used, the level-3 process page table pages that contain the PTEs that map the user buffer are locked into memory as well. However, these level-3 page table pages are not locked when the PTEs are copied into nonpaged pool or when the SPT window is used.
The new IOC_STD$FILL_DIOBM routine is used to set irp$l_svapte by one of the previously described three methods. The OpenVMS-supplied FDT support routines EXE_STD$MODIFYLOCK, EXE_STD$READLOCK, and EXE_STD$WRITELOCK use the IOC_STD$FILL_DIOBM routine in the following way:
Device drivers that call MMG_STD$IOLOCK directly will need to examine
their use of the returned values and might need to call the
Prior to OpenVMS Alpha Version 7.0, the MMG_STD$SVAPTECHK and MMG$SVAPTECHK routines compute a 32-bit svapte for either a process or system space address. As of OpenVMS Alpha Version 7.0, these routines are restricted to an S0/S1 system space address and no longer accept an address in P0/P1 space. The MMG_STD$SVAPTECHK and MMG$SVAPTECHK routines declare a bugcheck for an input address in P0/P1 space. These routines return a 32-bit system virtual address through the SPT window for an input address in S0/S1 space.
The MMG_STD$SVAPTECHK and MMG$SVAPTECHK routines are used by a number of OpenVMS Alpha device drivers and privileged components. In most instances, no source changes are required because the input address is in nonpaged pool.
The 64-bit process-private virtual address of the level 3 PTE that maps a P0/P1 virtual address can be obtained using the new PTE_VA macro. Unfortunately, this macro is not a general solution because it does not address the cross-process PTE access problem. Therefore, the necessary source changes depend on the manner in which the svapte output from MMG_STD$SVAPTECHK is used.
The INIT_CRAM routine uses the MMG$SVAPTECHK routine in its computation of the physical address of the hardware I/O mailbox structure within a CRAM that is in P0/P1 space. If you need to obtain a physical address, use the new IOC_STD$VA_TO_PA routine.
If you call MMG$SVAPTECHK and IOC$SVAPTE_TO_PA, use the new IOC_STD$VA_TO_PA routine instead.
The PTE address in
must be expressible using 32 bits and must be valid regardless of
process context. Fortunately, the caller's address is within the buffer
that was locked down earlier in the CONV_TO_DIO routine via a call to
EXE_STD$WRITELOCK and the EXE_STD$WRITELOCK routine derived a value for
cell using the DIOBM in the IRP. Therefore, instead of calling the
MMG$SVAPTECHK routine, the BUILD_DCB routine has been changed to call
the new routine EXE_STD$SVAPTE_IN_BUF, which computes a value for the
cell based on the caller's address, the original buffer address in the
cell, and the address in the
There are changes to the use of the PFN database entry cells containing the page reference count and back link pointer.
All source code references to the irp$l_ast , irp$l_astprm , and irp$l_iosb cells have been changed. These IRP cells were removed and replaced by new cells.
For more information, see Appendix A.
This section describes OpenVMS Alpha Version 7.0 changes to the memory management subsystem that might affect privileged-code applications.
For complete information about OpenVMS Alpha support for 64-bit
addresses, see the OpenVMS Alpha Guide to 64-Bit Addressing and VLM Features.3.
The process page tables no longer reside in the balance slot. Each process references its own page tables within page table space using 64-bit pointers.
Three macros (located in VMS_MACROS.H in SYS$LIBRARY:SYS$LIB_C.TLB) are available to obtain the address of a PTE in Page Table Space:
Two macros (located in SYS$LIBRARY:LIB.MLB) are available to map and unmap a PTE in another process's page table space:
Note that use of MAP_PTE and UNMAP_PTE requires the caller to hold the
MMG spinlock across the use of these macros and that UNMAP_PTE must be
invoked before another MAP_PTE can be issued.
As of OpenVMS Alpha Version 7.0, the global and process section table indexes, stored primarily in the PHD and PTEs, have different meanings. The section table index is now a positive index into the array of section table entries found in the process section table or in the global section table. The first section table index in both tables is now 1.
To obtain the address of a given section table entry, do the following:
2.3.3 Location of Process and System Working Set Lists
The base address of the working set list can no longer be found within the process PHD or the system PHD. To obtain the address of the process working set list, use the 64-bit data cell CTL$GQ_WSL To obtain the address of the system working set list, use the 64-bit data cell MMG$GQ_SYSWSL.
Note that pointers to working set list entries must be 64-bit addresses
in order to be compatible with future versions of OpenVMS Alpha after
Each working set list entry (WSLE) residing in the process or the
system working set list is now 64 bits in size. Thus, a working set
list index must be interpreted as an index into an array of quadwords
instead of an array of longwords, and working set list entries must be
interpreted as 64 bits in size. Note that the layout of the low bits in
the WSLE is unchanged.
Due to the support for larger physical memory systems in OpenVMS Alpha Version 7.0, the PFN database has been moved to S2 space, which can only be accessed with 64-bit pointers. Privileged routine interfaces within OpenVMS Alpha that pass PFN database entry addresses by reference have been renamed to force compile-time or link-time errors.
Privileged code that references the PFN database must be inspected and
possibly modified to ensure that 64-bit pointers are used.
The offset PFN$L_REFCNT in the PFN database entry has been replaced with a different-sized offset that is packed together with other fields in the PFN database.
References to the field PFN$L_REFCNT should be modified to use the following macros (located in PFN_MACROS.H within SYS$LIBRARY:SYS$LIB_C.TLB).
As of OpenVMS Alpha Version 7.0, the offset PFN$L_PTE in the PFN database entry has been replaced with a new PTE backpointer mechanism. This mechanism can support page table entries that reside in 64-bit virtual address space.
References to the field PFN$L_PTE should be modified to use one of the following macros (located in PFN_MACROS.H within SYS$LIBRARY:SYS$LIB_C.TLB).
Note that pointers to PFN database entries must be 64 bits.
Prior to OpenVMS Alpha Version 7.0, the process header contained two internal arrays of information that were used to help manage the balance slot contents (specifically, page table pages) during process swapping. These two arrays, along with the working set list index (WSLX) and backing storage (BAK) arrays, no longer are required for page table pages.
The swapper process now uses the upper-level page table entries and the
working set list itself to manage the swapping of page table pages. A
smaller version of the BAK array, now used only for backing storage
information for balance slot pages, is located at the end of the fixed
portion of the process header at the offset PHD$Q_BAK_ARRAY.
The format of a free page table entry in S0/S1 space has been changed
to use index values from the base of page table space instead of the
base of S0/S1 space. The free S0/S1 PTE list also uses page table space
index values. The list header has been renamed to LDR$GQ_FREE_S0S1_PT.
In order to support larger global sections and larger numbers of global sections in OpenVMS Alpha Version 7.0, the global page table has been moved to S2 space, which can be accessed only with 64-bit pointers.
Privileged code that references entries within GPT must be inspected
and possibly modified to ensure that 64-bit pointers are used.
The format of the free GPT entry has been changed to use index values from the base of the global page table instead of using the free pool list structure. The free GPT entry format is now similar to the free S0/S1 PTE format.
Note that pointers to GPT entries must be 64 bits.
As of OpenVMS Alpha Version 7.0, each process virtual addressing region is described by a region descriptor entry (RDE). The program region (P0) and control region (P1) have region descriptor entries that contain attributes of the region and describe the current state of the region. The program region RDE is located at offset PHD$Q_P0_RDE within the process's PHD. The control region RDE is located at offset PHD$Q_P1_RDE, also within the process's PHD.
Many internal OpenVMS Alpha memory management routines accept a pointer to the region's RDE associated with the virtual address also passed to the routine.
The following two functions (located in MMG_FUNCTIONS.H in SYS$LIBRARY:SYS$LIB_C.TLB) are available to obtain the address of an RDE:
This section describes the OpenVMS Alpha kernel threads features that
might require changes to privileged-code applications.
The CPU$L_CURKTB field in the CPU databases contains the current kernel
thread executing on that CPU. If kernel-mode codes own the SCHED
spinlock, then the current KTB address can be obtained from this field.
Before kernel threads implementation, this was the current PCB address.
No changes are necessary to kernel-mode code that locks mutexes. All of SCH$LOCK*, SCH$UNLOCK*, and SCH$IOLOCK* routines determine the correct kernel thread if it must wait because the mutex is already owned.