You may be able to improve Tru64 UNIX performance by optimizing your memory resources. Usually, the best way to improve performance is to eliminate or reduce paging and swapping. This can be done by increasing memory resources.
This chapter describes how to perform the following tasks:
Understand how the operating system allocates virtual memory to processes and to file system caches, and how memory is reclaimed (Section 6.1)
Configure swap space for high performance (Section 6.2)
Obtain information about memory usage (Section 6.3)
Provide more memory resources to processes (Section 6.4)
Modify paging and swapping operation (Section 6.5)
Reserve physical memory for shared memory (Section 6.6)
The operating system allocates physical memory in 8-KB units called pages. The virtual memory subsystem tracks and manages all the physical pages in the system and efficiently distributes the pages among three areas:
Static wired memory
Allocated at boot time and used for operating system data and text and for system tables, static wired memory is also used by the metadata buffer cache, which holds recently accessed UNIX File System (UFS) and CD-ROM File System (CDFS) metadata.
You can reduce the amount of static wired memory only by removing subsystems or by decreasing the size of the metadata buffer cache (see Section 6.1.2.1).
Dynamically wired memory
Dynamically wired memory is allocated at boot time and used for dynamically
allocated data structures, such as system hash tables.
User processes also
allocate dynamically wired memory for address space by using virtual memory
locking interfaces, including the
mlock
function.
The
amount of dynamically wired memory varies according to the demand.
The
vm
subsystem attribute
vm_syswiredpercent
specifies
the maximum amount of memory that a user process can wire (80 percent of physical
memory, by default).
Physical memory for processes and data caching
Physical memory that is not wired is referred to as pageable memory. It is used for processes' most-recently accessed anonymous memory (modifiable virtual address space) and file-backed memory (memory that is used for program text or shared libraries). Pageable memory is also used to cache the most-recently accessed UFS file system data for reads and writes and for page faults from mapped file regions, in addition to AdvFS metadata and file data.
The virtual memory subsystem allocates physical pages according to the
process and file system demand.
Because processes and file
systems compete for a limited amount of physical memory, the virtual memory
subsystem periodically reclaims the oldest pages by writing their contents
to swap space or disk (paging).
Under heavy loads, entire processes may be
suspended to free large amounts of memory (swapping).
You can control virtual
memory operation by tuning various
vm
subsystem attributes,
as described in this chapter.
You must understand memory operation to determine which tuning guidelines will improve performance for your workload. The following sections describe how the virtual memory subsystem:
Tracks physical pages (Section 6.1.1)
Allocates memory to file system buffer caches (Section 6.1.2)
Allocates memory to processes (Section 6.1.3)
Reclaims pages (Section 6.1.4)
The virtual memory subsystem tracks all the physical memory pages in the system. Page lists are used to identify the location and age of each page. The oldest pages are the first to be reclaimed. At any one time, each physical page can be found on one of the following lists:
Free list -- Pages that are clean and are not being used
Page reclamation begins when the size of the free list decreases to a tunable limit.
Active list -- Pages that are currently being used by processes or the UBC
To determine which active pages should be reclaimed first, the page-stealer daemon identifies the oldest pages on the active list. The oldest pages that are being used by processes are designated Inactive pages. The oldest pages that are being used by the UBC are designates UBC LRU (Unified Buffer Cache least-recently used) pages.
Use the
vmstat
command to
determine the number of pages that are on the page lists.
Remember that pages
on the active list (the
act
field in the
vmstat
output) include both inactive and UBC LRU pages.
6.1.2 File System Buffer Cache Memory Allocation
The operating system uses three caches to store file system user data and metadata. If the cached data is later reused, a disk I/O operation is avoided, which improves performance. This is because data can be retrieved from memory faster than a disk I/O operation.
The following sections describe these file system caches:
Metadata buffer cache (Section 6.1.2.1)
Unified Buffer Cache (Section 6.1.2.2)
AdvFS buffer cache (Section 6.1.2.3)
6.1.2.1 Metadata Buffer Cache Memory Allocation
At boot time, the kernel allocates wired memory for the metadata buffer cache. The cache acts as a layer between the operating system and disk by storing recently accessed UFS and CDFS metadata, which includes file header information, superblocks, inodes, indirect blocks, directory blocks, and cylinder group summaries. Performance is improved if the data is later reused and a disk operation is avoided.
The metadata buffer cache uses
bcopy
routines to
move data in and out of memory.
Memory in the metadata buffer cache is not
subject to page reclamation.
The size of the metadata buffer cache is specified by the value of the
vfs
subsystem attribute
bufcache
.
See
Section 6.4.6
and
Section 9.4.3.1
for tuning information.
6.1.2.2 Unified Buffer Cache Memory Allocation
The physical memory that is not wired is available to processes and to the Unified Buffer Cache (UBC), which compete for this memory.
The UBC functions as a layer between the operating system and disk by storing recently accessed UFS file system data for reads and writes from conventional file activity and holding page faults from mapped file sections. File system performance is improved if the cached data is later reused and a disk I/O operation is avoided.
In addition, AdvFS wires UBC pages for its metadata and file data, although the AdvFS buffer cache actively manages this data. See Section 6.1.2.3 for information about the AdvFS buffer cache.
Figure 6-1
shows how the virtual memory subsystem allocates
physical memory to the UBC and for processes.
Figure 6-1: UBC Memory Allocation
The amount of memory that the UBC can utilize is determined by three
vm
subsystem attributes:
Specifies the minimum percentage of virtual memory that only the UBC can utilize. The remaining memory is shared with processes. The default is 10 percent.
Specifies the maximum percentage of virtual memory that the UBC can utilize. The default is 100 percent.
ubc_borrowpercent
attribute
Specifies the UBC borrowing threshold.
The default is 20 percent.
Between
the value of the
ubc_borrowpercent
attribute and the value
of the
ubc_maxpercent
attribute, the memory that is allocated
to the UBC is considered borrowed from processes.
When paging begins, these
borrowed pages are reclaimed, until the amount of memory allocated to the
UBC decreases to the value of the
ubc_borrowpercent
attribute.
At any one time, the amount of memory allocated to the UBC and to processes
depends on file system and process demands.
For example, if file system activity
is heavy and process demand is low, most of the pages will be allocated to
the UBC, as shown in
Figure 6-2.
Figure 6-2: Memory Allocation During High File System Activity and No Paging Activity
In contrast, heavy process activity, such as large increases in the
working sets for large executables, will cause the virtual memory subsystem
to reclaim UBC borrowed pages, down to the value of the
ubc_borrowpercent
attribute, as shown in
Figure 6-3.
Figure 6-3: Memory Allocation During Low File System Activity and High Paging Activity
The UBC uses a hashed list to quickly locate the physical pages that it is holding. A hash table contains file and offset information that is used to speed lookup operations.
The
UBC also uses a buffer to facilitate the movement of data between memory and
disk.
The
vm
subsystem attribute
vm_ubcbuffers
specifies the maximum file system device I/O queue depth for writes
(that is, the number of UBC I/O requests that can be outstanding).
6.1.2.3 AdvFS Buffer Cache Memory Allocation
The AdvFS buffer cache functions as a layer between the operating system and disk by storing recently accessed file data and metadata. Performance is improved if the cached data is later reused and a disk operation is avoided.
AdvFS wires UBC pages for both file data and metadata, although AdvFS actively manages the cache by using its own management routines and data structures. For example, AdvFS performs its own hashing of cached data, and uses its own cache lookup routines and modified data flushing routines. AdvFS interacts with the UBC by using the UBC to hold the actual data content for metadata and file sysem user data. When an AdvFS I/O operation occurs, AdvFS searches the AdvFS buffer cache for the data before querying the UBC for the cache page.
You can tune the AdvFS buffer cache by modifying the values of some
advfs
attributes.
Because AdvFS manages its own buffer cache,
tuning the UBC will not have a great effect on AdvFS.
At boot time, the kernel determines the amount of physical memory that
is available for AdvFS buffer cache headers, and allocates a buffer cache
header for each possible page.
Buffer headers are maintained in a global array
and temporarily assigned a buffer handle that refers to an actual page.
The
number of AdvFS buffer headers depends on the number of 8-KB pages that can
be obtained from the amount of memory specified by the
advfs
subsystem attribute
AdvfsCacheMaxPercent
.
The default value
is 7 percent of physical memory.
The AdvFS buffer cache is organized as a fixed-size hash chain table,
which uses a file page offset, fileset handle, and domain handle to calculate
the hash key that is used to look up a page.
The size of the hash chain
table depends on the number of buffer cache headers.
However, you can override
AdvFS table size calculation by changing the
AdvfsCacheHashSize
attribute.
When a page of data is requested, AdvFS searches the hash chain table for a match. If the entry is already in memory, AdvFS returns the buffer handle and a pointer to the page of data to the requester.
If no entry is found, AdvFS obtains a free buffer header and initializes it to represent the requested page. AdvFS performs a read operation to obtain the page from disk and attaches the buffer header to a UBC page. The UBC page is then wired into memory. AdvFS buffer cache pages remain wired until the buffer needs to be recycled, the file is deleted, or the fileset is unmounted.
See
Section 6.4.4,
Section 9.3.6.1, and
Section 9.3.6.2
for information about tuning the AdvFS buffer cache.
6.1.3 Process Memory Allocation
Physical memory that is not wired is available to processes and the UBC, which compete for this memory. The virtual memory subsystem allocates memory resources to processes and to the UBC according to the demand, and reclaims the oldest pages if the demand depletes the number of available free pages.
The following sections describe how the virtual memory subsystem allocates
memory to processes.
6.1.3.1 Process Virtual Address Space Allocation
The
fork
system call creates new processes.
When you invoke a process,
the
fork
system call:
Creates a UNIX process body, which includes a set of data
structures that the kernel uses to track the process and a set of resource
limitations.
See
fork
(2)
for more information.
Establishes a contiguous block of virtual address space for the process. Virtual address space is the array of virtual pages that the process can use to map into actual physical memory. Virtual address space is used for anonymous memory (memory that holds data elements and structures that are modified during process execution) and for file-backed memory (memory used for program text or shared libraries).
Because physical memory is limited, a process' entire virtual address space cannot be in physical memory at one time. However, a process can execute when only a portion of its virtual address space (its working set) is mapped to physical memory. Pages of anonymous memory and file-backed memory are paged in only when needed. If the memory demand increases and pages must be reclaimed, the pages of anonymous memory are paged out and their contents moved to swap space, while the pages of file-backed memory are simply released.
Creates one or more threads of execution. The default is one thread for each process. Multiprocessing systems support multiple process threads.
Although the virtual memory subsystem allocates a large amount of virtual address space for each process, it uses only part of this space. Only 4 TB is allocated for user space. User space is generally private and maps to a nonshared physical page. An additional 4 TB of virtual address space is used for kernel space. Kernel space usually maps to shared physical pages. The remaining space is not used for any purpose.
Figure 6-4
shows the use of process virtual address
space.
Figure 6-4: Virtual Address Space Usage
6.1.3.2 Virtual Address to Physical Address Translation
When a virtual page is touched (accessed), the virtual memory subsystem must locate the physical page and then translate the virtual address into a physical address. Each process has a page table, which is an array containing an entry for each current virtual-to-physical address translation. Page table entries have a direct relation to virtual pages (that is, virtual address 1 corresponds to page table entry 1) and contain a pointer to the physical page and protection information.
Figure 6-5
shows the translation of a virtual address
into a physical address.
Figure 6-5: Virtual-to-Physical Address Translation
A process resident set is the complete set of all the virtual addresses that have been mapped to physical addresses (that is, all the pages that have been accessed during process execution). Resident set pages may be shared among multiple processes.
A process
working set
is the set of virtual addresses
that are currently mapped to physical addresses.
The working set is a subset
of the resident set, and it represents a snapshot of the process resident
set at one point in time.
6.1.3.3 Page Faults
When an anonymous (nonfile-backed) virtual address is requested, the virtual memory subsystem must locate the physical page and make it available to the process. This occurs at different speeds, depending on whether the page is in memory or on disk (see Figure 1-1).
If a requested address is currently being used (that is, the address is in the active page list), it will have an entry in the page table. In this case, the PAL code loads the physical address into the translation lookaside buffer, which then passes the address to the CPU. Because this is a memory operation, it occurs quickly.
If a requested address is not active in the page table, the PAL lookup code issues a page fault, which instructs the virtual memory subsystem to locate the page and make the virtual-to-physical address translation in the page table.
There are four different types of page faults:
If a requested virtual address is being accessed for the first time, a zero-filled-on-demand page fault occurs. The virtual memory subsystem performs the following tasks:
Allocates an available page of physical memory.
Fills the page with zeros.
Enters the virtual-to-physical address translation in the page table.
If a requested virtual address has already been accessed and is located in the memory subsystem's internal data structures, a short page fault occurs. For example, if the physical address is located in the hash queue list or the page queue list, the virtual memory subsystem passes the address to the CPU and enters the virtual-to-physical address translation in the page table. This occurs quickly because it is a memory operation.
If a requested virtual address has already been accessed, but the physical page has been reclaimed, the page contents will be found either on the free page list or in swap space. If a page is located on the free page list, it is removed from the hash queue and the free list and then reclaimed. This operation occurs quickly, and does not require disk I/O.
If a page is found in swap space, a page-in page fault occurs. The virtual memory subsystem copies the contents of the page from swap space into the physical address and enters the virtual-to-physical address translation in the page table. Because this requires a disk I/O operation, it requires more time than a memory operation.
If a process needs to modify a read-only virtual page, a copy-on-write page fault occurs. The virtual memory subsystem allocates an available page of physical memory, copies the read-only page into the new page, and enters the translation in the page table.
The virtual memory subsystem uses several techniques to improve process execution time and decrease the number of page faults:
Mapping additional pages
The virtual memory subsystem attempts to anticipate which pages the task will need next. Using an algorithm that checks which pages were most recently used, the number of available pages, and other factors, the subsystem maps additional pages along with the page that contains the requested address.
If possible, the virtual memory subsystem maps a process' entire resident set into the secondary cache and executes the entire task, text, and data within the cache.
The
vm
subsystem
attribute
private_cache_percent
specifies the percentage
of the secondary cache that is reserved for anonymous memory.
This attribute
is used only for benchmarking.
The default is to reserve 50 percent of the
cache for anonymous memory and 50 percent for file-backed memory (shared).
To cache more anonymous memory, increase the value of the
private_cache_percent
attribute.
Because memory resources are limited, the virtual memory subsystem must periodically reclaim pages. The free page list contains clean pages that are available to processes and the UBC. As the demand for memory increases, the list may become depleted. If the number of pages falls below a tunable limit, the virtual memory subsystem will replenish the free list by reclaiming the least-recently used pages from processes and the UBC.
To reclaim pages, the virtual memory subsystem:
Prewrites modified pages to swap space, in an attempt to forestall a memory shortage. See Section 6.1.4.1 for more information.
Begins paging if the demand for memory is not satisfied, as follows:
Reclaims pages that the UBC has borrowed and puts them on the free list.
Reclaims the oldest inactive and UBC LRU pages from the active page list, moves the contents of the modified pages to swap space or disk, and puts the clean pages on the free list.
If needed, more aggressively reclaims pages from the active list.
See Section 6.1.4.2 for more information about reclaiming memory by paging.
Begins swapping if the demand for memory is not met. The virtual memory subsystem temporarily suspends processes and moves entire resident sets to swap space, which frees large numbers of pages. See Section 6.1.4.3 for information about swapping.
The point at which paging and swapping start and stop depends
on the values of some
vm
subsystem attributes.
Figure 6-6
shows some of the attributes that control paging and swapping.
Figure 6-6: Paging and Swapping Attributes
Detailed descriptions of the attributes are as follows:
vm_page_free_target
--Paging begins when the number of pages on the free list is
less than this value.
Paging stops when the number of pages is equal to or
more than this value.
The default value of the
vm_page_free_target
attribute is based on the amount of memory in the system.
Use
Table 6-1
to determine the default value for your system.
Table 6-1: Default Values for vm_page_free_target Attribute
Size of Memory | Value of vm_page_free_target |
Up to 512 MB | 128 |
513 MB to 1024 MB | 256 |
1025 MB to 2048 MB | 512 |
2049 MB to 4096 MB | 768 |
More than 4096 MB | 1024 |
vm_page_free_min
--Specifies the threshold at which a page must be reclaimed
for each page allocated.
The default value is twice the value of the
vm_page_free_reserved
attribute.
vm_page_free_reserved
--Only privileged tasks can get memory
when the number of pages on the free list is less than this value.
The default
value is 10 pages.
vm_page_free_swap
--Idle task swapping begins when the number of pages on the
free list is less than this value for a period of time.
The default value
of the
vm_page_free_swap
attribute is based on the values
of the
vm_page_free_target
and
vm_page_free_min
attributes by using this formula:
vm_page_free_min + ((vm_page_free_target - vm_page_free_min) / 2)
vm_page_free_optimal
--Hard swapping begins when the number of pages on the free
list is less than this value for five seconds.
The first processes to be swapped
out include those with the lowest scheduling priority and those with the largest
resident set size.
The default value of the
vm_page_free_optimal
attribute is based on the values of the
vm_page_free_target
and
vm_page_free_min
attributes by using this
formula:
vm_page_free_min + ((vm_page_free_target - vm_page_free_min) / 2)
vm_page_free_hardswap
--Swapping stops when the number of pages on the free list
is equal to or more than this value.
The default value is the value of the
vm_page_free_target
attribute multiplied by 16.
See Section 6.5 for information about modifying paging and swapping attributes.
The following sections describe the page reclamation procedure in detail.
6.1.4.1 Modified Page Prewriting
The virtual memory subsystem attempts to prevent memory shortages by prewriting modified inactive and UBC LRU pages to disk. To reclaim a page that has been prewritten, the virtual memory subsystem only needs to validate the page, which can improve performance. See Section 6.1.1 for information about page lists.
When the virtual memory subsystem anticipates that the pages on the free list will soon be depleted, it prewrites to disk the oldest modified (dirty) pages that are currently being used by processes or the UBC.
The value of the
vm
subsystem attribute
vm_page_prewrite_target
determines the number of inactive pages
that the subsystem will prewrite and keep clean.
The default value is
vm_page_free_target
* 2.
The
vm_ubcdirtypercent
attribute specifies the modified
UBC LRU page threshold.
When the number of modified UBC LRU pages is more
than this value, the virtual memory subsystem prewrites to disk the oldest
modified UBC LRU pages.
The default value of the
vm_ubcdirtypercent
attribute is 10 percent of the total UBC LRU pages.
In addition,
the
sync
function periodically flushes (writes to disk)
system metadata and data from all unwritten memory buffers.
For example,
the data that is flushed includes, for UFS, modified inodes and delayed block
I/O.
Commands such as the
shutdown
command, also issue
their own
sync
functions.
To minimize the impact of I/O
spikes caused by the
sync
function, the value of the
vm
subsystem attribute
ubc_maxdirtywrites
specifies
the maximum number of disk writes that the kernel can perform each second.
The default value is five I/O operations per second.
6.1.4.2 Reclaiming Memory by Paging
When the memory demand is high
and the number of pages on the free page list falls below the value of the
vm
subsystem attribute
vm_page_free_target
, the
virtual memory subsystem uses paging to replenish the free page list.
The
page-out daemon and task swapper daemon are extensions of the page reclamation
code, which controls paging and swapping.
The paging process is as follows:
The page reclamation code activates the page-stealer
daemon, which first reclaims the clean pages that the UBC has borrowed from
the virtual memory subsystem, until the size of the UBC reaches the borrowing
threshold that is specified by the value of the
ubc_borrowpercent
attribute (the default is 20 percent).
Freeing borrowed UBC pages
is a fast way to reclaim pages, because UBC pages are usually not modified.
If the reclaimed pages are dirty (modified), their contents must be written
to disk before the pages can be moved to the free page list.
If freeing clean UBC borrowed memory does not sufficiently replenish the free list, a page out occurs. The page-stealer daemon reclaims the oldest inactive and UBC LRU pages from the active page list, moves the contents of the modified pages to disk, and puts the clean pages on the free list.
Paging becomes increasingly aggressive if the number of free
pages continues to decrease.
If the number of pages on the free page list
falls below the value of the
vm
subsystem attribute
vm_page_free_min
(the default is 20 pages), a page must be reclaimed
for each page taken from the list.
Figure 6-7
shows the movement of pages during paging
operations.
Figure 6-7: Paging Operation
Paging stops when the number of pages on the free list increases to
the limit specified by the
vm
subsystem attribute
vm_page_free_target
.
However, if paging individual pages does not
sufficiently replenish the free list, swapping is used to free a large amount
of memory (see
Section 6.1.4.3).
6.1.4.3 Reclaiming Memory by Swapping
If there is a continuously high demand for memory, the virtual memory subsystem may be unable to replenish the free page list by reclaiming single pages. To dramatically increase the number of clean pages, the virtual memory subsystem uses swapping to suspend processes, which reduces the demand for physical memory.
The task swapper will swap out a process by suspending the process, writing its resident set to swap space, and moving the clean pages to the free page list. Swapping has a serious impact on system performance because a swapped out process cannot execute, and should be avoided on VLM systems and systems running large programs.
The point at which swapping begins and ends is controlled by a number
of
vm
subsystem attributes, as follows:
Idle task swapping begins when the number of pages
on the free list falls below the value of the
vm_page_free_swap
attribute for a period of time.
The task swapper suspends all tasks that have
been idle for 30 seconds or more.
Hard task swapping begins when the number of pages on the
free page list falls below the value of the
vm_page_free_optimal
attribute for more than five seconds.
The task swapper suspends,
one at a time, the tasks with the lowest priority and the largest resident
set size.
Swapping stops
when the number of pages on the free list increases to the value of the
vm_page_free_hardswap
attribute.
A
swap in
occurs when the number of
pages on the free list increases to the value of the
vm_page_free_optimal
attribute for a period of time.
The value of the
vm_inswappedmin
attribute specifies the minimum amount of time, in seconds, that
a task must remain in the inswapped state before it can be moved out of swap
space.
The default value is 1 second.
The task's working set is then paged
in from swap space, and the task can now execute.
You can modify the value
of the
vm_inswappedmin
attribute without rebooting the
system.
You may be able to improve system performance by modifying the attributes that control when swapping begins and ends, as described in Section 6.5. Large-memory systems or systems running large programs should avoid paging and swapping, if possible.
Increasing the rate of swapping (swapping earlier during page reclamation) may increase throughput. As more processes are swapped out, fewer processes are actually executing and more work is done. Although increasing the rate of swapping moves long-sleeping threads out of memory and frees memory, it may degrade interactive response time because when an outswapped process is needed, it will have a long latency period.
Decreasing the rate of swapping (by swapping later during page reclamation) may improve interactive response time, but at the cost of throughput. See Section 6.5.2 and Section 6.5.3 for more information about changing the rate of swapping.
To facilitate the movement of data between memory and disk, the virtual memory subsystem uses synchronous and asynchronous swap buffers. The virtual memory subsystem uses these two types of buffers to immediately satisfy a page-in request without having to wait for the completion of a page-out request, which is a relatively slow process.
Synchronous swap buffers are used for page-in page faults
and for swap outs.
Asynchronous swap buffers are used for asynchronous page
outs and for prewriting modified pages.
See
Section 6.5.9,
Section 6.5.10,
Section 6.5.11, and
Section 6.5.12
for swap buffer tuning information.
6.2 Configuring Swap Space for High Performance
Use the
swapon
command to display swap space,
and to configure additional swap space after system installation.
To make
this additional swap space permanent, use the
vm
subsystem
attribute
swapdevice
to specify swap devices in the
/etc/sysconfigtab
file.
For example:
vm: swapdevice=/dev/disk/dsk0b,/dev/disk/dsk0d
See Section 3.6 for information about modifying kernel subsystem attributes.
See Section 2.3.2.2 and Section 2.3.2.3 for information about swap space allocation modes and swap space requirements.
The following list describes how to configure swap space for high performance:
Ensure that all your swap devices are configured when you boot the system, instead of adding swap space while the system is running.
Use fast disks for swap space to decrease page-fault latency.
Use disks that are not busy for swap space.
Spread out swap space across multiple disks; do not put multiple swap partitions on the same disk. This makes paging and swapping more efficient and helps to prevent any single adapter, disk, or bus from becoming a bottleneck. The page reclamation code uses a form of disk striping (known as swap space interleaving) that improves performance when data is written to multiple disks.
Spread out your swap disks across multiple I/O buses to prevent a single bus from becoming a bottleneck.
Configure multiple swap devices as individual devices (or LSM volumes) instead of striping the devices and configuring only one logical swap device.
If you are paging heavily and cannot increase the amount of memory in your system, consider using RAID 5 for swap devices. See Chapter 8 for more information about RAID 5.
See the
System Administration
manual for more information about adding swap
devices.
See
Chapter 8
for more information about configuring
and tuning disks for high performance and availability.
6.3 Gathering Memory Information
Table 6-2
describes
the tools that you can use to gather information about memory usage.
Table 6-2: Virtual Memory and UBC Monitoring Tools
Name | Use | Description |
|
Analyzes system configuration and displays statistics (Section 4.3) |
Creates an HTML file that describes
the system configuration, and can be used to diagnose problems.
The
The
|
Displays total system memory |
Use the
|
|
Displays virtual memory and CPU usage statistics (Section 6.3.1) |
Displays information about process threads, virtual memory usage (page lists, page faults, page ins, and page outs), interrupts, and CPU usage (percentages of user, system and idle times). First reported are the statistics since boot time; subsequent reports are the statistics since a specified interval of time. |
|
|
Displays CPU and virtual memory usage by processes (Section 6.3.2) |
Displays current statistics for running processes, including CPU usage, the processor and processor set, and the scheduling priority. The
|
Displays IPC statistics |
Displays interprocess communication (IPC) statistics for currently active message queues, shared-memory segments, semaphores, remote queues, and local queue headers. The information
provided in the following fields reported by the
|
|
|
Displays information about swap space utilization (Section 6.3.3) |
Displays the total amount of allocated
swap space, swap space in use, and free swap space for each swap device.
You
can also use the
|
Reports UBC statistics (Section 6.3.4) |
You can check the UBC by using the
|
The following sections describe some of these tools in detail.
6.3.1 Monitoring Memory by Using the vmstat Command
The
vmstat
command shows
the virtual memory, process, and CPU statistics for a specified time interval.
The first line of the output is for all time since a reboot, and each subsequent
report is for the last interval.
An example of the
vmstat
command is as follows; output
is provided in one-second intervals:
#
/usr/ucb/vmstat 1
Virtual Memory Statistics: (pagesize = 8192) procs memory pages intr cpu r w u act free wire fault cow zero react pin pout in sy cs us sy id 2 66 25 6417 3497 1570 155K 38K 50K 0 46K 0 4 290 165 0 2 98 4 65 24 6421 3493 1570 120 9 81 0 8 0 585 865 335 37 16 48 2 66 25 6421 3493 1570 69 0 69 0 0 0 570 968 368 8 22 69 4 65 24 6421 3493 1570 69 0 69 0 0 0 554 768 370 2 14 84 4 65 24 6421 3493 1570 69 0 69 0 0 0 865 1K 404 4 20 76 [1] [2] [3] [4] [5]
The
vmstat
command includes information that you can use to diagnose
CPU and virtual memory problems.
Examine the following fields:
Process information (procs
):
r
-- Number of threads that are running
or can run.
w
-- Number of threads that are waiting
interruptibly (waiting for an event or a resource, but can be interrupted
or suspended).
For example, the thread can accept user signals or be swapped
out of memory.
u
-- Number of threads that are waiting
uninterruptibly (waiting for an event or a resource, but cannot be interrupted
or suspended).
For example, the thread cannot accept user signals; it must
come out of the wait state to take a signal.
Processes that are waiting uninterruptibly
cannot be stopped by the
kill
command.
Virtual memory information (memory
):
act
-- Number of pages on the active
list, including inactive pages and UBC LRU pages.
free
-- Number of pages on the free
list.
wire
-- Number of pages on the wired
list.
Pages on the wired list cannot be reclaimed.
See Section 6.1.1 for more information on page lists. [Return to example]
Paging information (pages
):
fault
-- Number of address translation
faults.
cow
-- Number of copy-on-write page
faults.
These page faults occur if the requested page is shared by a parent
process and a child process, and one of these processes needs to modify the
page.
If a copy-on-write page fault occurs, the virtual memory subsystem loads
a new address into the translation buffer, and then copies the contents of
the requested page into this address, so that the process can modify it.
zero
-- Number of zero-filled-on-demand
page faults.
These page faults occur if a requested page is not located in
the internal data structures and has never been referenced.
If a zero-filled-on-demand
page fault occurs, the virtual memory subsystem allocates an available page
of physical memory, fills the page with zeros, and then enters the address
into the page table.
react
-- Number of pages that have
been faulted (touched) while on the inactive page list.
pin
-- Number of requests for pages
from the page-stealer daemon.
pout
-- Number of pages that have
been paged out to disk.
in
-- Number of nonclock device interrupts
per second.
sy
-- Number of system calls called
per second.
cs
-- Number of task and thread context
switches per second.
CPU usage information (cpu
):
us
-- Percentage of user time for
normal and priority processes.
User time includes the time the CPU spent executing
library routines.
sy
-- Percentage of system time.
System
time includes the time the CPU spent executing system calls.
id
-- Percentage of idle time.
See
Section 7.1.2
for information about using the
vmstat
command to monitor CPU usage.
[Return to example]
To use the
vmstat
command to diagnose a memory performance
problem:
Check the size of the free page list (free
).
Compare the number of free pages to the values for the active pages (act
) and the wired pages (wire
).
The sum of the
free, active, and wired pages should be close to the amount of physical memory
in your system.
Although the value for
free
should be small,
if the value is consistently small (less than 128 pages) and accompanied by
excessive paging and swapping, you may not have enough physical memory for
your workload.
Examine
the
pout
field.
If the number of page outs is consistently
high, you may have insufficient memory.
The following command output may indicate that the size of the UBC is too small for your configuration:
The output of the
vmstat
or
monitor
command shows excessive file system page-in activity, but little
or no page-out activity or shows a very low free page count.
The output of the
iostat
command shows
little or no swap disk I/O activity or shows excessive file system I/O activity.
See
Section 8.2
for more information.
Excessive paging also can increase the miss rate for the secondary cache, and may be indicated by the following output:
The output of the
ps
command shows high
task swapping activity.
See
Section 6.3.2
for more information.
The output of the
swapon
command shows
excessive use of swap space.
See
Section 6.3.3
for
more information.
You can also use the
vmstat -P
command to display statistics about physical memory use.
For example:
#
vmstat -P
Total Physical Memory = 512.00 M = 65536 pages Physical Memory Clusters: start_pfn end_pfn type size_pages / size_bytes 0 256 pal 256 / 2.00M 256 65527 os 65271 / 509.93M 65527 65536 pal 9 / 72.00k Physical Memory Use: start_pfn end_pfn type size_pages / size_bytes 256 280 unixtable 24 / 192.00k 280 287 scavenge 7 / 56.00k 287 918 text 631 / 4.93M 918 1046 data 128 / 1.00M 1046 1209 bss 163 / 1.27M 1210 1384 kdebug 174 / 1.36M 1384 1390 cfgmgmt 6 / 48.00k 1390 1392 locks 2 / 16.00k 1392 1949 unixtable 557 / 4.35M 1949 1962 pmap 13 / 104.00k 1962 2972 vmtables 1010 / 7.89M 2972 65527 managed 62555 / 488.71M ============================ Total Physical Memory Use: 65270 / 509.92M Managed Pages Break Down: free pages = 1207 active pages = 25817 inactive pages = 20103 wired pages = 15434 ubc pages = 15992 ================== Total = 78553 WIRED Pages Break Down: vm wired pages = 1448 ubc wired pages = 4550 meta data pages = 1958 malloc pages = 5469 contig pages = 159 user ptepages = 1774 kernel ptepages = 67 free ptepages = 9 ================== Total = 15434
See
Section 6.4
for information about increasing
memory resources.
6.3.2 Monitoring Memory by Using the ps Command
The
ps
command displays the current status of the system processes.
You can use it to determine the current running processes (including users),
their state, and how they utilize system memory.
The command lists processes
in order of decreasing CPU usage, so you can identify which processes are
using the most CPU time.
The
ps
command provides only a snapshot of the system;
by the time the command finishes executing, the system state has probably
changed.
In addition, one of the first lines of the command may refer to the
ps
command itself.
An example of the
ps
command is as follows:
#
/usr/ucb/ps aux
USER PID %CPU %MEM VSZ RSS TTY S STARTED TIME COMMAND chen 2225 5.0 0.3 1.35M 256K p9 U 13:24:58 0:00.36 cp /vmunix /tmp root 2236 3.0 0.5 1.59M 456K p9 R + 13:33:21 0:00.08 ps aux sorn 2226 1.0 0.6 2.75M 552K p9 S + 13:25:01 0:00.05 vi met.ps root 347 1.0 4.0 9.58M 3.72 ?? S Nov 07 01:26:44 /usr/bin/X11/X -a root 1905 1.0 1.1 6.10M 1.01 ?? R 16:55:16 0:24.79 /usr/bin/X11/dxpa mat 2228 0.0 0.5 1.82M 504K p5 S + 13:25:03 0:00.02 more mat 2202 0.0 0.5 2.03M 456K p5 S 13:14:14 0:00.23 -csh (csh) root 0 0.0 12.7 356M 11.9 ?? R < Nov 07 3-17:26:13 [kernel idle] [1] [2] [3] [4] [5] [6] [7]
The
ps
command output includes the following information
that you can use to diagnose CPU and virtual memory problems:
Percentage of CPU time usage (%CPU
).
[Return to example]
Percentage of real memory usage (%MEM
).
[Return to example]
Process virtual address size (VSZ
)--This
is the total amount of anonymous memory allocated to the process (in bytes).
[Return to example]
Real memory (resident set) size of the process
(RSS
)--This is the total amount of physical memory
(in bytes) mapped to virtual pages (that is, the total amount of memory that
the application has physically used).
Shared memory is included in the resident
set size figures; as a result, the total of these figures may exceed the total
amount of physical memory available on the system.
[Return to example]
Process status or state (S
)--This specifies
whether a process is in one of the following states:
Runnable (R
)
Sleeping (S
)--Process has been waiting
for an event or a resource for less than 20 seconds, but it can be interrupted
or suspended.
For example, the process can accept user signals or be swapped
out.
Uninterruptible sleeping (U
)--Process
is waiting for an event or a resource, but cannot be interrupted or suspended.
You cannot use the
kill
command to stop these processes;
they must come out of the wait state to accept the signal.
Idle (I
)--Process has been sleeping
for more than 20 seconds.
Stopped (T
)--Process has been stopped.
Halted (H
)--Process has been halted.
Swapped out (W
)--Process has been
swapped out of memory.
Locked into memory (L
)--Process has
been locked into memory and cannot be swapped out.
Has exceeded the soft limit on memory requirements (>
)
A process group leader with a controlling terminal (+
)
Has a reduced priority (N
)
Has a raised priority (<
)
Current CPU time used (TIME
), in the format
hh:mm:ss.ms.
[Return to example]
The command that is running (COMMAND
).
[Return to example]
From the output of the
ps
command, you can determine
which processes are consuming most of your system's CPU time and memory resources,
and whether processes are swapped out.
Concentrate on processes that are running
or paging.
Here are some concerns to keep in mind:
If a process is using a large amount of memory (see the
RSS
and
VSZ
fields), the process may have excessive
memory requirements.
See
Section 11.2
for information about decreasing
an application's use of memory.
Are duplicate processes running? Use the
kill
command to terminate any unnecessary processes.
See
kill
(1)
for more information.
If a process is using a large amount of CPU time, it may be
in an infinite loop.
You may have to use the
kill
command
to terminate the process and then correct the problem by making changes to
its source code.
You can also use the Class Scheduler to allocate a percentage of CPU
time to a specific task or application (see
Section 7.2.2)
or lower the process' priority by using either the
nice
or
renice
command.
These commands have no effect on memory
usage by a process.
See
nice
(8)
or
renice
(8)
for more information.
Check the processes that are swapped out.
Examine the
S
(state) field.
A
W
entry indicates a process
that has been swapped out.
If processes are continually being swapped out,
this could indicate a lack of memory resources.
See
Section 6.4
for information about increasing memory resources.
6.3.3 Monitoring Swap Space Usage by Using the swapon Command
Use the
swapon -s
command
to display your swap device configuration.
For each swap partition, the command
displays the total amount of allocated swap space, the amount of swap space
that is being used, and the amount of free swap space.
This information can
help you determine how your swap space is being utilized.
An example of the
swapon
command is as follows:
#
/usr/sbin/swapon -s
Swap partition /dev/disk/dsk1b (default swap): Allocated space: 16384 pages (128MB) In-use space: 10452 pages ( 63%) Free space: 5932 pages ( 36%) Swap partition /dev/disk/dsk4c: Allocated space: 128178 pages (1001MB) In-use space: 10242 pages ( 7%) Free space: 117936 pages ( 92%) Total swap allocation: Allocated space: 144562 pages (1.10GB) Reserved space: 34253 pages ( 23%) In-use space: 20694 pages ( 14%) Available space: 110309 pages ( 76%)
You can configure swap space when you first install the operating system, or you can add swap space at a later date. Application messages, such as the following, usually indicate that not enough swap space is configured into the system or that a process limit has been reached:
unable to obtain requested swap space
" swap space below 10 percent free
"
See
Section 2.3.2.3
for information about swap
space requirements.
See
Section 6.2
for information about adding
swap space and distributing swap space for high performance.
6.3.4 Monitoring the UBC by Using the dbx Debugger
If you have not disabled
read-ahead, you can monitor the UBC by using the
dbx print
command to examine the
ufs_getapage_stats
data structure.
For example:
#
/usr/ucb/dbx -k /vmunix /dev/mem
(dbx)
print ufs_getapage_stats
struct { read_looks = 2059022 read_hits = 2022488 read_miss = 36506 alloc_error = 0 alloc_in_cache = 0 } (dbx)
To calculate the hit rate, divide the value of the
read_hits
field by the value of the
read_looks
field.
A good hit rate is a rate above 95 percent.
In the previous example, the hit
rate is approximately 98 percent.
6.4 Tuning to Provide More Memory to Processes
If your system is paging or swapping, you may be able to increase the memory that is available to processes by tuning various kernel subsystem attributes.
Table 6-3
shows the guidelines for increasing
memory resources to processes and lists the performance benefits as well as
tradeoffs.
Some of the guidelines for increasing the memory available to processes
may affect UBC operation and file system caching.
Adding physical memory
to your system is the best way to stop paging or swapping.
Table 6-3: Memory Resource Tuning Guidelines
Guideline | Performance Benefit | Tradeoff |
Reduce the number of processes running at the same time (Section 6.4.1) | Decreases CPU load and demand for memory | System performs less work |
Reduce the static size of the kernel (Section 6.4.2) | Frees memory | Not all functionality may be available |
Decrease the borrowed memory threshold (Section 6.4.3) | Improves system response time when memory is low | May degrade file system performance |
Decrease the memory allocated to the AdvFS buffer cache (Section 6.4.4) | Provides more memory resources to processes | May degrade AdvFS performance on systems that open and reuse files |
Decrease the memory allocated to AdvFS access-structures (Section 6.4.5) | Provides more memory resources to processes | May degrade AdvFS performance on low-memory systems that use AdvFS |
Decrease the size of the metadata buffer cache (Section 6.4.6) | Provides more memory resources to processes | May degrade UFS performance on small systems |
Decrease the size of the namei cache (Section 6.4.7) | Frees memory | May slow lookup operations and degrade file system performance |
Increase the percentage of memory reserved
for kernel
malloc
allocations (Section 6.4.8) |
Improves network throughput under a heavy load | Consumes memory |
Reduce process memory requirements (Section 11.2.6) | Frees memory | Program may not run optimally |
The following sections describe the guidelines that will increase the
memory available to processes in detail.
6.4.1 Reducing the Number of Processes Running Simultaneously
You can improve performance and reduce the demand for memory by running
fewer applications simultaneously.
Use the
at
or the
batch
command to run applications at offpeak hours.
See
at
(1)
for more information.
6.4.2 Reducing the Static Size of the Kernel
You can reduce the static size of the kernel by deconfiguring
any unnecessary subsystems.
Use the
sysconfig
command to
display the configured subsystems and to delete subsystems.
Be sure not to
remove any subsystems or functionality that is vital to your environment.
See
Section 3.6
for information about modifying
kernel subsystem attributes.
6.4.3 Decreasing the Borrowed Memory Threshold
You may be able to prevent paging by decreasing
the borrowed memory threshold.
If you do this, less memory remains in the
UBC when page reclamation begins.
The
ubc_borrowpercent
attribute specifies the UBC borrowing threshold.
See
Section 6.1.2.2
for information about borrowed memory.
Performance Benefit and Tradeoff
Decreasing the borrowed memory threshold may improve the system response time when memory is low, but may also reduce UBC effectiveness.
You can modify the
ubc_borrowpercent
attribute without
rebooting the system.
When to Tune
If your workload does not use file systems heavily, you may want to decrease the borrowed memory threshold.
Recommended Values
The default value of the
ubc_borrowpercent
attribute
is 20 percent.
See
Section 3.6
for information about modifying
kernel subsystem attributes.
6.4.4 Decreasing the Size of the AdvFS Buffer Cache
The
advfs
subsystem attribute
AdvfsCacheMaxPercent
determines the
maximum amount of memory that can be used for the AdvFS buffer cache.
See
Section 6.1.2.3
for information about the AdvFS buffer cache.
Performance Benefit and Tradeoff
To free memory resources, you may want to decrease the amount of memory allocated to the AdvFS buffer cache. Decreasing the cache size also decreases the overhead associated with managing the cache.
However, decreasing the size of the AdvFS buffer cache may degrade AdvFS performance if you reuse many AdvFS files.
You cannot modify the
AdvfsCacheMaxPercent
attribute
without rebooting the system.
When to Tune
If you are not using AdvFS, decrease the size of the AdvFS buffer cache.
If you are using AdvFS, but most of your data is read or written only once, reducing the size of the AdvFS buffer cache may improve performance. The cache lookup time is decreased because the cache contains fewer entries to search. If you are using AdvFS, but you have a large-memory system, you also may want to decrease the size of the AdvFS cache.
Recommended Values
The default value of the
AdvfsCacheMaxPercent
attribute
is 7 percent of physical memory.
The minimum is 1 percent; the maximum is
30 percent.
If you are not using AdvFS, decrease the cache size to 1 percent.
See
Section 3.6
for information about modifying
kernel subsystem attributes.
6.4.5 Decreasing the Memory for AdvFS Access Structures
AdvFS access structures are in-memory data structures that AdvFS uses to cache low-level information about files that are currently open and files that were opened but are now closed. Caching open file information can enhance AdvFS performance if the open files are later reused.
At boot time, the system reserves for AdvFS access structures a portion
of the physical memory that is not wired by the kernel.
The memory reserved
is either twice the value of the
AdvfsMinFreeAccess
attribute
or the value of the
AdvfsAccessMaxPercent
attribute, whichever
is smaller.
Access structures are then placed on the access structure free
list, and are allocated and deallocated according to the kernel configuration
and workload demands.
There are three attributes that control the allocation of AdvFS access structures:
The
AdvfsAccessMaxPercent
attribute controls
the maximum percentage of pageable memory that can be allocated for AdvFS
access structures.
The
AdvfsMinFreeAccess
attribute controls
the allocation of AdvFS access structures.
At boot time, and when the number
of access structures on the free list is less than the value of the
AdvfsMinFreeAccess
attribute, AdvFS allocates access structures,
until the number of access structures on the free list is either twice the
value of the
AdvfsMinFreeAccess
attribute or the value
of the
AdvfsAccessMaxPercent
attribute, whichever is smaller.
The
AdvfsMaxFreeAccessPercent
attribute
controls when access structures are deallocated from the free list.
When the
percentage of access structures on the free list is more than the value of
the
AdvfsMaxFreeAccessPercent
attribute, and the number
of access structures on the free list is more than twice the value of the
AdvfsMinFreeAccess
attribute, AdvFS deallocates access structures.
See Section 9.3.3 for more information about access structures.
Performance Benefit and Tradeoff
Decreasing the amount of memory allocated for access structures makes more memory available to processes and file system buffer caching, but it may degrade performance on low-memory systems that use AdvFS or systems that reuse AdvFS files.
You can modify the
AdvfsAccessMaxPercent
attribute
without rebooting the system.
When to Tune
If you do not use AdvFS, you may want to decrease the value of the
AdvfsMinFreeAccess
attribute to minimize the memory allocated to
AdvFS access structures at boot time.
If your workload does not reuse AdvFS files, you may want to decrease
the value of the
AdvfsMaxFreeAccessPercent
attribute.
This
will cause the system to aggressively deallocate free access structures.
If you have a large-memory system, you may want to decrease the value
of the
AdvfsAccessMaxPercent
attribute.
This is because
the number of open files does not scale with the size of system memory as
efficiently as UBC memory usage and process memory usage.
Recommended Values
The default value of the
AdvfsAccessMaxPercent
attribute
is 25 percent of pageable memory.
The minumum value is 5 percent; the maximum
value is 95 percent.
The default value of the
AdvfsMinFreeAccess
attribute
is 128.
The minumum value is 1; the maximum value is 100,000.
The default value of the
AdvfsMaxFreeAccessPercent
attribute is 80 percent.
The minimum value is 5 percent; the maximum value
is 95 percent.
See
Section 3.6
for information about modifying
kernel subsystem attributes.
6.4.6 Decreasing the Size of the Metadata Buffer Cache
The metadata
buffer cache contains recently accessed UFS and CD-ROM File System (CDFS)
metadata.
The
vfs
subsystem attribute
bufcache
specifies the percentage of physical memory that the kernel wires
for the metadata buffer cache.
Performance Benefit and Tradeoff
Decreasing the size of the metadata buffer cache will increase the amount of memory that is available to processes and for file system buffer caching. However, decreasing the size of the cache may degrade UFS performance.
You cannot modify the
bufcache
attribute without
rebooting the system.
When to Tune
If you have a high cache hit rate or if you use only AdvFS, you may want to decrease the size of the metadata buffer cache.
Recommended Values
The default size of the metadata buffer cache is 3 percent of physical
memory.
You can decrease the value of the
bufcache
attribute
to a minimum of 1 percent.
For VLM systems and systems that use only AdvFS, set the value
of the
bufcache
attribute to 1 percent.
You can reduce the memory allocated for the metadata buffer cache below
1 percent by decreasing the value of the
vfs
subsystem
attribute
bufpages
, which specifies the number of pages
in the cache.
The default value is 1958 pages.
See
Section 3.6
for information about modifying
kernel subsystem attributes.
6.4.7 Decreasing the Size of the namei Cache
The namei cache is used by UFS, AdvFS, CDFS, and NFS to store information about recently used file names, parent directory vnodes, and file vnodes. The number of vnodes determines the number of open files. The namie cache also stores vnode information for files that were referenced but not found. Having this information in the cache substantially reduces the amount of searching that is needed to perform pathname translations.
The
vfs
subsystem attribute
name_cache_size
specifies the number of elements in the namei cache.
Performance Benefit and Tradeoff
Decreasing the size of the namei cache can free memory resources. However, this may degrade file system performance by reducing the number of cache hits.
You cannot modify the
name_cache_size
attribute without
rebooting.
When to Tune
Monitor the namei cache by using the
dbx print
command
to examine the
nchstats
data structure.
If the hit rate
is low, you may want to decrease the cache size.
See
Section 9.1.2
for information.
Recommended Values
Decrease the number of elements in the namei cache by decreasing the
value of the
name_cache_size
attribute.
The default value
is:
2 * (148 + 10 *
maxusers
) * 11 / 10
The maximum value is:
2 *
max_vnodes
* 11 / 10
Make sure that decreasing the size of the namei cache does not degrade file system performance.
See
Section 3.6
for information about modifying
kernel subsystem attributes.
6.4.8 Increasing the Memory Reserved for Kernel malloc Allocations
If
you are running a large Internet application, you may need to increase the
amount of memory reserved for the kernel
malloc
subsystem.
To do this, increase the value of the
generic
subsystem
attribute
kmemreserve_percent
, which increases the percentage
of physical memory reserved for kernel memory allocations that are less than
or equal to the page size (8 KB).
Performance Benefit and Tradeoff
Increasing the value of the
kmemreserve_percent
attribute
improves network throughput by reducing the number of packets that are dropped
while the system is under a heavy network load.
However, increasing this
value consumes memory.
You can modify the
kmemreserve_percent
attribute
without rebooting.
When to Tune
You may want
to increase the value of the
kmemreserve_percent
attribute
if the output of the
netstat
command shows dropped packets,
or if the output of the
vmstat -M
command shows dropped
packets under the
fail_nowait
heading.
This may occur under
a heavy network load.
Recommended Values
The default value of the
kmemreserve_percent
attribute
is 0, which means that the percentage of reserved physical memory will be
0.4 percent of available memory or 256, whichever is the smallest value.
Increase the value of the
kmemreserve_percent
attribute
(up to the maximum of 75) by small increments until the output of the
vmstat -M
command shows no entries under the
fail_nowait
heading.
See
Section 3.6
for information about modifying
kernel subsystem attributes.
6.5 Tuning Paging and Swapping Operation
You may be able to improve performance by modifying paging and swapping operations. Very-large memory (VLM) systems should avoid paging and swapping.
Table 6-4
describes the guidelines for controlling
paging and swapping and lists the performance benefits and any tradeoffs.
Table 6-4: Paging and Swapping Tuning Guidelines
Guideline | Performance Benefit | Tradeoff |
Increase the paging threshold (Section 6.5.1) | Maintains performance when free memory is exhausted | May waste memory |
Increase the rate of swapping (Section 6.5.2) | Increases process throughput | Decreases interactive response performance |
Decrease the rate of swapping (Section 6.5.3) | Improves process interactive response performance | Decreases process throughput |
Enable aggressive swapping (Section 6.5.4) | Improves system throughput | Degrades interactive response performance |
Limit process resident set size (Section 6.5.5) | Prevents a process from being swapped out because of a large resident set size | May increase paging activity |
Use memory locking (Section 11.2.7) | Prevents a process from being swapped out | Improves process throughput |
Increase the rate of dirty page prewriting (Section 6.5.6) | Prevents drastic performance degradation when memory is exhausted | Decreases peak workload performance |
Decrease the rate of dirty page prewriting (Section 6.5.7) | Improves peak workload performance | May cause drastic performance degradation when memory is exhausted |
Increase the size of the page-in and page-out clusters (Section 6.5.8) | Improves peak workload performance | Decreases total system workload performance |
Increase the swap device I/O queue depth for page ins and swap outs (Section 6.5.9) | Increases overall system throughput | Consumes memory |
Decrease the swap device I/O queue depth for page ins and swap outs (Section 6.5.10) | Improves the interactive response time and frees memory | Decreases system throughput |
Increase the swap device I/O queue depth for page outs (Section 6.5.11) | Frees memory and increases throughput | Decreases interactive response performance |
Decrease the swap device I/O queue depth for page outs (Section 6.5.12) | Improves interactive response time | Consumes memory |
The following sections describe the guidelines for controlling paging
and swapping in detail.
6.5.1 Increasing the Paging Threshold
The
vm
subsystem attribute
vm_page_free_target
specifies
the minimum number of pages on the free list before paging begins.
Increasing
the paging threshold may prevent performance problems when a severe memory
shortage occurs.
See
Section 6.1.4
for information about
paging and swapping attributes.
Performance Benefit and Tradeoff
Increasing the value of the
vm_page_free_target
attribute
(the paging threshold) may improve performance when free memory is exhausted.
However, this may increase paging activity on a low-memory system.
In addition,
an excessively high value can waste memory.
You can modify the
vm_page_free_target
attribute
without rebooting the system.
When to Tune
You may want to increase the value of the
vm_page_free_target
attribute if you have sufficient memory resources, and your system
experiences performance problems when a severe memory shortage occurs.
Do
not increase the value if the system is not paging.
Recommended Values
The default value of the
vm_page_free_target
attribute
is based on the amount of memory in the system.
Use the following table to
determine the default value for your system:
Size of Memory | Value of vm_page_free_target |
Up to 512 MB | 128 |
513 MB to 1024 MB | 256 |
1025 MB to 2048 MB | 512 |
2049 MB to 4096 MB | 768 |
More than 4096 MB | 1024 |
If you want to increase the value of the
vm_page_free_target
attribute, start at the default value and then double the value.
Do not specify a value that is more than 1024 pages or 8 MB.
Do not decrease the value of the
vm_page_free_target
attribute.
If you increase the default value of the
vm_page_free_target
attribute, you may also want to increase the value of the
vm_page_free_min
attribute.
See
Section 3.6
for information about modifying
kernel subsystem attributes.
6.5.2 Increasing the Rate of Swapping
Swapping
has a drastic impact on system performance.
You can modify kernel subsystem
attributes to control when swapping begins and ends.
VLM systems and systems
running large programs should avoid swapping.
Hard swapping begins when the
number of pages on the free list is less than the value of the
vm
subsystem attribute
vm_page_free_optimal
for
five seconds.
See
Section 6.1.4
for more information about
paging and swapping attributes.
Performance Benefit and Tradeoff
Increasing the rate of swapping (swapping earlier during page reclamation)
by raising the value of the
vm_page_free_optimal
attribute
moves long-sleeping threads out of memory, frees memory, and increases throughput.
As more processes are swapped out, fewer processes are actually executing
and more work is done.
However, when an outswapped process is needed, it will
have a long latency, so increasing the rate of swapping may degrade interactive
response time.
You can modify the
vm_page_free_optimal
attribute
without rebooting the system.
When to Tune
You do not need to modify task swapping if the system is not paging.
Recommended Values
The default value of the
vm_page_free_optimal
attribute
is based on the values of the
vm_page_free_target
and
vm_page_free_min
attributes.
Increase the value of the
vm_page_free_optimal
only by 2 pages at a time.
Do not specify a
value that is more than the value of the
vm
subsystem attribute
vm_page_free_target
.
See
Section 3.6
for information about modifying
kernel subsystem attributes.
6.5.3 Decreasing the Rate of Swapping
Swapping
has a drastic impact on system performance.
You can modify kernel subsystem
attributes to control when swapping begins and ends.
VLM systems and systems
running large programs should avoid swapping.
Hard swapping begins when the
number of pages on the free list is less than the value of the
vm
subsystem attribute
vm_page_free_optimal
for
five seconds.
See
Section 6.1.4
for more information about
paging and swapping attributes.
Performance Benefit and Tradeoff
Decreasing the rate of swapping (swapping later during page reclamation)
by decreasing the value of the
vm_page_free_optimal
attribute
improves interactive response time, but at the cost of throughput.
You can modify the
vm_page_free_optimal
attribute
without rebooting the system.
When to Tune
You do not need to modify task swapping if the system is not paging.
Recommended Values
The default value of the
vm_page_free_optimal
attribute
is based on the values of the
vm_page_free_target
and
vm_page_free_min
attributes.
Decrease the value of the
vm_page_free_optimal
attribute by 2 pages at a time.
Do not specify
a value that is less than the value of the
vm
subsystem
attribute
vm_page_free_min
.
See
Section 3.6
for information about modifying
kernel attributes.
6.5.4 Enabling Aggressive Task Swapping
Swapping
begins when the free page list falls below the swapping threshold, as specified
by the
vm
subsystem attribute
vm_page_free_swap
.
Tasks are swapped in when the demand for memory decreases.
You
can use the
vm
subsystem attribute
vm_aggressive_swap
to enable aggressive task swapping, which causes the virtual memory
subsystem to swap in processes at a rate that is slower than normal task swapping.
Performance Benefit and Tradeoff
Aggressive task swapping improves system throughput, but it degrades the interactive response performance.
You can modify the
vm_aggressive_swap
attribute without
rebooting.
When to Tune
Usually, you do not need to enable aggressive task swapping.
Recommended Values
By default, the
vm_aggressive_swap
attribute is disabled
(set to 0).
To enable aggressive task swapping, set the value of the
vm_aggressive_swap
attribute to 1.
See
Section 3.6
for information about modifying
kernel attributes.
6.5.5 Limiting the Resident Set Size to Avoid Swapping
By default, Tru64 UNIX does not limit the resident set size for a process. If the number of free pages cannot keep up with the demand for memory, processes with large resident set sizes are likely candidates for swapping. To avoid swapping a process because it has a large resident set size, you can specify process-specific and system-wide limits for resident set sizes.
To set a limit on the resident set size for a specific process, use
the
setrlimit
system call to specify a value (in bytes)
for the
RLIMIT_RSS
resource parameter.
To set a system-wide limit, use the
vm
subsystem
attribute
vm_rss_maxpercent
, which specifies the maximum
percentage of managed pages that can be used for a resident set.
If you limit the resident set size, either for a specific process or
system-wide, you must also use the
vm
subsystem attribute
anon_rss_enforce
to set either a soft or hard limit on the size
of a resident set.
If you enable a hard limit, a task's resident set cannot
exceed the limit.
If a task reaches the hard limit, pages of the task's anonymous
memory are moved to swap space to keep the resident set size within the limit.
If you enable a soft limit, anonymous memory paging will start when the following conditions are met:
A task's resident set exceeds the system-wide or per-process limit.
The number of pages of the free page list is less than the
value of the
vm
subsystem attribute
vm_rss_block_target
.
A task that has exceeded its soft limit remains blocked until the number
of pages on the free page list reaches the value of the
vm
subsystem attribute
vm_rss_wakeup_target
attribute.
Performance Benefit and Tradeoff
Limiting resident set sizes will prevent a process from being swapped out because of a large resident set size.
You cannot modify the
anon_rss_enforce
attribute
without rebooting the system.
You can modify the
vm_rss_maxpercent
,
vm_rss_block_target
, and
vm_rss_wakeup_target
attributes without rebooting the system.
When to Tune
You do not need to limit resident set sizes if the system is not paging.
Recommended Values
To set a system-wide limit, use the
vm
subsystem
attribute
vm_rss_maxpercent
to specify the maximum percentage
of managed pages that can be used for a resident set.
The minimum value of
the attribute is 1; the maximum and default values are 100.
Decrease the default
value by decrements of 10 percent.
Use the attribute
anon_rss_enforce
to set either
a soft or hard limit on the size of a resident set.
If set to 0 (zero), the
default, there is no limit on the size of a process' resident set.
Set the
attribute to 1 to enable a soft limit; set the attribute to 2 to enable a
hard limit.
If you enable a soft limit, use the
vm_rss_block_target
to specify the free page list threshold at which anonymous paging begins.
The default value of the
vm_rss_block_target
attribute
is the same as the default value of the
vm_page_free_optimal
attribute, which specifies the swapping threshold.
You can increase the default
value of the
vm_rss_block_target
attribute to delay paging
anonymous memory.
You can decrease the default value to start paging earlier.
The minimum value of the
vm_rss_block_target
attribute
is 0; the maximum value is 2 GB.
If you enable a soft limit, use the
vm
subsystem
attribute
vm_rss_wakeup_target
attribute to specify the
free page list threshold at which a task that has exceeded its soft limit
becomes unblocked.
The default value of the
vm_rss_wakeup_target
attribute is the same as the default value of the
vm_page_free_optimal
attribute, which specifies the swapping threshold.
You can increase
the value of the
vm_rss_wakeup_target
attribute to free
more memory before unblocking the task.
You can decrease the value so that
the task is unblocked sooner, but less memory is freed.
The minimum value
of the
vm_rss_block_target
attribute is 0; the maximum
value is 2 GB.
6.5.6 Increasing Modified Page Prewriting
The virtual memory subsystem attempts to prevent a memory shortage by prewriting modified (dirty) pages to disk. To reclaim a page that has been prewritten, the virtual memory subsystem only needs to validate the page, which can improve performance.
When the virtual memory subsystem anticipates that the pages on the free list will soon be depleted, it prewrites to disk the oldest inactive and UBC LRU pages.
The value of the
vm
subsystem attribute
vm_page_prewrite_target
determines the number of inactive pages
that the subsystem will prewrite and keep clean.
The
vm_ubcdirtypercent
attribute specifies the modified
UBC LRU page threshold.
When the number of modified UBC LRU pages is more
than this value, the virtual memory subsystem prewrites to disk the oldest
modified UBC LRU pages.
See Section 6.1.4.1 for more information about modified page prewriting.
Performance Benefit and Tradeoff
Increasing the rate of modified page prewriting will prevent a drastic performance degradation when memory is exhausted, but will also reduce peak workload performance. Increasing the rate of modified page prewriting will also increase the amount of continuous disk I/O, but will provide better file system integrity if a system crash occurs.
You can modify the
vm_page_prewrite_target
or
vm_ubcdirtypercent
attribute without rebooting the system.
When to Tune
You do not need to modify dirty page prewriting if the system is not paging.
Recommended Values
To increase the rate of inactive dirty page prewriting, increase the
value of the
vm_page_prewrite_target
attribute by increments
of 64 pages.
The default value is
vm_page_free_target
* 2.
The default value of the
vm_ubcdirtypercent
attribute
10 percent of the total UBC LRU pages (that is, 10 percent of the total UBC
LRU pages must be dirty before the oldest UBC LRU pages are prewritten).
To
increase the rate of UBC LRU dirty page prewriting, decrease the value of
the
vm_ubcdirtypercent
attribute by decrements of 1 percent.
See
Section 3.6
for information about modifying
kernel attributes.
6.5.7 Decreasing Modified Page Prewriting
The virtual memory subsystem attempts to prevent a memory shortage by prewriting modified (dirty) pages to disk. To reclaim a page that has been prewritten, the virtual memory subsystem only needs to validate the page, which can improve performance.
When the virtual memory subsystem anticipates that the pages on the free list will soon be depleted, it prewrites to disk the oldest inactive and UBC LRU pages.
The value of the
vm
subsystem attribute
vm_page_prewrite_target
determines the number of inactive pages
that the subsystem will prewrite and keep clean.
The
vm_ubcdirtypercent
attribute specifies the modified
UBC LRU page threshold.
When the number of modified UBC LRU pages is more
than this value, the virtual memory subsystem prewrites to disk the oldest
modified UBC LRU pages.
See Section 6.1.4.1 for more information about modified page prewriting.
Performance Benefit and Tradeoff
Decreasing the rate of modified page prewriting will improve peak workload performance, but it will cause a drastic performance degradation when memory is exhausted.
You can modify the
vm_page_prewrite_target
and
vm_ubcdirtypercent
attributes without rebooting the system.
When to Tune
You do not need to modify inactive dirty page writing if the system is not paging. Decrease UBC LRU dirty page prewriting only for benchmarking.
Recommended Values
To decrease the rate of inactive dirty page prewriting, decrease the
default value of the
vm_page_prewrite_target
attribute.
The default value is
vm_page_free_target
* 2.
The default value of the
vm_ubcdirtypercent
attribute
is 10 percent of the total UBC LRU pages (that is, 10 percent of the UBC LRU
pages must be dirty before the UBC LRU pages are prewritten).
To decrease
the rate of UBC LRU dirty page prewriting, increase the value of the
vm_ubcdirtypercent
attribute.
See
Section 3.6
for information about modifying
kernel attributes.
6.5.8 Increasing the Size of the Page-In and Page-Out Clusters
The virtual memory subsystem reads in and writes
out additional pages in an attempt to anticipate pages that it will need.
The
vm
subsystem attribute
vm_max_rdpgio_kluster
specifies the maximum size of an anonymous page-in cluster.
The
vm
subsystem attribute
vm_max_wrpgio_kluster
specifies the maximum size of an anonymous page-out cluster.
Performance Benefit and Tradeoff
If you increase the value of the
vm_max_rdpgio_kluster
attribute, the system will spend less time page faulting because more pages
will be in memory.
This will increase the peak workload performance, but will
consume more memory and decrease system performance.
Increasing the value of the
vm_max_wrpgio_kluster
attribute improves the peak workload performance and conserves memory, but
may cause more page ins and decrease the total system workload performance.
You cannot modify the
vm_max_rdpgio_kluster
and
vm_max_wrpgio_kluster
attributes without rebooting the system.
When to Tune
You may want to increase the size of the page-in clusters if you have a large-memory system and you are swapping processes. You may want to increase the size of the page-out clusters if you are paging, and you are swapping processes.
Recommended Values
The default value of the
vm_max_rdpgio_kluster
attribute
is 16384 bytes (2 pages).
The default value of the
vm_max_wrpgio_kluster
attribute is 32768 bytes (4 pages).
See
Section 3.6
for information about modifying
kernel subsystem attributes.
6.5.9 Increasing the Swap I/O Queue Depth for Page Ins and Swap Outs
Synchronous swap buffers
are used for page-in page faults and for swap outs.
For each swap device,
the
vm
subsystem attribute
vm_syncswapbuffers
specifies the maximum swap device I/O queue depth for page ins
and swap outs.
Performance Benefit and Tradeoff:
Increasing the value of the
vm_syncswapbuffers
attribute
increases overall system throughput, but it consumes memory.
You can modify the
vm_syncswapbuffers
attribute without
rebooting the system.
When to Tune:
Usually, you do not need to decrease the swap I/O queue depth.
Recommended Values:
The default value of the
vm_syncswapbuffers
attribute
is 128.
The value should be equal to the approximate number of simultaneously
running processes that the system can easily handle.
See
Section 3.6
for information about modifying
kernel subsystem attributes.
6.5.10 Decreasing the Swap I/O Queue Depth for Page Ins and Swap Outs
Synchronous swap buffers
are used for page-in page faults and for swap outs.
The
vm
subsystem attribute
vm_syncswapbuffers
specifies the maximum
swap device I/O queue depth for page ins and swap outs.
Performance Benefit and Tradeoff
Decreasing the value of the
vm_syncswapbuffers
attribute
decreases memory demands and improves interactive response time, but it decreases
overall system throughput.
You can modify the
vm_syncswapbuffers
attribute without
rebooting the system.
When to Tune
Usually, you do not have to decrease the swap I/O queue depth.
Recommended Values
The default value of the
vm_syncswapbuffers
attribute
is 128.
The value should be equal to the approximate number of simultaneously
running processes that the system can easily handle.
See
Section 3.6
for information about modifying
kernel subsystem attributes.
6.5.11 Increasing the Swap I/O Queue Depth for Page Outs
Asynchronous swap
buffers are used for asynchronous page outs and for prewriting modified pages.
The
vm
subsystem attribute
vm_asyncswapbuffers
controls the maximum depth of the swap device I/O queue for page
outs.
Performance Benefit and Tradeoff
Increasing the value of the
vm_asyncswapbuffers
attribute
will free memory and increase the overall system throughput.
You can modify the
vm_asyncswapbuffers
attribute
without rebooting the system.
When to Tune
If you are using LSM, you may want to increase the page-out rate.
Be careful if you increase the value of the
vm_asyncswapbuffers
attribute, because this will cause page-in requests to lag asynchronous page-out
requests.
Recommended Values
The default value of the
vm_asyncswapbuffers
attribute
is 4.
You can specify a value that is the approximate number of I/O transfers
that a swap device can handle at one time.
See
Section 3.6
for information about modifying
kernel subsystem attributes.
6.5.12 Decreasing the Swap I/O Queue Depth for Page Outs
Asynchronous swap
buffers are used for asynchronous page outs and for prewriting modified pages.
The
vm
subsystem attribute
vm_asyncswapbuffers
controls the maximum depth of the swap device I/O queue for page
outs.
Performance Benefit and Tradeoff
Decreasing the
vm_asyncswapbuffers
attribute will
use more memory, but it will improve the interactive response time.
You can modify the
vm_asyncswapbuffers
attribute
without rebooting the system.
When to Tune
Usually, you do not need to decrease the swap I/O queue depth.
Recommended Values
The default value of the
vm_asyncswapbuffers
attribute
is 4.
You can specify a value that is the approximate number of I/O transfers
that a swap device can handle at one time.
See
Section 3.6
for information about modifying
kernel subsystem attributes.
6.6 Reserving Physical Memory for Shared Memory
Granularity hints allow you to reserve a portion of dynamically wired physical memory at boot time for shared memory. This functionality allows the translation lookaside buffer to map more than a single page, and enables shared page table entry functionality, which may result in more cache hits.
On some database servers, using granularity hints provides a 2 to 4 percent run-time performance gain that reduces the shared memory detach time. See your database application documentation to determine if you should use granularity hints.
For most applications, use the Segmented Shared Memory (SSM) functionality (the default) instead of granularity hints.
To enable granularity hints, you must
specify a value for the
vm
subsystem attribute
gh_chunks
.
In addition, to make granularity hints more effective,
modify applications to ensure that both the shared memory segment starting
address and size are aligned on an 8-MB boundary.
Section 6.6.1
and
Section 6.6.2
describe
how to enable granularity hints.
6.6.1 Tuning the Kernel to Use Granularity Hints
To use granularity hints, you must specify the number of 4-MB chunks of physical memory to reserve for shared memory at boot time. This memory cannot be used for any other purpose and cannot be returned to the system or reclaimed.
To reserve memory for shared memory, specify a nonzero value for the
gh_chunks
attribute.
For example, if you want to reserve 4 GB of
memory, specify 1024 for the value of
gh_chunks
(1024 *
4 MB = 4 GB).
If you specify a value of 512, you will reserve 2 GB of memory.
The value you specify for the
gh_chunks
attribute
depends on your database application.
Do not reserve an excessive amount of
memory, because this decreases the memory available to processes and the UBC.
Note
If you enable granularity hints, disable the use of segmented shared memory by setting the value of the
ipc
subsystem attributessm_threshold
attribute to zero.
You can determine if you have reserved the appropriate amount of memory.
For example, you can initially specify 512 for the value of the
gh_chunks
attribute.
Then, invoke the following sequence of
dbx
commands while running the application that allocates shared
memory:
#
/usr/ucb/dbx -k /vmunix /dev/mem
(dbx)
px &gh_free_counts
0xfffffc0000681748(dbx)
0xfffffc0000681748/4X
fffffc0000681748: 0000000000000402 0000000000000004 fffffc0000681758: 0000000000000000 0000000000000002 (dbx)
The previous output shows the following:
The first number (402
) specifies the number
of 512-page chunks (4 MB).
The second number (4
) specifies the number
of 64-page chunks.
The third number (0
) specifies the number
of 8-page chunks.
The fourth number (2
) specifies the number
of 1-page chunks.
To save memory, you can reduce the value of the
gh_chunks
attribute until only one or two 512-page chunks are free while the application
that uses shared memory is running.
The following
vm
subsystem attributes also affect
granularity hints:
Specifies the shared memory segment size above which memory is allocated
from the memory reserved by the
gh_chunks
attribute.
The
default is 8 MB.
When set to 1 (the default), the
shmget
function
returns a failure if the requested segment size is larger than the value specified
by the
gh_min_seg_size
attribute, and if there is insufficient
memory in the
gh_chunks
area to satisfy the request.
If the value of the
gh_fail_if_no_mem
attribute is
0, the entire request will be satisfied from the pageable memory area if the
request is larger than the amount of memory reserved by the
gh_chunks
attribute.
gh_keep_sorted
Specifies whether the memory reserved for granularity hints is sorted. The default does not sort reserved memory.
gh_front_alloc
Specifies whether the memory reserved for granularity hints is allocated from low physical memory addresses (the default). This functionality is useful if you have an odd number of memory boards.
In addition, messages will display on the system console indicating unaligned size and attach address requests. The unaligned attach messages are limited to one per shared memory segment.
See
Section 3.6
for information about modifying
kernel subsystem attributes.
6.6.2 Modifying Applications to Use Granularity Hints
You can make granularity hints more effective by making both the shared memory segment starting address and size aligned on an 8-MB boundary.
To share third-level page table entries, the shared memory segment attach
address (specified by the
shmat
function) and the shared
memory segment size (specified by the
shmget
function)
must be aligned on an 8-MB boundary.
This means that the lowest 23 bits of
both the address and the size must be zero.
The attach address and the shared memory segment size is specified by
the application.
In addition, System V shared memory semantics allow a maximum
shared memory segment size of 2 GB minus 1 byte.
Applications that need shared
memory segments larger than 2 GB can construct these regions by using multiple
segments.
In this case, the total shared memory size specified by the user
to the application must be 8-MB aligned.
In addition, the value of the
shm_max
attribute, which specifies the maximum size of a System
V shared memory segment, must be 8-MB aligned.
If the total shared memory size specified to the application is greater
than 2 GB, you can specify a value of 2139095040 (or 0x7f800000) for the value
of the
shm_max
attribute.
This is the maximum value (2
GB minus 8 MB) that you can specify for the
shm_max
attribute
and still share page table entries.
Use the following
dbx
command sequence to determine
if page table entries are being shared:
#
/usr/ucb/dbx -k /vmunix /dev/mem
(dbx)
p *(vm_granhint_stats *)&gh_stats_store
struct { total_mappers = 21 shared_mappers = 21 unshared_mappers = 0 total_unmappers = 21 shared_unmappers = 21 unshared_unmappers = 0 unaligned_mappers = 0 access_violations = 0 unaligned_size_requests = 0 unaligned_attachers = 0 wired_bypass = 0 wired_returns = 0 } (dbx)
For the best performance, the
shared_mappers
kernel
variable should be equal to the number of shared memory segments, and the
unshared_mappers
,
unaligned_attachers
, and
unaligned_size_requests
variables should be zero.
Because of how shared memory is divided into shared memory segments,
there may be some unshared segments.
This occurs when the starting address
or the size is aligned on an 8-MB boundary.
This condition may be unavoidable
in some cases.
In many cases, the value of
total_unmappers
will be greater than the value of
total_mappers
.
Shared memory locking changes a lock that was a single lock into a hashed
array of locks.
The size of the hashed array of locks can be modified by modifying
the value of the
vm
subsystem attribute
vm_page_lock_count
.
The default value is zero.