This chapter describes how the DIGITAL UNIX operating system uses the physical memory installed in the system. This chapter also describes how to configure and tune virtual memory, swap space, and buffer caches. Many of the tuning tasks described in this chapter require you to modify system attributes. See Section 2.11 for more information.
The total amount of physical memory is determined by the capacity of the memory boards installed in your system. The system distributes this memory in 8-KB units called pages.
The system distributes pages of physical memory among three areas:
Wired memory
At boot time, the operating system and the Privileged Architecture Library (PAL) code wire a contiguous portion of physical memory in order to perform basic system operations. Static wired memory is reserved for operating system data and text, system tables, the metadata buffer cache, which temporarily holds recently accessed UNIX File System (UFS) and CD-ROM File System (CDFS) metadata, and the Advanced File System (AdvFS) buffer cache. Static wired memory cannot be reclaimed through paging. You can reduce the amount of static wired memory only by removing subsystems.
In addition, the kernel uses
dynamically wired memory
for dynamically allocated data structures.
User processes also wire memory
for address space.
The amount of dynamically wired memory varies according
to the demand.
The maximum amount is specified by the value of the
vm-syswiredpercent
attribute (the default is 80 percent of physical
memory).
Memory that is dynamically wired cannot be reclaimed through paging.
You can reduce the amount of dynamically wired memory by allocating more kernel
resources to processes (for example, by increasing the value of the
maxusers
attribute).
Virtual memory
The virtual memory subsystem uses a portion of physical memory to cache processes' most-recently accessed anonymous memory and file-backed memory. The subsystem efficiently allocates memory to competing processes and tracks the distribution of all the physical pages. This memory can be reclaimed through paging.
Unified Buffer Cache
The Unified Buffer Cache (UBC) uses a portion of physical memory to cache most-recently accessed file system data. The UBC contains actual file data for reads and writes and for page faults from mapped file regions and also AdvFS metadata. By functioning as a layer between the operating system and the storage subsystem, the UBC can decrease the number of disk operations. This memory can be reclaimed through paging.
Figure 4-1 shows how physical memory is used.
The virtual memory subsystem and the UBC compete for the physical pages that are not wired. Pages are allocated to processes and to the UBC, as needed. When the demand for memory increases, the oldest (least-recently used) pages are reclaimed from the virtual memory subsystem and the UBC and reused. Various attributes control the amount of memory available to the virtual memory subsystem and the UBC and the rate of page reclamation. Wired pages are not reclaimed.
System performance depends on the total amount of physical memory and also the distribution of memory resources. DIGITAL UNIX allows you to control the allocation of memory (other than static wired memory) by modifying the values of system attributes. Tuning memory usually involves the following tasks:
Increasing system resource allocation to improve application performance
Modifying how the system allocates memory and the rate of page reclamation
Modifying how file system data is cached in memory
You can also configure your swap space for optimal performance. However, to determine how to obtain the best performance, you must understand your workload characteristics, as described in Chapter 1.
When programs are executed, the system moves data and instructions among various caches, physical memory, and disk swap space. Accessing the data and instructions occurs at different speeds, depending on the location. Table 4-1 describes the various hardware resources (in the order of fastest to slowest access time).
Resource | Description |
CPU caches | Various caches reside in the CPU chip and vary in size up to a maximum of 64 KB (depending on the type of processor). These caches include the translation lookaside buffer, the high-speed internal virtual-to-physical translation cache, the high-speed internal instruction cache, and the high-speed internal data cache. |
Secondary cache | The secondary direct-mapped physical data cache is external to the CPU, but usually resides on the main processor board. Block sizes for the secondary cache vary from 32 bytes to 256 bytes (depending on the type of processor). The size of the secondary cache ranges from 128 KB to 8 MB. |
Tertiary cache | The tertiary cache is not available on all Alpha CPUs; otherwise, it is identical to the secondary cache. |
Physical memory | The actual amount of physical memory varies. |
Swap space | Swap space consists of one or more disks or disk partitions (block special devices). |
The hardware logic and the PAL code control much of the movement of addresses and data among the CPU cache, the secondary and tertiary caches, and physical memory. This movement is transparent to the operating system. Figure 4-2 shows an overview of how instructions and data are moved among various hardware components during program execution.
Movement between caches and physical memory is significantly faster than movement between disk and physical memory, because of the relatively slow speed of disk I/O. Therefore, avoid paging and swapping operations, and applications should utilize caches when possible. Figure 4-3 shows the amount of time that it takes to access data and instructions from various hardware locations.
For more information on the CPU, secondary cache, and tertiary cache, see the Alpha Architecture Reference Manual.
The virtual memory subsystem performs the following functions:
Allocates memory to processes
Tracks and manages all the pages in the system
Uses paging and swapping to ensure that there is enough memory for processes to run and to cache file system I/O
The following sections describe these functions in detail.
For each process, the
fork
system call performs
the following tasks:
Creates a UNIX process body, which includes a set of data
structures that the kernel uses to track the process and a set of resource
limitations.
See
fork
(2)
for more information.
Allocates a contiguous block of
virtual address space,
which is the array of pages that an application can map into physical memory.
Virtual address space is used for
anonymous memory
(memory used for the stack, heap, or
malloc
function)
and for
file-backed memory
(memory used for program
text or shared libraries).
Pages of anonymous memory are paged in when needed
and paged out when pages must be reclaimed.
Pages of file-backed memory
are paged in when needed and released when pages must be reclaimed.
Creates one or more threads of execution. The default is one thread for each process. Multiprocessing systems support multiple process threads.
Because memory is limited, a process' entire virtual address space cannot be in physical memory at one time. However, a process can execute when only a portion of its virtual address space (its working set) is mapped to physical memory.
For each process, the virtual memory subsystem allocates a large amount of virtual address space but uses only part of this space. Only 4 TB is allocated for user space. User space is generally private and maps to a nonshared physical page. An additional 4 TB of virtual address space is used for kernel space. Kernel space usually maps to shared physical pages. The remaining space is not used for any purpose.
In addition, user space is sparsely
populated with valid pages.
Only valid pages are able to map to physical pages.
The
vm-maxvas
attribute specifies the maximum amount of
valid virtual address space for a process (that is, the sum of all the valid
pages).
The default is 128000 pages (1 GB).
Figure 4-4 shows the use of process virtual address space.
When a virtual page is touched or accessed, the virtual memory subsystem must locate the physical page and then translate the virtual address into a physical address. Each process has a page table, which is an array containing an entry for each current virtual-to-physical address translation. Page table entries have a direct relation to virtual pages (that is, virtual address 1 corresponds to page table entry 1) and contain a pointer to the physical page and protection information.
Figure 4-5 shows the translation of a virtual address into a physical address.
A process' resident set is the complete set of all the virtual addresses that have been mapped to physical addresses (that is, all the pages that have been accessed during process execution). Resident set pages may be shared among multiple processes. A process' working set is the set of virtual addresses that are currently mapped to physical physical addresses. The working set is a subset of the resident set and represents a snapshot of the process' resident set.
When a nonfile-backed virtual address is requested, the virtual memory subsystem locates the physical page and makes it available to the process. This process occurs at different speeds, depending on the location of the page (see Figure 4-3).
If a requested address is currently being used (active), it will have an entry in the page table. In this case, the PAL code loads the physical address into the translation lookaside buffer, which then passes the address to the CPU.
If a requested address is not active in the page table, the PAL lookup code issues a page fault, which instructs the virtual memory subsystem to locate the page and make the virtual-to-physical address translation in the page table.
If a requested virtual address is being accessed for the first time, the virtual memory subsystem performs the following tasks:
Allocates an available page of physical memory.
Fills the page with zeros.
Enters the virtual-to-physical address translation in the page table.
This is called a zero-filled-on-demand page fault.
If a requested virtual address has already been accessed, it will be in one of the following locations:
The virtual memory subsystem's internal data structures
If the physical address is located in the internal data structures (for example, the hash queue list or the page queue list), the virtual memory subsystem enters the virtual-to-physical address translation in the page table. This is called a short page fault.
Swap space
If the virtual address has already been accessed, but the physical page has been reclaimed, the page contents will be found in swap space. The virtual memory subsystem copies the contents of the page from swap space into the physical address and enters the virtual-to-physical address translation in the page table. This is called a page-in page fault.
If a process needs to modify a read-only virtual page, the virtual memory subsystem allocates an available page of physical memory, copies the read-only page into the new page, and enters the translation in the page table. This is called a copy-on-write page fault.
To improve process execution time and decrease the number of page faults, the virtual memory subsystem attempts to anticipate which pages the task will need next. Using an algorithm that checks which pages were most recently used, the number of available pages, and other factors, the subsystem maps additional pages, along with the page that contains the requested address.
The virtual memory subsystem also uses page coloring to reduce execution time. If possible, the subsystem attempts to map a process' entire resident set into the secondary cache. If the entire task, text, and data are executed within the cache, addresses do not have to be fetched from physical memory.
The
private-cache-percent
attribute specifies the percentage of the cache that is reserved
for anonymous (nonshared) memory.
The default is to reserve 50 percent of
the cache for anonymous memory and 50 percent for file-backed memory (shared).
To cache more anonymous memory, increase the value of the
private-cache-percent
attribute.
This attribute is primarily used for benchmarking.
The virtual memory subsystem allocates physical pages to processes and the UBC, as needed. Because physical memory is limited, these pages must be periodically reclaimed so that they can be reused.
The virtual memory subsystem uses page lists to track the location and age of all the physical memory pages. At any one time, each physical page can be found on one of the following lists:
Free list--Pages that are clean and are not being used (the size of this list controls when page reclamation occurs)
Active list--Pages that are being used by the virtual memory subsystem or the UBC
To determine which pages should be reclaimed first, the page-stealer daemon identifies the oldest pages on the active list and designates these least-recently used (LRU) pages as follows:
Inactive pages are the oldest pages that are being used by the virtual memory subsystem.
UBC LRU pages are the oldest pages that are being used by the UBC.
Use the
vmstat
command or
dbx
to
determine the number of pages that are on the page lists.
Remember that pages on the active list (the
act
field
in the
vmstat
output)
include both inactive and UBC LRU pages.
As physical pages are allocated to processes and the UBC, the free list becomes depleted, and pages must be reclaimed in order to replenish the list. To reclaim pages, the virtual memory subsystem does the following:
Prewrites the oldest dirty (modified) pages to swap space
Uses paging to reclaim individual pages
Uses swapping to suspend processes and reclaim a large number of pages
See Section 4.3.5, Section 4.3.6, Section 4.3.8, and Section 4.3.9 for more information about prewriting pages, paging, and swapping.
The virtual memory subsystem attempts to prevent a memory shortage by prewriting modified pages to swap space.
When the virtual memory subsystem anticipates that the pages on
the free list will soon be depleted, it prewrites to swap space
the oldest modified (dirty) inactive pages.
The value of the
vm-page-prewrite-target
attribute
determines the number of pages that the subsystem will prewrite
and keep clean.
The default value is 256 pages.
In addition, when the number of modified UBC LRU pages
exceeds the value of the
vm-ubcdirtypercent
attribute, the virtual memory subsystem prewrites to swap space the
oldest modified UBC LRU pages.
The default value of the
vm-ubcdirtypercent
attribute is
10 percent of the total UBC LRU pages.
To minimize the impact of
sync
(steady state flushes)
when prewriting UBC pages, the
ubc-maxdirtywrites
attribute specifies the maximum number of disk writes that the kernel
can perform each second.
The default value is 5.
See Section 4.7.13 for more information about prewriting dirty pages.
When the demand for memory depletes the free list, paging begins. The virtual memory subsystem takes the oldest inactive and UBC LRU pages, moves the contents of the modified pages to swap space, and puts the clean pages on the free list, where they can be reused.
If the free page list cannot be replenished by reclaiming individual pages, swapping begins. Swapping temporarily suspends processes and moves entire resident sets to swap space, which frees large amounts of physical memory.
The point at which paging and swapping start and stop depends on the values of some virtual memory subsystem attributes. Figure 4-6 shows the default values of these attributes.
Detailed descriptions of the attributes are as follows:
vm-page-free-target
--Paging starts when the number of pages on the free list is less than
this value (the default is 128 pages).
vm-page-free-min
--Specifies the threshold
at which a page must be reclaimed for each page allocated (the default
is 20 pages).
vm-page-free-swap
--Idle
task swapping starts when the number of pages on the free list
is less than this value for a period of time (the default is 74 pages).
vm-page-free-optimal
--Hard
swapping starts when the number of pages on the free list is less than
this value for five seconds (the default is 74 pages).
The first processes to be swapped out include those with the lowest
scheduling priority and those with the largest resident set size.
vm-page-free-hardswap
--Swapping
stops when the number of pages on the free list is more
than this value (the default is 1280 pages).
vm-page-free-reserved
--Only
privileged tasks can get memory when the number of pages on the
free list is less than this value (the default is 10 pages).
See Section 4.3.8 and Section 4.3.9 for information about paging and swapping operations.
Because the UBC shares with the virtual memory subsystem the physical pages that are not wired by the kernel, the allocation of memory to the UBC can affect file system performance and paging and swapping activity. The UBC is dynamic and consumes varying amounts of memory in order to respond to changing file system demands.
Figure 4-7 shows how memory is allocated to the UBC.
The following attributes control the amount of memory available to the UBC:
Specifies the minimum percentage of memory that the UBC can utilize. The default is 10 percent.
Specifies the maximum percentage of memory that the UBC can utilize. The default is 100 percent.
ubc-borrowpercent
attribute
Specifies the UBC borrowing threshold.
The default is 20 percent.
From the value of the
ubc-borrowpercent
attribute to the value of the
ubc-maxpercent
attribute, the UBC is only borrowing
memory from the virtual memory subsystem.
When paging starts,
pages are first reclaimed from the UBC until the amount of memory
allocated to the UBC reaches the value of the
ubc-borrowpercent
attribute.
When the memory demand is high and the number of pages on the free page
list reaches the value of the
vm-page-free-target
attribute, the virtual memory subsystem uses paging to replenish the free
page list.
The page reclamation code controls paging and swapping.
The page-out daemon and task swapper daemon are extensions of the page
reclamation code.
See
Section 4.3.6
for more
information about the attributes that control paging and swapping.
The page reclamation code activates the page-stealer daemon, which first reclaims the pages that the UBC has borrowed from the virtual memory subsystem, until the size of the UBC reaches the borrowing threshold (the default is 20 percent). If the reclaimed pages are dirty (modified), their contents must be written to disk before the pages can be moved to the free page list. Freeing borrowed UBC pages is a fast way to reclaim pages, because UBC pages are usually unmodified. See Section 4.3.7 for more information about UBC borrowed pages.
If freeing UBC borrowed memory does not sufficiently replenish the free list, a pageout occurs. The page-stealer daemon reclaims the oldest inactive and UBC LRU pages.
Paging becomes increasingly aggressive if the number of free pages
continues to decrease.
If the number of pages on the free page list falls below the value of
the
vm-page-free-min
attribute (the default is 20
pages), a page must be reclaimed for each page allocated.
To
prevent deadlocks, if the number of pages on the free page list falls below
the value of the
vm-page-free-reserved
attribute
(the default is 10 pages), only privileged tasks can get memory until the free page
list is replenished.
Paging stops when the number of pages on the free list reaches the
value of the
vm-page-free-target
attribute.
If paging individual pages does not replenish the free list, swapping is used to free a large amount of memory. See Section 4.3.9 for more information.
Figure 4-8 shows the movement of pages during paging operations.
If there is a high demand for memory, the virtual memory subsystem may be unable to replenish the free list by reclaiming pages. Swapping reduces the demand for physical memory by suspending processes, which dramatically increases the number of pages on the free list. To swap out a process, the task swapper suspends the process, writes its resident set to swap space, and moves the clean pages to the free list.
Idle task swapping begins when the number of pages on the free list falls
below the value of the
vm-page-free-swap
attribute for a period of time (the default is 74 pages).
The task swapper suspends all tasks that have been idle for 30 seconds or more.
If the number of pages on the
free list falls below the value of the
vm-page-free-optimal
attribute (the default is 74 pages) for more than five seconds, hard swapping begins.
The task swapper suspends, one at a time, the tasks with the lowest priority
and the largest resident set size.
Swapping stops when the number of pages on the free list
reaches the value of the
vm-page-free-hardswap
attribute
(the default is 1280).
A
swapin
occurs when the number of pages
on the free list reaches the value of the
vm-page-free-optimal
attribute for
a period of time.
The task's working set is paged in
from swap space and it can now execute.
The value of the
vm-inswappedmin
attribute specifies the
minimum amount of time, in seconds, that a task
must remain in the inswapped state before it can be outswapped.
The default value is 1 second.
Swapping has a serious impact on system performance. You can modify the attributes described in Section 4.3.6 to control when swapping starts and stops.
Increasing the rate of swapping (swapping earlier during page reclamation) increases throughput. As more processes are swapped out, fewer processes are actually executing and more work is done. Although increasing the rate of swapping moves long-sleeping threads out of memory and frees memory, it degrades interactive response time. When an outswapped process is needed, it will have a long latency.
If you decrease the rate of swapping (swap later during page reclamation), you will improve interactive response time, but at the cost of throughput.
To facilitate the movement of data between memory and disk, the virtual memory subsystem uses synchronous and asynchronous swap buffers. The virtual memory subsystem uses these two types of buffers to immediately satisfy a page-in request without having to wait for the completion of a page-out request, which is a relatively slow process.
Synchronous swap buffers are used for page-in page faults and for swap outs. Asynchronous swap buffers are used for asynchronous pageouts and for prewriting modified pages. See Section 4.7.15 and Section 4.7.16 for tuning information.
The DIGITAL UNIX operating system uses the Unified Buffer Cache (UBC) as a layer between the operating system and disk. The UBC holds actual file data, which includes reads and writes from conventional file activity and page faults from mapped file sections, and AdvFS metadata. The cache can improve I/O performance by decreasing the number of disk I/O operations.
The UBC shares with the virtual memory subsystem the physical pages
that are not wired by the kernel.
The maximum and minimum percentages of
memory that the UBC can utilize are specified by the
ubc-maxpercent
attribute (the default is 100 percent) and
the
ubc-minpercent
attribute (the default is 10 percent).
In addition, the
ubc-borrowpercent
attribute
specifies the percentage of
memory allocated to the UBC above which the memory is only borrowed from
the virtual memory subsystem.
The default is 20 percent of physical memory.
See
Section 4.3.7
for more information.
The UBC is dynamic and consumes
varying amounts of memory in order to respond to changing file system
demands.
For example, if file system activity is heavy,
pages will be allocated to the UBC up to the value of the
ubc-maxpercent
attribute.
In contrast, heavy process activity, such as large increases in the
working sets for large executables, will cause the virtual memory
subsystem to reclaim UBC borrowed pages.
Figure 4-7
shows the allocation of physical memory
to the UBC.
The UBC uses a hashed list to quickly locate the physical pages that it is holding. A hash table contains file and offset information that is used to speed lookup operations.
The UBC also uses a buffer to facilitate
the movement of data between memory and disk.
The
vm-ubcbuffers
attribute specifies maximum file system device I/O queue depth
for writes (that is, the number of UBC I/O requests that can be outstanding).
See
Section 4.7.17
for tuning information.
The metadata buffer cache is part of kernel wired memory and is used to cache only UFS and CDFS metadata, which includes file header information, superblocks, inodes, indirect blocks, directory blocks, and cylinder group summaries. The DIGITAL UNIX operating system uses the metadata buffer cache as a layer between the operating system and disk. The cache can improve I/O performance by decreasing disk I/O operations.
The metadata buffer cache is configured at boot time and uses
bcopy
routines to move data in and out of memory.
The size of the metadata buffer cache is specified by the value of the
bufcache
attribute.
See
Section 4.9
for tuning information.
The following sections describe how to configure memory and swap space, which includes the following tasks:
Determining how much physical memory your system requires (Section 4.6.1)
Determining how much swap space you need (Section 4.6.2)
Choosing a swap space allocation mode (Section 4.6.3)
This section describes how to determine your system's memory requirements. The amount of memory installed in your system must be able to provide an acceptable level of user and application performance.
To determine your system's memory requirements, you must gather the following information:
The amount of memory that will be wired
The amount of memory that the virtual memory subsystem requires to cache the anonymous regions of process data
The amount of memory that the UBC requires to cache file system data
See Section 4.6.2 for information about swap space requirements.
Your system's performance depends on the swap space configuration. DIGITAL recommends a minimum of 128 MB for swap space.
To calculate the swap space required by your system and workload, compare the total modifiable virtual address space (anonymous memory) required by your processes with the total amount of physical memory. Modifiable virtual address space holds data elements and structures that are modified during process execution, such as heap space, stack space, and data space.
To calculate swap space requirements if you are using immediate mode, total the anonymous memory requirements for all processes and then add 10 percent of that value. If you are using deferred mode, total the anonymous memory requirements for all processes and then divide by two.
Application messages, such as the following, usually indicate that not
enough swap space is configured into the system or that a process limit has
been reached:
lack of paging space
" swap space below 10 percent free
"
Use multiple disks for swap space. The page reclamation code uses a form of disk striping (known as swap space interleaving) so that pages can be written to the multiple disks. To optimize swap space, ensure that all your swap disks are configured when you boot the system, instead of adding swap space while the system is running.
Use the
swapon -s
command to display your swap space configuration.
The first line displayed
is the total allocated swap space.
Use the
iostat
to display disk usage.
The following list describes how to configure swap space for high performance:
Configure all of your swap space at boot time
Use fast disks for swap space to decrease page fault latency
Do not use busy disks for swap space
Spread out your swap space across multiple disks (never put multiple swap partitions on the same disk)
Spread out your swap disks across multiple I/O buses to prevent a single bus from becoming a bottleneck
Use the Logical Storage Manager (LSM) to stripe your swap disks
See Chapter 5 for more information about configuring and tuning swap disks for high performance and availability.
There are two methods that you can use to allocate swap space. The methods differ in the point in time at which the virtual memory subsystem reserves swap space for a process. There is no performance benefit attached to either method; however, deferred mode is recommended for very-large memory/very-large database (VLM/VLDB) systems. The swap allocation methods are as follows:
Immediate mode--Swap space is reserved when modifiable virtual address space is created. Immediate mode is often referred to as eager mode and is the default swap space allocation mode.
Anonymous memory is memory that is not backed by a file, but is
backed by swap space (for example, stack space, heap space, and memory
allocated by the
malloc
or
sbrk
routines).
When anonymous memory is allocated,
the operating system reserves swap space for the memory.
Usually, this
results in an unnecessary amount of reserved swap space.
Immediate mode requires more swap space than deferred mode, but it ensures
that the swap space will be available to processes when it is needed.
Deferred mode--Swap space is not reserved until the virtual memory subsystem needs to write a modified virtual page to swap space. Deferred mode is sometimes referred to as lazy mode.
Deferred mode requires less swap space than immediate mode and causes the system to run faster because it requires less swap space bookkeeping. It postpones the reservation and allocation of swap space for anonymous memory until it is needed. However, because deferred mode does not reserve swap space in advance, the swap space may not be available when a task needs it, and the process may be killed asynchronously.
You can enable the deferred swap space allocation mode
by removing or moving the
/sbin/swapdefault
file.
See the System Administration manual for more information on swap space allocation methods.
The virtual memory subsystem is a primary source of performance problems. Performance may degrade if the virtual memory subsystem cannot keep up with the demand for memory and excessive paging and swapping occurs. A memory bottleneck may cause a disk I/O bottleneck, because excessive paging and swapping decreases performance and indicates that the natural working set size has exceeded the available memory. The virtual memory subsystem runs at a high priority when servicing page faults, which blocks the execution of other processes.
If you have excessive page-in and page-out activity from a swap partition, the system may have a high physical memory commitment ratio. Excessive paging also can increase the miss rate for the secondary cache, and may be indicated by the following output:
The output of the
vmstat
shows a very
low free page count or shows high page-in and page-out activity.
See
Section 2.4.2
for more information.
The output of the
ps
command shows high
task swapping activity.
See
Section 2.4.1
for more information.
The output of the
iostat
command shows
excessive swap disk I/O activity.
See
Section 2.5.1
for more information.
The tuning recommendations that will provide the best performance benefit involve the following two areas:
System resource allocation
Increasing the available address space
Increasing the kernel resources available to processes
Memory allocation and page reclamation
Modifying the percentage of memory allocated to the UBC
Changing the rate of swapping
Changing how the system prewrites modified inactive pages
Table 4-2 describes the primary tuning tasks guidelines and lists the performance benefits as well as tradeoffs.
Action | Performance Benefit | Tradeoff |
Reduce the number of processes running at the same time (Section 4.7.1) | Reduces demand for memory | None |
Reduce the static size of the kernel (Section 4.7.2) | Reduces demand for memory | None |
Increase the available address space (Section 4.7.3) | Improves performance for memory-intensive processes | Slightly increases the demand for memory |
Increase the available system resources (Section 4.7.4) | Improves performance for memory-intensive processes | Increases wired memory |
Increase the maximum number of memory-mapped files that are available to a process (Section 4.7.5) | Increases file mapping and improves performance for memory-intensive processes, such as Internet servers | Consumes memory |
Increase the maximum number of virtual pages within a process' address space that can have individual protection attributes (Section 4.7.6) | Improves performance for memory-intensive processes and for Internet servers that maintain large tables or resident images | Consumes memory |
Increase the size of a System V message and queue (Section 4.7.7) | Improves performance for memory-intensive processes | Consumes memory |
Increase the maximum size of a single System V shared memory region (Section 4.7.8) | Improves performance for memory-intensive processes | Consumes memory |
Increase the minimum size of a System V shared memory segment (Section 4.7.9) | Improves performance for VLM and VLDB systems | Consumes memory |
Reduce process memory requirements (Section 4.7.10) | Reduces demand for memory | None |
Reduce the amount of physical memory available to the UBC (Section 4.7.11) | Provides more memory resources to processes | May degrade file system performance |
Increase the rate of swapping (Section 4.7.12) | Frees memory and increases throughput | Decreases interactive response performance |
Decrease the rate of swapping (Section 4.7.12) | Improves interactive response performance | Decreases throughput |
Increase the rate of dirty page prewriting (Section 4.7.13) | Prevents drastic performance degradation when memory is exhausted | Decreases peak workload performance |
Decrease the rate of dirty page prewriting (Section 4.7.13) | Improves peak workload performance | May cause drastic performance degradation when memory is exhausted |
If the previous tasks do not sufficiently improve performance, there are advanced tuning tasks that you can perform. The advanced tuning tasks include the following:
Modify the sizes of the page-in and page-out clusters
Modify the swap device I/O queue depth
Modify the amount of memory the UBC uses to cache large files
Increase the paging threshold
Enable aggressive task swapping
Decrease the size of the file system caches
Reserve memory at boot time for shared memory
Table 4-3 describes the advanced tuning tasks guidelines and lists the performance benefits as well as tradeoffs.
Action | Performance Benefit | Tradeoff |
Increase the size of the page-in and page-out clusters (Section 4.7.14) | Improves peak workload performance | Decreases total system workload performance |
Decrease the size of the page-in and page-out clusters (Section 4.7.14) | Improves total system workload performance | Decreases peak workload performance |
Increase the swap device I/O queue depth for pageins and swapouts (Section 4.7.15) | Increases overall system throughput | Consumes memory |
Decrease the swap device I/O queue depth for pageins and swapouts (Section 4.7.15) | Improves the interactive response time and frees memory | Decreases system throughput |
Increase the swap device I/O queue depth for pageouts (Section 4.7.16) | Frees memory and increases throughput | Decreases interactive response performance |
Decrease the swap device I/O queue depth for pageouts (Section 4.7.16) | Improves interactive response time | Consumes memory |
Increase the UBC write device queue depth (Section 4.7.17) | Increases overall file system throughput and frees memory | Decreases interactive response performance |
Decrease the UBC write device queue depth (Section 4.7.17) | Improves interactive response time | Consumes memory |
Increase the amount of UBC memory used to cache a large file (Section 4.7.18) | Improves large file performance | May allow a large file to consume all the pages on the free list |
Decrease the amount of UBC memory used to cache a large file (Section 4.7.18) | Prevents a large file from consuming all the pages on the free list | May degrade large file performance |
Increase the paging threshold (Section 4.7.19) | Maintains performance when free memory is exhausted | May waste memory |
Enable aggressive swapping (Section 4.7.20) | Improves system throughput | Degrades interactive response performance |
Decrease the size of the metadata buffer cache (Section 4.7.21) | Provides more memory resources to processes on large systems | May degrade UFS performance |
Decrease the size of the namei cache (Section 4.7.22) | Decreases demand for memory | May slow lookup operations and degrade file system performance |
Decrease the amount of memory allocated to the AdvFS cache (Section 4.7.23) | Provides more memory resources to processes | May degrade AdvFS performance |
Reserve physical memory for shared memory (Section 4.7.24) | Improves shared memory detach time | Decreases the memory available to the virtual memory subsystem and the UBC |
The following sections describe these guidelines in detail.
You can improve performance and reduce the demand for memory by running
fewer applications simultaneously.
Use the
at
or the
batch
command to run applications at offpeak hours.
You can reduce the static size of the kernel by deconfiguring any unnecessary
subsystems.
Use the
setld
command to display the installed
subsets and to delete subsets.
Use the
sysconfig
command to display the configured
subsystems and to delete subsystems.
If your applications are memory-intensive, you may want to increase the available address space. Increasing the address space will cause only a small increase in the demand for memory. However, you may not want to increase the address space if your applications use many forked processes.
The following attributes determine the available address space for processes:
This attribute controls the maximum amount of virtual address space available to a process. The default value is 1 GB (1073741824). For Internet servers, you may want to increase this value to 10 GB.
per-proc-address-space
and
max-per-proc-address-size
These attributes control the maximum amount of user process address space, which is the maximum number of valid virtual regions. The default value for both attributes is 1 GB.
per-proc-stack-size
and
max-per-proc-stack-size
These attributes control the maximum size of a user process stack.
The default value of the
per-proc-stack-size
attribute
is 2097152 bytes.
The default value of the
max-per-proc-stack-size
attribute is 33554432 bytes.
You may need to increase these values if you receive
cannot grow
stack
messages.
per-proc-data-size
and
max-per-proc-data-size
These attributes control the maximum size of a user process data segment.
The default value of the
per-proc-data-size
attribute is
134217728 bytes.
The default value of the
max-per-proc-data-size
is 1 GB.
You can use the
setrlimit
function to control the
consumption of system resources by a parent process and its child processes.
See
setrlimit
(2)
for information.
If your applications are memory-intensive, you may want to increase the system resources that are available to processes. Be careful when increasing the system resources, because this will increase the amount of wired memory in the system.
The following attributes affect system resources:
The
maxusers
attribute specifies the number of simultaneous
users that a system can support without straining system resources.
System
algorithms use the
maxusers
attribute to size various system
data structures, and to determine the amount of space allocated to system
tables, such as the system process table, which is used to determine how many
active processes can be running at one time.
The default value assigned to the
maxusers
attribute
depends on the size of your system.
Increasing the value of the
maxusers
attribute allocates more system resources for use by the
kernel.
However, this also increases the amount of physical memory consumed
by the kernel.
Decreasing the value of the
maxusers
attribute
reduces kernel memory usage, but allocates less system resources to
processes.
If your system experiences a lack of resources (for example,
Out of processes
messages), you can increase the value of the
maxusers
attribute to 512.
A lack of resources may also be indicated
by a
No more processes
error message.
If you have sufficient
memory on a heavily loaded system (for example, more than 96 MB), you can
increase the value of the
maxusers
attribute to 1024.
task-max
The
task-max
attribute specifies the maximum number of
tasks that can run simultaneously.
The default value is 20 + 8 *
maxusers
.
thread-max
The
thread-max
attribute specifies the maximum number of
threads.
The default value is 2 *
task-max
.
The
max-proc-per-user
attribute specifies the maximum
number of processes that can be allocated at any one time to each user,
except superuser.
The default value of the
max-proc-per-user
attribute is 64.
If your system experiences a lack of processes,
you can increase the value of the
max-proc-per-user
attribute.
The value must be more than the maximum number of processes
that will be started by your system.
If you have a Web server, these
processes include CGI processes.
If you plan to run more than 64 Web server daemons
simultaneously, increase the attribute value to 512.
On a very
busy server with sufficient memory, you can use a higher value.
Increasing this value can improve the performance of multiprocess
Web servers.
The
max-threads-per-user
attribute specifies the
maximum
number of threads that can be allocated at any one time to each user,
except superuser.
The default value is 256.
If your system, especially a Web server, experiences a lack of threads,
you can increase the value of the
max-threads-per-user
attribute.
The value must be more than the maximum number of threads
that will be started by your system.
You can increase the value of the
max-threads-per-user
attribute to 512.
On a very busy server
with sufficient memory, you can use a higher value, such as 4096.
Increasing this value can improve the performance of multithreaded
Web servers.
You can use the
setrlimit
function to control the
consumption of system resources by a parent process and its child processes.
See
setrlimit
(2)
for information.
The
vm-mapentries
attribute specifies the maximum
number of memory-mapped files in a user address.
Each map entry describes one unique disjoint portion
of a virtual address space.
The default value is 200.
You may want to increase the value of the
vm-mapentries
attribute for VLM systems.
Because Web servers map files into memory,
for busy systems running multithreaded Web server
software, you may want to increase the value to 20000.
This will
increase the limit on file mapping.
This attribute affects all processes, and increasing its value
will increase the demand for memory.
The
vm-vpagemax
attribute specifies the maximum
number of virtual pages within a process' address space that
can be given individual protection attributes.
These protection attributes
differ from the protection attributes associated
with the other pages in the address space.
Changing the protection attributes of a single page within a virtual memory region causes all pages within that region to be treated as though they had individual protection attributes. For example, each thread of a multithreaded task has a user stack in the stack region for the process in which it runs. Because multithreaded tasks have guard pages (that is, pages that do not have read/write access) inserted between the user stacks for the threads, all pages in the stack region for the process are treated as though they have individual protection attributes.
The default value of the
vm-vpagemax
attribute is determined by dividing the value
of the
vm-maxvas
attribute (the address
space size in bytes) by 8192.
If a stack region for a multithreaded task
exceeds 16 KB pages, you may want to increase the value of the
vm-vpagemax
attribute.
For example, if the value
of the
vm-maxvas
attribute is 1 GB (the default), set
the value of
vm-vpagemax
to 131072 pages
(1073741824/8192=131072).
This value improves the efficiency of Web
servers that maintain large tables or resident images.
You may want to increase the value of the
vm-vpagemax
attribute for VLM systems.
However, this attribute affects all processes, and increasing its value
will increase the demand for memory.
If your applications are memory-intensive or you have a VLM system,
you may want to increase
the value of the
msg-max
attribute.
This attribute
specifies the maximum size of a single System V message.
However, increasing the value of this attribute
will increase the demand for memory.
The default value is 8192 bytes (1 page).
In addition, you may want to increase the
value of the
msg-tql
attribute.
This attribute
specifies the maximum number of messages that can be queued to a single
System V message queue at one time.
However, increasing the value of this attribute
will increase the demand for memory.
The default value is 40.
If your applications are memory-intensive or you have a VLM
system, you may want to increase the
value of the
shm-max
attribute.
This attribute
specifies the maximum size of a single System V shared memory region.
However, increasing the value of this attribute
will increase the demand for memory.
The default value is 4194304 bytes (512 pages).
In addition, you may want to increase the
value of the
shm-seg
attribute.
This attribute
specifies the maximum number of System V shared memory regions that
can be attached to a single process at any point in time.
However, increasing the value of this attribute
will increase the demand for memory.
The default value is 32.
If your applications are memory-intensive, you may want to increase the
value of the
ssm-threshold
attribute.
Page table sharing occurs when the size of a System V
shared memory segment reaches the value specified by this attribute.
However, increasing the value of this attribute
will increase the demand for memory.
You may want to reduce your applications' use of memory to free memory for other purposes. Follow these coding considerations to reduce your applications' use of memory:
Configure and tune applications according to the guidelines provided by the application's installation procedure. For example, you may be able to reduce an application's anonymous memory requirements, set parallel/concurrent processing attributes, size shared global areas and private caches, and set the maximum number of open/mapped files.
Look for data cache collisions between heavily used data
structures, which occur when the distance between two data
structures allocated in memory is equal to the size of the primary
(internal) data cache.
If your data structures are small, you can
avoid collisions by allocating them contiguously in memory.
To do this,
use a single
malloc
call instead of multiple calls.
If an application uses large amounts of data for a short time,
allocate the data dynamically with the
malloc
function instead of declaring it statically.
When you have finished
using dynamically allocated memory, it is freed for use by
other data structures
that occur later in the program.
If you have limited memory resources,
dynamically allocating data reduces an application's memory usage and
can substantially improve performance.
If an application uses the
malloc
function
extensively,
you may be able to improve its processing
speed or decrease its memory utilization by using the function's control
variables to tune memory allocation.
See
malloc
(3)
for details on tuning
memory allocation.
If your application fits in a 32-bit address space and allocates
large
amounts of dynamic memory by using structures that contain many
pointers, you may be able to reduce memory usage by using
the
-xtaso
flag.
The
-xtaso
flag is
supported by all versions of the C
compiler (-newc
,
-migrate
, and
-oldc
versions).
To use the
-xtaso
flag, modify your source code with a C-language pragma that controls
pointer size allocations.
See
cc
(1)
for details.
See the Programmer's Guide for more information on process memory allocation.
You may be able to improve performance by reducing the maximum percentage of memory available for the UBC. If you decrease the maximum size of the UBC, you increase the amount of memory available to the virtual memory subsystem, which may reduce the paging and swapping rate. However, reducing the memory allocated to the UBC may adversely affect I/O performance because the UBC will hold less file system data, which results in more disk I/O operations. Therefore, do not significantly decrease the maximum size of the UBC.
The maximum amount of memory that can be allocated to the UBC
is specified by the
ubc-maxpercent
attribute.
The default is 100 percent.
The minimum amount of memory that can be allocated to the UBC is specified
by the
ubc-minpercent
attribute.
The default is 10
percent.
If you have an Internet server, use these default values.
If the page-out
rate is high and you are not using the file system heavily, decreasing
the value of the
ubc-maxpercent
attribute may reduce
the rate of paging and swapping.
Start with the default value
of 100 percent and decrease the value in increments of 10.
If the values of the
ubc-maxpercent
and
ubc-minpercent
attributes are close together, you may
seriously degrade I/O performance or cause the system to page excessively.
Use the
vmstat
command to determine whether the system is paging excessively.
Using
dbx
, periodically examine the
vpf_pgiowrites
and
vpf_ubcalloc
fields of the
vm_perfsum
kernel structure.
The page-out rate may shrink if pageouts greatly exceed
UBC allocations.
You also may be able to prevent paging by increasing the percentage of
memory that the UBC borrows from the virtual memory subsystem.
To do this, decrease the value of the
ubc-borrowpercent
attribute.
Decreasing the value of the
ubc-borrowpercent
attribute
allows less memory to remain in the UBC when page reclamation
begins.
This can reduce the UBC effectiveness,
but may improve the system response time when a low-memory condition
occurs.
The value of the
ubc-borrowpercent
attribute
can range from 0 to 100.
The default value is 20 percent.
Swapping has a drastic impact on system performance. You can modify attributes to control when swapping begins and ends. Increasing the rate of swapping (swapping earlier during page reclamation), moves long-sleeping threads out of memory, frees memory, and increases throughput. As more processes are swapped out, fewer processes are actually executing and more work is done. However, when an outswapped process is needed, it will have a long latency, so increasing the rate of swapping will degrade interactive response time.
In contrast, if you decrease the rate of swapping (swap later during page reclamation), you will improve interactive response time, but at the cost of throughput.
To increase the rate of swapping, increase the value of the
vm-page-free-optimal
attribute (the default is 74 pages).
Increase the value only by 2 pages at a time.
Do not specify a value
that is more than the value of the
vm-page-free-target
attribute (the default is 128).
To decrease the rate of swapping, decrease the value of the
vm-page-free-optimal
attribute by 2 pages at a time.
Do not specify a value that is less than the value of the
vm-page-free-min
attribute (the default is 20).
The virtual memory subsystem attempts to prevent a memory shortage by prewriting modified pages to swap space. When the virtual memory subsystem anticipates that the pages on the free list will soon be depleted, it prewrites to swap space the oldest modified (dirty) pages on the inactive list. To reclaim a page that has been prewritten, the virtual memory subsystem only needs to validate the page.
Increasing the rate of dirty page prewriting will reduce peak workload performance, but it will prevent a drastic performance degradation when memory is exhausted. Decreasing the rate will improve peak workload performance, but it will cause a drastic performance degradation when memory is exhausted.
You can control the rate of dirty page prewriting by modifying the
values of the
vm-page-prewrite-target
attribute and the
vm-ubcdirtypercent
attribute.
The
vm-page-prewrite-target
attribute
specifies the number of virtual memory pages that the subsystem will prewrite
and keep clean.
The default value is 256 pages.
To increase the rate of virtual memory dirty page prewriting, increase the
value of the
vm-page-prewrite-target
attribute from
the default value (256) by increments of 64 pages.
The
vm-ubcdirtypercent
attribute specifies the percentage
of
UBC LRU pages that can be modified before the virtual memory subsystem
prewrites the dirty UBC LRU pages.
The default value is
10 percent of the total UBC LRU pages (that is, 10 percent of the UBC LRU
pages must be dirty before the UBC LRU pages are prewritten).
To increase the rate of UBC LRU dirty page prewriting, decrease the value
of
the
vm-ubcdirtypercent
attribute by increments of 1 percent.
In addition, you may want to minimize the impact of I/O spikes caused by the
sync
function when prewriting UBC LRU dirty pages.
The value of the
ubc-maxdirtywrites
attribute
specifies the
maximum number of disk writes that the kernel can perform each second.
The default value of the
ubc-maxdirtywrites
attribute
is 5 I/O operations per second.
To minimize the impact of
sync
(steady state flushes)
when prewriting dirty UBC LRU pages, increase the value
of the
ubc-maxdirtywrites
attribute.
The virtual memory subsystem reads in and writes out additional pages in an attempt to anticipate pages that it will need.
The
vm-max-rdpgio-kluster
attribute specifies the
maximum size of an anonymous page-in cluster.
The default value is 16 KB
(2 pages).
If you increase the value of this attribute, the system will spend
less time page faulting because more pages will be in memory.
This will increase
the peak workload performance, but will consume more memory and decrease
the total system workload performance.
Decreasing the value of the
vm-max-rdpgio-kluster
attribute will conserve memory and increase the total system workload performance,
but will increase paging and decrease the peak workload performance.
The
vm-max-wrpgio-kluster
attribute specifies the maximum size of an
anonymous page-out cluster.
The default value is 32 KB (4 pages).
Increasing
the value of this attribute improves the peak workload performance and conserves
memory, but causes more pageins and decreases the total system workload performance.
Decreasing the value of the
vm-max-wrpgio-kluster
attribute improves the total system workload performance and decreases the
number of pageins, but decreases the peak workload performance and consumes
more memory.
Synchronous swap buffers are used for page-in page faults and
for swapouts.
The
vm-syncswapbuffers
attribute specifies
the maximum swap device I/O queue depth for pageins and swapouts.
You can modify the value of the
vm-syncswapbuffers
attribute.
The value should be equal to the approximate number of simultaneously
running processes that the system can easily handle.
The default is 128.
Increasing the swap device I/O queue depth increases overall system throughput, but consumes memory.
Decreasing the swap device I/O queue depth decreases memory demands and improves interactive response time, but decreases overall system throughput.
Asynchronous swap buffers are used for asynchronous pageouts and for
prewriting modified pages.
The
vm-asyncswapbuffers
attribute controls the
maximum depth of the swap device I/O queue for pageouts.
The value of the
vm-asyncswapbuffers
attribute
should be the approximate number of I/O transfers
that a swap device can handle at one time.
The default value is 4.
Increasing the queue depth will free memory and increase the overall system throughput.
Decreasing the queue depth will use more memory, but will improve the interactive response time.
If you are using LSM, you may want to increase the page-out rate.
Be careful if you increase the value of the
vm-asyncswapbuffers
attribute, because this will cause page-in requests to lag
asynchronous page-out requests.
The UBC uses a buffer to facilitate
the movement of data between memory and disk.
The
vm-ubcbuffers
attribute specifies the maximum file system device I/O queue
depth for writes.
The default value is 256.
Increasing the UBC write device queue depth frees memory and increases the overall file system throughput.
Decreasing the UBC write device queue depth increases memory demands, but improves the interactive response time.
If a large file completely fills the UBC, it may take all of the pages
on the free page list, which may cause the system to page excessively.
The
vm-ubcseqpercent
attribute specifies the
maximum amount of memory allocated to the UBC that can be used to cache a
file.
The default value is 10 percent of memory allocated to the UBC.
The
vm-ubcseqstartpercent
attribute specifies the
size of the UBC as a percentage of physical memory, at which time the
virtual memory subsystem starts stealing the
UBC LRU pages for a file to satisfy the demand for pages.
The default is 50 percent of physical memory.
Increasing the value of the
vm-ubcseqpercent
attribute will improve the performance of a large single file,
but decrease the remaining amount of memory.
Decreasing the value of the
vm-ubcseqpercent
attribute will increase the available memory, but will degrade the
performance of a large single file.
To force the system to reuse the pages in the UBC instead of taking pages from the free list, perform the following tasks:
Make the maximum size of the UBC greater than the size of
the UBC as a
percentage of percentage of memory.
That is, the value of the
ubc-maxpercent
attribute (the default is 100 percent)
must be greater than the value of the
vm-ubcseqstartpercent
attribute (the default is 50 percent).
Make the value of the
vm-ubcseqpercent
attribute,
which specifies the size of a file as a percentage of the UBC, greater
than a referenced file.
The default value of the
vm-ubcseqpercent
attribute is 10 percent.
For example, using the default values, the UBC would have to be larger than 50 percent of all memory and a file would have to be larger than 10 percent of the UBC (that is, the file size would have to be at least 5 percent of all memory) in order for the system to reuse the pages in the UBC.
On large-memory systems that are doing a lot of file system operations,
you may want to lower the
vm-ubcseqstartpercent
value
to 30 percent.
Do not specify a lower value unless you decrease
the size of the UBC.
In this case, do not change the value of the
vm-ubcseqpercent
attribute.
The
vm-page-free-target
attribute specifies the minimum
number of pages on the free list before paging starts.
The default
value is 128 pages.
Increasing the value of the
vm-page-free-target
attribute will increase the
paging activity but may improve performance when free memory is exhausted.
If you increase the value, start at the default value (128 pages or 1 MB)
and then double the value.
Do not specify a value above 1025 pages or 8 MB.
A high value can waste memory.
Do not decrease the value of the
vm-page-free-target
attribute unless you have a lot of memory or you experience a serious
performance degradation when free memory is exhausted.
You can enable the
vm-aggressive
attribute (set the value
to 1) to allow the virtual memory subsystem to aggressively swap out processes
when memory is needed.
This improves system throughput, but degrades the
interactive response performance.
By default, the
vm-aggressive
attribute is disabled
(set
to 0), which results in less aggressive swapping.
In this case,
processes are swapped in at a faster rate than if aggressive swapping is
enabled.
The metadata buffer cache contains recently accessed UFS and CDFS metadata. On large-memory systems with a high cache hit rate, you may want to decrease the size of the metadata buffer cache. This will increase the amount of memory that is available to the virtual memory subsystem. However, decreasing the size of the cache may degrade UFS performance.
The
bufcache
attribute specifies the percentage of
physical memory that the kernel wires for the metadata buffer cache.
The default size of the metadata buffer cache is 3 percent of physical
memory.
You can decrease the value of the
bufcache
attribute to a minimum of 1 percent.
For systems that use only AdvFS, set the value of the
bufcache
attribute to 1 percent.
The namei cache is used by all file systems to map file pathnames to
inodes.
Use
dbx
to monitor the cache by examining the
nchstats
structure.
To free memory resources, decrease the number of elements in the namei
cache by decreasing the value of the
name-cache-size
attribute.
The default values are 2*nvnode
*11/10
(for 32-MB or larger systems) and 150 (for 24-MB systems).
The maximum value is 2*max-vnodes
*11/10.
To free memory resources, you may want to decrease the percentage of physical memory allocated to the AdvFS buffer cache.
The
AdvfsCacheMaxPercent
attribute determines the maximum amount of physical memory that can be used
for the AdvFS buffer cache.
The default is 7 percent of memory.
However, decreasing the size of the AdvFS buffer cache may adversely affect
AdvFS I/O performance.
Granularity hints allow you to reserve a portion of dynamically wired physical memory at boot time for shared memory. Granularity hints allow the translation lookaside buffer to map more than a single page and enable shared page table entry functionality, which will cause fewer buffer misses.
On typical database servers, using granularity hints provides a 2 to 4 percent run-time performance gain that reduces the shared memory detach time. In most cases, use the Segmented Shared Memory (SSM) functionality (the default) instead of the granularity hints functionality.
To enable granularity hints, you must specify a value for the
gh-chunks
attribute.
To make granularity hints more
effective, modify applications to ensure that both the shared
memory segment starting address and size are
aligned on an 8-MB boundary.
Section 4.7.24.1 and Section 4.7.24.2 describe how to enable granularity hints.
To use granularity hints, you must specify the number of 4-MB chunks of physical memory to reserve for shared memory at boot time. This memory cannot be used for any other purpose and cannot be returned to the system or reclaimed.
To reserve memory for shared memory, specify a nonzero value for the
gh-chunks
attribute.
For example, if you want to reserve
4 GB of memory, specify 1024 for the value of
gh-chunks
(1024 * 4 MB = 4 GB).
If you specify a value of 512, you will reserve
2 GB of memory.
The value you specify for the
gh-chunks
attribute
depends on your database application.
Do not reserve an excessive
amount of memory, because reserving memory decreases the memory available
to the virtual memory subsystem and the UBC.
You can determine if you have reserved the appropriate amount of memory.
For example, you can initially specify 512 for the value of the
gh-chunks
attribute.
Then, invoke the
following sequence of
dbx
commands while running the
application that allocates shared memory:
# dbx -k /vmunix /dev/mem (dbx) px &gh_free_counts 0xfffffc0000681748 (dbx) 0xfffffc0000681748/4X fffffc0000681748: 0000000000000402 0000000000000004 fffffc0000681758: 0000000000000000 0000000000000002 (dbx)
The output shows the following:
The first number (402) specifies the number of 512-page chunks (4 MB).
The second number (4) specifies the number of 64-page chunks.
The third number (0) specifies the number of 8-page chunks.
The fourth number (2) specifies the number of 1-page chunks.
To save memory, you can reduce the value of the
gh-chunks
attribute until only one or two 512-page chunks are free while the application
that uses shared memory is running.
The following attributes also affect granularity hints:
Specifies the shared memory segment size above which memory is allocated
from the memory reserved by the
gh-chunks
attribute.
The
default is 8 MB.
When set to 1 (the default), the
shmget
function
returns a failure if the requested segment size is larger than the value
specified by the
gh-min-seg-size
attribute, and if there
is insufficient memory in the
gh-chunks
area to satisfy
the request.
If the value of the
gh-fail-if-no-mem
attribute is
0, the entire request will be satisfied from the pageable memory area if the
request is larger than the amount of memory reserved by the
gh-chunks
attribute.
In addition, messages will display on the system console indicating unaligned size and attach address requests. The unaligned attach messages are limited to one per shared memory segment.
You can make granularity hints more effective by making both the shared memory segment starting address and size aligned on an 8-MB boundary.
To share Level 3 page table entries, the shared memory segment attach
address (specified by the
shmat
function) and the shared
memory segment size (specified by the
shmget
function)
must be aligned on an 8-MB boundary.
This means that the lowest 23 bits
of both the address and the size must be zero.
The attach address and the shared memory segment size is specified
by the application.
In addition, System V shared memory semantics allow a
maximum shared memory segment size of 2 GB minus 1 byte.
Applications that
need shared memory segments larger than 2 GB can construct these regions by
using multiple segments.
In this case, the total shared memory size specified
by the user to the application must be 8-MB aligned.
In addition, the value
of the
shm-max
attribute, which specifies the maximum size
of a System V shared memory segment, must be 8-MB aligned.
If the total shared memory size specified to the application is greater
than 2 GB, you can specify a value of 2139095040 (or 0x7f800000) for the
value of the
shm-max
attribute.
This is the maximum value
(2 GB minus 8 MB) that you can specify for the
shm-max
attribute and still share page table entries.
Use the following
dbx
command sequence to determine
if page table entries are being shared:
# dbx -k /vmunix /dev/mem (dbx) p *(vm_granhint_stats *)&gh_stats_store struct { total_mappers = 21 shared_mappers = 21 unshared_mappers = 0 total_unmappers = 21 shared_unmappers = 21 unshared_unmappers = 0 unaligned_mappers = 0 access_violations = 0 unaligned_size_requests = 0 unaligned_attachers = 0 wired_bypass = 0 wired_returns = 0 } (dbx)
For the best performance, the
shared_mappers
kernel variable should be equal to the number of shared memory segments,
and the
unshared_mappers
,
unaligned_attachers
, and
unaligned_size_requests
variables should
be 0 (zero).
Because of how shared memory is divided into shared memory
segments, there may be some unshared segments.
This occurs when the
starting address or the size is aligned on an 8-MB boundary.
This condition may be unavoidable in some cases.
In many cases, the
value of
total_unmappers
will be greater than
the value of
total_mappers
.
Shared memory locking changes a lock that was a single lock into a hashed
array of locks.
The size of the hashed array of locks can be modified by
modifying the value of the
vm-page-lock-count
attribute.
The default value is 64.
The UBC and the virtual memory subsystem compete for the physical memory that is not wired by the kernel. You may be able to improve file system performance by tuning the UBC. However, increasing the amount of memory available to the UBC will affect the virtual memory subsystem and may increase the rate of paging and swapping.
The amount of memory allocated to the UBC is determined by the
ubc-maxpercent,
ubc-minpercent,
and
ubc-borrowpercent
attributes.
You may be able to improve performance
by modifying the value of these attributes, which are described in
Section 4.4.
The following output may indicate that the size of the UBC is too small for your configuration:
The output of the
vmstat
or
monitor
command shows excessive file system page in activity but little
or no page out activity or shows a very low free page count.
The output of the
iostat
command shows
little or no swap
disk I/O activity or shows excessive file system I/O activity.
The UBC is flushed by the
update
daemon.
You can monitor the UBC usage lookup hit ratio by using
dbx
.
You
can view UBC statistics by using
dbx
and checking
the
vm_perfsum
structure.
You can also monitor the UBC by using
dbx -k
and examining the
ufs_getapage_stats
structure.
See
Chapter 2
for information about
monitoring the UBC.
You can improve UBC performance by following the guidelines described in Table 4-4. You can also improve file system performance by following the guidelines described in Chapter 5.
Action | Performance Benefit | Tradeoff |
Increase the memory allocated to the UBC (Section 4.8.1) | Improves file system performance | May cause excessive paging and swapping |
Decrease the amount of memory borrowed by the UBC (Section 4.8.2) | Improves file system performance | Decreases the memory available for processes and may decrease system response time |
Increase the minimum size of the UBC (Section 4.8.3) | Improves file system performance | Decreases the memory available for processes |
Modify the application to use
mmap
(Section 4.8.4) |
Decreases memory requirements | None |
Increase the UBC write device queue depth (Section 4.7.17) | Increases overall file system throughput and frees memory | Decreases interactive response performance |
Decrease the UBC write device queue depth (Section 4.7.17) | Improves interactive response time | Consumes memory |
The following sections describe these guidelines in detail.
If there is an insufficient amount of memory allocated to the UBC, I/O performance may be degraded. If you allocate more memory to the UBC, you will improve the chance that data will be found in the cache. By preventing the system from having to copy data from a disk, you may improve I/O performance. However, allocating more memory to the UBC may cause excessive paging and swapping.
To increase the maximum amount of memory allocated to the UBC, you
can increase the value of the
ubc-maxpercent
attribute.
The default value is 100 percent.
However, the performance of an application that generates a lot of
random I/O will not be improved by enlarging the UBC because the next
access location for random I/O cannot be predetermined.
See
Section 4.3.7
for information about UBC memory allocation.
If
vmstat
output shows excessive paging but few or
no pageouts, you may want to increase the value of the
ubc-borrowpercent
attribute.
This situation can occur on low-memory systems (24-MB systems) because
they reclaim UBC pages more aggressively than systems with more memory.
The UBC borrows all physical memory above the
value of the
ubc-borrowpercent
attribute and
up to the value of the
ubc-maxpercent
attribute.
Increasing the value of the
ubc-borrowpercent
attribute
allows more memory to remain in the UBC when page reclamation
begins.
This can increase the UBC cache effectiveness,
but may degrade system response time when a low-memory condition
occurs.
The value of the
ubc-borrowpercent
attribute
can range from 0 to 100.
The default value is 20 percent.
See
Section 4.3.7
for information about UBC memory allocation.
Increasing the value of the
ubc-minpercent
attribute
will prevent large programs from completely filling the UBC.
For I/O servers, you may want to raise the value of the
ubc-minpercent
attribute to ensure that memory is
available for the UBC.
The default value is 10 percent.
To ensure that the value of the
ubc-minpercent
is
appropriate, use the
vmstat
command to examine
the page-out rate.
If the values of the
ubc-maxpercent
and
ubc-minpercent
attributes are close together, you may
degrade I/O performance or cause the system to page excessively.
See
Section 4.3.7
for information about UBC memory allocation.
You may want to use the
mmap
function instead of the
read
or
write
function in your
applications.
The
read
and
write
system calls require a page of buffer memory and a page of UBC memory, but
mmap
requires only one page of memory.
A portion of physical memory is wired for use by the metadata buffer cache, which is the traditional BSD buffer cache. The file system code that deals with UFS metadata, which includes directories, indirect blocks, and inodes, uses this cache.
You may be able to improve UFS performance by following the guidelines described in Table 4-5.
Action | Performance Benefit | Tradeoff |
Increase the memory allocated to the metadata buffer cache (Section 4.9.1) | Improves UFS performance | Reduces the memory available to the virtual memory subsystem and the UBC |
Increase the size of the hash chain table (Section 4.9.2) | Improves lookup speed | Consumes memory |
The following sections describe these guidelines in detail.
The
bufcache
attribute specifies the size of the
kernel's metadata buffer cache as a percentage of physical memory.
The default is 3 percent.
You may want to increase the size of the
metadata buffer cache if you have a high cache miss rate (low hit rate).
In general, you do not have to increase the cache size.
Never increase the value of the
bufcache
to more than 10 percent.
To determine whether to
increase the size of the metadata buffer cache, use
dbx
to examine the
bio_stats
structure.
The miss rate (block misses divided by the sum of the block misses and block
hits) should not be more than 3 percent.
Allocating additional memory to the metadata buffer cache reduces the
amount of memory available to the virtual memory subsystem and the UBC.
In general, you do not have to increase the value of the
bufcache
attribute.
The hash chain table for the metadata buffer cache stores the heads of the hashed buffer queues. Increasing the size of the hash chain table spreads out the buffers and may reduce linear searches, which improves lookup speeds.
The
buffer-hash-size
attribute specifies the size
of
the hash chain table for the metadata buffer cache.
The default
hash chain table size is 512 slots.
You can modify the value of the
buffer-hash-size
attribute so that each hash chain has 3 or 4 buffers.
To determine a value
for the
buffer-hash-size
attribute, use
dbx
to examine the value of
nbuf
, then divide
the value by 3 or 4, and finally round the result to a power of 2.
For example, if
nbuf
has a value of 360, dividing
360 by 3 gives you a value of 120.
Based on this calculation,
specify 128 (2 to the power of 7) as the value of the
buffer-hash-size
attribute.