4 Configuring and Tuning Memory

This chapter describes how the DIGITAL UNIX operating system uses the physical memory installed in the system. This chapter also describes how to configure and tune virtual memory, swap space, and buffer caches. Many of the tuning tasks described in this chapter require you to modify system attributes. See Section 2.11 for more information.

4.1 Understanding Memory Management

The total amount of physical memory is determined by the capacity of the memory boards installed in your system. The system distributes this memory in 8-KB units called pages.

The system distributes pages of physical memory among three areas:

Wired memory
At boot time, the operating system and the Privileged Architecture Library (PAL) code wire a contiguous portion of physical memory in order to perform basic system operations. Static wired memory is reserved for operating system data and text, system tables, the metadata buffer cache, which temporarily holds recently accessed UNIX File System (UFS) and CD-ROM File System (CDFS) metadata, and the Advanced File System (AdvFS) buffer cache. Static wired memory cannot be reclaimed through paging. You can reduce the amount of static wired memory only by removing subsystems.
In addition, the kernel uses dynamically wired memory for dynamically allocated data structures. User processes also wire memory for address space. The amount of dynamically wired memory varies according to the demand. The maximum amount is specified by the value of the vm-syswiredpercent attribute (the default is 80 percent of physical memory). Memory that is dynamically wired cannot be reclaimed through paging. You can reduce the amount of dynamically wired memory by allocating more kernel resources to processes (for example, by increasing the value of the maxusers attribute).

Virtual memory
The virtual memory subsystem uses a portion of physical memory to cache processes' most-recently accessed anonymous memory and file-backed memory. The subsystem efficiently allocates memory to competing processes and tracks the distribution of all the physical pages. This memory can be reclaimed through paging.

Unified Buffer Cache
The Unified Buffer Cache (UBC) uses a portion of physical memory to cache most-recently accessed file system data. The UBC contains actual file data for reads and writes and for page faults from mapped file regions and also AdvFS metadata. By functioning as a layer between the operating system and the storage subsystem, the UBC can decrease the number of disk operations. This memory can be reclaimed through paging.

Figure 4-1 shows how physical memory is used.

Figure 4-1: Physical Memory Usage

The virtual memory subsystem and the UBC compete for the physical pages that are not wired. Pages are allocated to processes and to the UBC, as needed. When the demand for memory increases, the oldest (least-recently used) pages are reclaimed from the virtual memory subsystem and the UBC and reused. Various attributes control the amount of memory available to the virtual memory subsystem and the UBC and the rate of page reclamation. Wired pages are not reclaimed.

System performance depends on the total amount of physical memory and also the distribution of memory resources. DIGITAL UNIX allows you to control the allocation of memory (other than static wired memory) by modifying the values of system attributes. Tuning memory usually involves the following tasks:

Increasing system resource allocation to improve application performance

Modifying how the system allocates memory and the rate of page reclamation

Modifying how file system data is cached in memory

You can also configure your swap space for optimal performance. However, to determine how to obtain the best performance, you must understand your workload characteristics, as described in Chapter 1.

4.2 Understanding Memory Hardware

When programs are executed, the system moves data and instructions among various caches, physical memory, and disk swap space. Accessing the data and instructions occurs at different speeds, depending on the location. Table 4-1 describes the various hardware resources (in the order of fastest to slowest access time).

Table 4-1: Memory Management Hardware Resources

Resource Description

CPU caches Various caches reside in the CPU chip and vary in size up to a maximum of 64 KB (depending on the type of processor). These caches include the translation lookaside buffer, the high-speed internal virtual-to-physical translation cache, the high-speed internal instruction cache, and the high-speed internal data cache.

Secondary cache The secondary direct-mapped physical data cache is external to the CPU, but usually resides on the main processor board. Block sizes for the secondary cache vary from 32 bytes to 256 bytes (depending on the type of processor). The size of the secondary cache ranges from 128 KB to 8 MB.

Tertiary cache The tertiary cache is not available on all Alpha CPUs; otherwise, it is identical to the secondary cache.

Physical memory The actual amount of physical memory varies.

Swap space Swap space consists of one or more disks or disk partitions (block special devices).

Resource	Description
CPU caches	Various caches reside in the CPU chip and vary in size up to a maximum of 64 KB (depending on the type of processor). These caches include the translation lookaside buffer, the high-speed internal virtual-to-physical translation cache, the high-speed internal instruction cache, and the high-speed internal data cache.
Secondary cache	The secondary direct-mapped physical data cache is external to the CPU, but usually resides on the main processor board. Block sizes for the secondary cache vary from 32 bytes to 256 bytes (depending on the type of processor). The size of the secondary cache ranges from 128 KB to 8 MB.
Tertiary cache	The tertiary cache is not available on all Alpha CPUs; otherwise, it is identical to the secondary cache.
Physical memory	The actual amount of physical memory varies.
Swap space	Swap space consists of one or more disks or disk partitions (block special devices).

The hardware logic and the PAL code control much of the movement of addresses and data among the CPU cache, the secondary and tertiary caches, and physical memory. This movement is transparent to the operating system. Figure 4-2 shows an overview of how instructions and data are moved among various hardware components during program execution.

Figure 4-2: Moving Instructions and Data Through the Memory Hardware

Movement between caches and physical memory is significantly faster than movement between disk and physical memory, because of the relatively slow speed of disk I/O. Therefore, avoid paging and swapping operations, and applications should utilize caches when possible. Figure 4-3 shows the amount of time that it takes to access data and instructions from various hardware locations.

Figure 4-3: Time Consumed to Access Storage Locations

For more information on the CPU, secondary cache, and tertiary cache, see the Alpha Architecture Reference Manual.

4.3 Understanding Virtual Memory

The virtual memory subsystem performs the following functions:

Allocates memory to processes

Tracks and manages all the pages in the system

Uses paging and swapping to ensure that there is enough memory for processes to run and to cache file system I/O

The following sections describe these functions in detail.

4.3.1 Allocating Virtual Address Space to Processes

For each process, the fork system call performs the following tasks:

Creates a UNIX process body, which includes a set of data structures that the kernel uses to track the process and a set of resource limitations. See fork(2) for more information.

Allocates a contiguous block of virtual address space, which is the array of pages that an application can map into physical memory. Virtual address space is used for anonymous memory (memory used for the stack, heap, or malloc function) and for file-backed memory (memory used for program text or shared libraries). Pages of anonymous memory are paged in when needed and paged out when pages must be reclaimed. Pages of file-backed memory are paged in when needed and released when pages must be reclaimed.

Creates one or more threads of execution. The default is one thread for each process. Multiprocessing systems support multiple process threads.

Because memory is limited, a process' entire virtual address space cannot be in physical memory at one time. However, a process can execute when only a portion of its virtual address space (its working set) is mapped to physical memory.

For each process, the virtual memory subsystem allocates a large amount of virtual address space but uses only part of this space. Only 4 TB is allocated for user space. User space is generally private and maps to a nonshared physical page. An additional 4 TB of virtual address space is used for kernel space. Kernel space usually maps to shared physical pages. The remaining space is not used for any purpose.

In addition, user space is sparsely populated with valid pages. Only valid pages are able to map to physical pages. The vm-maxvas attribute specifies the maximum amount of valid virtual address space for a process (that is, the sum of all the valid pages). The default is 128000 pages (1 GB).

Figure 4-4 shows the use of process virtual address space.

Figure 4-4: Virtual Address Space Usage

4.3.2 Translating Virtual Addresses to Physical Addresses

When a virtual page is touched or accessed, the virtual memory subsystem must locate the physical page and then translate the virtual address into a physical address. Each process has a page table, which is an array containing an entry for each current virtual-to-physical address translation. Page table entries have a direct relation to virtual pages (that is, virtual address 1 corresponds to page table entry 1) and contain a pointer to the physical page and protection information.

Figure 4-5 shows the translation of a virtual address into a physical address.

Figure 4-5: Virtual-to-Physical Address Translation

A process' resident set is the complete set of all the virtual addresses that have been mapped to physical addresses (that is, all the pages that have been accessed during process execution). Resident set pages may be shared among multiple processes. A process' working set is the set of virtual addresses that are currently mapped to physical physical addresses. The working set is a subset of the resident set and represents a snapshot of the process' resident set.

4.3.3 Page Faulting

When a nonfile-backed virtual address is requested, the virtual memory subsystem locates the physical page and makes it available to the process. This process occurs at different speeds, depending on the location of the page (see Figure 4-3).

If a requested address is currently being used (active), it will have an entry in the page table. In this case, the PAL code loads the physical address into the translation lookaside buffer, which then passes the address to the CPU.

If a requested address is not active in the page table, the PAL lookup code issues a page fault, which instructs the virtual memory subsystem to locate the page and make the virtual-to-physical address translation in the page table.

If a requested virtual address is being accessed for the first time, the virtual memory subsystem performs the following tasks:

Allocates an available page of physical memory.

Fills the page with zeros.

Enters the virtual-to-physical address translation in the page table.

This is called a zero-filled-on-demand page fault.

If a requested virtual address has already been accessed, it will be in one of the following locations:

The virtual memory subsystem's internal data structures
If the physical address is located in the internal data structures (for example, the hash queue list or the page queue list), the virtual memory subsystem enters the virtual-to-physical address translation in the page table. This is called a short page fault.

Swap space
If the virtual address has already been accessed, but the physical page has been reclaimed, the page contents will be found in swap space. The virtual memory subsystem copies the contents of the page from swap space into the physical address and enters the virtual-to-physical address translation in the page table. This is called a page-in page fault.

If a process needs to modify a read-only virtual page, the virtual memory subsystem allocates an available page of physical memory, copies the read-only page into the new page, and enters the translation in the page table. This is called a copy-on-write page fault.

To improve process execution time and decrease the number of page faults, the virtual memory subsystem attempts to anticipate which pages the task will need next. Using an algorithm that checks which pages were most recently used, the number of available pages, and other factors, the subsystem maps additional pages, along with the page that contains the requested address.

The virtual memory subsystem also uses page coloring to reduce execution time. If possible, the subsystem attempts to map a process' entire resident set into the secondary cache. If the entire task, text, and data are executed within the cache, addresses do not have to be fetched from physical memory.

The private-cache-percent attribute specifies the percentage of the cache that is reserved for anonymous (nonshared) memory. The default is to reserve 50 percent of the cache for anonymous memory and 50 percent for file-backed memory (shared). To cache more anonymous memory, increase the value of the private-cache-percent attribute. This attribute is primarily used for benchmarking.

4.3.4 Managing and Tracking Pages

The virtual memory subsystem allocates physical pages to processes and the UBC, as needed. Because physical memory is limited, these pages must be periodically reclaimed so that they can be reused.

The virtual memory subsystem uses page lists to track the location and age of all the physical memory pages. At any one time, each physical page can be found on one of the following lists:

Wired list--Pages that are wired and cannot be reclaimed

Free list--Pages that are clean and are not being used (the size of this list controls when page reclamation occurs)

Active list--Pages that are being used by the virtual memory subsystem or the UBC
To determine which pages should be reclaimed first, the page-stealer daemon identifies the oldest pages on the active list and designates these least-recently used (LRU) pages as follows:
- Inactive pages are the oldest pages that are being used by the virtual memory subsystem.
- UBC LRU pages are the oldest pages that are being used by the UBC.

Use the vmstat command or dbx to determine the number of pages that are on the page lists. Remember that pages on the active list (the act field in the vmstat output) include both inactive and UBC LRU pages.

As physical pages are allocated to processes and the UBC, the free list becomes depleted, and pages must be reclaimed in order to replenish the list. To reclaim pages, the virtual memory subsystem does the following:

Prewrites the oldest dirty (modified) pages to swap space

Uses paging to reclaim individual pages

Uses swapping to suspend processes and reclaim a large number of pages

See Section 4.3.5, Section 4.3.6, Section 4.3.8, and Section 4.3.9 for more information about prewriting pages, paging, and swapping.

4.3.5 Prewriting Modified Pages

The virtual memory subsystem attempts to prevent a memory shortage by prewriting modified pages to swap space.

When the virtual memory subsystem anticipates that the pages on the free list will soon be depleted, it prewrites to swap space the oldest modified (dirty) inactive pages. The value of the vm-page-prewrite-target attribute determines the number of pages that the subsystem will prewrite and keep clean. The default value is 256 pages.

In addition, when the number of modified UBC LRU pages exceeds the value of the vm-ubcdirtypercent attribute, the virtual memory subsystem prewrites to swap space the oldest modified UBC LRU pages. The default value of the vm-ubcdirtypercent attribute is 10 percent of the total UBC LRU pages.

To minimize the impact of sync (steady state flushes) when prewriting UBC pages, the ubc-maxdirtywrites attribute specifies the maximum number of disk writes that the kernel can perform each second. The default value is 5.

See Section 4.7.13 for more information about prewriting dirty pages.

4.3.6 Using Attributes to Control Paging and Swapping

When the demand for memory depletes the free list, paging begins. The virtual memory subsystem takes the oldest inactive and UBC LRU pages, moves the contents of the modified pages to swap space, and puts the clean pages on the free list, where they can be reused.

If the free page list cannot be replenished by reclaiming individual pages, swapping begins. Swapping temporarily suspends processes and moves entire resident sets to swap space, which frees large amounts of physical memory.

The point at which paging and swapping start and stop depends on the values of some virtual memory subsystem attributes. Figure 4-6 shows the default values of these attributes.

Figure 4-6: Paging and Swapping Attributes - Default Values

Detailed descriptions of the attributes are as follows:

vm-page-free-target--Paging starts when the number of pages on the free list is less than this value (the default is 128 pages).

vm-page-free-min--Specifies the threshold at which a page must be reclaimed for each page allocated (the default is 20 pages).

vm-page-free-swap--Idle task swapping starts when the number of pages on the free list is less than this value for a period of time (the default is 74 pages).

vm-page-free-optimal--Hard swapping starts when the number of pages on the free list is less than this value for five seconds (the default is 74 pages). The first processes to be swapped out include those with the lowest scheduling priority and those with the largest resident set size.

vm-page-free-hardswap--Swapping stops when the number of pages on the free list is more than this value (the default is 1280 pages).

vm-page-free-reserved--Only privileged tasks can get memory when the number of pages on the free list is less than this value (the default is 10 pages).

See Section 4.3.8 and Section 4.3.9 for information about paging and swapping operations.

4.3.7 Using Attributes to Control UBC Memory Allocation

Because the UBC shares with the virtual memory subsystem the physical pages that are not wired by the kernel, the allocation of memory to the UBC can affect file system performance and paging and swapping activity. The UBC is dynamic and consumes varying amounts of memory in order to respond to changing file system demands.

Figure 4-7 shows how memory is allocated to the UBC.

Figure 4-7: UBC Memory Allocation

The following attributes control the amount of memory available to the UBC:

ubc-minpercent attribute
Specifies the minimum percentage of memory that the UBC can utilize. The default is 10 percent.

ubc-maxpercent attribute
Specifies the maximum percentage of memory that the UBC can utilize. The default is 100 percent.

ubc-borrowpercent attribute
Specifies the UBC borrowing threshold. The default is 20 percent. From the value of the ubc-borrowpercent attribute to the value of the ubc-maxpercent attribute, the UBC is only borrowing memory from the virtual memory subsystem. When paging starts, pages are first reclaimed from the UBC until the amount of memory allocated to the UBC reaches the value of the ubc-borrowpercent attribute.

4.3.8 Paging Operation

When the memory demand is high and the number of pages on the free page list reaches the value of the vm-page-free-target attribute, the virtual memory subsystem uses paging to replenish the free page list. The page reclamation code controls paging and swapping. The page-out daemon and task swapper daemon are extensions of the page reclamation code. See Section 4.3.6 for more information about the attributes that control paging and swapping.

The page reclamation code activates the page-stealer daemon, which first reclaims the pages that the UBC has borrowed from the virtual memory subsystem, until the size of the UBC reaches the borrowing threshold (the default is 20 percent). If the reclaimed pages are dirty (modified), their contents must be written to disk before the pages can be moved to the free page list. Freeing borrowed UBC pages is a fast way to reclaim pages, because UBC pages are usually unmodified. See Section 4.3.7 for more information about UBC borrowed pages.

If freeing UBC borrowed memory does not sufficiently replenish the free list, a pageout occurs. The page-stealer daemon reclaims the oldest inactive and UBC LRU pages.

Paging becomes increasingly aggressive if the number of free pages continues to decrease. If the number of pages on the free page list falls below the value of the vm-page-free-min attribute (the default is 20 pages), a page must be reclaimed for each page allocated. To prevent deadlocks, if the number of pages on the free page list falls below the value of the vm-page-free-reserved attribute (the default is 10 pages), only privileged tasks can get memory until the free page list is replenished.

Paging stops when the number of pages on the free list reaches the value of the vm-page-free-target attribute.

If paging individual pages does not replenish the free list, swapping is used to free a large amount of memory. See Section 4.3.9 for more information.

Figure 4-8 shows the movement of pages during paging operations.

Figure 4-8: Paging Operation

4.3.9 Swapping Operation

If there is a high demand for memory, the virtual memory subsystem may be unable to replenish the free list by reclaiming pages. Swapping reduces the demand for physical memory by suspending processes, which dramatically increases the number of pages on the free list. To swap out a process, the task swapper suspends the process, writes its resident set to swap space, and moves the clean pages to the free list.

Idle task swapping begins when the number of pages on the free list falls below the value of the vm-page-free-swap attribute for a period of time (the default is 74 pages). The task swapper suspends all tasks that have been idle for 30 seconds or more.

If the number of pages on the free list falls below the value of the vm-page-free-optimal attribute (the default is 74 pages) for more than five seconds, hard swapping begins. The task swapper suspends, one at a time, the tasks with the lowest priority and the largest resident set size.

Swapping stops when the number of pages on the free list reaches the value of the vm-page-free-hardswap attribute (the default is 1280).

A swapin occurs when the number of pages on the free list reaches the value of the vm-page-free-optimal attribute for a period of time. The task's working set is paged in from swap space and it can now execute. The value of the vm-inswappedmin attribute specifies the minimum amount of time, in seconds, that a task must remain in the inswapped state before it can be outswapped. The default value is 1 second.

Swapping has a serious impact on system performance. You can modify the attributes described in Section 4.3.6 to control when swapping starts and stops.

Increasing the rate of swapping (swapping earlier during page reclamation) increases throughput. As more processes are swapped out, fewer processes are actually executing and more work is done. Although increasing the rate of swapping moves long-sleeping threads out of memory and frees memory, it degrades interactive response time. When an outswapped process is needed, it will have a long latency.

If you decrease the rate of swapping (swap later during page reclamation), you will improve interactive response time, but at the cost of throughput.

4.3.10 Using Swap Buffers

To facilitate the movement of data between memory and disk, the virtual memory subsystem uses synchronous and asynchronous swap buffers. The virtual memory subsystem uses these two types of buffers to immediately satisfy a page-in request without having to wait for the completion of a page-out request, which is a relatively slow process.

Synchronous swap buffers are used for page-in page faults and for swap outs. Asynchronous swap buffers are used for asynchronous pageouts and for prewriting modified pages. See Section 4.7.15 and Section 4.7.16 for tuning information.

4.4 Understanding the Unified Buffer Cache

The DIGITAL UNIX operating system uses the Unified Buffer Cache (UBC) as a layer between the operating system and disk. The UBC holds actual file data, which includes reads and writes from conventional file activity and page faults from mapped file sections, and AdvFS metadata. The cache can improve I/O performance by decreasing the number of disk I/O operations.

The UBC shares with the virtual memory subsystem the physical pages that are not wired by the kernel. The maximum and minimum percentages of memory that the UBC can utilize are specified by the ubc-maxpercent attribute (the default is 100 percent) and the ubc-minpercent attribute (the default is 10 percent). In addition, the ubc-borrowpercent attribute specifies the percentage of memory allocated to the UBC above which the memory is only borrowed from the virtual memory subsystem. The default is 20 percent of physical memory. See Section 4.3.7 for more information.

The UBC is dynamic and consumes varying amounts of memory in order to respond to changing file system demands. For example, if file system activity is heavy, pages will be allocated to the UBC up to the value of the ubc-maxpercent attribute. In contrast, heavy process activity, such as large increases in the working sets for large executables, will cause the virtual memory subsystem to reclaim UBC borrowed pages. Figure 4-7 shows the allocation of physical memory to the UBC.

The UBC uses a hashed list to quickly locate the physical pages that it is holding. A hash table contains file and offset information that is used to speed lookup operations.

The UBC also uses a buffer to facilitate the movement of data between memory and disk. The vm-ubcbuffers attribute specifies maximum file system device I/O queue depth for writes (that is, the number of UBC I/O requests that can be outstanding). See Section 4.7.17 for tuning information.

4.5 Understanding the Metadata Buffer Cache

The metadata buffer cache is part of kernel wired memory and is used to cache only UFS and CDFS metadata, which includes file header information, superblocks, inodes, indirect blocks, directory blocks, and cylinder group summaries. The DIGITAL UNIX operating system uses the metadata buffer cache as a layer between the operating system and disk. The cache can improve I/O performance by decreasing disk I/O operations.

The metadata buffer cache is configured at boot time and uses bcopy routines to move data in and out of memory. The size of the metadata buffer cache is specified by the value of the bufcache attribute. See Section 4.9 for tuning information.

4.6 Configuring Memory and Swap Space

The following sections describe how to configure memory and swap space, which includes the following tasks:

Determining how much physical memory your system requires (Section 4.6.1)

Determining how much swap space you need (Section 4.6.2)

Choosing a swap space allocation mode (Section 4.6.3)

4.6.1 Determining Your Physical Memory Requirements

This section describes how to determine your system's memory requirements. The amount of memory installed in your system must be able to provide an acceptable level of user and application performance.

To determine your system's memory requirements, you must gather the following information:

The amount of memory that will be wired

The amount of memory that the virtual memory subsystem requires to cache the anonymous regions of process data

The amount of memory that the UBC requires to cache file system data

See Section 4.6.2 for information about swap space requirements.

4.6.2 Configuring Swap Space

Your system's performance depends on the swap space configuration. DIGITAL recommends a minimum of 128 MB for swap space.

To calculate the swap space required by your system and workload, compare the total modifiable virtual address space (anonymous memory) required by your processes with the total amount of physical memory. Modifiable virtual address space holds data elements and structures that are modified during process execution, such as heap space, stack space, and data space.

To calculate swap space requirements if you are using immediate mode, total the anonymous memory requirements for all processes and then add 10 percent of that value. If you are using deferred mode, total the anonymous memory requirements for all processes and then divide by two.

Application messages, such as the following, usually indicate that not enough swap space is configured into the system or that a process limit has been reached:

"lack of paging space"

"swap space below 10 percent free"

Use multiple disks for swap space. The page reclamation code uses a form of disk striping (known as swap space interleaving) so that pages can be written to the multiple disks. To optimize swap space, ensure that all your swap disks are configured when you boot the system, instead of adding swap space while the system is running.

Use the swapon -s command to display your swap space configuration. The first line displayed is the total allocated swap space. Use the iostat to display disk usage.

The following list describes how to configure swap space for high performance:

Configure all of your swap space at boot time

Use fast disks for swap space to decrease page fault latency

Do not use busy disks for swap space

Spread out your swap space across multiple disks (never put multiple swap partitions on the same disk)

Spread out your swap disks across multiple I/O buses to prevent a single bus from becoming a bottleneck

Use the Logical Storage Manager (LSM) to stripe your swap disks

See Chapter 5 for more information about configuring and tuning swap disks for high performance and availability.

4.6.3 Choosing a Swap Space Allocation Mode

There are two methods that you can use to allocate swap space. The methods differ in the point in time at which the virtual memory subsystem reserves swap space for a process. There is no performance benefit attached to either method; however, deferred mode is recommended for very-large memory/very-large database (VLM/VLDB) systems. The swap allocation methods are as follows:

Immediate mode--Swap space is reserved when modifiable virtual address space is created. Immediate mode is often referred to as eager mode and is the default swap space allocation mode.
Anonymous memory is memory that is not backed by a file, but is backed by swap space (for example, stack space, heap space, and memory allocated by the malloc or sbrk routines). When anonymous memory is allocated, the operating system reserves swap space for the memory. Usually, this results in an unnecessary amount of reserved swap space. Immediate mode requires more swap space than deferred mode, but it ensures that the swap space will be available to processes when it is needed.

Deferred mode--Swap space is not reserved until the virtual memory subsystem needs to write a modified virtual page to swap space. Deferred mode is sometimes referred to as lazy mode.
Deferred mode requires less swap space than immediate mode and causes the system to run faster because it requires less swap space bookkeeping. It postpones the reservation and allocation of swap space for anonymous memory until it is needed. However, because deferred mode does not reserve swap space in advance, the swap space may not be available when a task needs it, and the process may be killed asynchronously.
You can enable the deferred swap space allocation mode by removing or moving the /sbin/swapdefault file.

See the System Administration manual for more information on swap space allocation methods.

4.7 Tuning Virtual Memory

The virtual memory subsystem is a primary source of performance problems. Performance may degrade if the virtual memory subsystem cannot keep up with the demand for memory and excessive paging and swapping occurs. A memory bottleneck may cause a disk I/O bottleneck, because excessive paging and swapping decreases performance and indicates that the natural working set size has exceeded the available memory. The virtual memory subsystem runs at a high priority when servicing page faults, which blocks the execution of other processes.

If you have excessive page-in and page-out activity from a swap partition, the system may have a high physical memory commitment ratio. Excessive paging also can increase the miss rate for the secondary cache, and may be indicated by the following output:

The output of the vmstat shows a very low free page count or shows high page-in and page-out activity. See Section 2.4.2 for more information.

The output of the ps command shows high task swapping activity. See Section 2.4.1 for more information.

The output of the iostat command shows excessive swap disk I/O activity. See Section 2.5.1 for more information.

The tuning recommendations that will provide the best performance benefit involve the following two areas:

System resource allocation
- Increasing the available address space
- Increasing the kernel resources available to processes

Memory allocation and page reclamation
- Modifying the percentage of memory allocated to the UBC
- Changing the rate of swapping
- Changing how the system prewrites modified inactive pages

Table 4-2 describes the primary tuning tasks guidelines and lists the performance benefits as well as tradeoffs.

Table 4-2: Primary Virtual Memory Tuning Guidelines

Action Performance Benefit Tradeoff

Reduce the number of processes running at the same time (Section 4.7.1) Reduces demand for memory None

Reduce the static size of the kernel (Section 4.7.2) Reduces demand for memory None

Increase the available address space (Section 4.7.3) Improves performance for memory-intensive processes Slightly increases the demand for memory

Increase the available system resources (Section 4.7.4) Improves performance for memory-intensive processes Increases wired memory

Increase the maximum number of memory-mapped files that are available to a process (Section 4.7.5) Increases file mapping and improves performance for memory-intensive processes, such as Internet servers Consumes memory

Increase the maximum number of virtual pages within a process' address space that can have individual protection attributes (Section 4.7.6) Improves performance for memory-intensive processes and for Internet servers that maintain large tables or resident images Consumes memory

Increase the size of a System V message and queue (Section 4.7.7) Improves performance for memory-intensive processes Consumes memory

Increase the maximum size of a single System V shared memory region (Section 4.7.8) Improves performance for memory-intensive processes Consumes memory

Increase the minimum size of a System V shared memory segment (Section 4.7.9) Improves performance for VLM and VLDB systems Consumes memory

Reduce process memory requirements (Section 4.7.10) Reduces demand for memory None

Reduce the amount of physical memory available to the UBC (Section 4.7.11) Provides more memory resources to processes May degrade file system performance

Increase the rate of swapping (Section 4.7.12) Frees memory and increases throughput Decreases interactive response performance

Decrease the rate of swapping (Section 4.7.12) Improves interactive response performance Decreases throughput

Increase the rate of dirty page prewriting (Section 4.7.13) Prevents drastic performance degradation when memory is exhausted Decreases peak workload performance

Decrease the rate of dirty page prewriting (Section 4.7.13) Improves peak workload performance May cause drastic performance degradation when memory is exhausted

Action	Performance Benefit	Tradeoff
Reduce the number of processes running at the same time (Section 4.7.1)	Reduces demand for memory	None
Reduce the static size of the kernel (Section 4.7.2)	Reduces demand for memory	None
Increase the available address space (Section 4.7.3)	Improves performance for memory-intensive processes	Slightly increases the demand for memory
Increase the available system resources (Section 4.7.4)	Improves performance for memory-intensive processes	Increases wired memory
Increase the maximum number of memory-mapped files that are available to a process (Section 4.7.5)	Increases file mapping and improves performance for memory-intensive processes, such as Internet servers	Consumes memory
Increase the maximum number of virtual pages within a process' address space that can have individual protection attributes (Section 4.7.6)	Improves performance for memory-intensive processes and for Internet servers that maintain large tables or resident images	Consumes memory
Increase the size of a System V message and queue (Section 4.7.7)	Improves performance for memory-intensive processes	Consumes memory
Increase the maximum size of a single System V shared memory region (Section 4.7.8)	Improves performance for memory-intensive processes	Consumes memory
Increase the minimum size of a System V shared memory segment (Section 4.7.9)	Improves performance for VLM and VLDB systems	Consumes memory
Reduce process memory requirements (Section 4.7.10)	Reduces demand for memory	None
Reduce the amount of physical memory available to the UBC (Section 4.7.11)	Provides more memory resources to processes	May degrade file system performance
Increase the rate of swapping (Section 4.7.12)	Frees memory and increases throughput	Decreases interactive response performance
Decrease the rate of swapping (Section 4.7.12)	Improves interactive response performance	Decreases throughput
Increase the rate of dirty page prewriting (Section 4.7.13)	Prevents drastic performance degradation when memory is exhausted	Decreases peak workload performance
Decrease the rate of dirty page prewriting (Section 4.7.13)	Improves peak workload performance	May cause drastic performance degradation when memory is exhausted

If the previous tasks do not sufficiently improve performance, there are advanced tuning tasks that you can perform. The advanced tuning tasks include the following:

Modify the sizes of the page-in and page-out clusters

Modify the swap device I/O queue depth

Modify the amount of memory the UBC uses to cache large files

Increase the paging threshold

Enable aggressive task swapping

Decrease the size of the file system caches

Reserve memory at boot time for shared memory

Table 4-3 describes the advanced tuning tasks guidelines and lists the performance benefits as well as tradeoffs.

Table 4-3: Advanced Virtual Memory Tuning Guidelines

Action Performance Benefit Tradeoff

Increase the size of the page-in and page-out clusters (Section 4.7.14) Improves peak workload performance Decreases total system workload performance

Decrease the size of the page-in and page-out clusters (Section 4.7.14) Improves total system workload performance Decreases peak workload performance

Increase the swap device I/O queue depth for pageins and swapouts (Section 4.7.15) Increases overall system throughput Consumes memory

Decrease the swap device I/O queue depth for pageins and swapouts (Section 4.7.15) Improves the interactive response time and frees memory Decreases system throughput

Increase the swap device I/O queue depth for pageouts (Section 4.7.16) Frees memory and increases throughput Decreases interactive response performance

Decrease the swap device I/O queue depth for pageouts (Section 4.7.16) Improves interactive response time Consumes memory

Increase the UBC write device queue depth (Section 4.7.17) Increases overall file system throughput and frees memory Decreases interactive response performance

Decrease the UBC write device queue depth (Section 4.7.17) Improves interactive response time Consumes memory

Increase the amount of UBC memory used to cache a large file (Section 4.7.18) Improves large file performance May allow a large file to consume all the pages on the free list

Decrease the amount of UBC memory used to cache a large file (Section 4.7.18) Prevents a large file from consuming all the pages on the free list May degrade large file performance

Increase the paging threshold (Section 4.7.19) Maintains performance when free memory is exhausted May waste memory

Enable aggressive swapping (Section 4.7.20) Improves system throughput Degrades interactive response performance

Decrease the size of the metadata buffer cache (Section 4.7.21) Provides more memory resources to processes on large systems May degrade UFS performance

Decrease the size of the namei cache (Section 4.7.22) Decreases demand for memory May slow lookup operations and degrade file system performance

Decrease the amount of memory allocated to the AdvFS cache (Section 4.7.23) Provides more memory resources to processes May degrade AdvFS performance

Reserve physical memory for shared memory (Section 4.7.24) Improves shared memory detach time Decreases the memory available to the virtual memory subsystem and the UBC

Action	Performance Benefit	Tradeoff
Increase the size of the page-in and page-out clusters (Section 4.7.14)	Improves peak workload performance	Decreases total system workload performance
Decrease the size of the page-in and page-out clusters (Section 4.7.14)	Improves total system workload performance	Decreases peak workload performance
Increase the swap device I/O queue depth for pageins and swapouts (Section 4.7.15)	Increases overall system throughput	Consumes memory
Decrease the swap device I/O queue depth for pageins and swapouts (Section 4.7.15)	Improves the interactive response time and frees memory	Decreases system throughput
Increase the swap device I/O queue depth for pageouts (Section 4.7.16)	Frees memory and increases throughput	Decreases interactive response performance
Decrease the swap device I/O queue depth for pageouts (Section 4.7.16)	Improves interactive response time	Consumes memory
Increase the UBC write device queue depth (Section 4.7.17)	Increases overall file system throughput and frees memory	Decreases interactive response performance
Decrease the UBC write device queue depth (Section 4.7.17)	Improves interactive response time	Consumes memory
Increase the amount of UBC memory used to cache a large file (Section 4.7.18)	Improves large file performance	May allow a large file to consume all the pages on the free list
Decrease the amount of UBC memory used to cache a large file (Section 4.7.18)	Prevents a large file from consuming all the pages on the free list	May degrade large file performance
Increase the paging threshold (Section 4.7.19)	Maintains performance when free memory is exhausted	May waste memory
Enable aggressive swapping (Section 4.7.20)	Improves system throughput	Degrades interactive response performance
Decrease the size of the metadata buffer cache (Section 4.7.21)	Provides more memory resources to processes on large systems	May degrade UFS performance
Decrease the size of the namei cache (Section 4.7.22)	Decreases demand for memory	May slow lookup operations and degrade file system performance
Decrease the amount of memory allocated to the AdvFS cache (Section 4.7.23)	Provides more memory resources to processes	May degrade AdvFS performance
Reserve physical memory for shared memory (Section 4.7.24)	Improves shared memory detach time	Decreases the memory available to the virtual memory subsystem and the UBC

The following sections describe these guidelines in detail.

4.7.1 Reducing the Number of Processes Running Simultaneously

You can improve performance and reduce the demand for memory by running fewer applications simultaneously. Use the at or the batch command to run applications at offpeak hours.

4.7.2 Reducing the Static Size of the Kernel

You can reduce the static size of the kernel by deconfiguring any unnecessary subsystems. Use the setld command to display the installed subsets and to delete subsets.

Use the sysconfig command to display the configured subsystems and to delete subsystems.

4.7.3 Increasing the Available Address Space

If your applications are memory-intensive, you may want to increase the available address space. Increasing the address space will cause only a small increase in the demand for memory. However, you may not want to increase the address space if your applications use many forked processes.

The following attributes determine the available address space for processes:

vm-maxvas
This attribute controls the maximum amount of virtual address space available to a process. The default value is 1 GB (1073741824). For Internet servers, you may want to increase this value to 10 GB.

per-proc-address-space and max-per-proc-address-size
These attributes control the maximum amount of user process address space, which is the maximum number of valid virtual regions. The default value for both attributes is 1 GB.

per-proc-stack-size and max-per-proc-stack-size
These attributes control the maximum size of a user process stack. The default value of the per-proc-stack-size attribute is 2097152 bytes. The default value of the max-per-proc-stack-size attribute is 33554432 bytes. You may need to increase these values if you receive cannot grow stack messages.

per-proc-data-size and max-per-proc-data-size
These attributes control the maximum size of a user process data segment. The default value of the per-proc-data-size attribute is 134217728 bytes. The default value of the max-per-proc-data-size is 1 GB.

You can use the setrlimit function to control the consumption of system resources by a parent process and its child processes. See setrlimit(2) for information.

4.7.4 Increasing the Available System Resources

If your applications are memory-intensive, you may want to increase the system resources that are available to processes. Be careful when increasing the system resources, because this will increase the amount of wired memory in the system.

The following attributes affect system resources:

maxusers
The maxusers attribute specifies the number of simultaneous users that a system can support without straining system resources. System algorithms use the maxusers attribute to size various system data structures, and to determine the amount of space allocated to system tables, such as the system process table, which is used to determine how many active processes can be running at one time.
The default value assigned to the maxusers attribute depends on the size of your system. Increasing the value of the maxusers attribute allocates more system resources for use by the kernel. However, this also increases the amount of physical memory consumed by the kernel. Decreasing the value of the maxusers attribute reduces kernel memory usage, but allocates less system resources to processes.
If your system experiences a lack of resources (for example, Out of processes messages), you can increase the value of the maxusers attribute to 512. A lack of resources may also be indicated by a No more processes error message. If you have sufficient memory on a heavily loaded system (for example, more than 96 MB), you can increase the value of the maxusers attribute to 1024.

task-max
The task-max attribute specifies the maximum number of tasks that can run simultaneously. The default value is 20 + 8 * maxusers.

thread-max
The thread-max attribute specifies the maximum number of threads. The default value is 2 * task-max.

max-proc-per-user
The max-proc-per-user attribute specifies the maximum number of processes that can be allocated at any one time to each user, except superuser. The default value of the max-proc-per-user attribute is 64.
If your system experiences a lack of processes, you can increase the value of the max-proc-per-user attribute. The value must be more than the maximum number of processes that will be started by your system. If you have a Web server, these processes include CGI processes. If you plan to run more than 64 Web server daemons simultaneously, increase the attribute value to 512. On a very busy server with sufficient memory, you can use a higher value. Increasing this value can improve the performance of multiprocess Web servers.

max-threads-per-user
The max-threads-per-user attribute specifies the maximum number of threads that can be allocated at any one time to each user, except superuser. The default value is 256.
If your system, especially a Web server, experiences a lack of threads, you can increase the value of the max-threads-per-user attribute. The value must be more than the maximum number of threads that will be started by your system. You can increase the value of the max-threads-per-user attribute to 512. On a very busy server with sufficient memory, you can use a higher value, such as 4096. Increasing this value can improve the performance of multithreaded Web servers.

You can use the setrlimit function to control the consumption of system resources by a parent process and its child processes. See setrlimit(2) for information.

4.7.5 Increasing the Number of Memory-Mapped Files

The vm-mapentries attribute specifies the maximum number of memory-mapped files in a user address. Each map entry describes one unique disjoint portion of a virtual address space. The default value is 200.

You may want to increase the value of the vm-mapentries attribute for VLM systems. Because Web servers map files into memory, for busy systems running multithreaded Web server software, you may want to increase the value to 20000. This will increase the limit on file mapping. This attribute affects all processes, and increasing its value will increase the demand for memory.

4.7.6 Increasing the Number of Pages With Individual Protections

The vm-vpagemax attribute specifies the maximum number of virtual pages within a process' address space that can be given individual protection attributes. These protection attributes differ from the protection attributes associated with the other pages in the address space.

Changing the protection attributes of a single page within a virtual memory region causes all pages within that region to be treated as though they had individual protection attributes. For example, each thread of a multithreaded task has a user stack in the stack region for the process in which it runs. Because multithreaded tasks have guard pages (that is, pages that do not have read/write access) inserted between the user stacks for the threads, all pages in the stack region for the process are treated as though they have individual protection attributes.

The default value of the vm-vpagemax attribute is determined by dividing the value of the vm-maxvas attribute (the address space size in bytes) by 8192. If a stack region for a multithreaded task exceeds 16 KB pages, you may want to increase the value of the vm-vpagemax attribute. For example, if the value of the vm-maxvas attribute is 1 GB (the default), set the value of vm-vpagemax to 131072 pages (1073741824/8192=131072). This value improves the efficiency of Web servers that maintain large tables or resident images.

You may want to increase the value of the vm-vpagemax attribute for VLM systems. However, this attribute affects all processes, and increasing its value will increase the demand for memory.

4.7.7 Increasing the Size of a System V Message and Queue

If your applications are memory-intensive or you have a VLM system, you may want to increase the value of the msg-max attribute. This attribute specifies the maximum size of a single System V message. However, increasing the value of this attribute will increase the demand for memory. The default value is 8192 bytes (1 page).

In addition, you may want to increase the value of the msg-tql attribute. This attribute specifies the maximum number of messages that can be queued to a single System V message queue at one time. However, increasing the value of this attribute will increase the demand for memory. The default value is 40.

4.7.8 Increasing the Size of a System V Shared Memory Region

If your applications are memory-intensive or you have a VLM system, you may want to increase the value of the shm-max attribute. This attribute specifies the maximum size of a single System V shared memory region. However, increasing the value of this attribute will increase the demand for memory. The default value is 4194304 bytes (512 pages).

In addition, you may want to increase the value of the shm-seg attribute. This attribute specifies the maximum number of System V shared memory regions that can be attached to a single process at any point in time. However, increasing the value of this attribute will increase the demand for memory. The default value is 32.

4.7.9 Increasing the Minimum Size of a System V Shared Memory Segment

If your applications are memory-intensive, you may want to increase the value of the ssm-threshold attribute. Page table sharing occurs when the size of a System V shared memory segment reaches the value specified by this attribute. However, increasing the value of this attribute will increase the demand for memory.

4.7.10 Reducing Application Memory Requirements

You may want to reduce your applications' use of memory to free memory for other purposes. Follow these coding considerations to reduce your applications' use of memory:

Configure and tune applications according to the guidelines provided by the application's installation procedure. For example, you may be able to reduce an application's anonymous memory requirements, set parallel/concurrent processing attributes, size shared global areas and private caches, and set the maximum number of open/mapped files.

Look for data cache collisions between heavily used data structures, which occur when the distance between two data structures allocated in memory is equal to the size of the primary (internal) data cache. If your data structures are small, you can avoid collisions by allocating them contiguously in memory. To do this, use a single malloc call instead of multiple calls.

If an application uses large amounts of data for a short time, allocate the data dynamically with the malloc function instead of declaring it statically. When you have finished using dynamically allocated memory, it is freed for use by other data structures that occur later in the program. If you have limited memory resources, dynamically allocating data reduces an application's memory usage and can substantially improve performance.

If an application uses the malloc function extensively, you may be able to improve its processing speed or decrease its memory utilization by using the function's control variables to tune memory allocation. See malloc(3) for details on tuning memory allocation.

If your application fits in a 32-bit address space and allocates large amounts of dynamic memory by using structures that contain many pointers, you may be able to reduce memory usage by using the -xtaso flag. The -xtaso flag is supported by all versions of the C compiler (-newc, -migrate, and -oldc versions). To use the -xtaso flag, modify your source code with a C-language pragma that controls pointer size allocations. See cc(1) for details.

See the Programmer's Guide for more information on process memory allocation.

4.7.11 Reducing the Memory Available to the UBC

You may be able to improve performance by reducing the maximum percentage of memory available for the UBC. If you decrease the maximum size of the UBC, you increase the amount of memory available to the virtual memory subsystem, which may reduce the paging and swapping rate. However, reducing the memory allocated to the UBC may adversely affect I/O performance because the UBC will hold less file system data, which results in more disk I/O operations. Therefore, do not significantly decrease the maximum size of the UBC.

The maximum amount of memory that can be allocated to the UBC is specified by the ubc-maxpercent attribute. The default is 100 percent. The minimum amount of memory that can be allocated to the UBC is specified by the ubc-minpercent attribute. The default is 10 percent. If you have an Internet server, use these default values.

If the page-out rate is high and you are not using the file system heavily, decreasing the value of the ubc-maxpercent attribute may reduce the rate of paging and swapping. Start with the default value of 100 percent and decrease the value in increments of 10. If the values of the ubc-maxpercent and ubc-minpercent attributes are close together, you may seriously degrade I/O performance or cause the system to page excessively.

Use the vmstat command to determine whether the system is paging excessively. Using dbx, periodically examine the vpf_pgiowrites and vpf_ubcalloc fields of the vm_perfsum kernel structure. The page-out rate may shrink if pageouts greatly exceed UBC allocations.

You also may be able to prevent paging by increasing the percentage of memory that the UBC borrows from the virtual memory subsystem. To do this, decrease the value of the ubc-borrowpercent attribute. Decreasing the value of the ubc-borrowpercent attribute allows less memory to remain in the UBC when page reclamation begins. This can reduce the UBC effectiveness, but may improve the system response time when a low-memory condition occurs. The value of the ubc-borrowpercent attribute can range from 0 to 100. The default value is 20 percent.

4.7.12 Changing the Rate of Swapping

Swapping has a drastic impact on system performance. You can modify attributes to control when swapping begins and ends. Increasing the rate of swapping (swapping earlier during page reclamation), moves long-sleeping threads out of memory, frees memory, and increases throughput. As more processes are swapped out, fewer processes are actually executing and more work is done. However, when an outswapped process is needed, it will have a long latency, so increasing the rate of swapping will degrade interactive response time.

In contrast, if you decrease the rate of swapping (swap later during page reclamation), you will improve interactive response time, but at the cost of throughput.

To increase the rate of swapping, increase the value of the vm-page-free-optimal attribute (the default is 74 pages). Increase the value only by 2 pages at a time. Do not specify a value that is more than the value of the vm-page-free-target attribute (the default is 128).

To decrease the rate of swapping, decrease the value of the vm-page-free-optimal attribute by 2 pages at a time. Do not specify a value that is less than the value of the vm-page-free-min attribute (the default is 20).

4.7.13 Controlling Dirty Page Prewriting

The virtual memory subsystem attempts to prevent a memory shortage by prewriting modified pages to swap space. When the virtual memory subsystem anticipates that the pages on the free list will soon be depleted, it prewrites to swap space the oldest modified (dirty) pages on the inactive list. To reclaim a page that has been prewritten, the virtual memory subsystem only needs to validate the page.

Increasing the rate of dirty page prewriting will reduce peak workload performance, but it will prevent a drastic performance degradation when memory is exhausted. Decreasing the rate will improve peak workload performance, but it will cause a drastic performance degradation when memory is exhausted.

You can control the rate of dirty page prewriting by modifying the values of the vm-page-prewrite-target attribute and the vm-ubcdirtypercent attribute.

The vm-page-prewrite-target attribute specifies the number of virtual memory pages that the subsystem will prewrite and keep clean. The default value is 256 pages. To increase the rate of virtual memory dirty page prewriting, increase the value of the vm-page-prewrite-target attribute from the default value (256) by increments of 64 pages.

The vm-ubcdirtypercent attribute specifies the percentage of UBC LRU pages that can be modified before the virtual memory subsystem prewrites the dirty UBC LRU pages. The default value is 10 percent of the total UBC LRU pages (that is, 10 percent of the UBC LRU pages must be dirty before the UBC LRU pages are prewritten). To increase the rate of UBC LRU dirty page prewriting, decrease the value of the vm-ubcdirtypercent attribute by increments of 1 percent.

In addition, you may want to minimize the impact of I/O spikes caused by the sync function when prewriting UBC LRU dirty pages. The value of the ubc-maxdirtywrites attribute specifies the maximum number of disk writes that the kernel can perform each second. The default value of the ubc-maxdirtywrites attribute is 5 I/O operations per second.

To minimize the impact of sync (steady state flushes) when prewriting dirty UBC LRU pages, increase the value of the ubc-maxdirtywrites attribute.

4.7.14 Modifying the Size of the Page-In and Page-Out Clusters

The virtual memory subsystem reads in and writes out additional pages in an attempt to anticipate pages that it will need.

The vm-max-rdpgio-kluster attribute specifies the maximum size of an anonymous page-in cluster. The default value is 16 KB (2 pages). If you increase the value of this attribute, the system will spend less time page faulting because more pages will be in memory. This will increase the peak workload performance, but will consume more memory and decrease the total system workload performance.

Decreasing the value of the vm-max-rdpgio-kluster attribute will conserve memory and increase the total system workload performance, but will increase paging and decrease the peak workload performance.

The vm-max-wrpgio-kluster attribute specifies the maximum size of an anonymous page-out cluster. The default value is 32 KB (4 pages). Increasing the value of this attribute improves the peak workload performance and conserves memory, but causes more pageins and decreases the total system workload performance.

Decreasing the value of the vm-max-wrpgio-kluster attribute improves the total system workload performance and decreases the number of pageins, but decreases the peak workload performance and consumes more memory.

4.7.15 Modifying the Swap I/O Queue Depth for Pageins and Swapouts

Synchronous swap buffers are used for page-in page faults and for swapouts. The vm-syncswapbuffers attribute specifies the maximum swap device I/O queue depth for pageins and swapouts.

You can modify the value of the vm-syncswapbuffers attribute. The value should be equal to the approximate number of simultaneously running processes that the system can easily handle. The default is 128.

Increasing the swap device I/O queue depth increases overall system throughput, but consumes memory.

Decreasing the swap device I/O queue depth decreases memory demands and improves interactive response time, but decreases overall system throughput.

4.7.16 Modifying the Swap I/O Queue Depth for Pageouts

Asynchronous swap buffers are used for asynchronous pageouts and for prewriting modified pages. The vm-asyncswapbuffers attribute controls the maximum depth of the swap device I/O queue for pageouts.

The value of the vm-asyncswapbuffers attribute should be the approximate number of I/O transfers that a swap device can handle at one time. The default value is 4.

Increasing the queue depth will free memory and increase the overall system throughput.

Decreasing the queue depth will use more memory, but will improve the interactive response time.

If you are using LSM, you may want to increase the page-out rate. Be careful if you increase the value of the vm-asyncswapbuffers attribute, because this will cause page-in requests to lag asynchronous page-out requests.

4.7.17 Modifying the UBC Write Device Queue Depth

The UBC uses a buffer to facilitate the movement of data between memory and disk. The vm-ubcbuffers attribute specifies the maximum file system device I/O queue depth for writes. The default value is 256.

Increasing the UBC write device queue depth frees memory and increases the overall file system throughput.

Decreasing the UBC write device queue depth increases memory demands, but improves the interactive response time.

4.7.18 Controlling Large File Caching

If a large file completely fills the UBC, it may take all of the pages on the free page list, which may cause the system to page excessively. The vm-ubcseqpercent attribute specifies the maximum amount of memory allocated to the UBC that can be used to cache a file. The default value is 10 percent of memory allocated to the UBC.

The vm-ubcseqstartpercent attribute specifies the size of the UBC as a percentage of physical memory, at which time the virtual memory subsystem starts stealing the UBC LRU pages for a file to satisfy the demand for pages. The default is 50 percent of physical memory.

Increasing the value of the vm-ubcseqpercent attribute will improve the performance of a large single file, but decrease the remaining amount of memory.

Decreasing the value of the vm-ubcseqpercent attribute will increase the available memory, but will degrade the performance of a large single file.

To force the system to reuse the pages in the UBC instead of taking pages from the free list, perform the following tasks:

Make the maximum size of the UBC greater than the size of the UBC as a percentage of percentage of memory. That is, the value of the ubc-maxpercent attribute (the default is 100 percent) must be greater than the value of the vm-ubcseqstartpercent attribute (the default is 50 percent).

Make the value of the vm-ubcseqpercent attribute, which specifies the size of a file as a percentage of the UBC, greater than a referenced file. The default value of the vm-ubcseqpercent attribute is 10 percent.

For example, using the default values, the UBC would have to be larger than 50 percent of all memory and a file would have to be larger than 10 percent of the UBC (that is, the file size would have to be at least 5 percent of all memory) in order for the system to reuse the pages in the UBC.

On large-memory systems that are doing a lot of file system operations, you may want to lower the vm-ubcseqstartpercent value to 30 percent. Do not specify a lower value unless you decrease the size of the UBC. In this case, do not change the value of the vm-ubcseqpercent attribute.

4.7.19 Increasing the Paging Threshold

The vm-page-free-target attribute specifies the minimum number of pages on the free list before paging starts. The default value is 128 pages.

Increasing the value of the vm-page-free-target attribute will increase the paging activity but may improve performance when free memory is exhausted. If you increase the value, start at the default value (128 pages or 1 MB) and then double the value. Do not specify a value above 1025 pages or 8 MB. A high value can waste memory.

Do not decrease the value of the vm-page-free-target attribute unless you have a lot of memory or you experience a serious performance degradation when free memory is exhausted.

4.7.20 Enabling Aggressive Task Swapping

You can enable the vm-aggressive attribute (set the value to 1) to allow the virtual memory subsystem to aggressively swap out processes when memory is needed. This improves system throughput, but degrades the interactive response performance.

By default, the vm-aggressive attribute is disabled (set to 0), which results in less aggressive swapping. In this case, processes are swapped in at a faster rate than if aggressive swapping is enabled.

4.7.21 Decreasing the Size of the Metadata Buffer Cache

The metadata buffer cache contains recently accessed UFS and CDFS metadata. On large-memory systems with a high cache hit rate, you may want to decrease the size of the metadata buffer cache. This will increase the amount of memory that is available to the virtual memory subsystem. However, decreasing the size of the cache may degrade UFS performance.

The bufcache attribute specifies the percentage of physical memory that the kernel wires for the metadata buffer cache. The default size of the metadata buffer cache is 3 percent of physical memory. You can decrease the value of the bufcache attribute to a minimum of 1 percent.

For systems that use only AdvFS, set the value of the bufcache attribute to 1 percent.

4.7.22 Decreasing the Size of the namei Cache

The namei cache is used by all file systems to map file pathnames to inodes. Use dbx to monitor the cache by examining the nchstats structure.

To free memory resources, decrease the number of elements in the namei cache by decreasing the value of the name-cache-size attribute. The default values are 2*nvnode*11/10 (for 32-MB or larger systems) and 150 (for 24-MB systems). The maximum value is 2*max-vnodes*11/10.

4.7.23 Decreasing the Size of the AdvFS Buffer Cache

To free memory resources, you may want to decrease the percentage of physical memory allocated to the AdvFS buffer cache.

The AdvfsCacheMaxPercent attribute determines the maximum amount of physical memory that can be used for the AdvFS buffer cache. The default is 7 percent of memory. However, decreasing the size of the AdvFS buffer cache may adversely affect AdvFS I/O performance.

4.7.24 Reserving Physical Memory for Shared Memory

Granularity hints allow you to reserve a portion of dynamically wired physical memory at boot time for shared memory. Granularity hints allow the translation lookaside buffer to map more than a single page and enable shared page table entry functionality, which will cause fewer buffer misses.

On typical database servers, using granularity hints provides a 2 to 4 percent run-time performance gain that reduces the shared memory detach time. In most cases, use the Segmented Shared Memory (SSM) functionality (the default) instead of the granularity hints functionality.

To enable granularity hints, you must specify a value for the gh-chunks attribute. To make granularity hints more effective, modify applications to ensure that both the shared memory segment starting address and size are aligned on an 8-MB boundary.

Section 4.7.24.1 and Section 4.7.24.2 describe how to enable granularity hints.

4.7.24.1 Tuning the Kernel to Use Granularity Hints

To use granularity hints, you must specify the number of 4-MB chunks of physical memory to reserve for shared memory at boot time. This memory cannot be used for any other purpose and cannot be returned to the system or reclaimed.

To reserve memory for shared memory, specify a nonzero value for the gh-chunks attribute. For example, if you want to reserve 4 GB of memory, specify 1024 for the value of gh-chunks (1024 * 4 MB = 4 GB). If you specify a value of 512, you will reserve 2 GB of memory.

The value you specify for the gh-chunks attribute depends on your database application. Do not reserve an excessive amount of memory, because reserving memory decreases the memory available to the virtual memory subsystem and the UBC.

You can determine if you have reserved the appropriate amount of memory. For example, you can initially specify 512 for the value of the gh-chunks attribute. Then, invoke the following sequence of dbx commands while running the application that allocates shared memory:

# dbx -k /vmunix /dev/mem
 
(dbx) px &gh_free_counts
0xfffffc0000681748
(dbx) 0xfffffc0000681748/4X
fffffc0000681748:  0000000000000402 0000000000000004
fffffc0000681758:  0000000000000000 0000000000000002
(dbx)

The output shows the following:

The first number (402) specifies the number of 512-page chunks (4 MB).

The second number (4) specifies the number of 64-page chunks.

The third number (0) specifies the number of 8-page chunks.

The fourth number (2) specifies the number of 1-page chunks.

To save memory, you can reduce the value of the gh-chunks attribute until only one or two 512-page chunks are free while the application that uses shared memory is running.

The following attributes also affect granularity hints:

gh-min-seg-size
Specifies the shared memory segment size above which memory is allocated from the memory reserved by the gh-chunks attribute. The default is 8 MB.

gh-fail-if-no-mem
When set to 1 (the default), the shmget function returns a failure if the requested segment size is larger than the value specified by the gh-min-seg-size attribute, and if there is insufficient memory in the gh-chunks area to satisfy the request.
If the value of the gh-fail-if-no-mem attribute is 0, the entire request will be satisfied from the pageable memory area if the request is larger than the amount of memory reserved by the gh-chunks attribute.

In addition, messages will display on the system console indicating unaligned size and attach address requests. The unaligned attach messages are limited to one per shared memory segment.

4.7.24.2 Modifying Applications to Use Granularity Hints

You can make granularity hints more effective by making both the shared memory segment starting address and size aligned on an 8-MB boundary.

To share Level 3 page table entries, the shared memory segment attach address (specified by the shmat function) and the shared memory segment size (specified by the shmget function) must be aligned on an 8-MB boundary. This means that the lowest 23 bits of both the address and the size must be zero.

The attach address and the shared memory segment size is specified by the application. In addition, System V shared memory semantics allow a maximum shared memory segment size of 2 GB minus 1 byte. Applications that need shared memory segments larger than 2 GB can construct these regions by using multiple segments. In this case, the total shared memory size specified by the user to the application must be 8-MB aligned. In addition, the value of the shm-max attribute, which specifies the maximum size of a System V shared memory segment, must be 8-MB aligned.

If the total shared memory size specified to the application is greater than 2 GB, you can specify a value of 2139095040 (or 0x7f800000) for the value of the shm-max attribute. This is the maximum value (2 GB minus 8 MB) that you can specify for the shm-max attribute and still share page table entries.

Use the following dbx command sequence to determine if page table entries are being shared:

# dbx -k /vmunix /dev/mem
 
(dbx) p *(vm_granhint_stats *)&gh_stats_store
	struct {
	    total_mappers = 21
	    shared_mappers = 21
	    unshared_mappers = 0
	    total_unmappers = 21
	    shared_unmappers = 21
	    unshared_unmappers = 0
	    unaligned_mappers = 0
	    access_violations = 0
	    unaligned_size_requests = 0
	    unaligned_attachers = 0
	    wired_bypass = 0
	    wired_returns = 0
	} 
	(dbx)

For the best performance, the shared_mappers kernel variable should be equal to the number of shared memory segments, and the unshared_mappers, unaligned_attachers, and unaligned_size_requests variables should be 0 (zero).

Because of how shared memory is divided into shared memory segments, there may be some unshared segments. This occurs when the starting address or the size is aligned on an 8-MB boundary. This condition may be unavoidable in some cases. In many cases, the value of total_unmappers will be greater than the value of total_mappers.

Shared memory locking changes a lock that was a single lock into a hashed array of locks. The size of the hashed array of locks can be modified by modifying the value of the vm-page-lock-count attribute. The default value is 64.

4.8 Tuning the UBC

The UBC and the virtual memory subsystem compete for the physical memory that is not wired by the kernel. You may be able to improve file system performance by tuning the UBC. However, increasing the amount of memory available to the UBC will affect the virtual memory subsystem and may increase the rate of paging and swapping.

The amount of memory allocated to the UBC is determined by the ubc-maxpercent, ubc-minpercent, and ubc-borrowpercent attributes. You may be able to improve performance by modifying the value of these attributes, which are described in Section 4.4.

The following output may indicate that the size of the UBC is too small for your configuration:

The output of the vmstat or monitor command shows excessive file system page in activity but little or no page out activity or shows a very low free page count.

The output of the iostat command shows little or no swap disk I/O activity or shows excessive file system I/O activity.

The UBC is flushed by the update daemon. You can monitor the UBC usage lookup hit ratio by using dbx. You can view UBC statistics by using dbx and checking the vm_perfsum structure. You can also monitor the UBC by using dbx -k and examining the ufs_getapage_stats structure. See Chapter 2 for information about monitoring the UBC.

You can improve UBC performance by following the guidelines described in Table 4-4. You can also improve file system performance by following the guidelines described in Chapter 5.

Table 4-4: Guidelines for Tuning the UBC

Action Performance Benefit Tradeoff

Increase the memory allocated to the UBC (Section 4.8.1) Improves file system performance May cause excessive paging and swapping

Decrease the amount of memory borrowed by the UBC (Section 4.8.2) Improves file system performance Decreases the memory available for processes and may decrease system response time

Increase the minimum size of the UBC (Section 4.8.3) Improves file system performance Decreases the memory available for processes

Modify the application to use mmap (Section 4.8.4) Decreases memory requirements None

Increase the UBC write device queue depth (Section 4.7.17) Increases overall file system throughput and frees memory Decreases interactive response performance

Decrease the UBC write device queue depth (Section 4.7.17) Improves interactive response time Consumes memory

Action	Performance Benefit	Tradeoff
Increase the memory allocated to the UBC (Section 4.8.1)	Improves file system performance	May cause excessive paging and swapping
Decrease the amount of memory borrowed by the UBC (Section 4.8.2)	Improves file system performance	Decreases the memory available for processes and may decrease system response time
Increase the minimum size of the UBC (Section 4.8.3)	Improves file system performance	Decreases the memory available for processes
Modify the application to use `mmap` (Section 4.8.4)	Decreases memory requirements	None
Increase the UBC write device queue depth (Section 4.7.17)	Increases overall file system throughput and frees memory	Decreases interactive response performance
Decrease the UBC write device queue depth (Section 4.7.17)	Improves interactive response time	Consumes memory

The following sections describe these guidelines in detail.

4.8.1 Increasing the Maximum Size of the UBC

If there is an insufficient amount of memory allocated to the UBC, I/O performance may be degraded. If you allocate more memory to the UBC, you will improve the chance that data will be found in the cache. By preventing the system from having to copy data from a disk, you may improve I/O performance. However, allocating more memory to the UBC may cause excessive paging and swapping.

To increase the maximum amount of memory allocated to the UBC, you can increase the value of the ubc-maxpercent attribute. The default value is 100 percent. However, the performance of an application that generates a lot of random I/O will not be improved by enlarging the UBC because the next access location for random I/O cannot be predetermined. See Section 4.3.7 for information about UBC memory allocation.

4.8.2 Decreasing the Amount of Borrowed Memory

If vmstat output shows excessive paging but few or no pageouts, you may want to increase the value of the ubc-borrowpercent attribute. This situation can occur on low-memory systems (24-MB systems) because they reclaim UBC pages more aggressively than systems with more memory.

The UBC borrows all physical memory above the value of the ubc-borrowpercent attribute and up to the value of the ubc-maxpercent attribute. Increasing the value of the ubc-borrowpercent attribute allows more memory to remain in the UBC when page reclamation begins. This can increase the UBC cache effectiveness, but may degrade system response time when a low-memory condition occurs. The value of the ubc-borrowpercent attribute can range from 0 to 100. The default value is 20 percent. See Section 4.3.7 for information about UBC memory allocation.

4.8.3 Increasing the Minimum Size of the UBC

Increasing the value of the ubc-minpercent attribute will prevent large programs from completely filling the UBC. For I/O servers, you may want to raise the value of the ubc-minpercent attribute to ensure that memory is available for the UBC. The default value is 10 percent.

To ensure that the value of the ubc-minpercent is appropriate, use the vmstat command to examine the page-out rate.

If the values of the ubc-maxpercent and ubc-minpercent attributes are close together, you may degrade I/O performance or cause the system to page excessively. See Section 4.3.7 for information about UBC memory allocation.

4.8.4 Using mmap in Your Applications

You may want to use the mmap function instead of the read or write function in your applications. The read and write system calls require a page of buffer memory and a page of UBC memory, but mmap requires only one page of memory.

4.9 Tuning the Metadata Buffer Cache

A portion of physical memory is wired for use by the metadata buffer cache, which is the traditional BSD buffer cache. The file system code that deals with UFS metadata, which includes directories, indirect blocks, and inodes, uses this cache.

You may be able to improve UFS performance by following the guidelines described in Table 4-5.

Table 4-5: Guidelines for Tuning the Metadata Buffer Cache

Action Performance Benefit Tradeoff

Increase the memory allocated to the metadata buffer cache (Section 4.9.1) Improves UFS performance Reduces the memory available to the virtual memory subsystem and the UBC

Increase the size of the hash chain table (Section 4.9.2) Improves lookup speed Consumes memory

Action	Performance Benefit	Tradeoff
Increase the memory allocated to the metadata buffer cache (Section 4.9.1)	Improves UFS performance	Reduces the memory available to the virtual memory subsystem and the UBC
Increase the size of the hash chain table (Section 4.9.2)	Improves lookup speed	Consumes memory

The following sections describe these guidelines in detail.

4.9.1 Increasing the Size of the Metadata Buffer Cache

The bufcache attribute specifies the size of the kernel's metadata buffer cache as a percentage of physical memory. The default is 3 percent.

You may want to increase the size of the metadata buffer cache if you have a high cache miss rate (low hit rate). In general, you do not have to increase the cache size. Never increase the value of the bufcache to more than 10 percent.

To determine whether to increase the size of the metadata buffer cache, use dbx to examine the bio_stats structure. The miss rate (block misses divided by the sum of the block misses and block hits) should not be more than 3 percent.

Allocating additional memory to the metadata buffer cache reduces the amount of memory available to the virtual memory subsystem and the UBC. In general, you do not have to increase the value of the bufcache attribute.

4.9.2 Increasing the Size of the Hash Chain Table

The hash chain table for the metadata buffer cache stores the heads of the hashed buffer queues. Increasing the size of the hash chain table spreads out the buffers and may reduce linear searches, which improves lookup speeds.

The buffer-hash-size attribute specifies the size of the hash chain table for the metadata buffer cache. The default hash chain table size is 512 slots.

You can modify the value of the buffer-hash-size attribute so that each hash chain has 3 or 4 buffers. To determine a value for the buffer-hash-size attribute, use dbx to examine the value of nbuf, then divide the value by 3 or 4, and finally round the result to a power of 2. For example, if nbuf has a value of 360, dividing 360 by 3 gives you a value of 120. Based on this calculation, specify 128 (2 to the power of 7) as the value of the buffer-hash-size attribute.