3 Tuning Subsystem Operations

This chapter describes how you can tune your system to use resources most efficiently under a variety of system load conditions. Tuning your system can include changing system configuration file parameters or sysconfigtab attributes, increasing resources such as CPU or cache memory, and changing the system configuration, such as adding disks, spreading out file systems, or adding swap space.

Note
If you have a performance problem on your system, never attempt to tune your system until you have confirmed that the problem is not caused by an application that is either broken or in need of further optimization. For information on application optimization, see the Programmer's Guide.

As a general rule, performance tuning consists of performing several of the following tasks:

Analyzing the work load on your system and determining which system components may need tuning (Section 3.1)
Optimizing CPU use (Section 3.2)
Tuning memory (Section 3.4)
Tuning interprocess communications (Section 3.5)
Tuning I/O (Section 3.6)

Section 3.3 explains the mechanisms that you can use to tune your system.

3.1 Analyzing Tuning Possibilities

Prior to tuning your system, you need to understand how your system is being used. For example, is your system used primarily as a file server? Are users running many small applications or are they running mostly large applications? Without this understanding, your attempts at tuning may cause more harm than good. If you understand the system's intended use and you perceive a performance problem, keep the following tuning rules in mind:

Avoid unnecessary changes. The Digital UNIX operating system tries to tune itself in many ways according to the work load. For example, it dynamically adjusts the unified buffer cache (UBC) according to file system I/O.
Try the tuning suggestions that cause the least disruption to the user community. All components of the Digital UNIX operating system interact in some manner. A change you make in one component may have an unexpected effect elsewhere.
Make the easy changes first (for example, add an additional swap space area).
Make one change at a time. Try to avoid making too many changes at once. By making one change at time, you can track exactly what has helped or hurt the system and, if necessary, you can ensure that you can return to a previous state.
Know when to quit. Tuning has diminishing returns. Squeezing out the last half-percent of performance improvement may not be worth the effort that goes into it.

The following text provides an example of an analytical path you could follow to determine where your system needs tuning - after first confirming that your applications are not causing the performance problem:

Is there enough memory? Use the vmstat command to display information about virtual memory.
Check the number of pages on the free list. If the number is less than 128, you may have a virtual memory problem. Possible solutions to a virtual memory problem include the following:
- Ensure that no new application is adversely affecting your system environment.
- Modify the system load if possible.
- Check your swap space configuration. Spread out swap partitions across several disks. Configure your swap space at system startup to get the best performance benefit. Use the swapon -s command to display your swap space configuration. Use the iostat command to determine which disks are getting the most use. Do not swap to the system disk if possible.
- Decrease the amount of memory available to the UBC.
- Modify the virtual memory configuration attributes in the sysconfigtab file.
- Add more memory.
Is there enough CPU? Use the vmstat command to determine how the applications are using CPU resources:
- Check the idle time (id). A high idle time may not indicate anything that relates to a problem, or it could indicate an I/O bottleneck or an application bottleneck.
- Check the user time (us). A high user time and a low idle time could indicate a lack of CPU resources.
- Check the system time (sy). A high system time could indicate nothing, or it could indicate that the system is thrashing (that is, virtual memory is low and the system is trying to reclaim pages).
Is your system paging excessively? Use the vmstat command to check for a high page-out rate (pout). Your system may have a virtual memory problem if it is paging excessively.
Do you have excessive disk I/O? Use the iostat command to determine which disks are being used the most. Spread out your swap space across several disks. Spread out file systems across disks.
Do you have a large number of network retransmissions or dropped packets? You may have a problem in your network. Use the netstat command with the -i and -s options to produce statistics that will help you to analyze input and output problems. Use the nfsstat -c command to obtain information about NFS retransmissions.

After you determine where your performance problem exists, you can then begin your effort to remedy it. The following sections describe the various tuning possibilities.

3.2 Optimizing CPU Utilization

When applications are operating correctly but are experiencing CPU saturation (that is, a lack of available CPU cycles), your options for correcting the situation are limited. If the CPU overload condition is expected to continue indefinitely, the best long-term solution is to add more memory, add an additional processor (on multiprocessor systems), or replace your system with a larger one. In the short term, before you are able to increase CPU capacity, you can make a number of adjustments to temporarily increase performance on a system that is compute bound.

The following sections describe the effects of expanding CPU capacity on a Symmetrical Multiprocessing system (SMP) (Section 3.2.1) and temporary changes that you can make to reduce CPU load (Section 3.2.2).

3.2.1 Expanding CPU Capacity on an SMP System

SMP systems allow you to expand the computing power of a system by adding additional processors. In most cases, increasing computing power in this way improves the performance of a system. Adding additional processors can be an effective solution for performance problems on SMP systems that are compute bound (only nominal idle time) and have multiple processes with the ability to run concurrently. Note that your system's ability to take advantage of the increase in computing capacity provided by additional processors may be limited if you do not also increase your system's I/O capacity.

Workloads that lend themselves well to SMP include DBMS servers, mail servers, and compute servers, to name a few. Basically, most workloads that have multiple processes or multiple threads of execution that can run concurrently can benefit from SMP. It is important to note that the gating factor for these workloads in some cases is not computing power, and these types of workloads may require additional tuning. For example, workloads that are metadata intensive and that involve a limited number of directories may benefit more if NVRAM is also added to your system whenever an additional processor is added. (Metadata intensive applications open large numbers of small files and access them repetitively.)

The operating system is designed to ensure that the user load is balanced across the available processors. Factored into the load-balancing algorithm are the user load, system load, and the interrupt rate. The algorithm is tuned to attempt to allow threads that have recently run on a given processor to continue to run on that processor (to take advantage of data retained in caches). Users can optionally choose to bind a particular process to a particular processor. This can be done using either the runon command or the bind_to_cpu system call. See runon(1) or bind_to_cpu(3) for details.

The utilities iostat and vmstat allow you to monitor the memory, CPU, and I/O consumption on your system. The cpustat extension to kdbx also allows application developers to monitor the time spent in user mode, system mode, and kernel mode on each of the processors. This information can help application developers determine how effectively they are achieving parallelism across the system. To enhance parallelism, application developers working in Fortran or C should consider using the Kuch & Associates Preprocessor (KAP), which can have a significant impact on SMP performance. See the Programmer's Guide for details on KAP.

3.2.2 Short-Term Adjustments to Reduce CPU Load

In general, the following adjustments can be made to improve CPU processing on a temporary basis:

Job scheduling - Spread out the jobs within the time available. This can be done in a variety of ways:
- Prioritize the jobs so that important jobs get run first (nice command for jobs not yet started; renice command for jobs that are running).
- Schedule jobs at distinct times (at and cron commands) or when the load level permits (batch command).
Job sizing - Extremely large programs may run more efficiently if you increase the following program size limits: dfldsiz, maxdsiz, dflssiz, and maxssiz. Some extremely large programs may not run at all unless these parameters are adjusted. For example, an inadequate maxdsiz size limit may produce the following error:
Out of process memory...

See the manual System Administration for details on how you can adjust these limits. (Note that job scheduling can also be a very important consideration in determining how you are going to handle large programs. Also note that the limit and unlimit commands can affect several of these size limits.)
Reducing the size of the kernel - In certain situations, it may be necessary or helpful to reduce the size of the kernel to free up memory resources. Kernel size reductions can serve as a temporary solution until additional memory can be acquired. Such reductions can be made at installation time or by reconfiguring the kernel:
- At installation time (during the kernel build phase), you can minimize the number of kernel options in effect for your system. See the Installation Guide for details.
- On a running system, you can remove optional software support using the setld -d command. You can also delete support for any unused devices or device types by editing the system configuration file (/usr/sys/conf/system_name/file). See the manual System Administration for details on these two methods.

The system vnode table limits the number of active files. If your system is heavily loaded but does not have a shortage of memory, increasing the size of the system vnode table may improve performance. You can increase its size by giving a higher value to the maxusers attribute.

3.3 Methods of Tuning System Parameters

System parameters are global variables that you can tune in a variety of ways:

Assigning new values using dbx. (Values established by dbx are only temporary; they are always lost the next time the system is booted.)
Establishing values by editing the system configuration file (/usr/sys/conf/system_name) or the param.c file (/usr/sys/system_name/param.c).
Establishing values for configuration attributes in the sysconfigtab file (/etc/sysconfigtab). (Configuration attributes correspond to global variables in the system configuration file and the param.c file.)

The latter method is preferred because it requires only a system reboot to put new values permanently into effect; whereas, using the other methods is either only a temporary solution (dbx method) or requires a kernel rebuild and a system reboot (system configuration file or param.c method). Kernel rebuilds can be difficult and time consuming and should be avoided whenever possible.

Using the attribute method entails establishing attribute values by interacting with the Kernel Tuner (dxkerneltuner) provided by the Common Desk Environment's (CDE) graphical user interface or by issuing the sysconfigdb or sysconfig -r commands from a terminal window:

Modifying attributes with the Kernel Tuner (dxkerneltuner) has the same effect as modifying attributes with the sysconfigdb command. The Kernel Tuner is accessed from the CDE control panel by clicking on the Application Manager icon, selecting the category Monitoring/Tuning, and then selecting Kernel Tuner. The sysconfigdb command is maintained for those systems that have only character-cell displays and for those users who prefer to use command line interfaces in terminal windows.
The sysconfigdb command establishes attribute values in the sysconfigtab file; these values take permanent effect and are activated in the system when it is rebooted. (You can also modify the sysconfigtab file by directly editing it.)
The sysconfig -r command modifies values in a running system, but the values are lost when a system is rebooted. This command works only on attributes that are dynamically reconfigurable, that is, those attributes that can be changed on a running kernel. This allows you to experiment with a variety of settings for these particular attributes until you find one that achieves the optimum effect. Then, to preserve that setting beyond the next system reboot, you must modify the sysconfigtab file by using the sysconfigdb command (or by directly editing the sysconfigtab file).

As indicated earlier in this section, the value of a global variable can be reset by a variety of mechanisms. As a result, global variables in the system configuration file and the param.c file can have values that differ from those of their corresponding attributes in the sysconfigtab file. To understand how one value of a global variable is overridden by another, it is necessary to understand the levels at which global variables are controlled in a system. From lowest to highest, the control levels are as follows:

Default values initialized by subsystem code
The system configuration file and the param.c file
The sysconfigtab file
Values established in a running system by the sysconfig -r command (only a few of the attributes can be set in this way) or by dbx

Each global variable can have a different value at each level of control. As a result, the following precedence rules, from highest to lowest, apply in a running system:

Values set since the last reboot by using dbx or the sysconfig -r command override values set by all other methods. (However, values established by dbx or the sysconfig -r command are lost the next time the system is booted.)
When a system is rebooted, values set in sysconfigtab override values set in the system configuration file or the param.c file.
When a system is rebuilt, values set in the system configuration file or the param.c file override the default values initialized by subsystem code.

See Section 2.2.9 for information on how to monitor the values of configuration attributes. For descriptions of the tunable configuration attributes in sysconfigtab, see Appendix B. (Note that not all subsystems displayed by a sysconfig -r command are covered in Appendix B. Only those subystems that have tunable attributes affecting performance are covered.)

3.4 Tuning Memory

The memory subsystem is one of the first places where a performance problem can occur. Performance can be degraded when the virtual memory subsystem cannot keep up with the demand for pages.

Memory tuning can be divided into the following two areas of concern:

Tuning the UBC
You can limit the amount of memory that the UBC uses for the file system buffer cache. This increases the amount of memory available to the virtual memory subsystem, but decreases I/O performance. (See Section 3.4.1 for details on UBC tuning.)
Tuning your virtual memory subsystem
You can tune several sysconfigtab attributes to improve the performance of the virtual memory subsystem. Another method of improving its performance is to configure additional swap space or spread out your disk I/O. Adding more physical memory is an option that will always improve performance. (See Section 3.4.2 for details on virtual memory tuning.)

Table 3-1 lists some sysconfigtab attributes that can have a significant impact on virtual memory, including paging and swapping, and the UBC. Reboot the system if you change any of these attributes.

Table 3-1: Configuration Attributes and Parameters Used to Tune Memory

Attribute/Parameter	Default	Description
Parameters:
`dfldsiz`	134217728	Default data segment size limit.
`dflssiz`	1048576	Default stack size limit.
`maxdsiz`	1073741824	Maximum data segment size limit.
`maxusers`	32	See `maxusers` attribute.
Attributes:
`bufcache`	3	Percentage of memory dedicated to the file system buffer cache.
`buffer-hash-size`	512	The size of the buffer cache hash chain table used to store the heads of the hashed buffer queues.
`max-proc-per-user`	64	Maximum number of processes one user can run simultaneously.
`maxusers`	32	Number of users the system can support simultaneously.
`msg-mnb`	16384	Maximum number of bytes on a System V message queue.
`msg-mni`	50	Number of System V message queue identifiers.
`msg-tql`	40	Number of System V message headers.
`name-cache-hash-size`	256	The size of the hash chain table for the namei cache.
`ubc-borrowpercent`	20	The percentage of physical memory above which the UBC is borrowing memory from the system.
`ubc-maxpercent`	100	Maximum percentage of memory that the UBC can consume.
`ubc-minpercent`	10	Percentage of memory at which page stealing from the UBC is prohibited.
`vm-aggressive-swap`	0(off)	Controls whether the task swapper should be more aggressive in swapping out idle tasks to prevent the system from reaching a low-memory condition.
`vm-asyncswapbuffers`	4	The total number of asynchronous I/O requests by the page stealing daemon that can be outstanding, per swap partition, at any one time.
`vm-clustermap`	102410241	Cluster duplication map size.
`vm-clustersize`	1024*64	Maximum cluster duplication for each bp.
`vm-cowfaults`	4	Copy point.
`vm-csubmapsize`	1024*1024	Size of kernel copy map. (The kernel copy map is the address space for copying data into and out of the kernel.)
`vm-heappercent`	7	Percent of kernel virtual address space to allocate for use by the `heap`.
`vm-inswappedmin`	1	Minimum number of inswapped ticks (in seconds) that must occur before a task can be swapped out.
`vm-mapentries`	200	Maximum number of virtual memory map entries that a user map can have. Map entries are allocated when the user maps an object into address space that is not adjacent to another object that has the same protection and that can grow.
`vm-maxvas`	1L<<30	Maximum virtual address space for user maps (see `vm-mapentries`).
`vm-maxwire`	1L<<24	Maximum amount of memory that can be wired by a user process.
`vm-page-free-min`	20	The free list's low watermark (below which physical memory reclamation begins).
`vm-page-free-optimal`	74	A value for pages on the free list that can cause the system to swap out entire tasks to reduce memory demand.
`vm-page-free-reserved`	10	The number of pages on the free list that are reserved for the kernel.
`vm-page-free-target`	128	The free list's high watermark (above which physical memory reclamation stops).
`vm-page-prewrite-target`	256	The number of pages that the virtual memory subsystem attempts to keep clean.
`vm-segmentation`	1(on)	Enables shared page tables.
`vm-syncswapbuffers`	128	Number of synchronous swap buffers.
`vm-syswiredpercent`	80	Maximum percentage of system-wide wired memory.
`vm-ubcbuffers`	256	Minimum number of buffers that the UBC can contain.
`vm-ubcdirtypercent`	10	Maximum percentage of UBC pages that can be modified ("dirtied").
`vm-ubcpagesteal`	24	Number of pages that the UBC can have for a file before the UBC will begin to take pages from the file to satisfy the file's own demands.
`vm-ubcseqpercent`	10	The size of a file as a percentage of the UBC.
`vm-ubcseqstartpercent`	50	The size of the UBC as a percentage of total memory.
`vm-vpagemax`	16384	The maximum number of individually protected pages in a user address space.
`vm-zone_size`	67108864	Amount of kernel virtual address space that is available for many of the system's dynamic data structures.

Detailed descriptions of the attributes are provided in Section B.21. For information about the parameters listed in the table, see the manual System Administration.

3.4.1 UBC Subsystem

In some cases, an I/O-intensive process may degrade the performance of other processes by using a major portion of the UBC. If you need more memory for the virtual memory subsystem, you can reduce the amount of memory that is available to the UBC. Note that reducing the memory available to the UBC may adversely affect file system I/O because less file system data will be cached in the UBC and increasing amounts of data will have to be accessed from disk.

The UBC is flushed by the update daemon. UBC statistics can be viewed by using dbx and checking the vm_perfsum structure. You can also monitor the UBC by using the dbx -k command and examining the ufs_getapage_stats kernel data structure.

3.4.1.1 Changing the Size of the UBC

The size of the UBC is influenced by the values of the following configuration attributes in the sysconfigtab file:

ubc-maxpercent: Defines the maximum amount of physical memory that can be used for the UBC. The default is 100 percent of memory.

ubc-minpercent: Defines the minimum amount of physical memory allocation for the UBC. The default is 10 percent of memory.

ubc-borrowpercent: Defines the percentage of physical memory above which the UBC is borrowing memory from the system. When the system runs out of memory, the UBC is forced to give all borrowed memory back to the system before the system resorts to global page reclamation. The default is 20 percent of memory. (Note: It is much faster to free UBC pages than to perform global page reclamation because UBC pages are usually clean (unmodified) and can be freed without being written to disk.)

By default, the UBC will use at least 10 percent of all memory and can use up to 100 percent of all memory. If you want to reduce the amount of memory that can be allocated to the UBC, you could set ubc-maxpercent to 50 percent of all memory. This ensures that the UBC will not adversely affect the virtual memory subsystem. Note that the performance of an application that generates a lot of random I/O will not be improved by enlarging the UBC because the next access location for random I/O cannot be predetermined.

If vmstat output shows excessive paging but few or no page outs, the value of ubc-borrowpercent is probably set too low. It is particularly important to watch for this on low-memory systems (24-MB systems) because they tend to reclaim UBC pages much more aggressively than systems with more memory, and this condition can have an adverse effect on system performance.

Typically, the UBC borrows all physical memory above ubc-borrowpercent (up to the ubc-maxpercent limit). Increasing this value allows more memory to remain in the UBC before global page reclamation begins (that is, before the number of free pages in the system equals the value of the vm-page-free-min attribute). This typically increases the UBC cache effectiveness, but decreases the system response time when a low memory condition occurs. The range of values for this parameter is 0 to 100.

If the page-out rate is high and you are not using the file system heavily, you could decrease the value of ubc-maxpercent to reduce the rate of paging. Use the vmstat command to determine whether the system is paging excessively. Using dbx, periodically examine the vpf_pgiowrites and vpf_ubcalloc fields of the vm_perfsum kernel structure. The page-out rate may shrink if page outs greatly exceed UBC allocations.

For I/O servers, you may want to raise the value of the ubc-minpercent attribute in the sysconfigtab file to ensure that more memory is available for the UBC. If you do this, large programs that run occasionally will not completely fill the UBC. To check that you did not raise ubc-minpercent too high, use the vmstat command to examine the page-out rate.

If you change the values of the ubc-maxpercent and ubc-minpercent attributes in the sysconfigtab file, do not make the values so close together that you degrade I/O performance or cause the system to page excessively.

3.4.1.2 Preventing UBC Cache Thrashing

The Digital UNIX operating system uses some configuration attributes in the sysconfigtab file to prevent a large file from completely filling the UBC, thus limiting the amount of memory available to the virtual memory subsystem. The system will reuse the pages in the UBC instead of taking pages from the free page list when both of the following conditions are met:

The size of the UBC is greater than the value of the vm-ubcseqstartpercent attribute in the sysconfigtab file (the default is 50 percent of total memory).
A referenced file is larger than the value of the vm-ubcseqpercent attribute in the sysconfigtab file (the default is 10 percent of current UBC size).

The vm-ubcseqstartpercent and vm-ubcseqpercent attributes in the sysconfigtab file are used to ensure that a large file does not take all of the pages on the free page list and cause the system to page excessively.

For example, using the default values, the UBC would have to be larger than 50 percent of all memory and a file would have to be larger than 10 percent of the UBC (that is, the file size would have to be at least 5 percent of all memory) in order for the system to reuse the pages in the UBC.

To determine the values of the vm-ubcseqstartpercent and vm-ubcseqpercent attributes, use the sysconfig -q command.

On large-memory systems that are doing a lot of file system operations, you may want to lower the vm-ubcseqstartpercent value to 30 percent. Do not specify a lower value unless you decrease the size of the UBC. You probably do not want to change the value of vm-ubcseqpercent.

3.4.1.3 Tailoring Usage of Metadata and Namei Buffer Caches

Although all memory is shared between the virtual memory subsystem and the UBC, the file system code that deals with the UNIX file system's metadata - including directories, indirect blocks, and inodes - still uses the traditional BSD buffer cache. The following configuration attributes in the sysconfigtab file affect file system buffer cache usage:

The bufcache attribute defines the size of the kernel's metadata buffer cache. The value associated with bufcache specifies the percentage of the system's physical memory that is allocated for the metadata buffer cache. The default memory allocation for the metadata buffer cache is 3 percent for 32-MB or larger systems and 2 percent for 24-MB systems. (Note that AdvFS does not use the cache controlled by the bufcache attribute. See POLYCENTER: Guide to File System Administration for details on buffer cache tuning.)
The buffer-hash-size attribute defines the size of the hash chain table for the metadata buffer cache. The hash chain table is used to store the heads of the hashed buffer queues. Increasing the size of buffer hash chain tables spreads the buffers and may reduce linear searches, thus improving lookup speeds. The default hash chain size is 512 slots.
The value of the buffer-hash-size attribute can be changed so that each hash chain has 3 or 4 buffers. To determine a value to assign to the buffer-hash-size attribute, use dbx to examine the value of nbuf, then divide the value by 3 or 4, and finally round the result to a power of two. For example, if dbx shows that nbuf has a value of 360, dividing 360 by 3 gives you a value of 120. Based on that value, 128 (that is, 2**7) would be a good value to use for the buffer-hash-size attribute. (See Section 2.2.10 for information on how to use dbx to examine system variables such as nbuf.)
The name-cache-hash-size attribute defines the size of the hash chain table for the namei cache. Increasing the size of hash chain tables spreads the namei cache elements and may reduce linear searches, thus improving lookup speeds. The default hash chain table for the namei cache is 256 slots.
You can change the value of the name-cache-hash-size attribute so that each hash chain has three or four name cache entries. To determine a value to assign to the name-cache-hash-size attribute, divide the value of name-cache-size attribute by three or four and then round the result to a power of two. For example, if the value of name-cache-size is 1029, dividing 1029 by four produces a value of 257. Based on this calculation, 256 (that is, 2**8) would be a good value for the name-cache-hash-size attribute.
If your system has adequate physical memory, you can also improve performance by increasing the value of the name-cache-size attribute instead of adjusting the hash size. Select which method to use depending on whether physical memory is the resource that is constraining the performance of your system.

To determine whether you should change the value of bufcache, use dbx to examine the bio_stats structure (see Section 2.2.10.5). The miss rate (block misses divided by the sum of the block misses and block hits) should not be more than 3 percent. If you have a high miss rate (low hit rate), you may want to raise the value of bufcache. Note that any additional memory that you allocate to the metadata buffer cache is taken away from the rest of the system. This may cause system performance to decline because it reduces the amount of memory that is available to the UBC and the virtual memory subsystem. In most cases, it is not advisable to modify the bufcache value. If you need to raise the value, never raise it to more than 10 percent.

You can decrease the value of bufcache on large memory systems if the hit rate is high and you want to increase the amount of memory that is available to the virtual memory subsystem.

3.4.2 Virtual Memory Subsystem

Excessive paging, which is sometimes called thrashing, decreases performance. This means that the natural working set size has exceeded available memory. Because the virtual memory subsystem runs at a higher priority, it blocks out other processes and spends all system resources on servicing page faults for the currently running processes.

You can determine whether a system has memory problems by examining the output of the vmstat command. The pout column lists the number of page outs. The free column lists the amount of pages on the free page list. Less than 128 pages on the free page list and a consistently high number of page outs may indicate that excessive paging and swapping is occurring.

Some general solutions for reducing excessive paging and swapping are as follows:

Reduce memory demands on the system by running fewer applications simultaneously. Use the at or batch command to run applications at night. See Section 3.2.2 for more information.
Reduce the application's use of memory by using dynamically allocated memory instead of statically allocated memory. Also, use dynamically allocated memory more effectively, if possible. See the Programmer's Guide for more information on how to allocate memory.
Add more physical memory.
Reduce the amount of memory available for the UBC. Note that this may adversely affect I/O performance. See Section 3.4.1.1 for more information.
Optimize the use of your swap space. See Section 3.4.2.1 for more information.

See Table 3-1 for a list of parameters and attributes that can be used to tune virtual memory. Detailed descriptions of the attributes are provided in Section B.21. For information about the parameters listed in the table, see the manual System Administration.

3.4.2.1 Modifying Your Swap Space Configuration

To optimize the use of your swap space, use your fastest disks for swap devices and spread out your swap space across multiple devices. Use the swapon -s command to display your swap space configuration. Use the iostat command to determine which disks are being used the most.

To ensure the best performance, place each swap partition on its own disk (instead of placing multiple swap partitions on the same disk). The page reclamation code uses a form of disk striping (known as swap space interleaving) so that pages can be written to the multiple disks. In addition, configure all of your swap devices at boot time to optimize swap space. See the manual System Administration for details on how to perform these operations.

To increase performance, you can change your swap mode from immediate mode (the default) to deferred mode (overcommitment mode) by removing or moving the /sbin/swapdefault file. Deferred mode requires less swap space than immediate mode and causes the system to run faster because less swap bookkeeping is required. However, because deferred mode does not reserve swap space in advance, the swap space may not be available when it is needed by a task and the process may be killed asynchronously. (See Appendix A for information about special considerations associated with low-memory systems (24-MB systems) operating in overcommitment mode.)

Application messages such as the following usually indicate that not enough swap space is configured into the system or that a process limit has been reached:

lack of paging space
swap space below 10 percent free

See the manual System Administration for information on how to fix these problems.

3.5 Tuning Interprocess Communication

You may be able to improve IPC performance by tuning the following configuration attributes in the sysconfigtab file:

msg-mnb (maximum number of bytes on queue)
The process will be unable to send a message to a queue if doing so would make the total number of bytes in that queue greater than the limit specified by msg-mnb. When the limit is reached, the process sleeps, waiting for this condition to be false.
msg-tql (number of system message headers)
The process will be unable to send a message if doing so would make the total number of message headers currently in the system greater than the limit specified by msg-tql. If the limit is reached, the process sleeps, waiting for a message header to be freed.

You can track the use of IPC facilities with the ipcs -a command (see ipcs(1)). By looking at the current number of bytes and message headers in the queues, you can then determine whether you need to increase the values of the msg-mnb and msg-tql attributes in the sysconfigtab file to diminish waiting.

You may also want to consider tuning several other IPC attributes in the sysconfigtab file. How you tune the following attributes depends on what you are trying to do in your application:

Message attributes
- msg-max (maximum message size)
- msg-mni (number of message queue identifiers)
Semaphore attributes
- sem-mni (number of semaphore identifiers)
- sem-msl (number of semaphores per ID)
- sem-opm (maximum number of operations per semop call)
- sem-ume (maximum number of undo entries per process)
- sem-vmx (semaphore maximum value)
- sem-aem (adjust on exit maximum value)
Shared memory attributes
- shm-max (maximum shared memory segment size)
- shm-min (minimum shared memory segment size)
- shm-mni (number of shared memory identifiers)
- shm-seg (maximum attached shared memory segments per process)
(Note: As a design consideration, consider whether you would be better off using threads instead of shared memory.)

3.6 Tuning I/O

I/O tuning can be divided into the following three areas of concern:

File systems performance (Section 3.6.1)
You can improve disk I/O performance by changing file system fragment sizes and other parameters that control the layout of the file systems.
Network performance (Section 3.6.2)
You can improve network performance by reducing the number of network applications, redesigning the network, or adding more memory.
NFS performance (Section 3.6.3)
In addition to improving NFS performance by using techniques that improve the performance of other file systems, you can also improve NFS performance by using Prestoserve and adjusting a few of its parameters. (See the Guide to Prestoserve for details on Prestoserve.)

The operating system includes several configuration parameters and attributes that can affect the I/O subsystem. As specified in Table 3-2, they are set in either the system configuration file or the sysconfigtab file or by using dbx. Reboot the system if you change any configuration parameters or attributes to place the new values in effect.

Table 3-2: Tunable I/O Subsystem Attributes and Parameters

Attribute/Parameter	Default	Description
Read parameters:
`cluster_consec_incr`	1	The increment for determining the number of blocks that should be combined on the next read-ahead request after the first read-ahead request. (Set with `dbx`.)
`cluster_consec_init`	2	The number of blocks that should be combined for the first read-ahead request. See Section 3.6.1.2 for more details on this parameter. (Set with `dbx`.)
`cluster_lastr_init`	-1	The number of contiguous reads that need to be detected before read-ahead is requested. The default value will start read-ahead on the very first contiguous read request. (Set with `dbx`.)
`cluster_max_read_ahead`	8	The maximum number of clusters that can be used in read-ahead operations. See Section 3.6.1.2 for more details on this parameter. (Set with `dbx`.)
`cluster_read_all`	1	This variable is either on (!= 0) or off (==0). By default (on), perform cluster read operations on nonread-ahead blocks and read-ahead blocks. If off, perform cluster read operations only on read-ahead blocks. See Section 3.6.1.2 for more details on this parameter. (Set with `dbx`.)
Write parameters:
`cluster_maxcontig`	8	The number of blocks that will combined into a single write request. The default tries to combine eight 8K-byte blocks into a 64K-byte cluster. This variable controls all mounted UNIX file systems (UFS). See Section 3.6.1.2 for more details on this parameter. (Set with `dbx`.)
`cluster_write_one`	1	This variable is either on (!=0) or off (==0). By default (on), when a cluster needs to be written (that is, 64KB of data has been dirtied), but nonlogically contiguous blocks make up the cluster, just the contiguous data is written, leaving the remaining data. The remaining data may be combined into future cluster requests. If off, 64KB of data will be written regardless of the number of write requests required to do so. (Set with `dbx`.)
Other parameters that influence I/O:
`delay_wbuffers`	0	This variable applies only to UFS. It is either on (!=0) or off (==0). By default (off), write-behind is turned on. If on, flushing full buffers to disk is delayed until a `sync` call is issued. See Section 3.6.1.2 for more details on this parameter. (Set with `dbx`.)
`maxcontig parameter`	8	The maximum number of contiguous blocks that will be laid out before forcing a rotational delay. (Set with `tunefs` or `newfs`.)
`maxusers`	32	See `maxusers` attribute.
`rotdelay`	4	The expected time (in milliseconds) to service a transfer completion interrupt and initiate a new transfer on the same disk. It is used to decide how much rotational spacing to place between successive blocks in a file. If zero, blocks are allocated contiguously. (Set with `tunefs` or `newfs`.)
Attributes that influence I/O:
`maxusers`	32	The number of users that your system can support simultaneously without straining system resources. See Section 3.6.1 for more details on this attribute.
`max-vnodes`	Varies	The maximum number of vnodes that can be allocated on a system. On a 32-MB or larger system, the maximum is the number of vnodes that will fit into 5 percent of available memory; on a 24-MB system, the default is 1000. (Set in the `sysconfigtab` file.)
`min-free-vnodes`	Varies	The minimum number of free vnodes that will be kept on the free list. On a 32-MB or larger system, the default is the value of nvnode; on a 24-MB system, the default is 150. (Set in the `sysconfigtab` file.)
`namei-cache-valid-time`	1200	The amount of time, in seconds, that governs when vnodes are deallocated. (Set in the `sysconfigtab` file.)
`open-max-hard`	4096	Hard limit for the number of files that a process can have open at one time. (Set in the `sysconfigtab` file.)
`open-max-soft`	4096	Soft limit for the number of file descriptors that a process may have open. This value is the default for all processes, and it must be less than or equal to the value of `open_max_hard`. See Section 3.6.1.2 for more details on this parameter. (Set in the `sysconfigtab` file.)
`vnode-age`	120	The amount of time, in seconds, that a vnode is guaranteed to be kept on the free list before it is recycled. (Set in the `sysconfigtab` file.)

For information about the parameters listed in the table, see the manual System Administration. Detailed descriptions of the attributes are provided in Section B.8.

3.6.1 Disk Subsystem

Disk throughput is the gating performance factor for most applications. Data transfers to and from disk are much slower than data transfers involving the CPU or main memory. As a result, configuring and tuning disk subsystems to maximize the efficiency of I/O operations can have a critical impact on an application's performance.

The size of the disk operation is also important. In doing I/O to a disk, most of the time is usually taken up with the seek followed by the rotational delay. This is called the access time. For small I/O requests, access time is more important than the transfer rate. For large I/O requests, the transfer rate is more critical than the access time. Access time is also important for workstation, timeshare, and server environments.

Most performance problems occur because of disk saturation, that is, because demands for disk I/O exceed the capacity of your system. Before you attempt to fix a disk saturation problem by tuning the UFS and the AdvFS file systems and the Common Access Method (CAM) subsystem, try to improve performance by making some of the following adjustments:

Use fast disks.
If possible, use many small disks instead of a few large ones. Small-sized disks usually have a better seek time and less rotational delay.
Reduce or stop paging and swapping by tuning the UBC (see Section 3.4.1) or virtual memory (see Section 3.4.2).
Run fewer applications simultaneously.
Compress files to regain disk space.
User LSM mirroring to improve read performance. (Striping is also available under LSM and can be beneficial.) See the manual Logical Storage Manager for details.
Use Prestoserve hardware. See the Guide to Prestoserve for details on Prestoserve.
Use quotas to limit users' disk space.
Layout the file systems across multiple disks to spread the I/O load evenly. Group together similar files, projects, and groups. Use as few file systems per disk as possible.
For disk efficiency, isolate performance critical files. If you have multiple I/O controllers, spread out the disks among the controllers. Use VMEbus-based I/O subsystems if possible.
Spread out swap partitions across multiple disks.
Optimize some file systems for transfer rate (for example, use striping on your fastest disks and use more than one controller if multiple disks are involved) and some for access time (for example, reduce fragmentation).
The memory file system (MFS) can improve read/write performance, but it is a volatile cache. Data is lost on reboot. (For details, see mfs(8).)
To make look-up operations faster, you can adjust the size of the namei cache, which maps pathnames to inodes. You can monitor the hit rate by using dbx and examining the nchstats structure. Adjust the cache with the maxusers or name-cache-size attribute in the sysconfigtab file.
The values of several system parameters are influenced by the maximum number of users. By changing the value of the maxusers attribute, adjustments are automatically made to the values of several other parameters, for example, the parameters and attributes associated with the maximum number of active processes allocated for each system (maxuproc), the number of vnodes (min-free-vnodes), and the size of the namei cache (name-cache-size).

The following sections describe how to tune the Virtual File System (VFS), UFS, and AdvFS file systems and the CAM subsystem.

3.6.1.1 Tuning the Virtual File System

Depending on system requirements, you can change Virtual File System (VFS) limits by tuning the following configuration attributes in the sysconfigtab file:

max-vnodes
min-free-vnodes
vnode-age
namei-cache-valid-time

Allocation and deallocation of vnodes is handled dynamically by the system using the values set for these four attributes. With the exception of namei-cache-valid-time, the values of all of these attributes can be changed with the sysconfig -r command while the kernel is executing. Tuning considerations associated with these configuration attributes are as follows:

max-vnodes
The maximum number of vnodes that can be allocated on a system can be set by the max-vnodes attribute. If max-vnodes is not set in the sysconfigtab file, the default value for the maximum number of vnodes for 24-MB systems is 1000; for 32-MB or larger systems, the default value is calculated from the following values:
- The system memory size
- The percentage of the total system memory that can be used for vnodes
The default value for the percentage of memory that can be used for vnodes is defined by a global element named vn_conf.percent_mem_for_vnodes (5 percent by default).
The system allocates vnodes based on demand, up to the maximum number of vnodes, and later deallocates them when their demand goes down. On a very busy system, if the number of active vnodes tends to exceed the maximum number of vnodes, you can adjust the maximum number upward by modifying the value associated with max-vnodes. For example:
# sysconfig -r vfs max-vnodes=15000
# sysconfig -q vfs max-vnodes
```
max-vnodes: 15000
```
#

Increasing the maximum number of vnodes puts an extra demand on the available memory on the system and should only be done if the system reports that it is out of vnodes. If the number of users on the system exceeds the value of maxusers, increase the value of max-vnodes proportionally.
You can change the maximum number of vnodes in either of the following ways:
- Set a specific maximum number of vnodes by using the max-vnodes attribute. (Applies to 32-MB or larger systems as well as 24-MB systems.)
- Change the percentage of memory for use by vnodes by using dbx to change the value of vn_conf.percent_mem_for_vnodes (5 percent by default). (Applies only to 32-MB or larger systems.)
min-free-vnodes
When vnode deallocation is in progress, the value of min-free-vnodes determines the minimum number of free vnodes that will be kept on the free list. Vnode deallocation stops when the number of free vnode reaches this value. A larger value for min-free-vnodes caches more free vnodes in the system and improves performance when free vnodes are reactivated as a result of vnode cache lookup operations. However, a larger value also has a proportional increase in the demand on memory because of the increase in the retention of vnodes.

On 24-MB systems the default value associated with the min-free-vnodes attribute is 150. On 32-MB or larger systems, the default value depends on the value of the maxusers attribute. It is possible to set maxusers so high that the value of min-free-vnodes is close to or larger than the value associated with the max-vnodes attribute. These conditions can have the following effects:
- Vnode deallocations will not occur if min-free-vnodes is larger than max-vnodes.
- Vnode deallocation will not be very effective if the value of min-free-vnodes gets close to max-vnodes.
On systems that need to reclaim the memory used by vnodes, you should ensure that the value of min-free-vnodes is significantly lower than max-vnodes.
If the value of min-free-vnodes needs to be close to the value of max-vnodes, it is recommended that you turn off vnode deallocation by using the sysconfig -r command to set the value of min-free-vnodes to be larger than max-vnodes. (You can also turn off vnode deallocation by setting the vnode-deallocation-enable attribute to zero (0). However, this method is not recommended because it causes the system to be very conservative in allocating vnodes.)
The value of min-free-vnodes can be changed on a running kernel. For example:
# sysconfig -q vfs min-free-vnodes
```
min-free-vnodes: 10388
```
# sysconfig -r vfs min-free-vnodes=468
# sysconfig -q vfs min-free-vnodes
```
min-free-vnodes: 468
```
#
vnode-age
A decision to recycle a vnode on the free list is based on its age and the value of the vnode-age attribute in the sysconfigtab file. The value of the vnode-age attribute represents the time in seconds that a vnode is guaranteed to be kept on the free list before it is recycled. If a vnode selected for recycling from the LRU vnode free list is not older than vnode-age, a new vnode is allocated.
The default value for vnode-age is set to 120 seconds on 32-MB or larger systems (two seconds on 24-MB systems). A larger value of vnode-age retains free vnodes on the free list for a longer time and can improve the chances of it being successfully looked up and reused before it is recycled. The value of vnode-age can be changed with the sysconfigdb command (which takes effect when the system is rebooted) or with the sysconfig -r command (which takes effect immediately on a running system).
namei-cache-valid-time
Vnodes are deallocated only if they have not been looked up within the amount of time preset by the namei-cache-valid-time attribute in the sysconfigtab file. Its default value is 1200 seconds on 32-MB or larger systems, and on 24-MB systems, 30 seconds.
Use the sysconfigdb command to change the value of this configuration attribute in the sysconfigtab file.
Increasing the value of ncache_valid_time delays deallocation of vnodes. Decreasing the value causes faster deallocations, but reduces the efficiency of the vnode cache.

If you need to optimize processing time, you can disable vnode deallocation by setting the value of the attribute vnode-deallocation-enable to zero (0). Disabling vnode deallocation increases memory usage because memory used by the vnodes is not returned back to the system.

3.6.1.2 Tuning the UNIX File System

Use the dumpfs command to display file system information.

To tune the UNIX File System (UFS), you can use one or more of the following options and techniques:

Use LSM mirroring and LSM striping
LSM mirroring can improve read performance, but it slows down write performance.
LSM striping improves performance by evenly distributing the I/O load across a number of disk drives.
See the manual Logical Storage Manager for details.
Use Prestoserve
Prestoserve can dramatically improve synchronous write performance. (See the Guide to Prestoserve for details.)
Check for disk fragmentation
You can determine whether a disk is fragmented by determining how effectively the system is clustering. You can do this by using dbx to examine the ufs_clusterstats, ufs_clusterstats_read, and ufs_clusterstats_write structures. UFS block clustering is usually reasonably efficient. If the numbers from the UFS clustering kernel structures show that clustering is not being particularly effective, the disk may be heavily fragmented.
Currently, the operating system does not have an online disk defragmenter for UFS. However, you can perform a defragmentation procedure to take care of heavily fragmented disk as follows:
1. Back up the file system onto tape or another partition.
2. Create a new file system either on the same partition or a different partition.
3. Restore the file system.
Adjust the file system fragment size
You can do this using the newfs command. The fragment size is 1KB by default. The UFS file system block size is fixed at 8KB. A block size/fragment size of 8KB/1KB is usually sufficient.
You can use a larger fragment size if the file system is used for executable files. Note that a large fragment size can waste disk space.
You can use the default fragment size (1KB) if the file system is used for small files or code development. A small fragment size uses disk space more efficiently than a large fragment size.
If you want to increase disk speed and most of the files are greater than two blocks (16KB), make the file system fragment size equal to the block size (8KB/8KB). This results in less overhead for the system, but it requires more space on the disk.
Reduce the density of inodes
If the file system has many large files, reduce the density of inodes by using the newfs -i command.
Change the rotational spacing between blocks written to disk
Set the rotdelay parameter to 0 (zero) to allocate blocks contiguously. This can be done using either the tunefs command or the newfs command. A rotational delay of zero will allocate logically contiguous blocks, which aids UFS block clustering.
Increase the number of blocks that are combined for a read
Use the tunefs command or the newfs command to change the value of maxcontig, which specifies the maximum number of contiguous blocks that will be laid out before forcing a rotational delay (that is, the number of blocks that can be combined into a cluster). The default is 8. This causes the file system to attempt I/O read requests in a size that is defined by the value of maxcontig multiplied by the block size (64KB by default).
Change the value of the maxbpg parameter
Use the tunefs or the newfs command to change the value of maxbpg, which is the maximum number of file blocks that any single file can allocate per cylinder group. Usually, this value is set to about one-quarter of the total blocks in a cylinder group.
The maxbpg parameter is used to prevent a single file from using all of the blocks in a single cylinder group, which could degrade access times for all files subsequently allocated in that cylinder group. By limiting the number of file blocks, large files must perform long seeks more frequently than if they were allowed to allocate all the blocks in a cylinder group before seeking elsewhere.
If your file system contains only large files, you can set the maxbpg parameter higher than the default value. To get the performance benefit on an existing file system, you must lay out the files on the disk again.
Delay write flushing
When a block of data in the UBC is scheduled to be written, it is sent asynchronously to disk. The default operating system behavior prevents the block from being read while the write is in progress. An application that reads a block immediately after it is written could improve its performance if the write was delayed so that the block could be read from memory instead of disk.
You can control when full UBC buffers are flushed to disk by using dbx to modify the value of the delay_wbuffers kernel parameter:
- If disabled (0), full buffers are flushed asynchronously when full. (This is the default.)
- If enabled (1), full buffers are not flushed until the update daemon issues the next sync (every 30 seconds) or until the UBC is full (that is, until the number of modified (dirty) pages in the UBC equals the value associated with the vm-ubcdirtypercent attribute).
Enabling delay_wbuffers is useful when many small files are created or when files are written and immediately reread. Delaying the operation of writing out the data increases the chances of having the data immediately available in memory. Applications that generate a lot of intermediate (temporary) files can often benefit from enabling delay_wbuffers.
Note
Enabling delay_wbuffers causes an increase in the number of dirty (modified) pages in the buffer cache and makes it more likely that data will be lost if the system is shut down abnormally. In addition, enabling delay_wbuffers induces an I/O pattern with more spikes in it because of the inactivity between sync calls. This I/O pattern could negatively affect real workloads (that is, nonbenchmark workloads).
Increase the number of read-ahead clusters
The kernel parameter cluster_max_read_ahead defines the maximum number of read-ahead clusters that the kernel can schedule. The default for cluster_max_read_ahead is 8. You can make the open algorithm faster by setting cluster_read_all to 1 and cluster_consec_init to the value of cluster_max_read_ahead. You can change these global variables with dbx. (See Section 1.5.3.1 for a general description or read and write clustering.)
Increase the number of blocks that are combined for a write
The cluster_maxcontig parameter is the number of blocks that will be combined into a single write. This variable controls all UFS file systems. The default value for cluster_maxcontig is 8 (which is optimal when using the default blocksize). You can change the value with dbx. (See Section 1.5.3.1 for a general description or read and write clustering.)
Reduce the number of open file descriptors
The open-max-soft and open-max-hard attributes in the sysconfigtab file control the maximum number of open file descriptors for each process. When the open-max-soft limit is reached, a warning message is issued, and when the open-max-hard limit is reached, the process is stopped. These attributes prevent runaway allocations, for example, allocations within a loop that cannot be exited because of an error condition.
The open-max-soft and open-max-hard attributes both have the value 4096 as a default. You can modify the values of the attributes by means of the sysconfigdb command.
Increase the upper limit on the maximum number of mounts
Mount structures are dynamically allocated when a mount request is made and subsequently deallocated when an unmount request is made. However, the max-ufs-mounts configuration attribute in the sysconfigtab file has a default value of 1000 as an upper limit on the maximum number of mounts. If there is a need to mount more than 1000 UFS or MFS file systems on a system, this value has to be increased. You can use either of the following methods to change the value:
- You can change it dynamically on a running kernel by using the sysconfig -r command (which puts the value immediately into effect).
- You can change it statically by using the sysconfigdb command to change the attribute value in the sysconfigtab file (which requires a system reboot to place the value into effect).

3.6.1.3 Tuning the Advanced File System

The POLYCENTER Advanced File System (AdvFS) is a file system option available on the Digital UNIX operating system. It provides rapid crash recovery, high performance, and a flexible structure that enables you to manage your file system while the system is on line. Optional AdvFS utilities further enhance file system management capabilities. In particular, the defragment, stripe, and migrate utilities provide online performance tuning. The AdvFS utilities are available as a separately licensed layered product.

Methods for improving AdvFS performance include the following:

Use LSM mirroring
LSM mirroring can improve read performance, but it slows down write performance. See the manual Logical Storage Manager for details.
Set up disks
Enhance AdvFS performance by dedicating an entire disk (usually partition C) to one file domain. This avoids I/O scheduling contention.
Back up and restore data
If you do not have the optional AdvFS utilities, you can defragment your disks by backing up and restoring the filesets:
1. Back up the filesets to tape or another partition using the vdump command.
2. Recreate the domain using the mkfdmn command.
3. Recreate the filesets using the mkfsets command.
4. Restore the files using the vrestore command.
Use fileset quotas
Fileset quotas apply to the fileset, not to individual users or groups. By establishing quotas you can limit the amount of disk storage and number of files consumed by a fileset. This is useful when a file domain contains several filesets. Without fileset quotas, all filesets have access to all disk space in a file domain, allowing one fileset to use all of the disk space in a file domain.
Defragment your file system
You can use the defragment utility to defragment your file system frequently without reducing system availability.
File fragmentation can reduce the read/write performance of the file because it results in more I/O operations to access the file. The defragment utility reduces the amount of file fragmentation in a file domain by attempting to move files and parts of files together so that the number of file extents is reduced.
You do not need to dismount the filesets in a file domain or otherwise take the domain offline in order to run the defragment utility. You can perform all normal I/O operations while the defragment utility is running.
Defragment large files individually
You can use the migrate utility in conjunction with the showfile command to improve file performance by monitoring and altering the way that large files are mapped on the disk. This method of defragmenting files is useful for defragmenting specific files. (Use the defragment utility to defragment all files in a domain.) Use the following procedure as a guideline for this method of improving file performance:
1. Show the status of all files in your working directory by using the showfile command with the asterisk (*) wildcard character.
2. Check the performance percentage of each file. A low percentage (under 80 percent) indicates that the file is fragmented on the disk.
3. Show the extent map of a fragmented file by using the showfile command with the -x option and the file name. The extent map shows whether the entire file or a portion of the file is fragmented.
4. Based on the information provided by the extent map, migrate some or all of the file to the same volume or another volume in the file domain. In a single-volume file domain, the volume must have enough free, contiguous space to migrate the file.
If several files in the file system are fragmented, you can add a new volume to the file domain and remove the volume containing the fragmented files. This action prompts AdvFS to automatically migrate all of the files to the new volume and defragment each file during the process.
Use file striping
You can use the stripe utility to distribute segments of a file across specific disks (or volumes) within a file domain. File striping provides load balancing and a higher transfer rate.
File striping increases contiguous read/write performance by allocating storage in segments across more than one disk or volume without preconfiguring the disks. AdvFS determines the number of pages per stripe segment, and the segments alternate among the disks in a sequential pattern. For instance, the file system allocates the first segment of a three-disk striped file on the first disk; the next segment on the second disk; and the next segment on the third disk. This completes one sequence, or stripe. The next stripe starts on the first disk, and so on.

3.6.1.4 Tuning CAM

The operating system uses the Common Access Method (CAM) as the operating system interface to the hardware. CAM maintains pools of buffers that are used to perform I/O. Each buffer takes approximately 1KB of physical memory. You should monitor these pools and tune them if necessary.

The following attributes can be checked with the dbx debugger and modified in the param.c file or the sysconfigtab file:

cam_ccb_pool_size - The initial size of the buffer pool free list at boot time. The default is 200.
cam_ccd_low_water - The number of buffers in the pool free list at which more buffers are allocated from the kernel. CAM reserves this number of buffers to ensure that the kernel always has enough memory to shutdown runaway processes. The default is 100.
cam_ccb_increment - The number of buffers either added or removed from the buffer pool free list. Buffers are allocated on an as-needed basis to handle immediate demands, but are released in a more measured manner to guard against spikes. The default is 50.

If the I/O pattern associated with your system tends to have intermittent bursts of I/O operations (I/O spikes), increasing the values of the cam_ccb_pool_size and cam_ccb_increment attributes may result in improved performance. See Section 3.3 for information on how to modify param.c parameters or sysconfigtab attributes.

3.6.2 Network Subsystems

Most resources used by the network subsystems are allocated and adjusted dynamically, so tuning is typically not an issue with the network itself. NFS tuning, however, can be critically important because NFS is the heaviest user of the network (see Section 3.6.3).

The one network subsystem resource that may require tuning is the number of network threads configured in your system. If the netstat -m command shows that the number of network threads configured in your system exceeds the peak number of currently active threads, your system may be configured with too many threads, thereby consuming system memory unnecessarily. To adjust the number of threads configured in your system, modify the netisrthreads attribute in the sysconfigtab file. Adjust the number downward to free up system memory.

Network performance is affected only when the supply of resources is unable to keep up with the demand for resources. Two types of conditions can cause this congestion to occur:

A problem with one or more components of the network (hardware or software)
A workload (network traffic) that consistently exceeds the capacity of the available resources even though everything is operating correctly

Neither of these problems are network tuning issues. In the case of a problem on the network, you must isolate the problem and fix it (which may involve tuning some other components of the system). In the case of an overloaded network (for example, when the kernel issues a can't get mbuf message on a regular basis), you must either redesign the network, reduce the number of network applications, or increase the physical memory (RAM). See the Network Programmer's Guide or the manual Network Administration for information on how to resolve network problems.

3.6.3 Network File System

The Network File System (NFS) shares the unified buffer cache with the virtual memory subsystem and local file systems. Much of what is described in Section 3.6.1.2 also applies to NFS. For example, adding more disks on a server and spreading the I/O across spindles can greatly enhance NFS performance.

Most performance problems with NFS can be attributed to bottlenecks in the file system, network, or disk subsystems. For example, NFS performance is severely degraded by lost packets on the network. Packets can be lost as a result of a variety of network problems. Such problems include congestion in the server, corruption of packets during transmission (which can be caused by bad electrical connections, noisy environments, babbling Ethernet interfaces, and other problems), and routers that abandon forwarding attempts too readily.

Apart from adjustments to the file system, network, and disk subsystems, NFS performance can be directly enhanced in the following ways:

Install Prestoserve on the server if it is equipped with a nonvolatile cache (NVRAM).
Prestoserve greatly improves write performance for servers that are using NFS Version 2. An NFS Version 2 server must write a client's write data to stable storage before responding to the client's write request. With Prestoserve, this write data can be stored in the NVRAM. Storing the data in this way is much faster than writing it to disk.
Prestoserve can also help improve write performance for NFS Version 3 servers, but not as much as with NFS Version 2 because NFS Version 3 servers can reliably write data to volatile storage without risking loss of data in the event of failure. NFS Version 3 clients detect server failures and resend write data that the server may have lost in volatile storage.
See the Guide to Prestoserve for details.
Adjust the number of nfsiod and nfsd daemons on client and server systems. These daemons perform the following functions:
- The nfsiod daemons are used on the client to service asynchronous I/O requests such as buffer cache readaheads and delayed writes. NFS servers attempt to gather writes into complete UFS clusters before initiating I/O, and the number of nfsiod daemons (plus 1) is the number of writes that a client will have outstanding at any one time. For good performance, a client should have 7 or 15 nfsiod daemons. (Having exactly 7 or 15 nfsiod daemons produces the most efficient blocking for I/O requests.)
- The nfsd daemons are used on the server to handle NFS requests from client machines. For good performance on heavily used NFS servers, a network should be configured with either 16 or 32 nfsd daemons. (Having exactly 16 or 32 nfsd daemons produces the most efficient blocking for I/O requests.)
To determine whether performance is being degraded by an insufficient number of nfsiod and nfsd daemons, issue the following command:
% ps alxww | grep nfs

This command displays the nfsiod and nfsd daemons that have been established to service client and server requests. If only one or two nfsiod or nfsd daemons are idle, increasing their numbers may improve NFS performance. See the nfsiod(8) and nfsd(8) reference pages for details.
For read-only file systems and slow network links, performance may be improved by changing the cache timeout limits. Note that these timeouts affect how quickly you see updates to a file or directory that has been modified by another host. If you are not sharing files with users on other hosts, including the server system, increasing these values will give you slightly better performance and will reduce the amount of network traffic that you generate. See the mount(8) reference page (acregmin, acregmax, acdirmin, acdirmax, actimeo parameters) for details.
NFS does not perform well when it is used over slow network links, congested networks, or wide area networks. In particular, network timeouts can severely degrade NFS performance. This condition can be identified by using the nfsstat command and determining the ratio of timeouts to calls. If timeouts are more than 1 percent of total calls, NFS performance will be severely degraded. See Section 2.2.12 for sample nfsstat output containing timeout and call statistics.
You can also use the netstat -s command to verify the existence of a timeout problem; a nonzero count for "fragments dropped after timeout" in the "ip" section of the netstat output is a reliable indicator that the problem exists. See Section 2.2.11 for sample netstat output.
If fragment drops are a problem, use the mount command to set the size of the NFS read and write buffers to 1KB. For example:
```
mount -o rsize=1024,wsize=1024 server:/dir /mnt
```

Also, when evaluating NFS performance, be aware that NFS does not perform well if any file-locking mechanisms are in use on an NFS file. The locks prevent the file from being cached on the client.