4 Improving System Performance

You may be able to improve Tru64 UNIX performance by tuning the operating system or performing other tasks. You may need to tune the system under the following circumstances:

You are running a large or specialized configuration that requires you to modify the default values of some subsystem attributes.

You want to optimize performance in a generally well-functioning system.

You want to solve a specific performance problem.

To help you improve system performance, this chapter describes the following:

Steps for configuring and tuning high-performance and high-availability systems (see Section 4.1)

Applying configuration-specific tuning guidelines (Section 4.2)

Running sys_check and applying its configuration and tuning guidelines (Section 4.3)

Identifying and solving some common performance problems (Section 4.4)

Using the advanced tuning guidelines described in this manual (Section 4.5)

4.1 Steps for Configuring and Tuning Systems

Before you configure and tune a system, you must become familiar with the terminology and concepts relating to performance and availability. See Chapter 1 for information.

In addition, you must understand how your applications utilize system resources, because not all configurations and tuning guidelines are appropriate for all types of workloads. For example, you must determine if your applications are memory-intensive or CPU-intensive, or if they perform many disk or network operations. See Section 2.1 for information about identifying a resource model for your configuration.

To help you configure and tune a system that will meet your performance and availability needs, follow these steps:

Ensure that your hardware and software configuration is appropriate for your workload resource model and your performance and availability goals. See Chapter 2.

Make sure that you have adhered to the configuration guidelines for:
- Memory and swap space (Section 2.3.2)
- Disks, LSM, and hardware RAID (Chapter 8)
- AdvFS, UFS, and NFS file systems (Chapter 9)

Perform the following initial tuning tasks:
1. If you have a large-memory system, Internet server, or NFS server, follow the tuning guidelines that are described in Section 4.2.
2. Apply any tuning recommendations described in your application documentation.
3. Make sure that you have sufficient system resources for large applications or for large-memory systems. See Chapter 5 for information about resource tuning.
4. Run sys_check and consider following its configuration and tuning recommendations (see Section 4.3).

Monitor the system and evaluate its performance, identifying any areas in which performance can be improved. Section 3.4 describes the tools that you can use to monitor performance.

If performance is deficient, see Section 4.4 for information about solving common performance problems, and see Section 4.5 for information about using the advanced tuning guidelines.

System tuning usually involves modifying kernel subsystem attributes. See Section 3.6 for information.

4.2 Tuning Special Configurations

Large configurations or configurations that run memory-intensive or network-intensive applications may require special tuning. The following sections provide information about tuning these special configurations:

Internet servers (Section 4.2.1)

Large-memory servers (Section 4.2.2)

NFS servers (Section 4.2.3)

In addition, your application product documentation may include specific configuration and tuning guidelines that you should follow.

4.2.1 Tuning Internet Servers

Internet servers (including Web, proxy, firewall, and gateway servers) run network-intensive applications that usually require significant system resources. If you have an Internet server, it is recommended that you modify the default values of some kernel attributes.

Follow the guidelines in Table 4-1 to help you tune an Internet server.

Table 4-1: Internet Server Tuning Guidelines

Guideline	Reference
Increase the system resources available to processes.	Section 5.1
Increase the available address space.	Section 5.3
Ensure that the Unified Buffer Cache (UBC) has sufficient memory.	Section 9.2.4
Increase the size of the hash table that the kernel uses to look up TCP control blocks.	Section 10.2.1
Increase the number of TCP hash tables.	Section 10.2.2
Increase the limits for partial TCP connections on the socket listen queue.	Section 10.2.3
For proxy servers only, increase the maximum number of concurrent nonreserved, dynamically allocated ports.	Section 10.2.4
Disable use of a path maximum transmission unit (PMTU).	Section 10.2.6
Increase the number of IP input queues.	Section 10.2.7
For proxy servers only, enable `mbuf` cluster compression.	Section 10.2.8

4.2.2 Tuning Large-Memory Systems

Large memory systems often run memory-intensive applications, such as database programs, that usually require significant system resources. If you have a large memory system, it is recommended that you modify the default values of some kernel attributes.

Follow the guidelines in Table 4-2 to help you tune a large-memory system.

Table 4-2: Large-Memory System Tuning Guidelines

Guideline	Reference
Increase the system resources available to processes.	Section 5.1
Increase the size of a System V message and queue.	Section 5.4.1
Increase the maximum size of a single System V shared memory region.	Section 5.4.4
Increase the minimum size of a System V shared memory segment.	Section 5.4.6
Increase the available address space.	Section 5.3
Reduce the size of the AdvFS buffer cache.	Section 6.4.4
Increase the number of AdvFS buffer hash chains, if you are using AdvFS.	Section 9.3.6.2
Increase the memory reserved for AdvFS access structures, if you are using AdvFS.	Section 9.3.6.3
Increase the size of the metadata buffer cache to more than 3 percent of main memory, if you are using UFS.	Section 9.4.3.1
Increase the size of the metadata hash chain table, if you are using UFS.	Section 9.4.3.2

4.2.3 Tuning NFS Servers

NFS servers run only a few small user-level programs, which consume few system resources. File system tuning is important because processing NFS requests consumes the majority of CPU and wall clock time. See Chapter 9 for information on file system tuning.

In addition, if you are running NFS over TCP, tuning TCP may improve performance if there are many active clients. See Section 10.2 for information on network subsystem tuning. If you are running NFS over UDP, network subsystem tuning is not needed.

Follow the guidelines in Table 4-3 to help you tune a system that is only serving NFS.

Table 4-3: NFS Server Tuning Guidelines

Guideline	Reference
Set the value of the `maxusers` attribute to the number of server NFS operations that are expected to occur each second.	Section 5.1
Increase the size of the namei cache.	Section 9.2.1
Increase the memory reserved for AdvFS access structures, if you are using AdvFS.	Section 9.3.6.3
Increase the size of the metadata buffer cache, if you are using UFS.	Section 9.4.3.1

4.3 Checking the Configuration by Using the sys_check Utility

After you apply any configuration-specific tuning guidelines, as described in Section 4.2, run the sys_check utility to check your system configuration.

The sys_check utility creates an HTML file that describes the system configuration, and can be used to diagnose problems. The utility checks kernel attribute settings and memory and CPU resources, provides performance data and lock statistics for SMP systems and for kernel profiles, and outputs any warnings and tuning guidelines.

Consider applying the sys_check utility's configuration and tuning guidelines before applying any advanced tuning guidelines.

Note

You may experience impaired system performance while running the sys_check utility. Invoke the utility during offpeak hours to minimize the performance impact.

You can invoke the sys_check utility from the SysMan graphical user interface or from the command line. If you specify sys_check without any command-line options, it performs a basic system analysis and creates an HTML file with configuration and tuning guidelines. Options that you can specify at the command line include the following:

The -all option provides information about all subsystems, including security information and setld inventory verification.

The -perf option provides only performance data and excludes configuration data.

The -escalate option creates escalation files required for reporting problems to Compaq.

See sys_check(8) for more information.

4.4 Solving Common Performance Problems

The following sections provide examples of some common performance problems and solutions:

Slow application performance (Section 4.4.1)

Insufficient memory or excessive paging (Section 4.4.2)

Insufficient swap space (Section 4.4.3)

Swapped out processes (Section 4.4.4)

Insufficient CPU cycles (Section 4.4.5)

Disk bottleneck (Section 4.4.6)

Poor disk I/O performance (Section 4.4.7)

Poor AdvFS performance (Section 4.4.8)

Poor UFS performance (Section 4.4.9)

Poor NFS performance (Section 4.4.10)

Poor network performance (Section 4.4.11)

Each section describes how to detect the problem, the possible causes of the problem, and how to eliminate or diminish the problem.

4.4.1 Application Completes Slowly

Use the following table to detect a slow application completion time and to diagnose the performance problem:

How to detect	Check application log files. Use the `ps` command to display information about application processing times and whether an application is swapped out. See Section 6.3.2. Use process accounting commands to obtain information about process completion times. See `accton`(8).
Cause	Application is inefficient.
Solution	Rewrite the application so that it runs more efficiently. See Chapter 7. Use profiling and debugging commands to analyze applications and identify inefficient areas of code. See Section 11.1.
Cause	Application is not optimized.
Solution	Optimize the application. See Chapter 7.
Cause	Application is being swapped out.
Solution	Delay swapping processes. See Section 6.5.3. Increase the memory available to processes. See Section 6.4. Reduce an application's use of memory. See Section 11.2.6.
Cause	Application requires more memory resources.
Solution	Increase the memory available to processes. See Section 6.4. Reduce an application's use of memory. See Section 11.2.6.
Cause	Insufficient swap space.
Solution	Increase the swap space and distribute it across multiple disks. See Section 4.4.3.
Cause	Application requires more CPU resources.
Solution	Provide more CPU resources to processes. See Section 4.4.5.
Cause	Disk I/O bottleneck.
Solution	Distribute disk I/O efficiently. See Section 4.4.6.

4.4.2 Insufficient Memory or Excessive Paging

A high rate of paging or a low free page count may indicate that you have inadequate memory for the workload. Avoid paging if you have a large memory system. Use the following table to detect insufficient memory and to diagnose the performance problem:

How to detect

Use the vmstat command to display information about paging and memory consumption. See Section 6.3.1 for more information.

Cause Insufficient memory resources available to processes.

Solution

Reduce an application's use of memory. See Section 11.2.6.

Increase the memory resources that are available to processes. See Section 6.4.

Add physical memory.

4.4.3 Insufficient Swap Space

If you consume all the available swap space, the system will display messages on the console indicating the problem. Use the following table to detect if you have insufficient swap space and to diagnose the performance problem:

How to detect	Invoke the `swapon -s` while you are running a normal workload. See Section 6.3.3.
Cause	Insufficient swap space for your configuration.
Solution	Configure enough swap space for your configuration and workload. See Section 2.3.2.3.
Cause	Swap space not distributed.
Solution	Distribute the swap load across multiple swap devices to improve performance. See Section 6.2.
Cause	Applications are utilizing excessive memory resources.
Solution	Increase the memory available to processes. See Section 6.4. Reduce an application's use of memory. See Section 11.2.6.

4.4.4 Processes Swapped Out

Swapped out (suspended) processes will decrease system response time and application completion time. Avoid swapping if you have a large memory system or large applications. Use the following table to detect if processes are being swapped out and to diagnose the performance problem:

How to detect	Use the `ps` command to determine if your system is swapping processes. See Section 6.3.2.
Cause	Insufficient memory resources.
Solution	Increase the memory available to processes. See Section 6.4. Reduce an application's use of memory. See Section 11.2.6.
Cause	Swapping occurs too early during page reclamation.
Solution	Decrease the rate of swapping. See Section 6.5.3.

4.4.5 Insufficient CPU Cycles

Although a low CPU idle time can indicate that the CPU is being fully utilized, performance can suffer if the system cannot provide a sufficient number of CPU cycles to processes. Use the following table to detect insufficient CPU cycles and to diagnose the performance problem:

How to detect

Use the vmstat command to display information about CPU system, user, and idle times. See Section 6.3.1 for more information.

Use the kdbx cpustat extension to check CPU usage. See Section 7.1.4).

Cause Excessive CPU demand from applications.

Solution

Optimize applications. See Section 11.2.4.

Use hardware RAID to relieve the CPU of disk I/O overhead. See Section 8.5.

Add processors

4.4.6 Disk Bottleneck

Excessive I/O to only one or a few disks may cause a bottleneck at the overutilized disks. Use the following table to detect an uneven distribution of disk I/O and to diagnose the performance problem:

How to detect

Use the iostat command to display which disks are being used the most. See Section 8.2.

Use the swapon -s command to display the utilization of swap disks. See Section 6.3.3.

Use the volstat command to display information about the LSM I/O workload. See Section 8.4.7.2 for more information.

Use the advfsstat to display AdvFS disk usage information. See Section 9.3.5.1.

Cause Disk I/O not evenly distributed.

Solution

Use disk striping. See Section 2.5.2.

Distribute disk, swap, and file system I/O across different disks and, optimally, multiple buses. See Section 8.1.

4.4.7 Poor Disk I/O Performance

Because disk I/O operations are much slower than memory operations, the disk I/O subsystem is often the source of performance problems. Use the following table to detect poor disk I/O performance and to diagnose the performance problem:

How to detect	Monitor the memory allocated to the UBC by using the `dbx ufs_getapage_stats` and `vm_tune` data structures. See Section 6.3.4. Use the `iostat` command to determine if a you have a bottleneck at a disk. See Section 8.2 for more information. Check for disk fragmentation. See Section 9.3.7.1 and Section 9.4.3.7. Check the hit rate of the namei cache with the `dbx nchstats` data structure. See Section 9.1.2. Use the `advfsstat` command to monitor the performance of AdvFS domains and filesets. See Section 9.3.5.1. Check UFS clustering with the `dbx ufs_clusterstats` data structure. See Section 6.3.4. Check the hit rate of the metadata buffer cache by using the `dbx bio_stats` data structure. See Section 9.4.2.3.
Cause	Disk I/O is not efficiently distributed.
Solution	Use disk striping. See Section 2.5.2. Distribute disk, swap, and file system I/O across different disks and, optimally, multiple buses. See Section 8.1.
Cause	File systems are fragmented.
Solution	Defragment file systems. See Section 9.3.7.1 and Section 9.4.3.7.
Cause	Maximum open file limit is too small.
Solution	Increase the maximum number of open files. See Section 5.5.1.
Cause	The namei cache is too small.
Solution	Increase the size of the namei cache. See Section 9.2.1.

4.4.8 Poor AdvFS Performance

Use the following table to detect poor AdvFS performance and to diagnose the performance problem:

How to detect	Use the `advfsstat` command to monitor the performance of AdvFS domains and filesets. See Section 9.3.5.1. Check for disk fragmentation by using the AdvFS `defragment` command with the `-v` and `-n` options. See Section 9.3.7.1.
Cause	Single-volume domains are being used.
Solution	Use multiple-volume file domains. See Section 9.3.4.1.
Cause	File system is fragmented.
Solution	Defragment the file system. See Section 9.3.7.1.
Cause	There are too few AdvFS buffer cache hits.
Solution	Allocate sufficient memory to the AdvFS buffer cache. See Section 9.3.6.1. Increase the number of AdvFS buffer hash chains (Section 9.3.6.2. Increase the dirty data caching threshold. See Section 9.3.6.4. Modify the AdvFS device queue limit. See Section 9.3.6.6.
Cause	The `advfsd` daemon is running unnecessarily.
Solution	Stop the daemon. See Section 7.2.5.

4.4.9 Poor UFS Performance

Use the following table to detect poor UFS performance and to diagnose the performance problem:

How to detect	Monitor the memory allocated to the UBC by using the `dbx ufs_getapage_stats`. See Section 6.3.4. Check the hit rate of the namei cache with the `dbx nchstats` data structure. See Section 9.1.2. Use the `dumpfs` command to display UFS information. See Section 9.4.2.1. Check how effectively the system is clustering and check fragmentation by using the `dbx print` command to examine the `ufs_clusterstats`, `ufs_clusterstats_read`, and `ufs_clusterstats_write` data structures. See Section 9.4.2.2. Check the hit rate of the metadata buffer cache by using the `dbx bio_stats` data structure. See Section 9.4.2.3.
Cause	The UBC is too small.
Solution	Increase the amount of memory allocated to the UBC. See Section 9.2.4.
Cause	The metadata buffer cache is too small.
Solution	Increase the size of metadata buffer cache. See Section 9.4.3.1.
Cause	The file system fragment size is incorrect.
Solution	Make the file system fragment size equal to the block size. See Section 9.4.1.1.
Cause	File system is fragmented.
Solution	Defragment the file system. Section 9.4.3.7.

4.4.10 Poor NFS Performance

Use the following table to detect poor NFS performance and to diagnose the performance problem:

How to detect	Use the `dbx print nfs_sv_active_hist` command to display a histogram of the active NFS server threads. See Section 3.6.7. Use the `dbx print nchstats` command to determine the namei cache hit rate. See Section 9.1.2. Use the `dbx print bio_stats` command to determine the metadata buffer cache hit rate. See Section 9.4.2.3. Use the `nfsstat` command to display the number of NFS requests and other information. See Section 9.5.1.1. Use the `ps axlmp 0 \| grep nfs` command to display the number of idle threads. See Section 9.5.2.3.
Cause	NFS server threads busy.
Solution	Reconfigure the server to run more threads. See Section 9.5.2.2.
Cause	Memory resources are not focused on file system caching.
Solution	Increase the amount of memory allocated to the UBC. See Section 9.2.4. If you are using AdvFS, increase the memory allocated for AdvFS buffer caching. See Section 9.3.6.1. If you are using AdvFS, increase the memory reserved for AdvFS access structures. See Section 9.3.6.3 for information.
Cause	System resource allocation is not adequate.
Solution	Set the value of the `maxusers` attribute to the number of server NFS operations that are expected to occur each second. See Section 5.1 for information.
Cause	UFS metadata buffer cache hit rate is low.
Solution	Increase the size of the metadata buffer cache. See Section 9.4.3.1. Increase the size of the namei cache. See Section 9.2.1.
Cause	CPU idle time is low.
Solution	Use UFS, instead of AdvFS. See Section 9.4.

4.4.11 Poor Network Performance

Use the following table to detect poor network performance and to diagnose the performance problem:

How to detect	Use the `netstat` command to display information about network collisions and dropped network connections. See Section 10.1.1. Check the socket listen queue statistics to check the number of pending requests and the number of times the system dropped a received SYN packet. See Section 10.1.2.
Cause	The TCP hash table is too small.
Solution	Increase the size of the hash table that the kernel uses to look up TCP control blocks. See Section 10.2.1.
Cause	The limit for the socket listen queue is too low.
Solution	Increase the limit for partial TCP connections on the socket listen queue. See Section 10.2.3.
Cause	There are too few outgoing network ports.
Solution	Increase the maximum number of concurrent nonreserved, dynamically allocated ports. See Section 10.2.4.
Cause	Network connections are becoming inactive too quickly.
Solution	Enable TCP keepalive functionality. See Section 10.2.9.

4.5 Using the Advanced Tuning Guidelines

If system performance is still deficient after applying the initial tuning recommendations (Section 4.1) and considering the solutions to common performance problems (Section 4.4), you may be able to improve performance by using the advanced tuning guidelines. Advanced tuning requires an in-depth knowledge of Tru64 UNIX and the applications running on the system, and should be performed by an experienced system administrator.

Before using the advanced tuning guidelines, you must:

Understand your workload resource model, because not all tuning guidelines are appropriate for all configurations (see Section 2.1)

Gather performance information to identify an area in which to focus your efforts (see Section 3.4 for information on using commands to obtain a basic understanding of system performance)

Use the advanced tuning guidelines shown in Table 4-4 to help you tune your system. Before implementing any tuning guideline, you must ensure that it is appropriate for your configuration and workload and also consider its benefits and tradeoffs.

Table 4-4: Advanced Tuning Guidelines

If your workload consists of:	You can improve performance by:
Applications requiring extensive system resources	Increasing resource limits (Chapter 5)
Memory-intensive applications	Increasing the memory available to processes (Section 6.4) Modifying paging and swapping operations (Section 6.5) Reserving shared memory (Section 6.6)
CPU-intensive applications	Freeing CPU resources (Section 7.2)
Disk I/O-intensive applications	Distributing the disk I/O load (Section 8.1)
File system-intensive applications	Modifying AdvFS, UFS, or NFS operation (Chapter 9)
Network-intensive applications	Modifying network operation (Section 10.2)
Nonoptimized or poorly written applications applications	Optimizing or rewriting the applications (Chapter 11)