4    Improving System Performance

You may be able to improve Tru64 UNIX performance by tuning the operating system or performing other tasks. You may need to tune the system under the following circumstances:

To help you improve system performance, this chapter describes the following:

4.1    Steps for Configuring and Tuning Systems

Before you configure and tune a system, you must become familiar with the terminology and concepts relating to performance and availability. See Chapter 1 for information.

In addition, you must understand how your applications utilize system resources, because not all configurations and tuning guidelines are appropriate for all types of workloads. For example, you must determine if your applications are memory-intensive or CPU-intensive, or if they perform many disk or network operations. See Section 2.1 for information about identifying a resource model for your configuration.

To help you configure and tune a system that will meet your performance and availability needs, follow these steps:

  1. Ensure that your hardware and software configuration is appropriate for your workload resource model and your performance and availability goals. See Chapter 2.

  2. Make sure that you have adhered to the configuration guidelines for:

  3. Perform the following initial tuning tasks:

    1. If you have a large-memory system, Internet server, or NFS server, follow the tuning guidelines that are described in Section 4.2.

    2. Apply any tuning recommendations described in your application documentation.

    3. Make sure that you have sufficient system resources for large applications or for large-memory systems. See Chapter 5 for information about resource tuning.

    4. Run sys_check and consider following its configuration and tuning recommendations (see Section 4.3).

  4. Monitor the system and evaluate its performance, identifying any areas in which performance can be improved. Section 3.4 describes the tools that you can use to monitor performance.

  5. If performance is deficient, see Section 4.4 for information about solving common performance problems, and see Section 4.5 for information about using the advanced tuning guidelines.

System tuning usually involves modifying kernel subsystem attributes. See Section 3.6 for information.

4.2    Tuning Special Configurations

Large configurations or configurations that run memory-intensive or network-intensive applications may require special tuning. The following sections provide information about tuning these special configurations:

In addition, your application product documentation may include specific configuration and tuning guidelines that you should follow.

4.2.1    Tuning Internet Servers

Internet servers (including Web, proxy, firewall, and gateway servers) run network-intensive applications that usually require significant system resources. If you have an Internet server, it is recommended that you modify the default values of some kernel attributes.

Follow the guidelines in Table 4-1 to help you tune an Internet server.

Table 4-1:  Internet Server Tuning Guidelines

Guideline Reference
Increase the system resources available to processes. Section 5.1
Increase the available address space. Section 5.3
Ensure that the Unified Buffer Cache (UBC) has sufficient memory. Section 9.2.4
Increase the size of the hash table that the kernel uses to look up TCP control blocks. Section 10.2.1
Increase the number of TCP hash tables. Section 10.2.2
Increase the limits for partial TCP connections on the socket listen queue. Section 10.2.3
For proxy servers only, increase the maximum number of concurrent nonreserved, dynamically allocated ports. Section 10.2.4
Disable use of a path maximum transmission unit (PMTU). Section 10.2.6
Increase the number of IP input queues. Section 10.2.7
For proxy servers only, enable mbuf cluster compression. Section 10.2.8

4.2.2    Tuning Large-Memory Systems

Large memory systems often run memory-intensive applications, such as database programs, that usually require significant system resources. If you have a large memory system, it is recommended that you modify the default values of some kernel attributes.

Follow the guidelines in Table 4-2 to help you tune a large-memory system.

Table 4-2:  Large-Memory System Tuning Guidelines

Guideline Reference
Increase the system resources available to processes. Section 5.1
Increase the size of a System V message and queue. Section 5.4.1
Increase the maximum size of a single System V shared memory region. Section 5.4.4
Increase the minimum size of a System V shared memory segment. Section 5.4.6
Increase the available address space. Section 5.3
Reduce the size of the AdvFS buffer cache. Section 6.4.4
Increase the number of AdvFS buffer hash chains, if you are using AdvFS. Section 9.3.6.2
Increase the memory reserved for AdvFS access structures, if you are using AdvFS. Section 9.3.6.3
Increase the size of the metadata buffer cache to more than 3 percent of main memory, if you are using UFS. Section 9.4.3.1
Increase the size of the metadata hash chain table, if you are using UFS. Section 9.4.3.2

4.2.3    Tuning NFS Servers

NFS servers run only a few small user-level programs, which consume few system resources. File system tuning is important because processing NFS requests consumes the majority of CPU and wall clock time. See Chapter 9 for information on file system tuning.

In addition, if you are running NFS over TCP, tuning TCP may improve performance if there are many active clients. See Section 10.2 for information on network subsystem tuning. If you are running NFS over UDP, network subsystem tuning is not needed.

Follow the guidelines in Table 4-3 to help you tune a system that is only serving NFS.

Table 4-3:  NFS Server Tuning Guidelines

Guideline Reference
Set the value of the maxusers attribute to the number of server NFS operations that are expected to occur each second. Section 5.1
Increase the size of the namei cache. Section 9.2.1
Increase the memory reserved for AdvFS access structures, if you are using AdvFS. Section 9.3.6.3
Increase the size of the metadata buffer cache, if you are using UFS. Section 9.4.3.1

4.3    Checking the Configuration by Using the sys_check Utility

After you apply any configuration-specific tuning guidelines, as described in Section 4.2, run the sys_check utility to check your system configuration.

The sys_check utility creates an HTML file that describes the system configuration, and can be used to diagnose problems. The utility checks kernel attribute settings and memory and CPU resources, provides performance data and lock statistics for SMP systems and for kernel profiles, and outputs any warnings and tuning guidelines.

Consider applying the sys_check utility's configuration and tuning guidelines before applying any advanced tuning guidelines.

Note

You may experience impaired system performance while running the sys_check utility. Invoke the utility during offpeak hours to minimize the performance impact.

You can invoke the sys_check utility from the SysMan graphical user interface or from the command line. If you specify sys_check without any command-line options, it performs a basic system analysis and creates an HTML file with configuration and tuning guidelines. Options that you can specify at the command line include the following:

See sys_check(8) for more information.

4.4    Solving Common Performance Problems

The following sections provide examples of some common performance problems and solutions:

Each section describes how to detect the problem, the possible causes of the problem, and how to eliminate or diminish the problem.

4.4.1    Application Completes Slowly

Use the following table to detect a slow application completion time and to diagnose the performance problem:

How to detect

Check application log files.

Use the ps command to display information about application processing times and whether an application is swapped out. See Section 6.3.2.

Use process accounting commands to obtain information about process completion times. See accton(8).

Cause Application is inefficient.
Solution Rewrite the application so that it runs more efficiently. See Chapter 7. Use profiling and debugging commands to analyze applications and identify inefficient areas of code. See Section 11.1.
Cause Application is not optimized.
Solution Optimize the application. See Chapter 7.
Cause Application is being swapped out.
Solution

Delay swapping processes. See Section 6.5.3.

Increase the memory available to processes. See Section 6.4.

Reduce an application's use of memory. See Section 11.2.6.

Cause Application requires more memory resources.
Solution

Increase the memory available to processes. See Section 6.4.

Reduce an application's use of memory. See Section 11.2.6.

Cause Insufficient swap space.
Solution Increase the swap space and distribute it across multiple disks. See Section 4.4.3.
Cause Application requires more CPU resources.
Solution Provide more CPU resources to processes. See Section 4.4.5.
Cause Disk I/O bottleneck.
Solution Distribute disk I/O efficiently. See Section 4.4.6.

4.4.2    Insufficient Memory or Excessive Paging

A high rate of paging or a low free page count may indicate that you have inadequate memory for the workload. Avoid paging if you have a large memory system. Use the following table to detect insufficient memory and to diagnose the performance problem:

How to detect

Use the vmstat command to display information about paging and memory consumption. See Section 6.3.1 for more information.

Cause Insufficient memory resources available to processes.
Solution

Reduce an application's use of memory. See Section 11.2.6.

Increase the memory resources that are available to processes. See Section 6.4.

Add physical memory.

4.4.3    Insufficient Swap Space

If you consume all the available swap space, the system will display messages on the console indicating the problem. Use the following table to detect if you have insufficient swap space and to diagnose the performance problem:

How to detect

Invoke the swapon -s while you are running a normal workload. See Section 6.3.3.
Cause Insufficient swap space for your configuration.
Solution Configure enough swap space for your configuration and workload. See Section 2.3.2.3.
Cause Swap space not distributed.
Solution Distribute the swap load across multiple swap devices to improve performance. See Section 6.2.
Cause Applications are utilizing excessive memory resources.
Solution

Increase the memory available to processes. See Section 6.4.

Reduce an application's use of memory. See Section 11.2.6.

4.4.4    Processes Swapped Out

Swapped out (suspended) processes will decrease system response time and application completion time. Avoid swapping if you have a large memory system or large applications. Use the following table to detect if processes are being swapped out and to diagnose the performance problem:

How to detect

Use the ps command to determine if your system is swapping processes. See Section 6.3.2.

Cause Insufficient memory resources.
Solution

Increase the memory available to processes. See Section 6.4.

Reduce an application's use of memory. See Section 11.2.6.

Cause Swapping occurs too early during page reclamation.
Solution Decrease the rate of swapping. See Section 6.5.3.

4.4.5    Insufficient CPU Cycles

Although a low CPU idle time can indicate that the CPU is being fully utilized, performance can suffer if the system cannot provide a sufficient number of CPU cycles to processes. Use the following table to detect insufficient CPU cycles and to diagnose the performance problem:

How to detect

Use the vmstat command to display information about CPU system, user, and idle times. See Section 6.3.1 for more information.

Use the kdbx cpustat extension to check CPU usage. See Section 7.1.4).

Cause Excessive CPU demand from applications.
Solution

Optimize applications. See Section 11.2.4.

Use hardware RAID to relieve the CPU of disk I/O overhead. See Section 8.5.

Add processors

4.4.6    Disk Bottleneck

Excessive I/O to only one or a few disks may cause a bottleneck at the overutilized disks. Use the following table to detect an uneven distribution of disk I/O and to diagnose the performance problem:

How to detect

Use the iostat command to display which disks are being used the most. See Section 8.2.

Use the swapon -s command to display the utilization of swap disks. See Section 6.3.3.

Use the volstat command to display information about the LSM I/O workload. See Section 8.4.7.2 for more information.

Use the advfsstat to display AdvFS disk usage information. See Section 9.3.5.1.

Cause Disk I/O not evenly distributed.
Solution

Use disk striping. See Section 2.5.2.

Distribute disk, swap, and file system I/O across different disks and, optimally, multiple buses. See Section 8.1.

4.4.7    Poor Disk I/O Performance

Because disk I/O operations are much slower than memory operations, the disk I/O subsystem is often the source of performance problems. Use the following table to detect poor disk I/O performance and to diagnose the performance problem:

How to detect

Monitor the memory allocated to the UBC by using the dbx ufs_getapage_stats and vm_tune data structures. See Section 6.3.4.

Use the iostat command to determine if a you have a bottleneck at a disk. See Section 8.2 for more information.

Check for disk fragmentation. See Section 9.3.7.1 and Section 9.4.3.7.

Check the hit rate of the namei cache with the dbx nchstats data structure. See Section 9.1.2.

Use the advfsstat command to monitor the performance of AdvFS domains and filesets. See Section 9.3.5.1.

Check UFS clustering with the dbx ufs_clusterstats data structure. See Section 6.3.4.

Check the hit rate of the metadata buffer cache by using the dbx bio_stats data structure. See Section 9.4.2.3.

Cause Disk I/O is not efficiently distributed.
Solution

Use disk striping. See Section 2.5.2.

Distribute disk, swap, and file system I/O across different disks and, optimally, multiple buses. See Section 8.1.

Cause File systems are fragmented.
Solution Defragment file systems. See Section 9.3.7.1 and Section 9.4.3.7.
Cause Maximum open file limit is too small.
Solution Increase the maximum number of open files. See Section 5.5.1.
Cause The namei cache is too small.
Solution Increase the size of the namei cache. See Section 9.2.1.

4.4.8    Poor AdvFS Performance

Use the following table to detect poor AdvFS performance and to diagnose the performance problem:

How to detect

Use the advfsstat command to monitor the performance of AdvFS domains and filesets. See Section 9.3.5.1.

Check for disk fragmentation by using the AdvFS defragment command with the -v and -n options. See Section 9.3.7.1.

Cause Single-volume domains are being used.
Solution Use multiple-volume file domains. See Section 9.3.4.1.
Cause File system is fragmented.
Solution Defragment the file system. See Section 9.3.7.1.
Cause There are too few AdvFS buffer cache hits.
Solution

Allocate sufficient memory to the AdvFS buffer cache. See Section 9.3.6.1.

Increase the number of AdvFS buffer hash chains (Section 9.3.6.2.

Increase the dirty data caching threshold. See Section 9.3.6.4.

Modify the AdvFS device queue limit. See Section 9.3.6.6.

Cause The advfsd daemon is running unnecessarily.
Solution Stop the daemon. See Section 7.2.5.

4.4.9    Poor UFS Performance

Use the following table to detect poor UFS performance and to diagnose the performance problem:

How to detect

Monitor the memory allocated to the UBC by using the dbx ufs_getapage_stats. See Section 6.3.4.

Check the hit rate of the namei cache with the dbx nchstats data structure. See Section 9.1.2.

Use the dumpfs command to display UFS information. See Section 9.4.2.1.

Check how effectively the system is clustering and check fragmentation by using the dbx print command to examine the ufs_clusterstats, ufs_clusterstats_read, and ufs_clusterstats_write data structures. See Section 9.4.2.2.

Check the hit rate of the metadata buffer cache by using the dbx bio_stats data structure. See Section 9.4.2.3.

Cause The UBC is too small.
Solution Increase the amount of memory allocated to the UBC. See Section 9.2.4.
Cause The metadata buffer cache is too small.
Solution Increase the size of metadata buffer cache. See Section 9.4.3.1.
Cause The file system fragment size is incorrect.
Solution Make the file system fragment size equal to the block size. See Section 9.4.1.1.
Cause File system is fragmented.
Solution Defragment the file system. Section 9.4.3.7.

4.4.10    Poor NFS Performance

Use the following table to detect poor NFS performance and to diagnose the performance problem:

How to detect

Use the dbx print nfs_sv_active_hist command to display a histogram of the active NFS server threads. See Section 3.6.7.

Use the dbx print nchstats command to determine the namei cache hit rate. See Section 9.1.2.

Use the dbx print bio_stats command to determine the metadata buffer cache hit rate. See Section 9.4.2.3.

Use the nfsstat command to display the number of NFS requests and other information. See Section 9.5.1.1.

Use the ps axlmp 0 | grep nfs command to display the number of idle threads. See Section 9.5.2.3.

Cause NFS server threads busy.
Solution Reconfigure the server to run more threads. See Section 9.5.2.2.
Cause Memory resources are not focused on file system caching.
Solution

Increase the amount of memory allocated to the UBC. See Section 9.2.4.

If you are using AdvFS, increase the memory allocated for AdvFS buffer caching. See Section 9.3.6.1.

If you are using AdvFS, increase the memory reserved for AdvFS access structures. See Section 9.3.6.3 for information.

Cause System resource allocation is not adequate.
Solution

Set the value of the maxusers attribute to the number of server NFS operations that are expected to occur each second. See Section 5.1 for information.

Cause UFS metadata buffer cache hit rate is low.
Solution

Increase the size of the metadata buffer cache. See Section 9.4.3.1.

Increase the size of the namei cache. See Section 9.2.1.

Cause CPU idle time is low.
Solution Use UFS, instead of AdvFS. See Section 9.4.

4.4.11    Poor Network Performance

Use the following table to detect poor network performance and to diagnose the performance problem:

How to detect

Use the netstat command to display information about network collisions and dropped network connections. See Section 10.1.1.

Check the socket listen queue statistics to check the number of pending requests and the number of times the system dropped a received SYN packet. See Section 10.1.2.

Cause The TCP hash table is too small.
Solution Increase the size of the hash table that the kernel uses to look up TCP control blocks. See Section 10.2.1.
Cause The limit for the socket listen queue is too low.
Solution Increase the limit for partial TCP connections on the socket listen queue. See Section 10.2.3.
Cause There are too few outgoing network ports.
Solution Increase the maximum number of concurrent nonreserved, dynamically allocated ports. See Section 10.2.4.
Cause Network connections are becoming inactive too quickly.
Solution Enable TCP keepalive functionality. See Section 10.2.9.

4.5    Using the Advanced Tuning Guidelines

If system performance is still deficient after applying the initial tuning recommendations (Section 4.1) and considering the solutions to common performance problems (Section 4.4), you may be able to improve performance by using the advanced tuning guidelines. Advanced tuning requires an in-depth knowledge of Tru64 UNIX and the applications running on the system, and should be performed by an experienced system administrator.

Before using the advanced tuning guidelines, you must:

Use the advanced tuning guidelines shown in Table 4-4 to help you tune your system. Before implementing any tuning guideline, you must ensure that it is appropriate for your configuration and workload and also consider its benefits and tradeoffs.

Table 4-4:  Advanced Tuning Guidelines

If your workload consists of: You can improve performance by:
Applications requiring extensive system resources Increasing resource limits (Chapter 5)
Memory-intensive applications

Increasing the memory available to processes (Section 6.4)

Modifying paging and swapping operations (Section 6.5)

Reserving shared memory (Section 6.6)

CPU-intensive applications Freeing CPU resources (Section 7.2)
Disk I/O-intensive applications Distributing the disk I/O load (Section 8.1)
File system-intensive applications Modifying AdvFS, UFS, or NFS operation (Chapter 9)
Network-intensive applications Modifying network operation (Section 10.2)
Nonoptimized or poorly written applications applications Optimizing or rewriting the applications (Chapter 11)