7 Managing CPU Performance

You may be able to improve performance by optimizing CPU resources. This chapter describes how to perform the following tasks:

Obtain information about CPU performance (Section 7.1)

Improve CPU performance (Section 7.2)

7.1 Gathering CPU Performance Information

Table 7-1 describes the tools you can use to gather information about CPU usage.

Table 7-1: CPU Monitoring Tools

Name	Use	Description
`sys_check`	Analyzes system configuration and displays statistics (Section 4.3)	Creates an HTML file that describes the system configuration, and can be used to diagnose problems. This utility checks kernel variable settings and memory and CPU resources, and provides performance data and lock statistics for SMP systems and kernel profiles. The `sys_check` utility performs a basic analysis of your configuration and kernel variable settings, and provides warnings and tuning guidelines if necessary. See `sys_check`(8) for more information.
`ps`	Displays CPU and virtual memory usage by processes (Section 6.3.2 and Section 7.1.1)	Displays current statistics for running processes, including CPU usage, the processor and processor set, and the scheduling priority. The `ps` command also displays virtual memory statistics for a process, including the number of page faults, page reclamations, and page ins; the percentage of real memory (resident set) usage; the resident set size; and the virtual address size.
Process Tuner	Displays CPU and virtual memory usage by processes	Displays current statistics for running processes. Invoke the Process Tuner graphical user interface (GUI) from the CDE Application Manager to display a list of processes and their characteristics, display the processes running for yourself or all users, display and modify process priorities, or send a signal to a process. While monitoring processes, you can select parameters to view (percent of CPU usage, virtual memory size, state, and `nice` priority) and also sort the view.
`vmstat`	Displays virtual memory and CPU usage statistics (Section 7.1.2)	Displays information about process threads, virtual memory usage (page lists, page faults, page ins, and page outs), interrupts, and CPU usage (percentages of user, system and idle times). First reported are the statistics since boot time; subsequent reports are the statistics since a specified interval of time.
`monitor`	Collects performance data	Collects a variety of performance data on a running system and either displays the information in a graphical format or saves it to a binary file. The `monitor` command is available on the Tru64 UNIX Freeware CD-ROM. See `ftp://gatekeeper.dec.com/pub/DEC` for information.
`top`	Provides continuous reports on the system	Provides continuous reports on the state of the system, including a list of the processes using the most CPU resources. The `top` command is available on the Tru64 UNIX Freeware CD-ROM. See `ftp://eecs.nwu.edu/pub/top` for information.
`ipcs`	Displays IPC statistics	Displays interprocess communication (IPC) statistics for currently active message queues, shared-memory segments, semaphores, remote queues, and local queue headers. The information provided in the following fields reported by the `ipcs` `-a` command can be especially useful: `QNUM`, `CBYTES`, `QBYTES`, `SEGSZ`, and `NSEMS`. See `ipcs`(1) for more information.
`uptime`	Displays the system load average (Section 7.1.3)	Displays the number of jobs in the run queue for the last 5 seconds, the last 30 seconds, and the last 60 seconds. The `uptime` command also shows the number of users logged into the system and how long a system has been running.
`w`	Reports system load averages and user information	Displays the current time, the amount of time since the system was last started, the users logged in to the system, and the number of jobs in the run queue for the last 5 seconds, 30 seconds, and 60 seconds. The `w` command also displays information about system users, including login and process information. See `w`(1) for more information.
`xload`	Monitors the system load average	Displays the system load average in a histogram that is periodically updated. See `xload`(1X) for more information.
`(kdbx) cpustat`	Reports CPU statistics (Section 7.1.4)	Displays CPU statistics, including the percentages of time the CPU spends in various states.
`(kdbx) lockstats`	Reports lock statistics (Section 7.1.5)	Displays lock statistics for each lock class on each CPU in the system.

The following sections describe some of these commands in detail.

7.1.1 Monitoring CPU Usage by Using the ps Command

The ps command displays a snapshot of the current status of the system processes. You can use it to determine the current running processes (including users), their state, and how they utilize system memory. The command lists processes in order of decreasing CPU usage so you can identify which processes are using the most CPU time.

See Section 6.3.2 for detailed information about using the ps command to diagnose CPU performance problems.

7.1.2 Monitoring CPU Statistics by Using the vmstat Command

The vmstat command shows the virtual memory, process, and CPU statistics for a specified time interval. The first line of output displays statistics since reboot time; each subsequent line displays statistics since the specified time interval.

An example of the vmstat command is as follows; output is provided in one-second intervals:

# /usr/ucb/vmstat 1
Virtual Memory Statistics: (pagesize = 8192)
procs        memory            pages                       intr        cpu
r  w  u  act  free wire  fault cow zero react pin pout   in  sy  cs  us sy  id
2 66 25  6417 3497 1570  155K  38K  50K    0  46K    0    4 290 165   0  2  98
4 65 24  6421 3493 1570   120    9   81    0    8    0  585 865 335  37 16  48
2 66 25  6421 3493 1570    69    0   69    0    0    0  570 968 368   8 22  69
4 65 24  6421 3493 1570    69    0   69    0    0    0  554 768 370   2 14  84
4 65 24  6421 3493 1570    69    0   69    0    0    0  865  1K 404   4 20  76

The following fields are particularly important for CPU monitoring:

Process information (procs):
- r -- Number of threads that are running or can run.
- w -- Number of threads that are waiting interruptibly (waiting for an event or a resource, but can be interrupted or suspended). For example, the thread can accept user signals or be swapped out of memory.
- u -- Number of threads that are waiting uninterruptibly (waiting for an event or a resource, but cannot be interrupted or suspended). For example, the thread cannot accept user signals; it must come out of the wait state to take a signal. Processes that are waiting uninterruptibly cannot be stopped by the kill command.

CPU usage information (cpu):
- us -- Percentage of user time for normal and priority processes. User time includes the time the CPU spent executing library routines.
- sy -- Percentage of system time. System time includes the time the CPU spent executing system calls.
- id -- Percentage of idle time.

See Section 6.3.1 for detailed information about the using the vmstat command to diagnose performance problems.

To use the vmstat command to diagnose a CPU performance problem, check the user (us), system (sy), and idle (id) time split. You must understand how your applications use the system to determine the appropriate values for these times. The goal is to keep the CPU as productive as possible. Idle CPU cycles occur when no runnable processes exist or when the CPU is waiting to complete an I/O or memory request.

The following list describes how to interpret the values for user, system, and idle time:

System time (sy)--A high percentage of system time may indicate a system bottleneck, which can be caused by excessive system calls, device interrupts, context switches, soft page faults, lock contention, or cache missing.
A high percentage of system time and a low percentage of idle time may indicate that something in the application load is stimulating the system with high overhead operations. Such overhead operations could consist of high system call frequencies, high interrupt rates, large numbers of small I/O transfers, or large numbers of IPCs or network transfers.
A high percentage of system time and low percentage of idle time may also be caused by failing hardware. Use the uerf command to check your hardware.
A high percentage of system time may also indicate that the system is thrashing; that is, the amount of memory available to the virtual memory subsystem has gotten so low that the system is spending all its time paging and swapping in an attempt to regain memory. A system that spends more than 50 percent of its time in system mode and idle mode may not have enough memory resources. See Section 6.4 for information about increasing memory resources.

Idle time (id)--A high percentage of idle time on one or more processors indicates either:
- Threads are blocked because the CPU is waiting for some event or resource (for example, memory or I/O)
- Threads are idle because the CPU is not busy
If you have a high idle time and poor response time, and you are sure that your system has a typical load, one or more of the following problems may exist:
- The hardware may have reached its capacity
- A kernel data structure is exhausted
- You may have a memory, disk I/O, or network bottleneck
If the idle time percentage is very low but performance is acceptable, your system is utilizing its CPU resources efficiently.

User time (us)--A high percentage of user time can be a characteristic of a well-performing system. However, if the system has poor performance, a high percentage of user time may indicate a user code bottleneck, which can be caused by inefficient user code, insufficient CPU processing power, or excessive memory latency or cache missing. See Section 7.2 for information on optimizing CPU resources.
Use profiling to determine which sections of code consume the most processing time. See Section 11.1 and the Programmer's Guide for more information on profiling.
A high percentage of user time and a low percentage of idle time may indicate that your application code is consuming most of the CPU. You can optimize the application, or you may need a more powerful processor. See Section 7.2 for information on optimizing CPU resources.

7.1.3 Monitoring the Load Average by Using the uptime Command

The uptime command shows how long a system has been running and the load average. The load average counts the jobs that are waiting for disk I/O, and applications whose priorities have been changed with either the nice or the renice command. The load average numbers give the average number of jobs in the run queue for the last 5 seconds, the last 30 seconds, and the last 60 seconds.

An example of the uptime command is as follows:

# /usr/ucb/uptime
1:48pm  up 7 days,  1:07,  35 users,  load average: 7.12, 10.33, 10.31

The command output displays the current time, the amount of time since the system was last started, the number of users logged into the system, and the load averages for the last 5 seconds, the last 30 seconds, and the last 60 seconds.

From the command output, you can determine whether the load is increasing or decreasing. An acceptable load average depends on your type of system and how it is being used. In general, for a large system, a load of 10 is high, and a load of 3 is low. Workstations should have a load of 1 or 2.

If the load is high, look at what processes are running with the ps command. You may want to run some applications during offpeak hours. See Section 6.3.2 for information about the ps command.

You can also lower the priority of applications with the nice or renice command to conserve CPU cycles. See nice(1) and renice(8) for more information.

7.1.4 Checking CPU Usage by Using the kdbx Debugger

The kdbx debugger cpustat extension displays CPU statistics, including the percentages of time the CPU spends in the following states:

Running user-level code

Running system-level code

Running at a priority set with the nice function

Idle

Waiting (idle with input or output pending)

The cpustat extension to the kdbx debugger can help application developers determine how effectively they are achieving parallelism across the system.

By default, the kdbx cpustat extension displays statistics for all CPUs in the system. For example:

# /usr/bin/kdbx -k /vmunix /dev/mem 
(kdbx)cpustat
 Cpu   User (%)    Nice (%) System (%)  Idle (%)   Wait (%)
===== ========== ========== ========== ========== ==========
    0       0.23       0.00       0.08      99.64       0.05
    1       0.21       0.00       0.06      99.68       0.05

See the Kernel Debugging manual and kdbx(8) for more information.

7.1.5 Checking Lock Usage by Using the kdbx Debugger

The kdbx debugger lockstats extension displays lock statistics for each lock class on each CPU in the system, including the following information:

Address of the structure

Class of the lock for which lock statistics are being recorded

CPU for which the lock statistics are being recorded

Number of instances of the lock

Number of times that processes have tried to get the lock

Number of times that processes have tried to get the lock and missed

Percentage of time that processes miss the lock

Total time that processes have spent waiting for the lock

Maximum amount of time that a single process has waited for the lock

Minimum amount of time that a single process has waited for the lock

For example:

# /usr/bin/kdbx -k /vmunix /dev/mem 
(kdbx)lockstats

See the Kernel Debugging manual and kdbx(8) for more information.

7.2 Improving CPU Performance

A system must be able to efficiently allocate the available CPU cycles among competing processes to meet the performance needs of users and applications. You may be able to improve performance by optimizing CPU usage.

Table 7-2 describes the guidelines for improving CPU performance.

Table 7-2: Primary CPU Performance Improvement Guidelines

Guideline	Performance Benefit	Tradeoff
Add processors (Section 7.2.1)	Increases CPU resources	Applicable only for multiproccessing systems, and may affect virtual memory performance
Use the Class Scheduler (Section 7.2.2)	Allocates CPU resources to critical applications	None
Prioritize jobs (Section 7.2.3)	Ensures that important applications have the highest priority	None
Schedule jobs at offpeak hours (Section 7.2.4)	Distributes the system load	None
Stop the `advfsd` daemon (Section 7.2.5)	Decreases demand for CPU power	Applicable only if you are not using the AdvFS graphical user interface
Use hardware RAID (Section 7.2.6)	Relieves the CPU of disk I/O overhead and provides disk I/O performance improvements	Increases costs

The following sections describe how to optimize your CPU resources. If optimizing CPU resources does not solve the performance problem, you may have to upgrade your CPU to a faster processor.

7.2.1 Adding Processors

Multiprocessing systems allow you to expand the computing power of a system by adding processors. Workloads that benefit most from multiprocessing have multiple processes or multiple threads of execution that can run concurrently, such as database management system (DBMS) servers, Internet servers, mail servers, and compute servers.

You may be able to improve the performance of a multiprocessing system that has only a small percentage of idle time by adding processors. See Section 7.1.2 for information about checking idle time.

Before you add processors, you must ensure that a performance problem is not caused by the virtual memory or I/O subsystems. For example, increasing the number of processors will not improve performance in a system that lacks sufficient memory resources.

In addition, increasing the number of processors may increase the demands on your I/O and memory subsystems and could cause bottlenecks.

If you add processors and your system is metadata-intensive (that is, it opens large numbers of small files and accesses them repeatedly), you can improve the performance of synchronous write operations by using Prestoserve (see Section 2.4.8), or by using a RAID controller with a write-back cache (see Section 8.5).

7.2.2 Using the Class Scheduler

Use the Class Scheduler to allocate a percentage of CPU time to specific tasks or applications. This allows you to reserve CPU time for important processes, while limiting CPU usage by less critical processes.

To use class scheduling, group together processes into classes and assign each class a percentage of CPU time. You can also manually assign a class to any process.

The Class Scheduler allows you to display statistics on the actual CPU usage for each class.

See the System Administration manual and class_scheduling(4), class_admin(8), runclass(1), and classcntl(2) for more information about the Class Scheduler.

7.2.3 Prioritizing Jobs

You can prioritize jobs so that important applications are run first. Use the nice command to specify the priority for a command. Use the renice command to change the priority of a running process.

See nice(1) and renice(8) for more information.

7.2.4 Scheduling Jobs at Offpeak Hours

You can schedule jobs so that they run at offpeak hours (use the at and cron commands) or when the load level permits (use the batch command). This can relieve the load on the CPU and the memory and disk I/O subsystems.

See at(1) and cron(8) for more information.

7.2.5 Stopping the advfsd Daemon

The advfsd daemon allows Simple Network Management Protocol (SNMP) clients such as Netview or Performance Manager (PM) to request AdvFS file system information. If you are not using the AdvFS graphical user interface (GUI), you can free CPU resources and prevent the advfsd daemon from periodically scanning disks by stopping the advfsd daemon.

To prevent the advfsd daemon from starting at boot time, rename /sbin/rc3.d/S53advfsd to /sbin/rc3.d/T53advfsd.

To immediately stop the daemon, use the following command:

# /sbin/init.d/advfsd stop

7.2.6 Using Hardware RAID to Relieve the CPU of I/O Overhead

RAID controllers can relieve the CPU of the disk I/O overhead, in addition to providing many disk I/O performance-enhancing features. See Section 8.5 for more information about hardware RAID.