You may be able to improve performance by optimizing CPU resources. This chapter describes how to perform the following tasks:
Obtain information about CPU performance (Section 7.1)
Improve CPU performance (Section 7.2)
7.1 Gathering CPU Performance Information
Table 7-1
describes the
tools you can use to gather information about CPU usage.
Table 7-1: CPU Monitoring Tools
Name | Use | Description |
Analyzes system configuration and displays statistics (Section 4.3) |
Creates an HTML file that describes the system configuration, and can be used to diagnose problems. This utility checks kernel variable settings and memory and CPU resources, and provides performance data and lock statistics for SMP systems and kernel profiles. The
|
|
|
Displays CPU and virtual memory usage by processes (Section 6.3.2 and Section 7.1.1) |
Displays current statistics for running processes, including CPU usage, the processor and processor set, and the scheduling priority. The
|
Displays CPU and virtual memory usage by processes |
Displays current statistics for running processes. Invoke the Process Tuner graphical user interface (GUI) from the CDE Application Manager to display a list of processes and their characteristics, display the processes running for yourself or all users, display and modify process priorities, or send a signal to a process. While monitoring
processes, you can select parameters to view (percent of CPU usage, virtual
memory size, state, and
|
|
|
Displays virtual memory and CPU usage statistics (Section 7.1.2) |
Displays information about process threads, virtual memory usage (page lists, page faults, page ins, and page outs), interrupts, and CPU usage (percentages of user, system and idle times). First reported are the statistics since boot time; subsequent reports are the statistics since a specified interval of time. |
Collects performance data |
Collects a variety of performance data
on a running system and either displays the information in a graphical format
or saves it to a binary file.
The
|
|
Provides continuous reports on the system |
Provides continuous reports on the
state of the system, including a list of the processes using the most CPU
resources.
The
|
|
Displays IPC statistics |
Displays interprocess communication
(IPC) statistics for currently active message queues, shared-memory segments,
semaphores, remote queues, and local queue headers.
The information provided
in the following fields reported by the
|
|
Displays the system load average (Section 7.1.3) |
Displays the number of jobs in the
run queue for the last 5 seconds, the last 30 seconds, and the last 60 seconds.
The
|
|
|
Displays the current time, the amount of time since the system was last started, the users logged in to the system, and the number of jobs in the run queue for the last 5 seconds, 30 seconds, and 60 seconds. The
|
|
|
Monitors the system load average |
Displays
the system load average in a histogram that is periodically updated.
See
|
Reports CPU statistics (Section 7.1.4) |
Displays CPU statistics, including the percentages of time the CPU spends in various states. |
|
Reports lock statistics (Section 7.1.5) |
Displays lock statistics for each lock class on each CPU in the system. |
The following sections describe some of these commands in detail.
7.1.1 Monitoring CPU Usage by Using the ps Command
The
ps
command displays
a snapshot of the current status of the system processes.
You can use it to
determine the current running processes (including users), their state, and
how they utilize system memory.
The command lists processes in order of decreasing
CPU usage so you can identify which processes are using the most CPU time.
See
Section 6.3.2
for detailed information about using the
ps
command to diagnose CPU performance problems.
7.1.2 Monitoring CPU Statistics by Using the vmstat Command
The
vmstat
command shows the virtual
memory, process, and CPU statistics for a specified time interval.
The first
line of output displays statistics since reboot time; each subsequent line
displays statistics since the specified time interval.
An example of the
vmstat
command is as follows; output
is provided in one-second intervals:
#
/usr/ucb/vmstat 1
Virtual Memory Statistics: (pagesize = 8192) procs memory pages intr cpu r w u act free wire fault cow zero react pin pout in sy cs us sy id 2 66 25 6417 3497 1570 155K 38K 50K 0 46K 0 4 290 165 0 2 98 4 65 24 6421 3493 1570 120 9 81 0 8 0 585 865 335 37 16 48 2 66 25 6421 3493 1570 69 0 69 0 0 0 570 968 368 8 22 69 4 65 24 6421 3493 1570 69 0 69 0 0 0 554 768 370 2 14 84 4 65 24 6421 3493 1570 69 0 69 0 0 0 865 1K 404 4 20 76
The following fields are particularly important for CPU monitoring:
Process information (procs
):
r
-- Number of threads that are running
or can run.
w
-- Number of threads that are waiting
interruptibly (waiting for an event or a resource, but can be interrupted
or suspended).
For example, the thread can accept user signals or be swapped
out of memory.
u
-- Number of threads that are waiting
uninterruptibly (waiting for an event or a resource, but cannot be interrupted
or suspended).
For example, the thread cannot accept user signals; it must
come out of the wait state to take a signal.
Processes that are waiting uninterruptibly
cannot be stopped by the
kill
command.
CPU usage information (cpu
):
us
-- Percentage of user time for
normal and priority processes.
User time includes the time the CPU spent executing
library routines.
sy
-- Percentage of system time.
System
time includes the time the CPU spent executing system calls.
id
-- Percentage of idle time.
See
Section 6.3.1
for detailed information about the using
the
vmstat
command to diagnose performance problems.
To use the
vmstat
command to diagnose a CPU performance
problem, check the user (us
), system (sy
),
and idle (id
) time split.
You must understand how your
applications use the system to determine the appropriate values for these
times.
The goal is to keep the CPU as productive as possible.
Idle CPU cycles
occur when no runnable processes exist or when the CPU is waiting to complete
an I/O or memory request.
The following list describes how to interpret the values for user, system, and idle time:
System time (sy
)--A high percentage
of system time may indicate a system bottleneck, which can be caused by excessive
system calls, device interrupts, context switches, soft page faults, lock
contention, or cache missing.
A high percentage of system time and a low percentage of idle time may indicate that something in the application load is stimulating the system with high overhead operations. Such overhead operations could consist of high system call frequencies, high interrupt rates, large numbers of small I/O transfers, or large numbers of IPCs or network transfers.
A high percentage of system time and low percentage of idle time may
also be caused by failing hardware.
Use the
uerf
command
to check your hardware.
A high percentage of system time may also indicate that the system is thrashing; that is, the amount of memory available to the virtual memory subsystem has gotten so low that the system is spending all its time paging and swapping in an attempt to regain memory. A system that spends more than 50 percent of its time in system mode and idle mode may not have enough memory resources. See Section 6.4 for information about increasing memory resources.
Idle time (id
)--A high percentage
of idle time on one or more processors indicates either:
Threads are blocked because the CPU is waiting for some event or resource (for example, memory or I/O)
Threads are idle because the CPU is not busy
If you have a high idle time and poor response time, and you are sure that your system has a typical load, one or more of the following problems may exist:
The hardware may have reached its capacity
A kernel data structure is exhausted
You may have a memory, disk I/O, or network bottleneck
If the idle time percentage is very low but performance is acceptable, your system is utilizing its CPU resources efficiently.
User time (us
)--A high percentage
of user time can be a characteristic of a well-performing system.
However,
if the system has poor performance, a high percentage of user time may indicate
a user code bottleneck, which can be caused by inefficient user code, insufficient
CPU processing power, or excessive memory latency or cache missing.
See
Section 7.2
for information on optimizing CPU resources.
Use profiling to determine which sections of code consume the most processing time. See Section 11.1 and the Programmer's Guide for more information on profiling.
A high percentage of user time and a low percentage of idle time may indicate that your application code is consuming most of the CPU. You can optimize the application, or you may need a more powerful processor. See Section 7.2 for information on optimizing CPU resources.
7.1.3 Monitoring the Load Average by Using the uptime Command
The
uptime
command shows
how long a system has been running and the load average.
The load average
counts the jobs that are waiting for disk I/O, and applications whose priorities
have been changed with either the
nice
or the
renice
command.
The load average numbers give the average number
of jobs in the run queue for the last 5 seconds, the last 30 seconds, and
the last 60 seconds.
An example of the
uptime
command is as follows:
#
/usr/ucb/uptime
1:48pm up 7 days, 1:07, 35 users, load average: 7.12, 10.33, 10.31
The command output displays the current time, the amount of time since the system was last started, the number of users logged into the system, and the load averages for the last 5 seconds, the last 30 seconds, and the last 60 seconds.
From the command output, you can determine whether the load is increasing or decreasing. An acceptable load average depends on your type of system and how it is being used. In general, for a large system, a load of 10 is high, and a load of 3 is low. Workstations should have a load of 1 or 2.
If the load is high, look at what processes are running with the
ps
command.
You may want to run some applications during offpeak
hours.
See
Section 6.3.2
for information about the
ps
command.
You can also lower the priority of applications with the
nice
or
renice
command to conserve CPU cycles.
See
nice
(1)
and
renice
(8)
for more information.
7.1.4 Checking CPU Usage by Using the kdbx Debugger
The
kdbx
debugger
cpustat
extension displays CPU statistics, including the percentages
of time the CPU spends in the following states:
Running user-level code
Running system-level code
Running at a priority set with the
nice
function
Idle
Waiting (idle with input or output pending)
The
cpustat
extension to the
kdbx
debugger can help application developers determine how effectively they are
achieving parallelism across the system.
By default, the
kdbx cpustat
extension displays statistics
for all CPUs in the system.
For example:
#
/usr/bin/kdbx -k /vmunix /dev/mem
(kdbx)
cpustat
Cpu User (%) Nice (%) System (%) Idle (%) Wait (%) ===== ========== ========== ========== ========== ========== 0 0.23 0.00 0.08 99.64 0.05 1 0.21 0.00 0.06 99.68 0.05
See the
Kernel Debugging
manual and
kdbx
(8)
for more information.
7.1.5 Checking Lock Usage by Using the kdbx Debugger
The
kdbx
debugger
lockstats
extension displays lock statistics for each lock class
on each CPU in the system, including the following information:
Address of the structure
Class of the lock for which lock statistics are being recorded
CPU for which the lock statistics are being recorded
Number of instances of the lock
Number of times that processes have tried to get the lock
Number of times that processes have tried to get the lock and missed
Percentage of time that processes miss the lock
Total time that processes have spent waiting for the lock
Maximum amount of time that a single process has waited for the lock
Minimum amount of time that a single process has waited for the lock
For example:
#
/usr/bin/kdbx -k /vmunix /dev/mem
(kdbx)
lockstats
See the
Kernel Debugging
manual and
kdbx
(8)
for more information.
7.2 Improving CPU Performance
A system must be able to efficiently allocate the available CPU cycles among competing processes to meet the performance needs of users and applications. You may be able to improve performance by optimizing CPU usage.
Table 7-2
describes the guidelines for improving
CPU performance.
Table 7-2: Primary CPU Performance Improvement Guidelines
Guideline | Performance Benefit | Tradeoff |
Add processors (Section 7.2.1) | Increases CPU resources | Applicable only for multiproccessing systems, and may affect virtual memory performance |
Use the Class Scheduler (Section 7.2.2) | Allocates CPU resources to critical applications | None |
Prioritize jobs (Section 7.2.3) | Ensures that important applications have the highest priority | None |
Schedule jobs at offpeak hours (Section 7.2.4) | Distributes the system load | None |
Stop the
advfsd
daemon
(Section 7.2.5) |
Decreases demand for CPU power | Applicable only if you are not using the AdvFS graphical user interface |
Use hardware RAID (Section 7.2.6) | Relieves the CPU of disk I/O overhead and provides disk I/O performance improvements | Increases costs |
The following sections describe how to optimize your CPU resources.
If optimizing CPU resources does not solve the performance problem, you may
have to upgrade your CPU to a faster processor.
7.2.1 Adding Processors
Multiprocessing systems allow you to expand the computing power of a system by adding processors. Workloads that benefit most from multiprocessing have multiple processes or multiple threads of execution that can run concurrently, such as database management system (DBMS) servers, Internet servers, mail servers, and compute servers.
You may be able to improve the performance of a multiprocessing system that has only a small percentage of idle time by adding processors. See Section 7.1.2 for information about checking idle time.
Before you add processors, you must ensure that a performance problem is not caused by the virtual memory or I/O subsystems. For example, increasing the number of processors will not improve performance in a system that lacks sufficient memory resources.
In addition, increasing the number of processors may increase the demands on your I/O and memory subsystems and could cause bottlenecks.
If you add processors and your system is metadata-intensive (that is,
it opens large numbers of small files and accesses them repeatedly), you can
improve the performance of synchronous write operations by using Prestoserve
(see
Section 2.4.8), or by using a RAID controller with
a write-back cache (see
Section 8.5).
7.2.2 Using the Class Scheduler
Use the Class Scheduler to allocate a percentage of CPU time to specific tasks or applications. This allows you to reserve CPU time for important processes, while limiting CPU usage by less critical processes.
To use class scheduling, group together processes into classes and assign each class a percentage of CPU time. You can also manually assign a class to any process.
The Class Scheduler allows you to display statistics on the actual CPU usage for each class.
See the
System Administration
manual and
class_scheduling
(4),
class_admin
(8),
runclass
(1),
and
classcntl
(2)
for more information about the Class Scheduler.
7.2.3 Prioritizing Jobs
You can prioritize
jobs so that important applications are run first.
Use the
nice
command to specify the priority for a command.
Use the
renice
command to change the priority of a running process.
See
nice
(1)
and
renice
(8)
for more information.
7.2.4 Scheduling Jobs at Offpeak Hours
You
can schedule jobs so that they run at offpeak hours (use the
at
and
cron
commands) or when the load level permits (use
the
batch
command).
This can relieve the load on the CPU
and the memory and disk I/O subsystems.
See
at
(1)
and
cron
(8)
for more information.
7.2.5 Stopping the advfsd Daemon
The
advfsd
daemon allows Simple Network Management
Protocol (SNMP) clients such as Netview or Performance Manager (PM) to request
AdvFS file system information.
If you are not using the AdvFS graphical user
interface (GUI), you can free CPU resources and prevent the
advfsd
daemon from periodically scanning disks by stopping the
advfsd
daemon.
To prevent the
advfsd
daemon from starting at boot
time, rename
/sbin/rc3.d/S53advfsd
to
/sbin/rc3.d/T53advfsd
.
To immediately stop the daemon, use the following command:
#
/sbin/init.d/advfsd stop
7.2.6 Using Hardware RAID to Relieve the CPU of I/O Overhead
RAID controllers can relieve the CPU of the disk I/O overhead, in addition to providing many disk I/O performance-enhancing features. See Section 8.5 for more information about hardware RAID.