This chapter describes how to optimize CPU resources and applications for high performance.
You must configure enough CPU power in your system to meet the performance needs of your users and applications. In addition, you may be able to improve performance by optimizing the CPU and your applications.
A system must be able to efficiently allocate the available CPU cycles among competing processes. In addition to single-CPU systems, DIGITAL supports multiprocessing systems and processors with different speeds.
Multiprocessing systems allow you to expand the computing power of a system by adding processors. Workloads that benefit most from multiprocessing have multiple processes or multiple threads of execution that can run concurrently, such as database management system (DBMS) servers, World Wide Web (WWW) servers, mail servers, and compute servers.
You may be able to improve the performance of a multiprocessing system that has only a small percentage of idle time by adding processors. However, increasing the number of processors may increase the demands on your I/O and memory subsystems and could cause bottlenecks. If your system is metadata-intensive (that is, it opens large numbers of small files and accesses them repeatedly), you may gain an additional performance benefit if you add Prestoserve or use a write-back cache when you add more processors. See Chapter 5 for information about Prestoserve and write-back caches.
Before you add processors, you must ensure that a performance problem is not caused by the virtual memory or I/O subsystems. For example, increasing the number of processors will not improve performance in a system that lacks sufficient memory resources.
The
iostat
and
vmstat
commands
let you monitor the memory, CPU, and I/O consumption on your system.
The
cpustat
extension to the
kdbx
debugger allows application developers to monitor the time spent in user
mode, system
mode, and kernel mode on each of the processors.
This information can help
application developers determine how effectively they are achieving parallelism
across the system.
See
Chapter 2
for information
about using tools to monitor performance.
Use the
vmstat
command to determine CPU usage as follows:
A high percentage of idle time on one or more processors indicates either:
Threads are blocked because the CPU is waiting for some event or resource (for example, memory or I/O)
Threads are idle because the CPU is not busy
A low percentage of idle time is the primary indication of a CPU bottleneck.
A high percentage of system time may indicate a system bottleneck, which can be caused by excessive system calls, device interrupts, context switches, soft page faults, lock contention, or cache missing.
A high percentage of user time can be a characteristic of a well-performing system. However, if the system has poor performance, a high percentage of user time may indicate a user code bottleneck, which can be caused by inefficient user code, insufficient processing power, or excessive memory latency or cache missing.
Use profiling to determine which sections of code consume the most processing time. See the Programmer's Guide for more information on profiling.
Use the
kdbx cpustat
extension to display statistics about CPU use, especially for
multiprocessing systems.
Statistics include the
percentages of time the CPU spends in the following states:
Running user level code
Running system level code
Running at a priority set with the
nice
function
Idle
Waiting (idle with input or output pending)
See Chapter 2 for information about monitoring systems.
After you configure the appropriate number of CPUs in your system, you may be able to improve system performance by optimizing your CPU resources. Before optimizing the CPU, ensure that the virtual memory or I/O subsystems are not the cause of poor performance. If optimizing the CPU does not solve the performance problem, you must upgrade your CPU to a faster processor or use multiprocessing.
To optimize your CPU resources, use the following methods:
Use the Class Scheduler to allocate CPU resources
The Class Scheduler allows you to allocate a percentage of CPU time to a task or application. This allows you to reserve a majority of CPU time for important processes, while limiting CPU usage by less critical processes.
To use class scheduling, group together processes into classes and assign each class a percentage of CPU time. You can display statistics on the actual CPU usage for a class. You can also manually assign a class to any process.
See the
Release Notes,
class_scheduling
(4),
class_admin
(8),
runclass
(1),
and
classcntl
(2)
for more information about the Class Scheduler.
Prioritize jobs so that important applications are run first
Use the
nice
command to specify the priority for a command.
Use
the
renice
command to change the priority of a running
process.
Schedule jobs at different times (use the
at
and
cron
commands) or when the load level permits (use the
batch
command)
Increase the program size limits
Extremely large programs may run more efficiently if you increase the values of the following system configuration file parameters that control program size limits:
dfldsiz
--Default data segment size limit
maxdsiz
--Maximum data segment size limit
dflssiz
--Default stack size limit
maxssiz
--Maximum stack size limit
Some extremely
large programs may not run unless these parameters are adjusted.
For example, an inadequate
maxdsiz
size limit may produce
the following error:
Out of process memory...
The
limit
and
unlimit
commands can
affect program size limits.
See the
System Administration
manual for
information on changing these parameter values.
You can reduce the static size of the kernel by
deconfiguring any unnecessary subsystems.
To do this, use the
setld
-d
command.
You can also minimize the number of kernel options for your system.
See the
Installation Guide
for details.
Ensure that lockmode is set to the appropriate value
Specify 0 for UP (uniprocessing), 2 for symmetrical multiprocessing (SMP), and 1 or 3 for realtime. This can prevent system bottlenecks in the CPU.
Optimize your applications
You can use various compiler and linker optimization levels to generate more efficient user code. See the Programmer's Guide for more information on application optimization.
If an application is degrading system performance, use profiling to identify sections of code that consume large portions of execution time. In a typical program, most execution time is spent in relatively few sections of code. To improve performance, concentrate on improving the coding efficiency of those time-intensive sections. See the Programmer's Guide for more information on profiling.
Well-written applications use CPU, memory, and I/O resources efficiently. You may be able to improve system and application performance by following these recommendations:
Use the latest version of the operating system, compiler, firmware, and patches
Check the software on your system to ensure that you are using the latest versions of the compiler and the operating system to build your application program. In general, new versions of a compiler perform advanced optimizations, and new versions of the operating system operate efficiently.
Use parallelism
To enhance parallelism, application developers working in Fortran or C should consider using the Kuch & Associates Preprocessor (KAP), which can have a significant impact on SMP performance. See the Programmer's Guide for details on KAP.
Ensure that the application runs without error
Test your application program to ensure that it runs without errors.
Whether
you are porting an application from a 32-bit system to DIGITAL UNIX or
developing a new application, never attempt to optimize an application
until it has been thoroughly debugged and tested.
If you
are porting an application written in C, use the
lint
command with the
-Q
flag or compile your program using
the C compiler's
-check
flag to
identify possible portability problems that you may need to resolve.
Optimize applications
Optimizing an application program can involve modifying the build process or modifying the source code. Various compiler and linker optimization levels can be used to generate more efficient user code. See the Programmer's Guide for more information on optimization.
Prioritize applications
Prioritize jobs so that important applications are run first.
Use the
nice
command to specify the priority for a command.
Use
the
renice
command to change the priority of a running
process.
Use shared libraries
Using shared libraries reduces the need for memory and disk space. When multiple programs are linked to a single shared library, the amount of physical memory used by each process can be significantly reduced. However, shared libraries initially result in an execution time that is slower than if you had used static libraries.
Interprocess communication (IPC) is the exchange of information between two or more processes. Some examples of IPC include messages, shared memory, semaphores, pipes, signals, process tracing, and processes communicating with other processes over a network. IPC is a functional interrelationship of several operating system subsystems. Elements are found in scheduling and networking.
In single-process programming, modules within a single process communicate with each other using global variables and function calls, with data passing between the functions and the callers. If you are programming by using separate processes with images in separate address spaces, use additional communication mechanisms.
The DIGITAL UNIX operating system provides the following facilities for interprocess communication:
Pipes--See the Guide to Realtime Programming for information about pipes.
Signals--See the Guide to Realtime Programming for information about signals.
Sockets--See the Network Programmer's Guide for information about sockets.
Streams--See the Programmer's Guide: STREAMS for information about streams.
X/Open Transport Interface (XTI)--See the Network Programmer's Guide for information about XTI.
You may be able to improve IPC performance by modifying the following attributes:
msg-mnb
(maximum number of bytes on queue)
A process will be unable to send a message to a queue if the message will
make the total number of bytes in that queue greater than the limit
specified by
msg-mnb
.
When the limit is reached, the
process sleeps and waits for this condition to be resolved.
msg-tql
(number of system message headers)
A process will be unable to send a message if the message will make
the total number of message headers currently in the system greater than the
limit specified by
msg-tql
.
If the limit is reached, the
process sleeps and waits for a message header to be freed.
You can track the use of IPC facilities with the
ipcs -a
command (see
ipcs
(1)).
By looking at the current number
of bytes and message headers in the queues, you can then determine whether
you need to increase the values of the
msg-mnb
and
msg-tql
attributes to
diminish waiting.
You may also want to consider modifying several of the following IPC attributes:
Message attributes:
Semaphore attributes:
Shared memory attributes:
As a design consideration, consider whether you will get better performance by using threads instead of shared memory.