3 Optimizing Applications and CPU Performance

This chapter describes how to optimize CPU resources and applications for high performance.

3.1 Configuring CPU Resources

You must configure enough CPU power in your system to meet the performance needs of your users and applications. In addition, you may be able to improve performance by optimizing the CPU and your applications.

A system must be able to efficiently allocate the available CPU cycles among competing processes. In addition to single-CPU systems, DIGITAL supports multiprocessing systems and processors with different speeds.

Multiprocessing systems allow you to expand the computing power of a system by adding processors. Workloads that benefit most from multiprocessing have multiple processes or multiple threads of execution that can run concurrently, such as database management system (DBMS) servers, World Wide Web (WWW) servers, mail servers, and compute servers.

You may be able to improve the performance of a multiprocessing system that has only a small percentage of idle time by adding processors. However, increasing the number of processors may increase the demands on your I/O and memory subsystems and could cause bottlenecks. If your system is metadata-intensive (that is, it opens large numbers of small files and accesses them repeatedly), you may gain an additional performance benefit if you add Prestoserve or use a write-back cache when you add more processors. See Chapter 5 for information about Prestoserve and write-back caches.

Before you add processors, you must ensure that a performance problem is not caused by the virtual memory or I/O subsystems. For example, increasing the number of processors will not improve performance in a system that lacks sufficient memory resources.

The iostat and vmstat commands let you monitor the memory, CPU, and I/O consumption on your system. The cpustat extension to the kdbx debugger allows application developers to monitor the time spent in user mode, system mode, and kernel mode on each of the processors. This information can help application developers determine how effectively they are achieving parallelism across the system. See Chapter 2 for information about using tools to monitor performance.

3.2 Identifying CPU Bottlenecks

Use the vmstat command to determine CPU usage as follows:

A high percentage of idle time on one or more processors indicates either:
- Threads are blocked because the CPU is waiting for some event or resource (for example, memory or I/O)
- Threads are idle because the CPU is not busy

A low percentage of idle time is the primary indication of a CPU bottleneck.

A high percentage of system time may indicate a system bottleneck, which can be caused by excessive system calls, device interrupts, context switches, soft page faults, lock contention, or cache missing.

A high percentage of user time can be a characteristic of a well-performing system. However, if the system has poor performance, a high percentage of user time may indicate a user code bottleneck, which can be caused by inefficient user code, insufficient processing power, or excessive memory latency or cache missing.
Use profiling to determine which sections of code consume the most processing time. See the Programmer's Guide for more information on profiling.

Use the kdbx cpustat extension to display statistics about CPU use, especially for multiprocessing systems. Statistics include the percentages of time the CPU spends in the following states:

Running user level code

Running system level code

Running at a priority set with the nice function

Idle

Waiting (idle with input or output pending)

See Chapter 2 for information about monitoring systems.

3.3 Optimizing CPU Resources

After you configure the appropriate number of CPUs in your system, you may be able to improve system performance by optimizing your CPU resources. Before optimizing the CPU, ensure that the virtual memory or I/O subsystems are not the cause of poor performance. If optimizing the CPU does not solve the performance problem, you must upgrade your CPU to a faster processor or use multiprocessing.

To optimize your CPU resources, use the following methods:

Use the Class Scheduler to allocate CPU resources
The Class Scheduler allows you to allocate a percentage of CPU time to a task or application. This allows you to reserve a majority of CPU time for important processes, while limiting CPU usage by less critical processes.
To use class scheduling, group together processes into classes and assign each class a percentage of CPU time. You can display statistics on the actual CPU usage for a class. You can also manually assign a class to any process.
See the Release Notes, class_scheduling(4), class_admin(8), runclass(1), and classcntl(2) for more information about the Class Scheduler.

Prioritize jobs so that important applications are run first
Use the nice command to specify the priority for a command. Use the renice command to change the priority of a running process.

Schedule jobs at different times (use the at and cron commands) or when the load level permits (use the batch command)

Increase the program size limits
Extremely large programs may run more efficiently if you increase the values of the following system configuration file parameters that control program size limits:
- dfldsiz--Default data segment size limit
- maxdsiz--Maximum data segment size limit
- dflssiz--Default stack size limit
- maxssiz--Maximum stack size limit
Some extremely large programs may not run unless these parameters are adjusted. For example, an inadequate maxdsiz size limit may produce the following error:
```
Out of process memory...
```
The limit and unlimit commands can affect program size limits. See the System Administration manual for information on changing these parameter values.

Reduce the size of the kernel
You can reduce the static size of the kernel by deconfiguring any unnecessary subsystems. To do this, use the setld -d command. You can also minimize the number of kernel options for your system. See the Installation Guide for details.

Ensure that lockmode is set to the appropriate value
Specify 0 for UP (uniprocessing), 2 for symmetrical multiprocessing (SMP), and 1 or 3 for realtime. This can prevent system bottlenecks in the CPU.

Optimize your applications
You can use various compiler and linker optimization levels to generate more efficient user code. See the Programmer's Guide for more information on application optimization.

3.4 Identifying Application Bottlenecks

If an application is degrading system performance, use profiling to identify sections of code that consume large portions of execution time. In a typical program, most execution time is spent in relatively few sections of code. To improve performance, concentrate on improving the coding efficiency of those time-intensive sections. See the Programmer's Guide for more information on profiling.

3.5 Improving Application Performance

Well-written applications use CPU, memory, and I/O resources efficiently. You may be able to improve system and application performance by following these recommendations:

Use the latest version of the operating system, compiler, firmware, and patches
Check the software on your system to ensure that you are using the latest versions of the compiler and the operating system to build your application program. In general, new versions of a compiler perform advanced optimizations, and new versions of the operating system operate efficiently.

Use parallelism
To enhance parallelism, application developers working in Fortran or C should consider using the Kuch & Associates Preprocessor (KAP), which can have a significant impact on SMP performance. See the Programmer's Guide for details on KAP.

Ensure that the application runs without error
Test your application program to ensure that it runs without errors. Whether you are porting an application from a 32-bit system to DIGITAL UNIX or developing a new application, never attempt to optimize an application until it has been thoroughly debugged and tested. If you are porting an application written in C, use the lint command with the -Q flag or compile your program using the C compiler's -check flag to identify possible portability problems that you may need to resolve.

Optimize applications
Optimizing an application program can involve modifying the build process or modifying the source code. Various compiler and linker optimization levels can be used to generate more efficient user code. See the Programmer's Guide for more information on optimization.

Prioritize applications
Prioritize jobs so that important applications are run first. Use the nice command to specify the priority for a command. Use the renice command to change the priority of a running process.

Use shared libraries
Using shared libraries reduces the need for memory and disk space. When multiple programs are linked to a single shared library, the amount of physical memory used by each process can be significantly reduced. However, shared libraries initially result in an execution time that is slower than if you had used static libraries.

3.6 Interprocess Communications Facilities

Interprocess communication (IPC) is the exchange of information between two or more processes. Some examples of IPC include messages, shared memory, semaphores, pipes, signals, process tracing, and processes communicating with other processes over a network. IPC is a functional interrelationship of several operating system subsystems. Elements are found in scheduling and networking.

In single-process programming, modules within a single process communicate with each other using global variables and function calls, with data passing between the functions and the callers. If you are programming by using separate processes with images in separate address spaces, use additional communication mechanisms.

The DIGITAL UNIX operating system provides the following facilities for interprocess communication:

Pipes--See the Guide to Realtime Programming for information about pipes.

Signals--See the Guide to Realtime Programming for information about signals.

Sockets--See the Network Programmer's Guide for information about sockets.

Streams--See the Programmer's Guide: STREAMS for information about streams.

X/Open Transport Interface (XTI)--See the Network Programmer's Guide for information about XTI.

You may be able to improve IPC performance by modifying the following attributes:

msg-mnb (maximum number of bytes on queue)
A process will be unable to send a message to a queue if the message will make the total number of bytes in that queue greater than the limit specified by msg-mnb. When the limit is reached, the process sleeps and waits for this condition to be resolved.

msg-tql (number of system message headers)
A process will be unable to send a message if the message will make the total number of message headers currently in the system greater than the limit specified by msg-tql. If the limit is reached, the process sleeps and waits for a message header to be freed.

You can track the use of IPC facilities with the ipcs -a command (see ipcs(1)). By looking at the current number of bytes and message headers in the queues, you can then determine whether you need to increase the values of the msg-mnb and msg-tql attributes to diminish waiting.

You may also want to consider modifying several of the following IPC attributes:

Message attributes:
- msg-max (maximum message size)
- msg-mni (number of message queue identifiers)

Semaphore attributes:
- sem-mni (number of semaphore identifiers)
- sem-msl (number of semaphores per ID)
- sem-opm (maximum number of operations per semop call)
- sem-ume (maximum number of undo entries per process)
- sem-vmx (semaphore maximum value)
- sem-aem (adjust on exit maximum value)

Shared memory attributes:
- shm-max (maximum shared memory segment size)
- shm-min (minimum shared memory segment size)
- shm-mni (number of shared memory identifiers)
- shm-seg (maximum attached shared memory segments per process)
As a design consideration, consider whether you will get better performance by using threads instead of shared memory.