11    Managing Application Performance

You may be able to improve overall Tru64 UNIX performance by improving application performance. This chapter describes how to:

11.1    Gathering Profiling and Debugging Information

You can use profiling to identify sections of application code that consume large portions of execution time. To improve performance, concentrate on improving the coding efficiency of those time-intensive sections.

Table 11-1 describes the commands you can use to obtain information about applications. Detailed information about these tools is located in the Programmer's Guide and the Kernel Debugging manual.

In addition, prof_intro(1) provides an overview of application profilers, profiling, optimization, and performance analysis.

Table 11-1:  Application Profiling and Debugging Tools

Name Use Description

atom

Profiles applications

Consists of a set of prepackaged tools (third, hiprof, or pixie) that can be used to instrument applications for profiling or debugging purposes. The atom toolkit also consists of a command interface and a collection of instrumentation routines that you can use to create custom tools for instrumenting applications. See the Programmer's Guide and atom(1) for more information.

third

Checks memory access and detects memory leaks in applications

Performs memory access checks and memory leak detection of C and C++ programs at run time, by using the atom tool to add code to executable and shared objects. The Third Degree tool instruments the entire program, including its referenced libraries. See third(1) for more information.

hiprof

Produces a profile of procedure execution times in an application

An atom-based program profiling tool that produces a flat profile, which shows the execution time spent in any given procedure, and a hierarchical profile, which shows the time spent in a given procedure and all of its descendents.

The hiprof tool uses code instrumentation instead of program counter (PC) sampling to gather statistics. The gprof command is usually used to filter and merge output files and to format profile reports. See hiprof(1) for more information.

pixie

Profiles basic blocks in an application

Produces a profile showing the number of times each instruction was executed in a program. The information can be reported as tables or can be used to automatically direct later optimizations by using the -feedback, -om, or -cord options in the C compiler (see cc(1)).

The pixie profiler reads an executable program, partitions it into basic blocks, and writes an equivalent program containing additional code that counts the execution of each basic block.

The pixie utility also generates a file containing the address of each of the basic blocks. When you run this pixie-generated program, it generates a file containing the basic block counts. The prof and pixstats commands can analyze these files. See pixie(1) for more information.

prof

Analyzes profiling data and displays a profile of statistics for each procedure in an application

Analyzes profiling data and produces statistics showing which portions of code consume the most time and where the time is spent (for example, at the routine level, the basic block level, or the instruction level).

The prof command uses as input one or more data files generated by the kprofile, uprofile, or pixie profiling tools. The prof command also accepts profiling data files generated by programs linked with the -p switch of compilers such as cc.

The information produced by prof allows you to determine where to concentrate your efforts to optimize source code. See prof(1) for more information.

gprof

Analyzes profiling data and displays procedure call information and statistical program counter sampling in an application

Analyzes profiling data and allows you to determine which routines are called most frequently, and the source of the routine call, by gathering procedure call information and erforming statistical program counter (PC) sampling.

The gprof tool produces a flat profile of the routines' CPU usage. To produce a graphical execution profile of a program, the tool uses data from PC sampling profiles, which are produced by programs compiled with the cc -pg command, or from instrumented profiles, which are produced by programs modified by the atom -tool hiprof command. See gprof(1) for more information.

uprofile

Profiles user code in an application

Profiles user code using performance counters in the Alpha chip. The uprofile tool allows you to profile only the executable part of a program. The uprofile tool does not collect information on shared libraries. You process the performance data collected by the tool with the prof command. See the Kernel Debugging manual or uprofile(1) for more information.

Visual Threads

Identifies bottlenecks and performance problems in multithreaded applications

Enables you to analyze and refine your multithreaded applications. You can use Visual Threads to identify bottlenecks and performance problems, and to debug potential thread-related logic problems. Visual Threads uses rule-based analysis and statistics capabilities and visualization techniques. Visual Threads is licensed as part of the Developers' Toolkit for Tru64 UNIX.

dbx

Debugs running kernels, programs, and crash dumps, and examines and temporarily modifies kernel variables

Provides source-level debugging for C, Fortran, Pascal, assembly language, and machine code. The dbx debugger allows you to analyze crash dumps, trace problems in a program object at the source-code level or at the machine code level, control program execution, trace program logic and flow of control, and monitor memory locations.

Use dbx to debug kernels, debug stripped images, examine memory contents, debug multiple threads, analyze user code and applications, display the value and format of kernel data structures, and temporarily modify the values of some kernel variables. See dbx(8) for more information.

ladebug

Debugs kernels and applications

Debugs programs and the kernel and helps locate run-time programming errors. The ladebug symbolic debugger is an alternative to the dbx debugger and provides both command-line and graphical user interfaces and support for debugging multithreaded programs. See the Ladebug Debugger Manual and ladebug(1) for more information.

lsof

Displays open files

Displays information about files that are currently opened by the running processes. The lsof is is available on the Tru64 UNIX Freeware CD-ROM.

11.2    Improving Application Performance

Well-written applications use CPU, memory, and I/O resources efficiently. Table 11-2 describes some guidelines to improve application performance.

Table 11-2:  Application Performance Improvement Guidelines

Guideline Performance Benefit Tradeoff
Install the latest operating system patches (Section 11.2.1) Provides the latest optimizations None
Use the latest version of the compiler (Section 11.2.2) Provides the latest optimizations None
Use parallelism (Section 11.2.3) Improves SMP performance None
Optimize applications (Section 11.2.4) Generates more efficient code None
Use shared libraries (Section 11.2.5) Frees memory May increase execution time
Reduce application memory requirements (Section 11.2.6) Frees memory Program may not run optimally
Use memory locking as part of real-time program initialization (Section 11.2.7) Allows you to lock and unlock memory as needed Reduces the memory available to processes and the UBC

The following sections describe how to improve application performance.

11.2.1    Using the Latest Operating System Patches

Always install the latest operating system patches, which often contain performance enhancements.

Check the /etc/motd file to determine which patches you are running. See your customer service representative or for information about installing patches.

11.2.2    Using the Latest Version of the Compiler

Always use the latest version of the compiler to build your application program. Usually, new versions include advanced optimizations.

Check the software on your system to ensure that you are using the latest version of the compiler.

11.2.3    Using Parallelism

To enhance parallelism, application developers working in Fortran or C should consider using the Kuch & Associates Preprocessor (KAP), which can have a significant impact on SMP performance. See the Programmer's Guide for details on KAP.

11.2.4    Optimizing Applications

Optimizing an application program can involve modifying the build process or modifying the source code. Various compiler and linker optimization levels can be used to generate more efficient user code. See the Programmer's Guide for more information on optimization.

Whether you are porting an application from a 32-bit system to Tru64 UNIX or developing a new application, never attempt to optimize an application until it has been thoroughly debugged and tested. If you are porting an application written in C, use the lint command with the -Q option or compile your program using the C compiler's -check option to identify possible portability problems that you may need to resolve.

11.2.5    Using Shared Libraries

Using shared libraries reduces the need for memory and disk space. When multiple programs are linked to a single shared library, the amount of physical memory used by each process can be significantly reduced.

However, shared libraries initially result in an execution time that is slower than if you had used static libraries.

11.2.6    Reducing Application Memory Requirements

You may be able to reduce an application's use of memory, which provides more memory resources for other processes or for file system caching. Follow these coding considerations to reduce your application's use of memory:

See the Programmer's Guide for detailed information on process memory allocation.

11.2.7    Controlling Memory Locking

Real-time application developers should consider memory locking as a required part of program initialization. Many real-time applications remain locked for the duration of execution, but some may want to lock and unlock memory as the application runs. Memory-locking functions allow you to lock the entire process at the time of the function call and throughout the life of the application. Locked pages of memory cannot be used for paging and the process cannot be swapped out.

Memory locking applies to a process's address space. Only the pages mapped into a process's address space can be locked into memory. When the process exits, pages are removed from the address space and the locks are removed.

Use the mlockall function to lock all of a process' address space. Locked memory remains locked until either the process exits or the application calls the munlockall function. Use the ps to determine if a process is locked into memory and cannot be swapped out. See Section 6.3.2.

Memory locks are not inherited across a fork, and all memory locks associated with a process are unlocked on a call to the exec function or when the process terminates. See the Guide to Realtime Programming manual and mlockall(3) for more information.