You may be able to improve overall Tru64 UNIX performance by improving application performance. This chapter describes how to:
Profile and debug applications (Section 11.1)
Improve application performance (Section 11.2)
11.1 Gathering Profiling and Debugging Information
You can use profiling to identify sections of application code that consume large portions of execution time. To improve performance, concentrate on improving the coding efficiency of those time-intensive sections.
Table 11-1 describes the commands you can use to obtain information about applications. Detailed information about these tools is located in the Programmer's Guide and the Kernel Debugging manual.
In addition,
prof_intro
(1)
provides an overview of application
profilers, profiling, optimization, and performance analysis.
Table 11-1: Application Profiling and Debugging Tools
Name | Use | Description |
Profiles applications |
Consists of a set
of prepackaged tools ( |
|
Checks memory access and detects memory leaks in applications |
Performs memory access checks and memory
leak detection of C and C++ programs at run time, by using the
|
|
Produces a profile of procedure execution times in an application |
An
The
|
|
Profiles basic blocks in an application |
Produces a profile showing the number
of times each instruction was executed in a program.
The information can be
reported as tables or can be used to automatically direct later optimizations
by using the
The
The
|
|
Analyzes profiling data and displays a profile of statistics for each procedure in an application |
Analyzes profiling data and produces statistics showing which portions of code consume the most time and where the time is spent (for example, at the routine level, the basic block level, or the instruction level). The
The information produced by
|
|
Analyzes profiling data and displays procedure call information and statistical program counter sampling in an application |
Analyzes profiling data and allows you to determine which routines are called most frequently, and the source of the routine call, by gathering procedure call information and erforming statistical program counter (PC) sampling. The
|
|
|
Profiles user code in an application |
Profiles
user code using performance counters in the Alpha chip.
The
|
Visual Threads |
Identifies bottlenecks and performance problems in multithreaded applications |
Enables you to analyze and refine your multithreaded applications. You can use Visual Threads to identify bottlenecks and performance problems, and to debug potential thread-related logic problems. Visual Threads uses rule-based analysis and statistics capabilities and visualization techniques. Visual Threads is licensed as part of the Developers' Toolkit for Tru64 UNIX. |
Debugs running kernels, programs, and crash dumps, and examines and temporarily modifies kernel variables |
Provides source-level debugging for
C, Fortran, Pascal, assembly language, and machine code.
The
Use
|
|
Debugs kernels and applications |
Debugs
programs and the kernel and helps locate run-time programming errors.
The
|
|
Displays open files |
Displays information about files that
are currently opened by the running processes.
The
|
11.2 Improving Application Performance
Well-written applications use CPU, memory, and I/O resources efficiently.
Table 11-2
describes some guidelines to improve application
performance.
Table 11-2: Application Performance Improvement Guidelines
Guideline | Performance Benefit | Tradeoff |
Install the latest operating system patches (Section 11.2.1) | Provides the latest optimizations | None |
Use the latest version of the compiler (Section 11.2.2) | Provides the latest optimizations | None |
Use parallelism (Section 11.2.3) | Improves SMP performance | None |
Optimize applications (Section 11.2.4) | Generates more efficient code | None |
Use shared libraries (Section 11.2.5) | Frees memory | May increase execution time |
Reduce application memory requirements (Section 11.2.6) | Frees memory | Program may not run optimally |
Use memory locking as part of real-time program initialization (Section 11.2.7) | Allows you to lock and unlock memory as needed | Reduces the memory available to processes and the UBC |
The following sections describe how to improve application performance.
11.2.1 Using the Latest Operating System Patches
Always install the latest operating system patches, which often contain performance enhancements.
Check the
/etc/motd
file to determine which patches
you are running.
See your customer service representative or for information
about installing patches.
11.2.2 Using the Latest Version of the Compiler
Always use the latest version of the compiler to build your application program. Usually, new versions include advanced optimizations.
Check the software on your system to ensure that you are using the latest
version of the compiler.
11.2.3 Using Parallelism
To enhance parallelism, application developers
working in Fortran or C should consider using the Kuch & Associates Preprocessor
(KAP), which can have a significant impact on SMP performance.
See the
Programmer's Guide
for details on KAP.
11.2.4 Optimizing Applications
Optimizing an application program can involve modifying the build process or modifying the source code. Various compiler and linker optimization levels can be used to generate more efficient user code. See the Programmer's Guide for more information on optimization.
Whether you are porting an application from a 32-bit system to Tru64 UNIX
or developing a new application, never attempt to optimize an application
until it has been thoroughly debugged and tested.
If you are porting an application
written in C, use the
lint
command with the
-Q
option or compile your program using the C compiler's
-check
option to identify possible portability problems that you
may need to resolve.
11.2.5 Using Shared Libraries
Using shared libraries reduces the need for memory and disk space. When multiple programs are linked to a single shared library, the amount of physical memory used by each process can be significantly reduced.
However, shared libraries initially result in an execution time that
is slower than if you had used static libraries.
11.2.6 Reducing Application Memory Requirements
You may be able to reduce an application's use of memory, which provides more memory resources for other processes or for file system caching. Follow these coding considerations to reduce your application's use of memory:
Configure and tune applications according to the guidelines provided by the application's installation procedure. For example, you may be able to reduce an application's anonymous memory requirements, set parallel/concurrent processing attributes, size shared global areas and private caches, and set the maximum number of open/mapped files.
You may want to use the
mmap
function instead
of the
read
or
write
function in your
applications.
The
read
and
write
system
calls require a page of buffer memory and a page of UBC memory, but
mmap
requires only one page of memory.
Look for data cache collisions between heavily used data structures,
which occur when the distance between two data structures allocated in memory
is equal to the size of the primary (internal) data cache.
If your data structures
are small, you can avoid collisions by allocating them contiguously in memory.
To do this, use a single
malloc
call instead of multiple
calls.
If an application uses large amounts
of data for a short time, allocate the data dynamically with the
malloc
function instead of declaring it statically.
When you have
finished using dynamically allocated memory, it is freed for use by other
data structures that occur later in the program.
If you have limited memory
resources, dynamically allocating data reduces an application's memory usage
and can substantially improve performance.
If an application uses the
malloc
function
extensively, you may be able to improve its processing speed or decrease its
memory utilization by using the function's control variables to tune memory
allocation.
See
malloc
(3)
for details on tuning memory allocation.
If your application fits in a 32-bit address space and allocates
large amounts of dynamic memory by using structures that contain many pointers,
you may be able to reduce memory usage by using the
-xtaso
option.
The
-xtaso
option is supported by all versions
of the C compiler (-newc
,
-migrate
,
and
-oldc
versions).
To use the
-xtaso
option, modify your source code with a C-language pragma that controls pointer
size allocations.
See
cc
(1)
for details.
See the
Programmer's Guide
for detailed information on process memory allocation.
11.2.7 Controlling Memory Locking
Real-time application developers should consider memory locking as a required part of program initialization. Many real-time applications remain locked for the duration of execution, but some may want to lock and unlock memory as the application runs. Memory-locking functions allow you to lock the entire process at the time of the function call and throughout the life of the application. Locked pages of memory cannot be used for paging and the process cannot be swapped out.
Memory locking applies to a process's address space. Only the pages mapped into a process's address space can be locked into memory. When the process exits, pages are removed from the address space and the locks are removed.
Use the
mlockall
function to lock all of a process'
address space.
Locked memory remains locked until either the process exits
or the application calls the
munlockall
function.
Use the
ps
to determine if a process is locked into memory and
cannot be swapped out.
See
Section 6.3.2.
Memory locks are not inherited across a fork, and all memory
locks associated with a process are unlocked on a call to the
exec
function or when the process terminates.
See the
Guide to Realtime Programming
manual and
mlockall
(3)
for more information.