[Return to Library] [Contents] [Previous Chapter] [Next Section] [Next Chapter] [Index] [Help]


10    Optimizing Techniques

Optimizing an application program can involve modifying the build process, modifying the source code, or both.

In many instances, optimizing an application program can result in major improvements in run-time performance. Two preconditions should be met, however, before you begin measuring the run-time performance of an application program and analyzing how to improve the performance:

After you verify that these conditions have been met, you can begin the optimization process.

The process of optimizing an application can be divided into two separate, but complementary, activities:

The following sections provide details that relate to these two aspects of the optimization process.


[Return to Library] [Contents] [Previous Chapter] [Next Section] [Next Chapter] [Index] [Help]


10.1    Guidelines for Building an Application Program

Opportunities for improving an application's run-time performance exist in all phases of the build process. The following sections identify some of the major opportunities that exist in the areas of compiling, linking and loading, preprocessing and postprocessing, and library selection.

See Appendix D for additional optimization information that pertains only to the -oldc version of the C compiler. Appendix D contains information on uopt, the global optimizer (which is not used by the -migrate or -newc versions of the C compiler).


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


10.1.1    Compilation Considerations

Compile your application with the highest optimization level possible, that is, the level that produces the best performance and the correct results. In general, applications that conform to language-usage standards should tolerate the highest optimization levels, and applications that do not conform to such standards may have to be built at lower optimization levels. For details, see cc(1) or Chapter 2.

If your application will tolerate it, compile all of the source files together in a single compilation. Compiling multiple source files increases the amount of code that the compiler can examine for possible optimizations. This can have the following effects:

To take advantage of these optimizations, use the following compilation flags:

See cc(1) or Chapter 2 for information on when to use which version of the C compiler.

Note that some routines may not tolerate a high level of optimization; such routines will have to be compiled separately.

Other compilation considerations that can have a significant impact on run-time performance include the following:

Flag Description
-ansi_alias Specifies whether source code observes ANSI C aliasing rules. ANSI C aliasing rules allow for more aggressive optimizations.
-ansi_args Specifies whether source code observes ANSI C rules about arguments. If ANSI C rules are observed, special argument-cleaning code does not have to be generated.
-fast Turns on the optimizations for the following flags for increased performance.

For -newc, -migrate, and -oldc versions of the C compiler:

-D_INTRINSICS
-D_INLINE_INTRINSICS
-D_FASTMATH
-float
-fp_reorder
-O3 (-O4 for -migrate)

For only -newc or -migrate versions of the C compiler:

-ansi_alias
-ansi_args
-assume trusted_short_alignment
-ifo
-readonly_strings

-feedback Specifies the name of a previously created feedback file. Information in the file can be used by the compiler when performing optimizations.
-fp_reorder Specifies whether certain code transformations that affect floating-point operations are allowed.
-G Specifies the maximum byte size of data items in the small data sections (sbss or sdata).
-inline Specifies whether to perform inline expansion of functions.
-ifo Provides improved optimization (interfile optimization) and code generation across file boundaries that would not be possible if the files were compiled separately.
-O Specifies the level of optimization that is to be achieved by the compilation.
-Olimit Specifies the maximum size, in basic blocks, of a routine that will be optimized by the global optimizer (uopt). (This flag can be used only with the -oldc flag.)
-om Performs a variety of code optimizations for programs compiled with the -non_shared flag.
-preempt_module Supports symbol preemption on a module-by-module basis.
-speculate Enables work (for example, load or computation operations) to be done in running programs on execution paths before the paths are taken.
-tune Selects processor-specific instruction tuning for specific implementations of the Alpha architecture.
-unroll Controls loop unrolling done by the optimizer at levels -O2 and above. (This flag can be used only with the -newc or -migrate flags.)

Note that using the preceding flags may cause a reduction in accuracy and adherence to standards. See cc(1) for details on these flags.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


10.1.2    Linking and Loading Considerations

If your application does not use many large libraries, consider linking it nonshared. This allows the linker to optimize calls into the library, thus decreasing your application's startup time and improving run-time performance (if calls are made frequently). Nonshared applications, however, can use more system resources than call-shared applications. If you are running a large number of applications simultaneously and the applications have a set of libraries in common (for example, libX11 or libc), you may increase total system performance by linking them as call-shared. See Chapter 4 for details.

For applications that use shared libraries, ensure that those libraries can be quickstarted. Quickstarting is a Digital UNIX capability that can greatly reduce an application's load time. For many applications, load time is a significant percentage of the total time that it takes to start and run the application. If an object cannot be quickstarted, it still runs, but startup time is slower. See Section 4.7 for details.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


10.1.2.1    Using the Postlink Optimizer

You perform postlink optimizations by using the -om flag on the cc command line. This flag must be used with the -non_shared flag and must be specified when performing the final link, for example:

cc -om -non_shared prog.c

The postlink optimizer performs the following code optimizations:

When you use the -om flag, you get the full range of postlink optimizations. To specify a specific postlink optimization, use the -WL compiler flag, followed by -om_option , where option can be one of the following:

compress_lita
This option removes unused .lita entries after optimization, then compresses the .lita section.

dead_code
This option removes dead code (unreachable options) generated after optimizations have been applied. The .lita section is not compressed by this option.

ireorg_feedback,file
This option directs the compiler to use the pixie-produced information in file.Counts and file.Addrs to reorganize the instructions to reduce cache thrashing.

no_inst_sched
This option turns off instruction scheduling.

no_align_labels
This option turns off alignment of labels. Normally, the -om flag will align the targets of all branches on quadword boundaries to improve loop performance.

Gcommon,num
This option sets the size threshold of "common" symbols. Every "common" symbol whose size is less than or equal to num will be allocated close together.

For more information, see the cc(1) reference page.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


10.1.3    Preprocessing and Postprocessing Considerations

Preprocessing options and postprocessing (run-time) options that can affect performance include the following:


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


10.1.4    Library Routine Selection

Library routine options that can affect performance include the following:


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


10.2    Application Coding Guidelines

If you are willing to modify your application, use the profiler tools to determine where your application spends most of its time. Many applications spend most of their time in a few routines. Concentrate your efforts on improving the speed of those heavily used routines.

Digital provides several profiling tools that work for programs written in C and other languages. See Chapter 8, atom(1), gprof(1), hiprof(5), pixie(5), and prof(1) for more details.

After you identify the heavily used portions of your application, consider the algorithms used by that code. Is it possible to replace a slow algorithm with a more efficient one? Replacing a slow algorithm with a faster one often produces a larger performance gain than tweaking an existing algorithm.

When you are satisfied with the efficiency of your algorithms, consider making code changes to help the compiler optimize the object code that it generates for your application. High Performance Computing by Kevin Dowd (O'Reilly & Associates, Inc., ISBN 1-56592-032-5) is a good source of general information on how to write source code that maximizes optimization opportunities for compilers.

The following sections identify performance opportunities involving data types, cache usage and data alignment, and general coding issues.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


10.2.1    Data Type Considerations

Data type considerations that can affect performance include the following:


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


10.2.2    Cache Usage and Data Alignment Considerations

Cache usage patterns can have a critical impact on performance:

Data alignment can also affect performance. By default, the C compiler aligns each data item on its natural boundary; that is, it positions each data item so that its starting address is an even multiple of the size of the data type used to declare it. Data not aligned on natural boundaries is called misaligned data. Misaligned data can slow performance because it forces the software to make necessary adjustments at run time.

In C programs, misalignment can occur when you type cast a pointer variable from one data type to a larger data type; for example, type casting a char pointer (1-byte alignment) to an int pointer (4-byte alignment) and then dereferencing the new pointer may cause unaligned access. Also in C, creating packed structures using the #pragma pack directive can cause unaligned access. (See Chapter 3 for details on the #pragma pack directive.)

To correct alignment problems in C programs, you can use the -align flag or you can make necessary modifications to the source code. If instances of misalignment are required by your program for some reason, use the _ _unaligned data-type qualifier in any pointer definitions that involve the misaligned data. When data is accessed through the use of a pointer declared _ _unaligned, the compiler generates the additional code necessary to copy or store the data without generating alignment errors. (Alignment errors have a much more costly impact on performance than the additional code that is generated.)

Warning messages identifying misaligned data are not issued during the compilation of C programs by any version of the C compiler (-newc, -migrate, or -oldc).

During execution of any program, the kernel issues warning messages ("unaligned access") for most instances of misaligned data. The messages include the program counter (pc) value for the address of the instruction that caused the misalignment. You can use the machine code debugging capabilities of the dbx or ladebug debugger to determine the source code locations associated with pc values.

For additional information on data alignment, see Appendix A in the Alpha Architecture Reference Manual. See cc(1) for details on alignment-control flags that you can specify on compilation command lines.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Chapter] [Index] [Help]


10.2.3    General Coding Considerations

General coding considerations specific to C applications include the following:

You should also avoid aliases where possible by introducing local variables to store dereferenced results. (A dereferenced result is the value obtained from a specified address.) Dereferenced values are affected by indirect operations and calls, whereas local variables are not; local variables can be kept in registers. Example 10-1 shows how the proper placement of pointers and the elimination of aliasing enable the compiler to produce better code.

Example 10-1: Pointers and Optimization

Source Code:

int len = 10;
char a[10];

 
void zero() { char *p; for (p = a; p != a + len; ) *p++ = 0; }

Consider the use of pointers in Example 10-1. Because the statement *p++=0 might modify len, the compiler must load it from memory and add it to the address of a on each pass through the loop, instead of computing a + len in a register once outside the loop.

Two different methods can be used to increase the efficiency of the code used in Example 10-1: