F    Optimizing Programs with -om and cord

This appendix describes how to optimize programs with the cc command's -om and -cord options. -om performs postlink optimization. -cord reorders the procedures in an executable program or shared library to improve cache utilization.

Note

The spike tool (see Section 10.1.3) is replacing the cc command's -om and -cord options, because it provides better control and more effective optimization. See also Chapter 10 and Chapter 8 for complete information about optimizing and profiling techniques, respectively.

This appendix is organized as follows:

F.1    Using the -om Postlink Optimizer

You can perform postlink optimizations by using the cc command's -om option. Section F.1.1 provide an overview of -om. Section F.1.2 shows how to use -om with feedback files for profile-directed optimization.

F.1.1    Overview

The -om postlink optimizer performs the following code optimizations:

When you use the -om option by itself, you get the full range of postlink optimizations. You can request a particular optimization by specifying -om followed by one of the following options (for more information, see cc(1)):

-WL, -om_compress_lita
-WL, -om_dead_code
-WL, -om_feedback
-WL, -om_Gcommon,num
-WL, -om_ireorg_feedback,file
-WL, -om_no_inst_sched
-WL, -om_no_align_labels
-WL, -om_split_procedures

The -om option is most effective when used with the -non_shared option -- that is, with executables. For example:

% cc -non_shared -O3 -om prog.c

The -om option should be specified in the final link operation.

F.1.2    Profile Directed Optimization with -om

The pixie profiler (see pixie(1)) provides profile information that the cc command's -om and -feedback options can use to tune the generated instruction sequences to the demands placed on the program by particular sets of input data. This technique is most effective with executables. For shared libraries,you can also use cord as described in Section F.2, or omit the -om option.

The following example shows the three necessary basic steps in this process, which consist of (1) preparing the program for profile-directed optimization, (2) creating an instrumented version of the program and running it to collect profiling statistics, and (3) feeding that information back to the compiler and linker to help them optimize the executable code. Later examples show how to elaborate on these steps to accommodate ongoing changes during development and data from multiple profiling runs.

% cc -feedback prog -o prog -non_shared -O3 *.c  [1]

% pixie -update prog [2]

% cc -feedback prog -o prog -non_shared -om -O3 *.c  [3]
 
 

  1. When the program is compiled with the -feedback option for the first time, a special augmented executable file is created. It contains information that the compiler uses to relate the executable to the source files. It also contains a section that is used later to store profiling feedback information for the compiler. This section remains empty after the first compilation, because the pixie profiler has not yet generated any feedback information (step 2). Make sure that the file name specified with the -feedback option is the same as the executable file name, which in this example is prog (from the -o option). By default, the -feedback option applies the -g1 option, which provides optimum symbolization for profiling. You need to experiment with the -On option to find the level of optimization that provides the best run-time performance for your program and compiler. [Return to example]

  2. The pixie command creates an instrumented version of the program (prog.pixie) and then runs it (because a prof option, -update, is specified). Execution statistics and address mapping data are automatically collected in an instruction-counts file (prog.Counts) and an instruction-addresses file (prog.Addrs). The -update option puts this profiling information in the augmented executable. [Return to example]

  3. In the second compilation with the -feedback option, the profiling information in the augmented executable guides the compiler and (through the -om option) the postlink optimizer. This customized feedback enhances any automatic optimization that the -O3 and -om options provide. You can make compiler optimizations even more effective by using the -ifo and/or -assume whole_program options in conjunction with the -feedback option. However, as noted in Section 10.1.1, the compiler may be unable to compile very large programs as if there were only one source file. [Return to example]

During a typical development process, steps 2 and 3 of the previous example are repeated as needed to reflect the impact of any changes to the source code. For example:

% cc -feedback prog -o prog -non_shared -O3 *.c 
% pixie -update prog
% cc -feedback prog -o prog -non_shared -O3 *.c 
[modify source code]
% cc -feedback prog -o prog -non_shared -O3 *.c 
.....
[modify source code]
% cc -feedback prog -o prog -non_shared -O3 *.c 
% pixie -update prog
% cc -feedback prog -o prog -non_shared -om -O3 *.c 

Because the profiling information in the augmented executable persists from compilation to compilation, the pixie processing step that updates the information does not have to be repeated every time that a source module is modified and recompiled. But each modification reduces the relevance of the old feedback information to the actual code and degrades the potential quality of the optimization, depending on the exact modification. The pixie processing step after the last modification and recompilation guarantees that the feedback information is correctly updated for the last compilation.

The profiling information in an augmented executable file makes it larger than a normal executable (typically 3-5 percent). After development is completed, you can use the strip command to remove any profiling and symbol table information. For example:

% strip prog

You might want to run your instrumented program several times with different inputs to get an accurate picture of its profile. The following example explains how to merge profiling statistics from two runs of a program, prog, whose output varies from run to run with different sets of input:

% cc -feedback prog -o prog -non_shared -O3 *.c [1]

% pixie -pids prog [2]

% prog.pixie [3] 
(input set 1) 
% prog.pixie 
(input set 2) 

% prof -pixie -update prog prog.Counts.* [4]

% cc -feedback prog -o prog -non_shared -om -O3 *.c [5]

  1. The first compilation produces an augmented executable, as explained in the previous example. [Return to example]

  2. By default, each run of the instrumented program (prog.pixie) produces a profiling data file called prog.Counts. The -pids option adds the process ID of each of the instrumented program's test runs to the name of the profiling data file that is produced (prog.Counts.pid). Thus, the data files that subsequent runs produce do not overwrite each other. [Return to example]

  3. The instrumented program is run twice, producing a uniquely named data file each time -- for example, prog.Counts.371 and prog.Counts.422. [Return to example]

  4. The prof -pixie command merges the two data files. The -update option updates the executable, prog, with the combined information. [Return to example]

  5. The second compilation step uses the combined profiling information from the two runs of the program to guide the optimization. [Return to example]

F.2    Profile Directed Reordering with -cord

The cc command's -cord option invokes the cord utility, which reorders the procedures in an executable program or shared library to improve instruction cache behavior. You use a feedback file that contains data from an actual run of your application as input to -cord; however, this is a different kind of feedback file (created with the pixie or prof -pixie command) than the kind discussed in Section F.1.2. The following example shows how to create a feedback file and then use the -cord option to compile an executable with the feedback file as input:

% cc -O3 -o prog *.c  
% pixie -feedback prog.fb prog [1]
% cc -O3 -cord -feedback prog.fb -o prog *.c  [2]

  1. The pixie command creates an instrumented version of the program and also runs it (because a prof option, -feedback is specified). The -feedback option creates a feedback file (prog.fb) that collects execution statistics to be used by the compiler in the next step. [Return to example]

  2. The cc command's -feedback option accepts the feedback file as input. The -cord option invokes the cord utility. [Return to example]

Compiling a shared library with feedback is similar. Profile the shared library with one or more programs that exercise the library code that most needs optimizing. For example:

% cc -o libexample.so -shared -g1 -O3 lib*.c [1]
% cc -o exerciser -O3 exerciser.c -L. -lexample [2]
% pixie -L. -incobj libexample.so -run exerciser [3]
% prof -pixie -feedback libexample.fb libexample.so exerciser.Counts [4]
% cc -cord -feedback libexample.fb -o libexample.so -shared -g1 -O3 lib*.c [5]
 
 

  1. The shared library is compiled with the -g1 option to give feedback data for each source line. [Return to example]

  2. A program to exercise the important parts of the library is built. [Return to example]

  3. The shared library and program are instrumented and run to profile them. [Return to example]

  4. A feedback file is generated for just the shared library. [Return to example]

  5. The shared library is recompiled, relinked, and reordered to optimize the performance of the code that the profile shows is most heavily used. [Return to example]

Use a feedback file generated with the same optimization level.

If you have produced a feedback file and plan to compile your program with the -non_shared option, it is better to use the feedback file with the -om option than with -cord.

You can also use cord with the runcord utility. For more information, see pixie(1), prof(1), cord(1), and runcord(1).