This appendix describes how to optimize
programs with the
cc
command's
-om
and
-cord
options.
-om
performs postlink optimization.
-cord
reorders the procedures in an executable program or shared library
to improve cache utilization.
Note
The
spike
tool (see Section 10.1.3) is replacing thecc
command's -om and -cord options, because it provides better control and more effective optimization. See also Chapter 10 and Chapter 8 for complete information about optimizing and profiling techniques, respectively.
This appendix is organized as follows:
Using the -om postlink optimizer (Section F.1)
Reordering procedures with -cord and feedback files (Section F.2)
F.1 Using the -om Postlink Optimizer
You can perform postlink optimizations by using the
cc
command's
-om
option.
Section F.1.1
provide an overview of
-om.
Section F.1.2
shows how to use
-om
with feedback files for profile-directed
optimization.
F.1.1 Overview
The -om postlink optimizer performs the following code optimizations:
Removal of
nop
(no operation) instructions;
that is, those instructions that have no effect on machine state.
Removal of
.lita
data; that is, that portion
of the data section of an executable image that holds address literals for
64-bit addressing.
Using available options, you can remove unused
.lita
entries after optimization and then compress the
.lita
section.
Reallocation of common symbols according to a size that you determine.
When you use the
-om
option by itself, you
get the full range of postlink optimizations.
You can request a particular
optimization by specifying
-om
followed by one of the following
options (for more information, see
cc
(1)):
The
-om
option is most effective when used with the
-non_shared
option -- that is, with executables.
For
example:
% cc -non_shared -O3 -om prog.c
The
-om
option should be specified in the final link
operation.
F.1.2 Profile Directed Optimization with -om
The
pixie
profiler (see
pixie
(1)) provides
profile information that the
cc
command's
-om
and
-feedback
options can use to tune the generated instruction
sequences to the demands placed on the program by particular sets of input
data.
This technique is most effective with executables.
For shared libraries,you can also use
cord
as described in
Section F.2, or omit
the
-om
option.
The following example shows the three necessary basic steps in this process, which consist of (1) preparing the program for profile-directed optimization, (2) creating an instrumented version of the program and running it to collect profiling statistics, and (3) feeding that information back to the compiler and linker to help them optimize the executable code. Later examples show how to elaborate on these steps to accommodate ongoing changes during development and data from multiple profiling runs.
%
cc -feedback prog -o prog -non_shared -O3 *.c [1]%
pixie -update prog [2]%
cc -feedback prog -o prog -non_shared -om -O3 *.c [3]
When the program is compiled with
the
-feedback
option for the first time, a special augmented
executable file is created.
It contains information that the compiler uses
to relate the executable to the source files.
It also contains a section that
is used later to store profiling feedback information for the compiler.
This
section remains empty after the first compilation, because the
pixie
profiler has not yet generated any feedback information (step 2).
Make sure that the file name specified with the
-feedback
option is the same as the executable file name, which in this example is
prog
(from the
-o
option).
By default, the
-feedback
option applies the
-g1
option, which provides
optimum symbolization for profiling.
You need to experiment with the
-On
option to find the level of optimization
that provides the best run-time performance for your program and compiler.
[Return to example]
The
pixie
command
creates an instrumented version of the program (prog.pixie
)
and then runs it (because a
prof
option,
-update, is specified).
Execution statistics and address mapping data are
automatically collected in an instruction-counts file (prog.Counts
) and an instruction-addresses file (prog.Addrs
).
The
-update
option puts this profiling information in the
augmented executable.
[Return to example]
In the second compilation with the -feedback option, the profiling information in the augmented executable guides the compiler and (through the -om option) the postlink optimizer. This customized feedback enhances any automatic optimization that the -O3 and -om options provide. You can make compiler optimizations even more effective by using the -ifo and/or -assume whole_program options in conjunction with the -feedback option. However, as noted in Section 10.1.1, the compiler may be unable to compile very large programs as if there were only one source file. [Return to example]
During a typical development process, steps 2 and 3 of the previous example are repeated as needed to reflect the impact of any changes to the source code. For example:
%
cc -feedback prog -o prog -non_shared -O3 *.c%
pixie -update prog%
cc -feedback prog -o prog -non_shared -O3 *.c [modify source code]%
cc -feedback prog -o prog -non_shared -O3 *.c ..... [modify source code]%
cc -feedback prog -o prog -non_shared -O3 *.c%
pixie -update prog%
cc -feedback prog -o prog -non_shared -om -O3 *.c
Because the profiling information in the augmented executable persists
from compilation to compilation, the
pixie
processing step
that updates the information does not have to be repeated every time that
a source module is modified and recompiled.
But each modification reduces
the relevance of the old feedback information to the actual code and degrades
the potential quality of the optimization, depending on the exact modification.
The
pixie
processing step after the last modification and
recompilation guarantees that the feedback information is correctly updated
for the last compilation.
The profiling information in an augmented executable file makes it larger
than a normal executable (typically 3-5 percent).
After development
is completed, you can use the
strip
command to remove any
profiling and symbol table information.
For example:
%
strip prog
You might want to run your instrumented program several times with different
inputs to get an accurate picture of its profile.
The following example explains
how to merge profiling statistics from two runs of a program,
prog
, whose output varies from run to run with different sets of input:
%
cc -feedback prog -o prog -non_shared -O3 *.c [1]%
pixie -pids prog [2]%
prog.pixie [3] (input set 1)%
prog.pixie (input set 2)%
prof -pixie -update prog prog.Counts.* [4]%
cc -feedback prog -o prog -non_shared -om -O3 *.c [5]
The first compilation produces an augmented executable, as explained in the previous example. [Return to example]
By default, each run of the instrumented
program (prog.pixie
) produces a profiling data file called
prog.Counts
.
The
-pids
option adds the process ID
of each of the instrumented program's test runs to the name of the profiling
data file that is produced (prog.Counts.
pid).
Thus, the data files that subsequent runs produce do
not overwrite each other.
[Return to example]
The instrumented program is run twice,
producing a uniquely named data file each time -- for example,
prog.Counts.371
and
prog.Counts.422
.
[Return to example]
The
prof -pixie
command
merges the two data files.
The
-update
option updates the
executable,
prog
, with the combined information.
[Return to example]
The second compilation step uses the combined profiling information from the two runs of the program to guide the optimization. [Return to example]
The
cc
command's
-cord
option invokes the
cord
utility, which reorders the procedures in an executable program
or shared library to improve instruction cache behavior.
You use a feedback
file that contains data from an actual run of your application as input to
-cord; however, this is a different kind of feedback file (created
with the
pixie
or
prof -pixie
command)
than the kind discussed in
Section F.1.2.
The following
example shows how to create a feedback file and then use the
-cord
option to compile an executable with the feedback file as input:
% cc -O3 -o prog *.c % pixie -feedback prog.fb prog [1] % cc -O3 -cord -feedback prog.fb -o prog *.c [2]
The
pixie
command creates
an instrumented version of the program and also runs it (because a
prof
option,
-feedback
is specified).
The
-feedback
option creates a feedback file (prog.fb
)
that collects execution statistics to be used by the compiler in the next
step.
[Return to example]
The
cc
command's
-feedback
option accepts the feedback file as input.
The
-cord
option invokes the
cord
utility.
[Return to example]
Compiling a shared library with feedback is similar. Profile the shared library with one or more programs that exercise the library code that most needs optimizing. For example:
% cc -o libexample.so -shared -g1 -O3 lib*.c [1] % cc -o exerciser -O3 exerciser.c -L. -lexample [2] % pixie -L. -incobj libexample.so -run exerciser [3] % prof -pixie -feedback libexample.fb libexample.so exerciser.Counts [4] % cc -cord -feedback libexample.fb -o libexample.so -shared -g1 -O3 lib*.c [5]
The shared library is compiled with the -g1 option to give feedback data for each source line. [Return to example]
A program to exercise the important parts of the library is built. [Return to example]
The shared library and program are instrumented and run to profile them. [Return to example]
A feedback file is generated for just the shared library. [Return to example]
The shared library is recompiled, relinked, and reordered to optimize the performance of the code that the profile shows is most heavily used. [Return to example]
Use a feedback file generated with the same optimization level.
If you have produced a feedback file and plan to compile your program with the -non_shared option, it is better to use the feedback file with the -om option than with -cord.
You can also use
cord
with the
runcord
utility.
For more information, see
pixie
(1),
prof
(1),
cord
(1),
and
runcord
(1).