D Optimizing Techniques (MIPS-Based C Compiler)

This appendix describes the optimization phases of the -oldc version of the C compiler and their benefits.

D.1 Global Optimizer

The global optimizer (uopt) is a single program that improves the performance of object programs by transforming existing code into more efficient coding sequences. Although the same optimizer processes optimizations for all languages, it does distinguish between the various languages to take advantage of the different language semantics involved.

The primary benefits of optimization are faster running programs and smaller object code size. However, the optimizer can also speed up development time. For example, coding time can be reduced by leaving it up to the optimizer to relate programming details to execution-time efficiency. This allows you to focus on the more crucial global structure of your program. Programs often yield optimizable code sequences regardless of how well a program is written.

D.2 Optimizer Effects on Debugging

Optimize your programs only after they are fully developed and debugged. Although the optimizer does not alter the flow of control within a program, it may move operations so that the object code does not correspond to the source code. These changed sequences of code may create confusion when using the debugger.

D.3 Loop Optimization by the Optimizer

Optimizations are most useful in code that contains loops. The optimizer moves loop-invariant code sequences outside loops so that they are performed only once instead of multiple times. Apart from loop-invariant code, loops often contain loop-induction expressions that can be replaced with simple increments. In programs composed of many loops, global optimization can often reduce the run time by half.

D.4 Register Allocation by the Optimizer

Register usage has a significant impact on program performance. For example, fetching a value from a register is significantly faster than fetching a value from storage. Thus, to perform its intended function, the optimizer must make the best possible use of registers.

In allocating registers, the optimizer selects those data items that are most suited for placement in registers, taking into account their frequency of use and their location in the program structure. In addition, the optimizer assigns values to registers so that their contents move minimally within loops and during procedure invocations.

D.5 Optimizing Separate Compilation Units

The optimizer processes one procedure at a time. Large procedures offer more opportunities for optimization because more interrelationships are exposed in terms of constructs and regions.

The uld and umerge phases of the compiler permit global optimization among separate units in the same compilation. Often, programs are divided into separate files that are compiled separately and referred to as modules or compilation units. Compiling them separately saves time during program development because a change requires recompilation of only one module, not the entire program.

Traditionally, program modularity restricted the optimization of code to a single compilation unit at a time. For example, calls to procedures that reside in other modules could not be fully optimized with the code that called them. The uld and umerge phases of the compiler system overcome this deficiency. The uld phase links multiple compilation units into a single compilation unit. Then, umerge orders the procedures for optimal processing by the global optimizer (uopt).

D.6 Optimization Options

Table D-1 summarizes the functions of each of the -O options to the cc -oldc command.

Table D-1: Compiler Optimization Options

Option	Result
`-O3`	The `uld` and `umerge` phases process the output from the compilation phase of the compiler, which produces symbol table information and the program text in an internal format called ucode. The `uld` phase combines all the ucode files and symbol tables, and passes control to `umerge`. The `umerge` phase reorders the ucode for optimal processing by `uopt`. Upon completion, `umerge` passes control to `uopt`, which performs global optimizations on the program.
`-O2`	The `uld` and `umerge` phases are bypassed and only the global optimizer (`uopt`) phase executes. It performs optimization only within the bounds of individual compilation units.
`-O1`	The `uld`, `umerge`, and `uopt` phases are bypassed. However, the code generator and the assembler perform basic optimizations in a more limited scope.
`-O0`	The `uld`, `umerge`, and `uopt` phases are bypassed, and the assembler bypasses certain optimizations that it normally performs.

D.7 Full Optimization (-O3)

The following examples assume that the program prog1 consists of three files: a.c, b.c, and c.c.

To perform procedure merging optimizations -O3 on all three files, enter the following command:

% cc -oldc -O3 -o prog1 a.c b.c c.c

If you normally use the -c option to compile the object file (.o), follow these steps:

Compile each file separately using the -j option by entering the following commands:
% cc -oldc -j a.c % cc -oldc -j b.c % cc -oldc -j c.c

The -j option causes the compiler driver to produce a .u file. None of the remaining compiler phases are executed.
The .u file contains the standard output of the first pass of the compiler (which is referred to as the front end of the compiler). The file is written in ucode, an internal language used by the compiler.
Enter the following command to perform optimization and complete the compilation process:
% cc -oldc -O3 -o prog1 a.u b.u c.u

D.8 Optimizing Large Procedures

To ensure that all procedures are optimized regardless of size, specify the -Olimit option at compilation time.

Because compilation time increases by the square of the procedure size, the compiler system enforces a top limit on the size of a procedure that can be optimized. This limit was set for the convenience of users who place a higher priority on the compilation turnaround time than on optimizing an entire procedure. The -Olimit option removes the top limit and allows those users who do not mind a long compilation to fully optimize their procedures.

D.9 Optimizing Frequently Used Modules

You may want to optimize modules that are frequently called from other programs to reduce the compilation and optimization time required for programs calling these modules.

In the examples that follow, b.c and c.c represent two frequently used modules to be optimized, retaining all information necessary to link them with future programs; future.c represents one such program.

The following steps show how to optimize frequently called modules:

Compile b.c and c.c separately by entering the following commands:
% cc -oldc -j b.c % cc -oldc -j c.c

The -j option causes the front end, or first pass, of the compiler to produce two ucode files, b.u and c.u.

Use an editor to create a file containing the external symbols in b.c and c.c to which future.c will refer. The symbolic names must be separated by at least one space. Consider the following skeletal contents of b.c and c.c:


 

b.c       proc1()                c.c     x()
           {                             {
           .                             .
           .                             .
           }                             }
          proc2()                           help()
        {                                  {
          .                                  .
          .                                  .
          }                                  }
             proc3()                    struct
              {                          {
              .                          .
              .                          .
              }                          }  ddata;
        struct                         y()
          {                            {
          .                            .
          .                            .
          }  work;                     }

In this example, future.c calls or references only proc1, proc2, x, ddata, and y in the two procedures (b.c and c.c). Thus, a file (named extern for this example) must be created containing the following symbolic names:

proc1 proc2 x ddata y

The structure work and the procedures help and proc3 are used internally only by b.c and c.c, and thus are not included in extern.

If you omit an external symbolic name, an error message is generated (see step 4).

Optimize the b.u and c.u modules using the extern file as follows:
% cc -oldc -O3 -kp extern b.u c.u -o keep.o

The -kp option designates that the -p linker option is to be passed to the ucode loader.
Create a ucode file and an optimized object code file (test_opt) for future.c, as follows:
% cc -oldc -j future.c % cc -oldc -O3 future.u keep.o -o test_opt

The following message may appear. It means that the code in future.c is using a symbol from the code in b.c or c.c that was not specified in the file extern.
```
proc3: multiply defined hidden external (should have been preserved)
```
If the preceding message appears, include proc3 in the file extern and recompile as follows:
% cc -oldc -O3 -kp extern b.u c.u -o keep.o % cc -oldc -O3 future.u keep.o -o test_opt

D.10 Building a ucode Object Library

Building a ucode object library is similar to building a COFF object library. First, compile the source files into ucode object files using the -j option:

% cc -oldc -j a.c % cc -oldc -j b.c % cc -oldc -j c.c

Then, enter the following commands to build a ucode library (libtest_opt.b) containing object files for a.c, b.c, and c.c:

% ar -crs libtest_opt.b a.u b.u c.u

The names of ucode libraries should have the suffix .b.

D.11 Using ucode Object Libraries

Using ucode object libraries is similar to using COFF object files. To load from a ucode library, specify the -klx option to the compiler driver or the ucode loader. To load from the ucode library file created in the previous example, enter the following command:

% cc -oldc -O3 file1.u file2.u -kltest_opt -o output

Libraries are searched as they are encountered on the command line, so the order in which they are specified on the command line is important. If a library is made from both assembly and high-level language routines, the ucode object library contains code only for the high-level language routines, not all of the routines as the COFF object library does. In this case, to ensure that all modules are loaded from the proper library, you must specify both the ucode object library and the COFF object library to the ucode loader.

If the compiler driver is to perform both a ucode load step and a final load step, the object file created after the ucode load step is placed in the position of the first ucode file specified or created on the command line in the final load step.