8 Profiling Programs to Improve Performance

Profiling is a method of identifying sections of code that consume large portions of execution time. In a typical program, most execution time is spent in relatively few sections of code. To improve performance, the greatest gains result from improving coding efficiency in time-intensive sections.

This chapter discusses the following topics:

Using the prof program
Using the gprof program
Using the pixie and hiprof Atom tools
Using the uprofile and kprofile tools
Selecting profiling information to display
Using feedback files
Using profiling environment variables
Using monitor routines
Profiling multithreaded applications

8.1 Profiling Methods

Profiling methods include:

Program counter (PC) sampling, a technique that periodically interrupts your program and logs the value of the PC. The prof and gprof tools use PC sampling to produce a statistical sample showing which portions of code consume the most time. The gprof tool also produces call graphs, which show the relationship of calling and called routines.
Basic block counting, a technique that inserts profiling code at key points in your program. It produces a count of the number of times each instruction executes.

To select an appropriate profiling method for an application, you must take into consideration the following factors:

The statistics that you want to collect and examine (for example, CPU usage, call counts, call cost, memory usage, and I/O operations)
The level at which you need to collect these statistics (for example, at a procedure level or at an instruction level).
Whether you must profile the shared libraries used by the application as well as its executable.
The method that you use to collect the profiling data. Certain collection methods require that you compile and/or link the application's sources in a special way. Others allow you to run a utility that inserts instrumentation code into an existing program. Still others retrieve information from the CPU's performance counters while the uninstrumented program is running.
The tool that you use to display the profiling data. Depending on the information that you need, you can choose from three tools that display previously collected profiling information. Each tool supports multiple data collection methods.

The profiling data display tools, and their respective data collection methods, include the following:

prof

Prints a profile of statistics per procedure.

The prof tool supports the following data collection methods:

Compiling or linking with the -p flag
The -p flag supports the profiling of shared libraries, but requires you to at least relink the program. It collects only CPU statistics using PC sampling
Using the uprofile tool
The uprofile tool profiles user code. It does not support the profiling of shared libraries. It does not require you to relink the program and collects either CPU statistics or other information.
Using the kprofile tool
The kprofile tool profiles the running operating system kernel. It does not require you to relink the program and collects either CPU statistics or other information.

prof -pixie

Prints a profile showing the number of times each procedure, source line, or instruction is executed. The prof -pixie tool supports the following basic block counting profiling data collection method:

Using the pixie Atom tool (that is, the atom -tool pixie command) to instrument the program's basic blocks.
The pixie Atom tool supports the profiling of shared libraries and does not require you to relink the program. It supports the prof tool's instruction-level profiling and true cycle-count estimation.

gprof

Produces call-graph profile data showing the effects of calling routines on called routines as well as other information.

The gprof tool supports the following data collection methods:

Compiling with the -pg flag
The -pg flag does not allow the profiling of shared libraries. It requires that you recompile the program's sources and uses an apportioned call cost method to determine a given procedure's cost to its callers.
Using the hiprof Atom tool (that is, the atom -tool hiprof command) to instrument the program
The hiprof Atom tool supports the profiling of shared libraries and does not require you to recompile or relink. To determine a given procedure's cost to its callers, it supports both the apportioned call cost method and the measured call cost method.

You can also use the monitor routines to perform PC-sampling on a specified address range in a program. For more information on using monitor routines, see Section 8.13 and monitor(3).

8.2 Profiling Tools Overview

Table 8-1 provides a concise overview of the profiling tools available in the Digital UNIX operating system.

Table 8-1: Profiling Tools

Tool	Use
PC-sampling/ `prof`	Link application with `-p;` analyze results with `prof`; see `prof`(1) and `monitor`(3).
Call-arcs/ `gprof`	Compile and link with `-pg;` analyze results with `gprof`; see `gprof`(1) and `monitor`(3).
`pixstats`	Additional postprocessor for pixified program output; see `pixstats`(1).
`uprofile`/ `kprofile`	Run application under `uprofile` or `kprofile`; requires `pfm` driver to be installed; analyze results with `prof`; see `uprofile`(1), `kprofile`(1), and `pfm`(7).
Atom toolkit	Programmable debug/performance analysis tool. Example tools are contained in `/usr/lib/cmplrs/atom/examples`; see `atom`(1) and other Atom reference pages for programming interface.
`pixie`	Atom-based basic block profiler; analyze results with `prof`; see `pixie`(5).
`hiprof`	Atom-based call-arc analyzer; analyze results with `gprof`; see `hiprof`(5).
`third`	Atom-based memory error/leak detection tool, Third Degree; generates text output. See `third`(5).

All profiling tools work on call-shared and nonshared applications.

8.2.1 PC-Sampling

Statistical PC-sampling for the program is useful for diagnosing high CPU-usage procedures in the program and it supports both threads and shared libraries.

Interface summary:

% cc -p *.o -o program # Link with libprof1.a % program # Run program to collect data % prof program # Process the mon.out file

8.2.2 gprof

The gprof tool provides procedure call information coupled with statistical PC-sampling. This is useful for determining which routines are called most frequently and from where. The gprof tool also gives a flat profile for CPU-usage on the routines. It supports threads and call-shared programs, but does not support shared libraries.

Using the gprof tool, you can retrieve information from libc.a and libm.a because these two libraries are compiled with the -pg flag. Other Digital-supplied libraries are not compiled with -pg, so calling information on these other system libraries is not available.

Interface summary:

% cc -pg *.c -o program # Compile and link with -pg % program # Run program to collect data % gprof program # Process the gmon.out file

8.2.3 uprofile and kprofile

The uprofile and kprofile tools use the performance counters on the Alpha chip. They do not collect information on shared libraries. By default, both tools collect cycles for the program. The performance data produced by these tools is processed with the prof command. See uprofile(1) and kprofile(1) for more information.

8.2.4 Atom Toolkit

The Atom toolkit consists of a programmable instrumentation tool and several packaged tools. Examples are included in the /usr/lib/cmplrs/atom/examples directory that demonstrate how to develop instrumentation and analysis code. The instrumentation part of the tool instructs Atom on where to insert calls to analysis routines in the program. When the program is run, the analysis routines are entered and data collection is performed as prescribed by the Atom tool specified on the atom command.

Atom does not work on programs built with the -om flag.

Interface summary:

% atom -tool toolname program % program.tool

Postprocessing is tool-dependent. See Chapter 9 for details on Atom.

8.2.5 pixie Atom tool

The Atom-based pixie is a basic block profiler that supports shared libraries and threaded applications.

Interface summary:

% atom -tool pixie [-env threads] program % program.pixie[.threads] % prof -pixie program

8.2.6 hiprof Atom tool

The hiprof Atom tool collects call-arc information on a program. By default, it operates like the gprof support provided by the -pg flag, but has flag-selectable options that are more powerful. The hiprof Atom tool supports shared libraries and threaded applications.

Interface summary:

% atom -tool hiprof [-env threads] program % program.hiprof[.threads] % gprof program program.hiout

8.2.7 Third Degree

Third Degree is a memory-leak and memory-overwrite detection tool, also based on Atom. Third Degree generates text output to a file called program.3log. The log contains the diagnostics that Third Degree detected (for example, reads of uninitialized heap or stack, memory overwrites, and memory leaks).

Interface summary:

% atom -tool third [-env threads] program % program.third[.threads] % cat program.3log

8.3 Profiling Sample Program

The examples in the remainder of this chapter refer to the sample program, profsample.c, shown in Example 8-1.

Example 8-1: Profiling Sample Program

#include <math.h>
#include <stdio.h>

 

#define LEN     100

 

void mult_by_scalar(double ary[], int len, double num);
void add_vector(double arya[], double aryb[], int len);
double value;
void printit(double value);

 

main()
{
    double ary1[LEN];
    double ary2[LEN];
    int i;

 

    for (i=0;  i<LEN;  i++) {
        ary1[i] = 0.0;
        ary2[i] = sqrt((double)i);
    }
    mult_by_scalar(ary1, LEN, 3.14159);
    mult_by_scalar(ary2, LEN, 2.71828);
    for (i=0;  i<20;  i++)
        add_vector(ary1, ary2, LEN);
}

 

void mult_by_scalar(double ary[], int len, double num)
{
    int i;

 

    for (i=0;  i<len;  i++)
     {
        ary[i] *= num;
        value = ary[i];
        printit(value);
      }
}

 

void add_vector(double arya[], double aryb[], int len)
{
    int i;

 

    for (i=0;  i<len;  i++)
      {
        arya[i] += aryb[i];
        value = arya[i];
        printit(value);
      }
}

 

void printit(double value)
{
        printf("Value = %f\n", value);
}

8.4 Using prof to Produce Program Counter Sampling Data

To use prof to obtain PC sampling data on a program, follow these steps:

Compile and link (or just link) using the -p option, as follows:
% cc -c profsample.c % cc -p -o profsample profsample.o -lm

You must specify the -p profiling option during the link step to obtain PC sampling information. If you have an existing application, you will not need to recompile to profile the executable program; simply relink the program using the -p option with the cc command.
If you are building an application for the first time, you can compile and link in the same step. In the preceding example, the -lm option ensures that libm.{a,so} is used to resolve symbols that refer to math library functions.
You might also consider compiling with one of the optimization flags to help improve the efficiency of your code, compiling with a debug flag to provide more symbolic information for the profile report, or compiling with both types of flags.
If you are profiling a multithreaded application, use the -threads flag with the cc command. For more information on profiling multithreaded applications, see Section 8.14.
Execute the profiled program:
% profsample

You can run the program several times, altering the input data (if any) to create multiple profile data files.
During execution, profiling data is saved in a profile data file. The default name for the profile data file is mon.out, unless you have set the environment variable PROFDIR. For more information on using PROFDIR, see Section 8.12.1
Run the profile formatting program prof, which extracts information from one or more profile data files and produces a tabular report:
% prof profsample mon.out

Example 8-2 shows output produced by the prof command on the profsample.c program.

Example 8-2: Profiler Listing for PC Sampling

Profile listing generated Thu May 26 13:36:14 1994 with:
   prof profsample mon.out

 

--------------------------------------------------------------
*  -p[rocedures] using pc-sampling; sorted in descending     *
*  order by total time spent in each procedure;              *
*  unexecuted procedures excluded                            *
--------------------------------------------------------------

 

Each sample covers 4.00 byte(s) for 14% of 0.0068 seconds

 

%time   seconds  cum %   cum sec  procedure (file)

 

 42.9    0.0029   42.9      0.00 printit (profsample.c)
 42.9    0.0029   85.7      0.01 add_vector (profsample.c)  [1]
 14.3    0.0010  100.0      0.01 mult_by_scalar (profsample.c)

This sample line of output presents the following information:
- 42.9 percent of execution time was spent in add_vector.
- 85.7 percent of total execution time was spent cumulatively in the printit and add_vector routines.
- The name of the source file for mult_by_scalar is profsample.c

Because the prof program works by periodic sampling of the program counter, you might see different output when you profile the same program multiple times. A different profiling run than the preceding example of the sample program produced the following output:

Profile listing generated Thu May 26 13:34:00 1994 with:
   prof -procedures profsample mon.out

 

--------------------------------------------------------------
*  -p[rocedures] using pc-sampling; sorted in descending     *
*  order by total time spent in each procedure;              *
*  unexecuted procedures excluded                            *
--------------------------------------------------------------

 

Each sample covers 4.00 byte(s) for 17% of 0.0059 seconds

 

%time     seconds  cum %   cum sec  procedure (file)

 

 66.7      0.0039   66.7      0.00 add_vector (profsample.c)
 33.3      0.0020  100.0      0.01 printit (profsample.c)

8.5 Using gprof to Display Call Graph Information

To determine the manner in which routines call, or are called by, other routines, use the gprof profiling tool.

The gprof tool postprocesses both hiprof output and -pg output.

To use this tool, follow these steps:

Use the hiprof Atom tool to produce an instrumented version of the program:
% atom -tool hiprof profsample
Execute the instrumented version of profsample:
% profsample.hiprof
Examine the profiling data as follows:
% gprof profsample profsample.hiout

During execution, profiling data is saved in the data file profsample.hiout, unless you have set the -dirname flag in the HIPROF_ARGS environment variable or on the command line.

Alternatively, you can use the following procedure to collect profiling data for the gprof tool:

Compile and link using the -pg option, as follows:
% cc -pg -c profsample.c % cc -pg -o profsample profsample.o -lm

You must specify the -pg flag with the cc command during both the compile and link steps to obtain call graph information.
Execute the program:
% profsample

When this method is used, profiling data is saved during execution in the data file gmon.out, unless you have set the PROFDIR environment variable. For more information on using this variable, see Section 8.12.1.
Run the formatting program gprof, which extracts information from the data file:
% gprof profsample gmon.out

The output produced by the gprof utility comprises three sections:

Call graph profile
Timing profile, similar to the profile produced by prof
Index

You can control gprof profiling by file by using the -no_pg flag to the cc command. When you use this flag, you disable gprof profiling for all objects that follow the flag on the command line. You cannot use the -no_pg flag with the -r and -shared flags to the ld command.

Example 8-3 shows output for gprof profiling of the sample program. The -b flag was used with gprof to suppress printing of the description of each output field. The descriptions are valuable, but they are lengthy and were left out due to space considerations. To see these descriptions, follow the steps to produce gprof output and write the output to a file or pipe the output through the more utility.

In the call graph profile section, each routine in the program has its own subsection that is contained within dashed lines and identified by the index number in the first column. Note that for the purpose of this example output, the three sections have been separated by rows of asterisks that do not appear in the output produced by gprof. Each row of asterisks includes the name of the section. For more information on gprof flags, see the gprof(1) reference page.

Example 8-3: Sample gprof Output

*********************** call graph profile *******************

 

granularity: each sample hit covers 4 byte(s) for 10.00%
of 0.01 seconds

 

                                called/total       parents
index %time  self descendents   called+self   name          index
                                called/total       children

 

                                                   <spontaneous>
[1]   100.0  0.00        0.01                 main [1]
             0.00        0.00    20/20          add_vector [2]
             0.00        0.00     2/2           mult_by_scalar [4]

 

-----------------------------------------------

 

             0.00        0.00     20/20         main [1]      [1]
[2]    75.5  0.00        0.00     20          add_vector [2]  [2]
             0.00        0.00   2000/2200       printit [3]   [3]

 

-----------------------------------------------

 

             0.00        0.00     200/2200      mult_by_scalar [4]
             0.00        0.00   2000/2200       add_vector [2]
[3]    50.0  0.00        0.00   2200          printit [3]

 

-----------------------------------------------

 

             0.00        0.00      2/2           main [1]
[4]     4.5  0.00        0.00      2           mult_by_scalar [4]
             0.00        0.00    200/2200        printit [3]

 

-----------------------------------------------

 

*********************** timing profile section ***************

 

granularity: each sample hit covers 4 byte(s) for 10.00%
of 0.01 seconds

 

  %   cumulative   self          self     total
time  seconds  seconds  calls  ms/call  ms/call  name
50.0    0.00    0.00   2200     0.00     0.00   printit [3]
30.0    0.01    0.00     20     0.15     0.37   add_vector [2]
20.0    0.01    0.00                            main [1]
 0.0    0.01    0.00      2     0.00     0.22   mult_by_scalar[4]

 

*********************** index section ************************
Index by function name

 

   [2] add_vector            [4] mult_by_scalar
   [1] main                  [3] printit

This line describes the relationship of the main routine to the add_vector routine. Because main is listed above the add_vector routine in the final column of this section, main is identified as the parent of add_vector. The fraction 20/20 indicates that of the 20 times that add_vector (the denominator of the fraction) was called, it was called 20 times by main (the numerator of this fraction). [Return to example]
This line describes the add_vector routine, which is the subject of this portion of the call graph profile because it is the leftmost routine in the rightmost column of this section. The index number [2] in the first column corresponds to the index number [2] in the index section at the end of the output. The 75.5% in the second column reports the total amount of time in the sample that is accounted for by the add_vector routine and its descendent, in this case the printit routine. The 20 in the called column indicates the total number of times that the add_vector routine is called. [Return to example]
This line describes the relationship of the printit routine to the add_vector routine. Because the printit routine is below the add_vector routine in this section, printit is identified as the child of add_vector. The fraction 2000/2200 indicates that of the total of 2200 calls to printit, 2000 of these calls came from add_vector. [Return to example]

8.6 Using pixie for Basic Block Counting

A basic block is a set of instructions with one entry and one exit. The pixie Atom tool provides execution counts for the basic blocks of a program. With prof, the execution counts can be viewed at the instruction level.

To obtain data for basic block counting, follow these steps:

Compile and link. For example:
% cc -c profsample.c % cc -o profsample profsample.o -lm
Run the pixie Atom tool. You do not have to specify a name for the output because pixie produces an output file by default with the same name as the original C source file, but with pixie appended after a period. For example, the following command causes pixie to create two files, profsample.pixie and profsample.Addrs:
% atom -tool pixie profsample

The profsample.pixie file is equivalent to profsample but contains additional code that counts the execution of each basic block. To create an output file with a name other than pname.pixie, use the -o flag followed by the name you assign to the output file.
The profsample.Addrs file contains the address of each of the basic blocks. For more information, see pixie(5).
Execute the profsample.pixie file:
% profsample.pixie

This command generates the file profsample.Counts, which contains the basic block counts. Each time you execute the profsample.pixie file, you create a new profsample.Counts file.
Run the profile formatting program prof, with the -pixie flag over the profsample executable file:
% prof -pixie profsample

This command extracts information from profsample.Addrs and profsample.Counts and displays information in an easily readable format. Note that you do not need to specify the .Addrs and .Counts file suffixes because pixie searches by default for files containing them.

You can also run the pixstats program on the executable file profsample to generate a detailed report on opcode frequencies, interlocks, a miniprofile, and more. For more information, see pixstats(1).

Note
The pixie profiling tool provided in the current version of the Digital UNIX operating system is the pixie Atom tool. If you use the syntax provided in earlier versions of the operating system to invoke pixie, a script transforms the call into a call to the pixie Atom tool. The previous version of the pixie tool can be found at /usr/opt/obsolete/usr/bin/pixie.

8.7 Selecting Profiling Information to Display

Depending on the size of the application and the type of profiling you request, prof may generate a very large amount of output. However, you are often only interested in profiling data about a particular portion of your application.

8.7.1 Limiting Profiling Display to Specific Procedures

The prof program provides the following flags to display information selectively by procedure:

-only
-exclude
-Only
-Exclude
-totals

The -only option tells prof to print only profiling information for a particular procedure. You can specify the -only option multiple times on the command line. For example, the following command displays profiling information for procedures mult_by_scalar and add_vector from the sample program:

% prof -only mult_by_scalar -only add_vector profsample

The -exclude option tells prof to print profiling information for all procedures except the specified procedure. You can use multiple -exclude flags on the command line.

The following command displays profiling information for all procedures except add_vector:

% prof -exclude add_vector profsample

Do not use the -only and -exclude flags on the same command line.

Many of the prof utility's profiling flags print output as percentages, for example, the percentage of total execution time attributed to a particular procedure.

By default, the -only and -exclude flags cause prof to calculate percentages based on all of the procedures in the application even if they were omitted from the listing. You can change this behavior with the -Only and -Exclude flags. These flags work the same as -only and -exclude, but cause prof to calculate percentages based only on those procedures that appear in the listing. For example, the following command omits the add_vector procedure from both the listing and from percentage calculations:

% prof -Exclude add_vector profsample

The -totals flag, used with the -procedures and -invocations listings, prints cumulative statistics for the entire object file instead of for each procedure in the object.

8.7.2 Including Shared Libraries in the Profiling Information

The -all, -incobj, and -excobj flags allows you to display profiling information for shared libraries used by the program:

The -all flag causes the profiles for all shared libraries (if any) described in the data file(s) to be displayed, in addition to the profile for the executable.
The -incobj flag causes the profile for the named shared library to be printed, in addition to the profile for the executable.
The -excobj flag causes the profile for the named executable or shared library not to be printed.

8.7.3 Using pixie to Display Profiling Information for Each Source Line

The -heavy and -lines flags cause prof to display the total number of machine cycles executed by each source line in your application. Both of these flags require you to use basic block counting (the -pixie option); they do not work in PC-sampling mode.

The -heavy option prints an entry for every source line that was executed by your application. Each entry shows the total number of machine cycles executed by that line. Entries are sorted from the line with the most machine cycles to the line with the least machine cycles. Because this option often prints a huge number of entries, you might want to use one of the -quit, -only, or -exclude flags to reduce output to a manageable size.

Example 8-4 shows output generated by the following command:

% prof -pixie -heavy -only add_vector -only mult_by_scalar \ -only main profsample

For example, you can see in Example 8-4 that line 47 of profsample.c in the procedure add_vector( ) accounts for over 12 percent of the application's total execution time. The listing also shows the size in bytes of each source line.

Example 8-4: Prof Output by Source Line with -heavy Flag

Profile listing generated Fri May 27 14:09:10 1994 with:
  prof -pixie -heavy -only add_vector -only mult_by_scalar
  -only main profsample

 

------------------------------------------------------------------
*  -h[eavy] using basic-block counts;                            *
*  sorted in descending order by the number of cycles executed   *
*  in each                                                       *
*  line; unexecuted lines are excluded                           *
------------------------------------------------------------------

 

procedure (file)              line bytes     cycles      %  cum %

 

add_vector (profsample.c)       48    44      22000  23.26  23.26
add_vector (profsample.c)       46    40      20000  21.15  44.41
add_vector (profsample.c)       47    24      12000  12.69  57.10
mult_by_scalar (profsample.c)   36    44       2200   2.33  59.43
main (profsample.c)             20    60       1500   1.59  61.02
mult_by_scalar (profsample.c)   34    28       1400   1.48  62.50
mult_by_scalar (profsample.c)   35    24       1200   1.27  63.77
main (profsample.c)             19    12        300   0.32  64.08
main (profsample.c)             25    48        240   0.25  64.34
add_vector (profsample.c)       41    28        140   0.15  64.48
add_vector (profsample.c)       44    12         60   0.06  64.55
add_vector (profsample.c)       50    12         60   0.06  64.61
mult_by_scalar (profsample.c)   29    28         14   0.01  64.63
main (profsample.c)             23    32          8   0.01  64.63
main (profsample.c)             22    32          8   0.01  64.64
mult_by_scalar (profsample.c)   38    12          6   0.01  64.65
mult_by_scalar (profsample.c)   32    12          6   0.01  64.66
main (profsample.c)             26    16          4   0.00  64.66


main (profsample.c)             13    16          4   0.00  64.66
main (profsample.c)             18     8          2   0.00  64.67
main (profsample.c)             24     8          2   0.00  64.67

The -lines option is similar to -heavy, but it sorts the output differently. This option prints the lines for each procedure in the order that they occur in the source file. Even lines that never executed are printed. The procedures themselves are sorted from those procedures that execute the most machine cycles to those that execute the least.

Example 8-5 shows the same information as Example 8-4, but in a different format as generated by the following command:

% prof -pixie -lines -only add_vector -only mult_by_scalar \ -only main profsample

Example 8-5: Prof Output by Source Line with -lines Flag


 

Profile listing generated Fri May 27 14:07:28 1994 with:
   prof -pixie -lines -only add_vector -only mult_by_scalar
   -only main profsample

 

------------------------------------------------------------------
*  -l[ines] using basic-block counts;                            *
*  grouped by procedure, sorted by cycles executed per procedure;*
*  '?' means that line number information is not available.      *
------------------------------------------------------------------

 

procedure (file)              line bytes     cycles      %  cum %

 

add_vector (profsample.c)       41    28        140   0.15   0.15
                                44    12         60   0.06   0.21
                                46    40      20000  21.15  21.36
                                47    24      12000  12.69  34.05
                                48    44      22000  23.26  57.32
                                50    12         60   0.06  57.38
mult_by_scalar (profsample.c)   29    28         14   0.01  57.39
                                32    12          6   0.01  57.40
                                34    28       1400   1.48  58.88
                                35    24       1200   1.27  60.15
                                36    44       2200   2.33  62.48
                                38    12          6   0.01  62.48
main (profsample.c)             13    16          4   0.00  62.49
                                18     8          2   0.00  62.49
                                19    12        300   0.32  62.81
                                20    60       1500   1.59  64.39
                                22    32          8   0.01  64.40
                                23    32          8   0.01  64.41


                                24     8          2   0.00  64.41
                                25    48        240   0.25  64.66
                                26    16          4   0.00  64.67

8.7.4 Limiting Profiling Display by Line

The -quit option reduces the amount of profiling output displayed. The -quit option affects the output from the -procedures, -heavy, and -lines profiling modes.

The -quit option provides three versions:

-quit n
The n refers to an integer. All lines after the n line are truncated.
-quit n%
The n is an integer followed by a percent sign (%). All lines after the line containing n% calls in the %calls column of the display are truncated.
-quit ncum%
The ncum% refers to an integer n followed by the characters cum (for cumulative) and a percent sign (%). All lines after the line containing ncum% calls in the cum% column of the display are truncated.

If you specify several modes on the same command line, the -quit option affects the output from each mode. For example, the -quit option in the following command reduces the output from both the -procedures and -heavy modes:

% prof -pixie -procedures -heavy -quit 20 profsample

This command prints only the 20 most time-consuming procedures and the 20 most time-consuming source lines. The -quit n option has no affect on the -lines profiling mode.

The -quit n% option restricts the output to those entries that account for at least n% of the total. Depending on the profiling mode, the total can refer to the total amount of time, the total number of machine cycles, or the total number of invocation counts. For example, the following command prints only those source lines that account for at least 2 percent of the application's total number of machine cycles:

% prof -pixie -lines -quit 2% profsample

The -quit ncum% option truncates the output after n% of the total has been accounted for. The definition of total depends on the profiling mode, as described in the preceding paragraph. For example, the following command prints the most heavily used source line and stops after 30 percent of the application's total number of machine cycles have been accounted for:

% prof -pixie -heavy -quit 30cum% sample

8.8 Using pixie to Average prof Results

A single run of a program may not produce the desired results. You can repeatedly run the version of the program created by pixie, varying the input with each run, and then use the resulting .Counts files to produce a consolidated report. For example:

Compile and link. Do not use the -p option when linking to produce an executable file for pixie:
% cc -c profsample.c % cc -o profsample profsample.o -lm
Run the profiling utility pixie, as follows:
% atom -tool pixie -toolargs=-pids profsample

This command produces the profsample.Addrs file to be used in step 4, as well as the modified program profsample.pixie.
Delete any existing .Counts files, set the PIXIE_ARGS environment variable to "-pids", and run the executable program produced by pixie. For example:
% profsample.pixie

The -pids option specified with the atom -tool pixie command in step 2 appends the process ID of the process running the executable program to the name of the profsample.Counts file, for example, profsample.Counts.1753.
Run the profiled program as many times as desired. Each time the program is run, a profsample.Counts.<pid> file is created.
Run prof to create the report as follows:
% prof -pixie profsample profsample.Addrs profsample.Counts.*

If you had run profsample.pixie three times, the prof utility would have averaged the basic block data in the three files generated by the executable (profsample.Counts.<pid1>, profsample.Counts.<pid2>, and profsample.Counts.<pid3>) to produce the profile report.

8.9 Analyzing Test Coverage

When you are writing a test suite for an application, you might want to know how effectively your suite tests the application. The prof utility provides two flags that can help you determine this. The -zero option prints the names of procedures that were never executed by your application. The -testcoverage option lists all of the source lines that were never executed by your application. Both of these flags require basic block counting.

Typically, you would perform the following steps to make use of these flags.

Run the pixie Atom tool on your application.
Run the results of step 1 through your test suite saving any \.Counts files.
Profile your application with the -zero or -testcoverage flags and specify all of the \.Counts files produced when you ran the test suite.

8.10 Merging Data Files

If the application you are profiling is fairly complicated, you may want to run it several times with different inputs to get an accurate picture of its profile. If you are using PC sampling, each run of your application produces a new mon.out file, or a program.pid file if you have set the PROFDIR environment variable. If you are using basic block counting, each run produces a new \.Counts file.

You have two ways of displaying profiling information that is based on an average of all of these output files.

The first way is to specify the names of each profiling data file explicitly on the command line. For example, the following command prints profiling information from two profile data files:

% prof -procedures profsample 1510.profsample 1522.profsample

Keeping track of many different profiling data files, however, can be difficult. Therefore, prof provides the -merge option to combine several data files into a single merged file. When prof operates in -pixie mode, the -merge flag combines the \.Counts files. When prof operates in PC-sampling mode, this switch combines the mon.out or other profile data files.

The following example combines two profile data files into a single data file named total.out:

% prof -merge total.out profsample 1773.profsample \ 1777.profsample

At a later time, you can then display profiling data using the combined file, just as you would use a normal mon.out file. For example:

% prof -procedures profsample total.out

The merge process is similar for -pixie mode. You must specify the executable file's name, the \.Addrs file, and each \.Counts file:

% prof -pixie -merge total.Counts a.out a.out.Addrs \ a.out.Counts.1866 a.out.Counts.1868

8.11 Using Feedback Files

Feedback files are useful in identifying portions of a large executable program in which significant percentages of the execution occur. Without feedback, the compiler must make assumptions about call frequency based on nesting levels. These assumptions are almost never as good as actual data from a sample run. The following sections describes how to use feedback files by using the cc command and the atom -tool pixie and prof commands.

8.11.1 Generating and Using Feedback Information

Follow these steps to generate feedback information that can be used to optimize subsequent compilations:

Compile the source code:
% cc -O2 -o profsample profsample.c -lm
Run the pixie Atom tool on the executable file:
% atom -tool pixie -toolargs=-o profsample.pixie profsample

This step creates an output executable file named profsample.pixie and a prof input file named profsample.Addrs.
Execute the program you just created:
% profsample.pixie

This step creates a file named profsample.Counts, which contains execution statistics.
Use prof to create a feedback file from the execution statistics:
% prof -pixie -feedback profsample.feedback profsample
You can use a feedback file as input to a compilation at -O2 or -O3 optimization levels when you use the -feedback option with the cc command, as shown in the following example:
% cc -O3 -feedback profsample.feedback -o \ profsample profsample.c -lm

The feedback file provides the compiler with actual execution information that can be used to improve certain optimizations, such as inlining function calls. Use a feedback file generated from a -O2 compilation for any subsequent compilations with -O2 or -O3 flags.

8.11.2 Using a Feedback File for Input to cord

You can also use a feedback file as input to the cord utility. The cord utility orders the procedures in an executable program to improve execution time. The following example shows how to use the -cord option as part of a compilation command with a feedback file as input:

% cc -O2 -cord -feedback profsample.feedback \ -o profsample profsample.c -lm

Use a feedback file generated with the same optimization level as the level you use in subsequent compilations.

You can also use cord with the runcord utility. For more information, see runcord(1).

8.12 Using Environment Variables to Control PC-Sample Profiling

By default, the -p and -pg flags to the cc command provide the following:

A single profile covering the whole text segment and all threads. To profile specific portions of the program, use the monitor utilities, as described in Section 8.13 and monitor(3).
A single data file called mon.out (for -p) or gmon.out (for -pg) placed in the current directory.

The -p flag supports the profiling of shared libraries. The -pg flag and uprofile tool support the profiling of only the part of a program that is in the executable. When using these tools to generate profiling information for library routines, link your object file with the -non_shared flag to the cc command.

You can use one of the following environment variables to control profiling behavior:

PROFDIR
PROFFLAGS

By using these variables, you can disable aspects of default profiling behavior, including:

Changing the name and path of profiling data files
Controlling when profiling begins
Controlling profiling of multithreaded applications

You can use the PROFFLAGS and PROFDIR environment variables together.

Note that these environment variables have no effect on the prof and gprof post-processors; they affect the profiling behavior of a program during its execution. These environment variables have no effect when you use the pixie Atom tool.

8.12.1 PROFDIR Environment Variable

By default, profiling data is collected in a data file named [g]mon.out. When you do multiple profiling runs, each run overwrites the existing [g]mon.out file. Use the PROFDIR environment variable when you want to collect PC sampling data in files with unique names. Set this environment variable as follows:

C Shell:
setenv PROFDIR path
Bourne Shell:
PROFDIR = path; export PROFDIR

The results are saved in the file path/pid.progname, which resolves as follows:

path: The directory path, specified with PROFDIR, identifying an existing directory.

pid: The process ID of the executing program.

progname: The program name.

When you set PROFDIR to a null string, no profiling occurs.

8.12.2 PROFFLAGS Environment Variable

By default, the profiling library libprof1.a (or libprof1_r.a, for multithreaded programs) allocates one buffer per process to record your profiling data, as well as placing the data output file in your current directory.

To disable this default behavior, set the PROFFLAGS environment variable as follows:

C Shell:
setenv PROFFLAGS "-disable_default"
Bourne Shell:
PROFFLAGS = "-disable_default"; export PROFFLAGS

When you have set PROFFLAGS to -disable_default, the default profiling support is disabled, allowing you to use the monitor calls to profile specific sections of your program for both nonthreaded and multithreaded programs. See monitor(3) and Section 8.13 for more information on using the monitor, monstartup, and moncontrol routines.

For multithreaded programs, you can allocate one buffer per thread by setting the PROFFLAGS environment variable as follows:

C Shell:
setenv PROFFLAGS "-threads"
Bourne Shell:
PROFFLAGS = "-threads"; export PROFFLAGS

When you have set PROFFLAGS to -threads, a separate file is produced for each thread and is named pid.sid.progname, which is resolved as follows:

pid: The process identification of the program.

sid: The sequence number of the thread, which depends on the order in which the threads were created.

progname: The name of the program being profiled.

You can use the -threads and -disable_default flags together to control profiling of your program when you use the monitor routines.

You can also set the PROFFLAGS environment variable to include or exclude profiling information:

setenv PROFFLAGS "-all": Causes the profiles for all shared libraries (if any) described in the data file(s) to be displayed, in addition to the profile for the executable.

setenv PROFFLAGS "-incobj lib_name" .dD Causes the profile for the named shared library to be printed, in addition to the profile for the executable.

setenv PROFFLAGS "-excobj lib_name" .dD Causes the profile for the named executable or shared library not to be printed.

8.13 Using monitor Routines to Control Profiling

The default profiling behavior on Digital UNIX systems is to profile the entire text segment of your program and place the profiling data in mon.out for prof profiling or in gmon.out for gprof profiling. For large programs, you might not need to profile the entire text segment. The monitor routines provide the ability to profile portions of your program specified by the lower and upper address boundaries of a function address range.

The monitor routines are:

monitor( ): Use this routine to gain control of explicit profiling by turning profiling on and off for a specific text range. This routine is not supported for gprof profiling.

monstartup( ): Similar to monitor, except it specifies address range only and is supported for gprof profiling.

moncontrol( ): Use this routine with monitor and monstartup to turn PC sampling on or off during program execution for a specific process or thread.

monitor_signal( ): Use this routine to profile nonterminating programs, such as daemons.

You can use monitor and monstartup to profile an address range in each shared library as well as in the static executable.

For more information on these functions, see monitor(3).

By default, profiling begins as soon your program starts to execute. You can set the PROFFLAGS environment variable to -disable_default to prevent profiling from beginning when your program executes. Then, you can use the monitor routines to begin profiling after the first call to monitor or monstartup.

You can disable the default naming of the profiling data file by using the PROFDIR environment variable. For more information on using this environment variable, see Section 8.12.1.

Example 8-6 demonstrates how to use the monstartup and monitor routines within a program to begin and end profiling.

Example 8-6: Using monstartup() and monitor()


 

   /*  Profile the domath() routine using monstartup.
    *  This example allocates a buffer for the entire program.
    *  Compile command: cc -p foo.c -o foo -lm
    *  Before running the executable, enter the following
    *  from the command line to disable default profiling support:
    *  setenv PROFFLAGS -disable_default
    */

 

    #include <stdio.h>
    #include <sys/syslimits.h>

 

    char dir[PATH_MAX];

 

    extern void _ _start();
    extern unsigned long _etext;

 

    main()
    {
        int i;
        int a = 1;

 

        /* Start profiling between _ _start (beginning of text
         * and _etext (end of text).  The profiling library
         * routines will allocate the buffer.
         */

 

        monstartup(_ _start,&_etext);

 

           for(i=0;i<10;i++)
                domath();

 

        /* Stop profiling and write the profiling output file. */

 

        monitor(0);

 

   }
    domath()
   {
      int i;
      double d1, d2;

 

      d2 = 3.1415;
      for (i=0;  i<1000000;  i++)
         d1 = sqrt(d2)*sqrt(d2);
   }

The external name _etext lies just above all the program text. See end(3) for more information.

When you set the PROFFLAGS environment variable to -disable_default, you disable default profiling buffer support. You can allocate buffers within your program, as shown in Example 8-7.

Example 8-7: Allocating Profiling Buffers Within a Program


 

   /*  Profile the domath routine using monitor().
    *  Compile command: cc -p foo.c -o foo -lm
    *  Before running the executable, enter the following
    *  from the command line to disable default profiling support:
    *  setenv PROFFLAGS -disable_default
    */

 

   #include <sys/types.h>
   #include <sys/syslimits.h>

 

   extern char *calloc();

 

   void domath(void);
   void nextproc(void);

 

   #define INST_SIZE 4          /* Instruction size on Alpha */
   char dir[PATH_MAX];

 

   main()
   {
        int i;
        char *buffer;
        size_t bufsize;

 

        /*  Allocate one counter for each instruction to
         *  be sampled. Each counter is an unsigned short.
         */

 

        bufsize = (((char *)nextproc - (char *)domath)/INST_SIZE)
         * sizeof(unsigned short);

 

         /*  Use calloc() to ensure that the buffer is clean
          *  before sampling begins.
          */

 

         buffer = calloc(bufsize,1);

 

         /*  Start sampling.  */
          monitor(domath,nextproc,buffer,bufsize,0);
          for(i=0;i<10;i++)
                domath();

 

         /* Stop sampling and write out profiling buffer.  */
                  monitor(0);
  }
   void domath(void)
  {
    int i;
    double d1, d2;

 

       d2 = 3.1415;
       for (i=0;  i<1000000;  i++)
           d1 = sqrt(d2)*sqrt(d2);
   }

 

   void nextproc(void)
   {}

You use the monitor_signal( ) routine to profile programs that do not terminate. Declare this routine as a signal handler in your program and build the program for prof or gprof profiling. While the program is executing, send a signal from the shell by using the kill command.

When the signal is received, monitor_signal is invoked and writes profiling data to the data file. If the program receives another signal, the data file is overwritten.

Example 8-8 illustrates how to use the monitor_signal routine.

Example 8-8: Using monitor_signal() to Profile Non-Terminating Programs


 

/* From the shell, start up the program in background.
 * Send a signal to the process, for example: kill -30 <pid>
 * Process the [g]mon.out file normally using gprof or prof
 */

 

#include <signal.h>

 

extern int monitor_signal();

 

main()
{
    int i;
    double d1, d2;

 

    /*
     * Declare monitor_signal() as signal handler for SIGUSR1
     */
    signal(SIGUSR1,monitor_signal);
    d2 = 3.1415;
    /*
     * Loop infinitely (absurd example of non-terminating process)
     */
    for (;;)
        d1 = sqrt(d2)*sqrt(d2);
}

8.14 Profiling Multithreaded Applications

Profiling multithreaded applications is essentially the same as profiling non-threaded applications. However, to profile multithreaded applications, you must compile your program with the -pthread or -threads flag to the cc command. Specifying one of these flags and either the -p or -pg flag enables the thread profiling library, libprof1_r.a.

The default case for profiling multithreaded applications is to provide one sampling buffer for all threads. In this case, you get sampling across the entire process and you get one output file comprising sampling data from all threads. Depending on whether you use the -p or -pg flag, your output file will be named mon.out or gmon.out, respectively.

To get a separate buffer and a separate output file for each thread in your program, use the environment variable PROFFLAGS. Set PROFFLAGS to -threads, as shown in the following example:

setenv PROFFLAGS "-threads"

The profiling data file will be named according to the following convention:

pid.sid.progname

In the preceding example, pid is the process id of the program, sid corresponds to the order in which the thread was created, progname is your program name.

If the application controls profiling by using the monitor routines, sid corresponds to the order in which profiling was started for the thread.

If you use the monitor( ) or monstartup( ) calls in a threaded program, you must first set PROFFLAGS to "-disable_default -threads", giving you complete control of profiling the application.

If the application uses monitor( ) and allocates separate buffers for each thread profiled, you must first set PROFFLAGS to "disable_default -threads", because this setting affects the file naming conventions that are used. Without the -threads flag, the buffer and address range used as a result of the first monitor or monstartup call would be applied to every thread that subsequently requests profiling. In this case, a single data file that covers all threads being profiled would be created.

Each thread in a process must call the monitor( ) or monstartup( ) routines to initiate profiling for itself.