[Return to Library] [Contents] [Previous Chapter] [Next Section] [Next Chapter] [Index] [Help]


8    Profiling Programs to Improve Performance

Profiling is a method of identifying sections of code that consume large portions of execution time. In a typical program, most execution time is spent in relatively few sections of code. To improve performance, the greatest gains result from improving coding efficiency in time-intensive sections.

This chapter discusses the following topics:


[Return to Library] [Contents] [Previous Chapter] [Next Section] [Next Chapter] [Index] [Help]


8.1    Profiling Methods

Profiling methods include:

To select an appropriate profiling method for an application, you must take into consideration the following factors:

The profiling data display tools, and their respective data collection methods, include the following:

prof
Prints a profile of statistics per procedure.

The prof tool supports the following data collection methods:

prof -pixie
Prints a profile showing the number of times each procedure, source line, or instruction is executed. The prof -pixie tool supports the following basic block counting profiling data collection method:

gprof
Produces call-graph profile data showing the effects of calling routines on called routines as well as other information.

The gprof tool supports the following data collection methods:

You can also use the monitor routines to perform PC-sampling on a specified address range in a program. For more information on using monitor routines, see Section 8.13 and monitor(3).


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


8.2    Profiling Tools Overview

Table 8-1 provides a concise overview of the profiling tools available in the Digital UNIX operating system.

Table 8-1: Profiling Tools

Tool Use
PC-sampling/ prof Link application with -p; analyze results with prof; see prof(1) and monitor(3).
Call-arcs/ gprof Compile and link with -pg; analyze results with gprof; see gprof(1) and monitor(3).
pixstats Additional postprocessor for pixified program output; see pixstats(1).
uprofile/ kprofile Run application under uprofile or kprofile; requires pfm driver to be installed; analyze results with prof; see uprofile(1), kprofile(1), and pfm(7).
Atom toolkit Programmable debug/performance analysis tool. Example tools are contained in /usr/lib/cmplrs/atom/examples; see atom(1) and other Atom reference pages for programming interface.
pixie Atom-based basic block profiler; analyze results with prof; see pixie(5).
hiprof Atom-based call-arc analyzer; analyze results with gprof; see hiprof(5).
third Atom-based memory error/leak detection tool, Third Degree; generates text output. See third(5).

All profiling tools work on call-shared and nonshared applications.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


8.2.1    PC-Sampling

Statistical PC-sampling for the program is useful for diagnosing high CPU-usage procedures in the program and it supports both threads and shared libraries.

Interface summary:

cc -p *.o -o program     # Link with libprof1.a
 
program                  # Run program to collect data
 
prof program             # Process the mon.out file


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


8.2.2    gprof

The gprof tool provides procedure call information coupled with statistical PC-sampling. This is useful for determining which routines are called most frequently and from where. The gprof tool also gives a flat profile for CPU-usage on the routines. It supports threads and call-shared programs, but does not support shared libraries.

Using the gprof tool, you can retrieve information from libc.a and libm.a because these two libraries are compiled with the -pg flag. Other Digital-supplied libraries are not compiled with -pg, so calling information on these other system libraries is not available.

Interface summary:

cc -pg *.c -o program    # Compile and link with -pg
 
program                  # Run program to collect data
 
gprof program            # Process the gmon.out file


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


8.2.3    uprofile and kprofile

The uprofile and kprofile tools use the performance counters on the Alpha chip. They do not collect information on shared libraries. By default, both tools collect cycles for the program. The performance data produced by these tools is processed with the prof command. See uprofile(1) and kprofile(1) for more information.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


8.2.4    Atom Toolkit

The Atom toolkit consists of a programmable instrumentation tool and several packaged tools. Examples are included in the /usr/lib/cmplrs/atom/examples directory that demonstrate how to develop instrumentation and analysis code. The instrumentation part of the tool instructs Atom on where to insert calls to analysis routines in the program. When the program is run, the analysis routines are entered and data collection is performed as prescribed by the Atom tool specified on the atom command.

Atom does not work on programs built with the -om flag.

Interface summary:

atom -tool toolname program
 
program.tool

Postprocessing is tool-dependent. See Chapter 9 for details on Atom.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


8.2.5    pixie Atom tool

The Atom-based pixie is a basic block profiler that supports shared libraries and threaded applications.

Interface summary:

atom -tool pixie [-env threads] program
 
program.pixie[.threads]
 
prof -pixie program


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


8.2.6    hiprof Atom tool

The hiprof Atom tool collects call-arc information on a program. By default, it operates like the gprof support provided by the -pg flag, but has flag-selectable options that are more powerful. The hiprof Atom tool supports shared libraries and threaded applications.

Interface summary:

atom -tool hiprof [-env threads] program
 
program.hiprof[.threads]
 
gprof program program.hiout


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


8.2.7    Third Degree

Third Degree is a memory-leak and memory-overwrite detection tool, also based on Atom. Third Degree generates text output to a file called program.3log. The log contains the diagnostics that Third Degree detected (for example, reads of uninitialized heap or stack, memory overwrites, and memory leaks).

Interface summary:

atom -tool third [-env threads] program
 
program.third[.threads]
 
cat program.3log


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


8.3    Profiling Sample Program

The examples in the remainder of this chapter refer to the sample program, profsample.c, shown in Example 8-1.

Example 8-1: Profiling Sample Program

#include <math.h>
#include <stdio.h>

 
#define LEN 100
 
void mult_by_scalar(double ary[], int len, double num); void add_vector(double arya[], double aryb[], int len); double value; void printit(double value);
 
main() { double ary1[LEN]; double ary2[LEN]; int i;
 
for (i=0; i<LEN; i++) { ary1[i] = 0.0; ary2[i] = sqrt((double)i); } mult_by_scalar(ary1, LEN, 3.14159); mult_by_scalar(ary2, LEN, 2.71828); for (i=0; i<20; i++) add_vector(ary1, ary2, LEN); }
 
void mult_by_scalar(double ary[], int len, double num) { int i;
 
for (i=0; i<len; i++) { ary[i] *= num; value = ary[i]; printit(value); } }
 
void add_vector(double arya[], double aryb[], int len) { int i;
 
for (i=0; i<len; i++) { arya[i] += aryb[i]; value = arya[i]; printit(value); } }
 
void printit(double value) { printf("Value = %f\n", value); }



[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


8.4    Using prof to Produce Program Counter Sampling Data

To use prof to obtain PC sampling data on a program, follow these steps:

  1. Compile and link (or just link) using the -p option, as follows:

    cc -c profsample.c
    cc -p -o profsample profsample.o -lm

    You must specify the -p profiling option during the link step to obtain PC sampling information. If you have an existing application, you will not need to recompile to profile the executable program; simply relink the program using the -p option with the cc command.

    If you are building an application for the first time, you can compile and link in the same step. In the preceding example, the -lm option ensures that libm.{a,so} is used to resolve symbols that refer to math library functions.

    You might also consider compiling with one of the optimization flags to help improve the efficiency of your code, compiling with a debug flag to provide more symbolic information for the profile report, or compiling with both types of flags.

    If you are profiling a multithreaded application, use the -threads flag with the cc command. For more information on profiling multithreaded applications, see Section 8.14.

  2. Execute the profiled program:

    profsample

    You can run the program several times, altering the input data (if any) to create multiple profile data files.

    During execution, profiling data is saved in a profile data file. The default name for the profile data file is mon.out, unless you have set the environment variable PROFDIR. For more information on using PROFDIR, see Section 8.12.1

  3. Run the profile formatting program prof, which extracts information from one or more profile data files and produces a tabular report:

    prof profsample mon.out

Example 8-2 shows output produced by the prof command on the profsample.c program.

Example 8-2: Profiler Listing for PC Sampling

Profile listing generated Thu May 26 13:36:14 1994 with:
   prof profsample mon.out

 
-------------------------------------------------------------- * -p[rocedures] using pc-sampling; sorted in descending * * order by total time spent in each procedure; * * unexecuted procedures excluded * --------------------------------------------------------------
 
Each sample covers 4.00 byte(s) for 14% of 0.0068 seconds
 
%time seconds cum % cum sec procedure (file)
 
42.9 0.0029 42.9 0.00 printit (profsample.c) 42.9 0.0029 85.7 0.01 add_vector (profsample.c) [1] 14.3 0.0010 100.0 0.01 mult_by_scalar (profsample.c)

  1. This sample line of output presents the following information:

Because the prof program works by periodic sampling of the program counter, you might see different output when you profile the same program multiple times. A different profiling run than the preceding example of the sample program produced the following output:

Profile listing generated Thu May 26 13:34:00 1994 with:
   prof -procedures profsample mon.out

 
-------------------------------------------------------------- * -p[rocedures] using pc-sampling; sorted in descending * * order by total time spent in each procedure; * * unexecuted procedures excluded * --------------------------------------------------------------
 
Each sample covers 4.00 byte(s) for 17% of 0.0059 seconds
 
%time seconds cum % cum sec procedure (file)
 
66.7 0.0039 66.7 0.00 add_vector (profsample.c) 33.3 0.0020 100.0 0.01 printit (profsample.c)


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


8.5    Using gprof to Display Call Graph Information

To determine the manner in which routines call, or are called by, other routines, use the gprof profiling tool.

The gprof tool postprocesses both hiprof output and -pg output.

To use this tool, follow these steps:

  1. Use the hiprof Atom tool to produce an instrumented version of the program:

    atom -tool hiprof profsample

  2. Execute the instrumented version of profsample:

    profsample.hiprof

  3. Examine the profiling data as follows:

    gprof profsample profsample.hiout

During execution, profiling data is saved in the data file profsample.hiout, unless you have set the -dirname flag in the HIPROF_ARGS environment variable or on the command line.

Alternatively, you can use the following procedure to collect profiling data for the gprof tool:

  1. Compile and link using the -pg option, as follows:

    cc -pg -c profsample.c
    cc -pg -o profsample profsample.o -lm

    You must specify the -pg flag with the cc command during both the compile and link steps to obtain call graph information.

  2. Execute the program:

    profsample

    When this method is used, profiling data is saved during execution in the data file gmon.out, unless you have set the PROFDIR environment variable. For more information on using this variable, see Section 8.12.1.

  3. Run the formatting program gprof, which extracts information from the data file:

    gprof profsample gmon.out


The output produced by the gprof utility comprises three sections:

You can control gprof profiling by file by using the -no_pg flag to the cc command. When you use this flag, you disable gprof profiling for all objects that follow the flag on the command line. You cannot use the -no_pg flag with the -r and -shared flags to the ld command.

Example 8-3 shows output for gprof profiling of the sample program. The -b flag was used with gprof to suppress printing of the description of each output field. The descriptions are valuable, but they are lengthy and were left out due to space considerations. To see these descriptions, follow the steps to produce gprof output and write the output to a file or pipe the output through the more utility.

In the call graph profile section, each routine in the program has its own subsection that is contained within dashed lines and identified by the index number in the first column. Note that for the purpose of this example output, the three sections have been separated by rows of asterisks that do not appear in the output produced by gprof. Each row of asterisks includes the name of the section. For more information on gprof flags, see the gprof(1) reference page.

Example 8-3: Sample gprof Output

*********************** call graph profile *******************

 
granularity: each sample hit covers 4 byte(s) for 10.00% of 0.01 seconds
 
called/total parents index %time self descendents called+self name index called/total children
 
<spontaneous> [1] 100.0 0.00 0.01 main [1] 0.00 0.00 20/20 add_vector [2] 0.00 0.00 2/2 mult_by_scalar [4]
 
-----------------------------------------------
 
0.00 0.00 20/20 main [1] [1] [2] 75.5 0.00 0.00 20 add_vector [2] [2] 0.00 0.00 2000/2200 printit [3] [3]
 
-----------------------------------------------
 
0.00 0.00 200/2200 mult_by_scalar [4] 0.00 0.00 2000/2200 add_vector [2] [3] 50.0 0.00 0.00 2200 printit [3]
 
-----------------------------------------------
 
0.00 0.00 2/2 main [1] [4] 4.5 0.00 0.00 2 mult_by_scalar [4] 0.00 0.00 200/2200 printit [3]
 
-----------------------------------------------
 
*********************** timing profile section ***************
 
granularity: each sample hit covers 4 byte(s) for 10.00% of 0.01 seconds
 
% cumulative self self total time seconds seconds calls ms/call ms/call name 50.0 0.00 0.00 2200 0.00 0.00 printit [3] 30.0 0.01 0.00 20 0.15 0.37 add_vector [2] 20.0 0.01 0.00 main [1] 0.0 0.01 0.00 2 0.00 0.22 mult_by_scalar[4]
 
*********************** index section ************************ Index by function name
 
[2] add_vector [4] mult_by_scalar [1] main [3] printit

  1. This line describes the relationship of the main routine to the add_vector routine. Because main is listed above the add_vector routine in the final column of this section, main is identified as the parent of add_vector. The fraction 20/20 indicates that of the 20 times that add_vector (the denominator of the fraction) was called, it was called 20 times by main (the numerator of this fraction). [Return to example]

  2. This line describes the add_vector routine, which is the subject of this portion of the call graph profile because it is the leftmost routine in the rightmost column of this section. The index number [2] in the first column corresponds to the index number [2] in the index section at the end of the output. The 75.5% in the second column reports the total amount of time in the sample that is accounted for by the add_vector routine and its descendent, in this case the printit routine. The 20 in the called column indicates the total number of times that the add_vector routine is called. [Return to example]

  3. This line describes the relationship of the printit routine to the add_vector routine. Because the printit routine is below the add_vector routine in this section, printit is identified as the child of add_vector. The fraction 2000/2200 indicates that of the total of 2200 calls to printit, 2000 of these calls came from add_vector. [Return to example]


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


8.6    Using pixie for Basic Block Counting

A basic block is a set of instructions with one entry and one exit. The pixie Atom tool provides execution counts for the basic blocks of a program. With prof, the execution counts can be viewed at the instruction level.

To obtain data for basic block counting, follow these steps:

  1. Compile and link. For example:

    cc -c profsample.c
    cc -o profsample profsample.o -lm

  2. Run the pixie Atom tool. You do not have to specify a name for the output because pixie produces an output file by default with the same name as the original C source file, but with pixie appended after a period. For example, the following command causes pixie to create two files, profsample.pixie and profsample.Addrs:

    atom -tool pixie profsample

    The profsample.pixie file is equivalent to profsample but contains additional code that counts the execution of each basic block. To create an output file with a name other than pname.pixie, use the -o flag followed by the name you assign to the output file.

    The profsample.Addrs file contains the address of each of the basic blocks. For more information, see pixie(5).

  3. Execute the profsample.pixie file:

    profsample.pixie

    This command generates the file profsample.Counts, which contains the basic block counts. Each time you execute the profsample.pixie file, you create a new profsample.Counts file.

  4. Run the profile formatting program prof, with the -pixie flag over the profsample executable file:

    prof -pixie profsample

    This command extracts information from profsample.Addrs and profsample.Counts and displays information in an easily readable format. Note that you do not need to specify the .Addrs and .Counts file suffixes because pixie searches by default for files containing them.

You can also run the pixstats program on the executable file profsample to generate a detailed report on opcode frequencies, interlocks, a miniprofile, and more. For more information, see pixstats(1).

Note

The pixie profiling tool provided in the current version of the Digital UNIX operating system is the pixie Atom tool. If you use the syntax provided in earlier versions of the operating system to invoke pixie, a script transforms the call into a call to the pixie Atom tool. The previous version of the pixie tool can be found at /usr/opt/obsolete/usr/bin/pixie.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


8.7    Selecting Profiling Information to Display

Depending on the size of the application and the type of profiling you request, prof may generate a very large amount of output. However, you are often only interested in profiling data about a particular portion of your application.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


8.7.1    Limiting Profiling Display to Specific Procedures

The prof program provides the following flags to display information selectively by procedure:

-only
-exclude
-Only
-Exclude
-totals

The -only option tells prof to print only profiling information for a particular procedure. You can specify the -only option multiple times on the command line. For example, the following command displays profiling information for procedures mult_by_scalar and add_vector from the sample program:

prof -only mult_by_scalar -only add_vector profsample

The -exclude option tells prof to print profiling information for all procedures except the specified procedure. You can use multiple -exclude flags on the command line.


The following command displays profiling information for all procedures except add_vector:

 prof -exclude add_vector profsample

Do not use the -only and -exclude flags on the same command line.

Many of the prof utility's profiling flags print output as percentages, for example, the percentage of total execution time attributed to a particular procedure.

By default, the -only and -exclude flags cause prof to calculate percentages based on all of the procedures in the application even if they were omitted from the listing. You can change this behavior with the -Only and -Exclude flags. These flags work the same as -only and -exclude, but cause prof to calculate percentages based only on those procedures that appear in the listing. For example, the following command omits the add_vector procedure from both the listing and from percentage calculations:

prof -Exclude add_vector profsample

The -totals flag, used with the -procedures and -invocations listings, prints cumulative statistics for the entire object file instead of for each procedure in the object.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


8.7.2    Including Shared Libraries in the Profiling Information

The -all, -incobj, and -excobj flags allows you to display profiling information for shared libraries used by the program:


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


8.7.3    Using pixie to Display Profiling Information for Each Source Line

The -heavy and -lines flags cause prof to display the total number of machine cycles executed by each source line in your application. Both of these flags require you to use basic block counting (the -pixie option); they do not work in PC-sampling mode.

The -heavy option prints an entry for every source line that was executed by your application. Each entry shows the total number of machine cycles executed by that line. Entries are sorted from the line with the most machine cycles to the line with the least machine cycles. Because this option often prints a huge number of entries, you might want to use one of the -quit, -only, or -exclude flags to reduce output to a manageable size.

Example 8-4 shows output generated by the following command:

prof -pixie -heavy -only add_vector -only mult_by_scalar \
  -only main profsample

For example, you can see in Example 8-4 that line 47 of profsample.c in the procedure add_vector( ) accounts for over 12 percent of the application's total execution time. The listing also shows the size in bytes of each source line.

Example 8-4: Prof Output by Source Line with -heavy Flag

Profile listing generated Fri May 27 14:09:10 1994 with:
  prof -pixie -heavy -only add_vector -only mult_by_scalar
  -only main profsample

 
------------------------------------------------------------------ * -h[eavy] using basic-block counts; * * sorted in descending order by the number of cycles executed * * in each * * line; unexecuted lines are excluded * ------------------------------------------------------------------
 
procedure (file) line bytes cycles % cum %
 
add_vector (profsample.c) 48 44 22000 23.26 23.26 add_vector (profsample.c) 46 40 20000 21.15 44.41 add_vector (profsample.c) 47 24 12000 12.69 57.10 mult_by_scalar (profsample.c) 36 44 2200 2.33 59.43 main (profsample.c) 20 60 1500 1.59 61.02 mult_by_scalar (profsample.c) 34 28 1400 1.48 62.50 mult_by_scalar (profsample.c) 35 24 1200 1.27 63.77 main (profsample.c) 19 12 300 0.32 64.08 main (profsample.c) 25 48 240 0.25 64.34 add_vector (profsample.c) 41 28 140 0.15 64.48 add_vector (profsample.c) 44 12 60 0.06 64.55 add_vector (profsample.c) 50 12 60 0.06 64.61 mult_by_scalar (profsample.c) 29 28 14 0.01 64.63 main (profsample.c) 23 32 8 0.01 64.63 main (profsample.c) 22 32 8 0.01 64.64 mult_by_scalar (profsample.c) 38 12 6 0.01 64.65 mult_by_scalar (profsample.c) 32 12 6 0.01 64.66 main (profsample.c) 26 16 4 0.00 64.66
main (profsample.c) 13 16 4 0.00 64.66 main (profsample.c) 18 8 2 0.00 64.67 main (profsample.c) 24 8 2 0.00 64.67

The -lines option is similar to -heavy, but it sorts the output differently. This option prints the lines for each procedure in the order that they occur in the source file. Even lines that never executed are printed. The procedures themselves are sorted from those procedures that execute the most machine cycles to those that execute the least.

Example 8-5 shows the same information as Example 8-4, but in a different format as generated by the following command:

prof -pixie -lines -only add_vector -only mult_by_scalar \
 -only main profsample

Example 8-5: Prof Output by Source Line with -lines Flag


 
Profile listing generated Fri May 27 14:07:28 1994 with: prof -pixie -lines -only add_vector -only mult_by_scalar -only main profsample
 
------------------------------------------------------------------ * -l[ines] using basic-block counts; * * grouped by procedure, sorted by cycles executed per procedure;* * '?' means that line number information is not available. * ------------------------------------------------------------------
 
procedure (file) line bytes cycles % cum %
 
add_vector (profsample.c) 41 28 140 0.15 0.15 44 12 60 0.06 0.21 46 40 20000 21.15 21.36 47 24 12000 12.69 34.05 48 44 22000 23.26 57.32 50 12 60 0.06 57.38 mult_by_scalar (profsample.c) 29 28 14 0.01 57.39 32 12 6 0.01 57.40 34 28 1400 1.48 58.88 35 24 1200 1.27 60.15 36 44 2200 2.33 62.48 38 12 6 0.01 62.48 main (profsample.c) 13 16 4 0.00 62.49 18 8 2 0.00 62.49 19 12 300 0.32 62.81 20 60 1500 1.59 64.39 22 32 8 0.01 64.40 23 32 8 0.01 64.41
24 8 2 0.00 64.41 25 48 240 0.25 64.66 26 16 4 0.00 64.67
 


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


8.7.4    Limiting Profiling Display by Line

The -quit option reduces the amount of profiling output displayed. The -quit option affects the output from the -procedures, -heavy, and -lines profiling modes.

The -quit option provides three versions:

If you specify several modes on the same command line, the -quit option affects the output from each mode. For example, the -quit option in the following command reduces the output from both the -procedures and -heavy modes:

prof -pixie -procedures -heavy -quit 20 profsample

This command prints only the 20 most time-consuming procedures and the 20 most time-consuming source lines. The -quit n option has no affect on the -lines profiling mode.

The -quit n% option restricts the output to those entries that account for at least n% of the total. Depending on the profiling mode, the total can refer to the total amount of time, the total number of machine cycles, or the total number of invocation counts. For example, the following command prints only those source lines that account for at least 2 percent of the application's total number of machine cycles:

prof -pixie -lines -quit 2% profsample

The -quit ncum% option truncates the output after n% of the total has been accounted for. The definition of total depends on the profiling mode, as described in the preceding paragraph. For example, the following command prints the most heavily used source line and stops after 30 percent of the application's total number of machine cycles have been accounted for:

prof -pixie -heavy -quit 30cum% sample


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


8.8    Using pixie to Average prof Results

A single run of a program may not produce the desired results. You can repeatedly run the version of the program created by pixie, varying the input with each run, and then use the resulting .Counts files to produce a consolidated report. For example:

  1. Compile and link. Do not use the -p option when linking to produce an executable file for pixie:

    cc -c profsample.c
    cc -o profsample profsample.o -lm

  2. Run the profiling utility pixie, as follows:

    atom -tool pixie -toolargs=-pids profsample

    This command produces the profsample.Addrs file to be used in step 4, as well as the modified program profsample.pixie.

  3. Delete any existing .Counts files, set the PIXIE_ARGS environment variable to "-pids", and run the executable program produced by pixie. For example:

    profsample.pixie

    The -pids option specified with the atom -tool pixie command in step 2 appends the process ID of the process running the executable program to the name of the profsample.Counts file, for example, profsample.Counts.1753.

  4. Run the profiled program as many times as desired. Each time the program is run, a profsample.Counts.<pid> file is created.

  5. Run prof to create the report as follows:

    prof -pixie profsample profsample.Addrs profsample.Counts.*

    If you had run profsample.pixie three times, the prof utility would have averaged the basic block data in the three files generated by the executable (profsample.Counts.<pid1>, profsample.Counts.<pid2>, and profsample.Counts.<pid3>) to produce the profile report.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


8.9    Analyzing Test Coverage

When you are writing a test suite for an application, you might want to know how effectively your suite tests the application. The prof utility provides two flags that can help you determine this. The -zero option prints the names of procedures that were never executed by your application. The -testcoverage option lists all of the source lines that were never executed by your application. Both of these flags require basic block counting.

Typically, you would perform the following steps to make use of these flags.

  1. Run the pixie Atom tool on your application.

  2. Run the results of step 1 through your test suite saving any \.Counts files.

  3. Profile your application with the -zero or -testcoverage flags and specify all of the \.Counts files produced when you ran the test suite.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


8.10    Merging Data Files

If the application you are profiling is fairly complicated, you may want to run it several times with different inputs to get an accurate picture of its profile. If you are using PC sampling, each run of your application produces a new mon.out file, or a program.pid file if you have set the PROFDIR environment variable. If you are using basic block counting, each run produces a new \.Counts file.

You have two ways of displaying profiling information that is based on an average of all of these output files.

The first way is to specify the names of each profiling data file explicitly on the command line. For example, the following command prints profiling information from two profile data files:

prof -procedures profsample 1510.profsample 1522.profsample

Keeping track of many different profiling data files, however, can be difficult. Therefore, prof provides the -merge option to combine several data files into a single merged file. When prof operates in -pixie mode, the -merge flag combines the \.Counts files. When prof operates in PC-sampling mode, this switch combines the mon.out or other profile data files.

The following example combines two profile data files into a single data file named total.out:

prof -merge total.out profsample 1773.profsample \
    1777.profsample

At a later time, you can then display profiling data using the combined file, just as you would use a normal mon.out file. For example:

 prof -procedures profsample total.out

The merge process is similar for -pixie mode. You must specify the executable file's name, the \.Addrs file, and each \.Counts file:

prof -pixie -merge total.Counts a.out a.out.Addrs \
  a.out.Counts.1866 a.out.Counts.1868


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


8.11    Using Feedback Files

Feedback files are useful in identifying portions of a large executable program in which significant percentages of the execution occur. Without feedback, the compiler must make assumptions about call frequency based on nesting levels. These assumptions are almost never as good as actual data from a sample run. The following sections describes how to use feedback files by using the cc command and the atom -tool pixie and prof commands.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


8.11.1    Generating and Using Feedback Information

Follow these steps to generate feedback information that can be used to optimize subsequent compilations:

  1. Compile the source code:

    cc -O2 -o profsample profsample.c -lm

  2. Run the pixie Atom tool on the executable file:

    atom -tool pixie -toolargs=-o profsample.pixie profsample

    This step creates an output executable file named profsample.pixie and a prof input file named profsample.Addrs.

  3. Execute the program you just created:

    profsample.pixie

    This step creates a file named profsample.Counts, which contains execution statistics.

  4. Use prof to create a feedback file from the execution statistics:

    prof -pixie -feedback profsample.feedback profsample

  5. You can use a feedback file as input to a compilation at -O2 or -O3 optimization levels when you use the -feedback option with the cc command, as shown in the following example:

    cc -O3 -feedback profsample.feedback -o \
      profsample profsample.c -lm

    The feedback file provides the compiler with actual execution information that can be used to improve certain optimizations, such as inlining function calls. Use a feedback file generated from a -O2 compilation for any subsequent compilations with -O2 or -O3 flags.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


8.11.2    Using a Feedback File for Input to cord

You can also use a feedback file as input to the cord utility. The cord utility orders the procedures in an executable program to improve execution time. The following example shows how to use the -cord option as part of a compilation command with a feedback file as input:

cc -O2 -cord -feedback profsample.feedback \
  -o profsample profsample.c -lm

Use a feedback file generated with the same optimization level as the level you use in subsequent compilations.

You can also use cord with the runcord utility. For more information, see runcord(1).


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


8.12    Using Environment Variables to Control PC-Sample Profiling

By default, the -p and -pg flags to the cc command provide the following:

The -p flag supports the profiling of shared libraries. The -pg flag and uprofile tool support the profiling of only the part of a program that is in the executable. When using these tools to generate profiling information for library routines, link your object file with the -non_shared flag to the cc command.

You can use one of the following environment variables to control profiling behavior:

By using these variables, you can disable aspects of default profiling behavior, including:

You can use the PROFFLAGS and PROFDIR environment variables together.

Note that these environment variables have no effect on the prof and gprof post-processors; they affect the profiling behavior of a program during its execution. These environment variables have no effect when you use the pixie Atom tool.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


8.12.1    PROFDIR Environment Variable

By default, profiling data is collected in a data file named [g]mon.out. When you do multiple profiling runs, each run overwrites the existing [g]mon.out file. Use the PROFDIR environment variable when you want to collect PC sampling data in files with unique names. Set this environment variable as follows:

The results are saved in the file path/pid.progname, which resolves as follows:

path
The directory path, specified with PROFDIR, identifying an existing directory.

pid
The process ID of the executing program.

progname
The program name.

When you set PROFDIR to a null string, no profiling occurs.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


8.12.2    PROFFLAGS Environment Variable

By default, the profiling library libprof1.a (or libprof1_r.a, for multithreaded programs) allocates one buffer per process to record your profiling data, as well as placing the data output file in your current directory.

To disable this default behavior, set the PROFFLAGS environment variable as follows:

When you have set PROFFLAGS to -disable_default, the default profiling support is disabled, allowing you to use the monitor calls to profile specific sections of your program for both nonthreaded and multithreaded programs. See monitor(3) and Section 8.13 for more information on using the monitor, monstartup, and moncontrol routines.

For multithreaded programs, you can allocate one buffer per thread by setting the PROFFLAGS environment variable as follows:

When you have set PROFFLAGS to -threads, a separate file is produced for each thread and is named pid.sid.progname, which is resolved as follows:

pid
The process identification of the program.

sid
The sequence number of the thread, which depends on the order in which the threads were created.

progname
The name of the program being profiled.

You can use the -threads and -disable_default flags together to control profiling of your program when you use the monitor routines.

You can also set the PROFFLAGS environment variable to include or exclude profiling information:

setenv PROFFLAGS "-all"
Causes the profiles for all shared libraries (if any) described in the data file(s) to be displayed, in addition to the profile for the executable.

setenv PROFFLAGS "-incobj lib_name" .dD Causes the profile for the named shared library to be printed, in addition to the profile for the executable.

setenv PROFFLAGS "-excobj lib_name" .dD Causes the profile for the named executable or shared library not to be printed.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


8.13    Using monitor Routines to Control Profiling

The default profiling behavior on Digital UNIX systems is to profile the entire text segment of your program and place the profiling data in mon.out for prof profiling or in gmon.out for gprof profiling. For large programs, you might not need to profile the entire text segment. The monitor routines provide the ability to profile portions of your program specified by the lower and upper address boundaries of a function address range.

The monitor routines are:

monitor( )
Use this routine to gain control of explicit profiling by turning profiling on and off for a specific text range. This routine is not supported for gprof profiling.

monstartup( )
Similar to monitor, except it specifies address range only and is supported for gprof profiling.

moncontrol( )
Use this routine with monitor and monstartup to turn PC sampling on or off during program execution for a specific process or thread.

monitor_signal( )
Use this routine to profile nonterminating programs, such as daemons.

You can use monitor and monstartup to profile an address range in each shared library as well as in the static executable.

For more information on these functions, see monitor(3).

By default, profiling begins as soon your program starts to execute. You can set the PROFFLAGS environment variable to -disable_default to prevent profiling from beginning when your program executes. Then, you can use the monitor routines to begin profiling after the first call to monitor or monstartup.

You can disable the default naming of the profiling data file by using the PROFDIR environment variable. For more information on using this environment variable, see Section 8.12.1.

Example 8-6 demonstrates how to use the monstartup and monitor routines within a program to begin and end profiling.

Example 8-6: Using monstartup() and monitor()


 
/* Profile the domath() routine using monstartup. * This example allocates a buffer for the entire program. * Compile command: cc -p foo.c -o foo -lm * Before running the executable, enter the following * from the command line to disable default profiling support: * setenv PROFFLAGS -disable_default */
 
#include <stdio.h> #include <sys/syslimits.h>
 
char dir[PATH_MAX];
 
extern void _ _start(); extern unsigned long _etext;
 
main() { int i; int a = 1;
 
/* Start profiling between _ _start (beginning of text * and _etext (end of text). The profiling library * routines will allocate the buffer. */
 
monstartup(_ _start,&_etext);
 
for(i=0;i<10;i++) domath();
 
/* Stop profiling and write the profiling output file. */
 
monitor(0);
 
} domath() { int i; double d1, d2;
 
d2 = 3.1415; for (i=0; i<1000000; i++) d1 = sqrt(d2)*sqrt(d2); }

The external name _etext lies just above all the program text. See end(3) for more information.

When you set the PROFFLAGS environment variable to -disable_default, you disable default profiling buffer support. You can allocate buffers within your program, as shown in Example 8-7.

Example 8-7: Allocating Profiling Buffers Within a Program


 
/* Profile the domath routine using monitor(). * Compile command: cc -p foo.c -o foo -lm * Before running the executable, enter the following * from the command line to disable default profiling support: * setenv PROFFLAGS -disable_default */
 
#include <sys/types.h> #include <sys/syslimits.h>
 
extern char *calloc();
 
void domath(void); void nextproc(void);
 
#define INST_SIZE 4 /* Instruction size on Alpha */ char dir[PATH_MAX];
 
main() { int i; char *buffer; size_t bufsize;
 
/* Allocate one counter for each instruction to * be sampled. Each counter is an unsigned short. */
 
bufsize = (((char *)nextproc - (char *)domath)/INST_SIZE) * sizeof(unsigned short);
 
/* Use calloc() to ensure that the buffer is clean * before sampling begins. */
 
buffer = calloc(bufsize,1);
 
/* Start sampling. */ monitor(domath,nextproc,buffer,bufsize,0); for(i=0;i<10;i++) domath();
 
/* Stop sampling and write out profiling buffer. */ monitor(0); } void domath(void) { int i; double d1, d2;
 
d2 = 3.1415; for (i=0; i<1000000; i++) d1 = sqrt(d2)*sqrt(d2); }
 
void nextproc(void) {}

You use the monitor_signal( ) routine to profile programs that do not terminate. Declare this routine as a signal handler in your program and build the program for prof or gprof profiling. While the program is executing, send a signal from the shell by using the kill command.

When the signal is received, monitor_signal is invoked and writes profiling data to the data file. If the program receives another signal, the data file is overwritten.

Example 8-8 illustrates how to use the monitor_signal routine.

Example 8-8: Using monitor_signal() to Profile Non-Terminating Programs


 
/* From the shell, start up the program in background. * Send a signal to the process, for example: kill -30 <pid> * Process the [g]mon.out file normally using gprof or prof */
 
#include <signal.h>
 
extern int monitor_signal();
 
main() { int i; double d1, d2;
 
/* * Declare monitor_signal() as signal handler for SIGUSR1 */ signal(SIGUSR1,monitor_signal); d2 = 3.1415; /* * Loop infinitely (absurd example of non-terminating process) */ for (;;) d1 = sqrt(d2)*sqrt(d2); }


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Chapter] [Index] [Help]


8.14    Profiling Multithreaded Applications

Profiling multithreaded applications is essentially the same as profiling non-threaded applications. However, to profile multithreaded applications, you must compile your program with the -pthread or -threads flag to the cc command. Specifying one of these flags and either the -p or -pg flag enables the thread profiling library, libprof1_r.a.

The default case for profiling multithreaded applications is to provide one sampling buffer for all threads. In this case, you get sampling across the entire process and you get one output file comprising sampling data from all threads. Depending on whether you use the -p or -pg flag, your output file will be named mon.out or gmon.out, respectively.

To get a separate buffer and a separate output file for each thread in your program, use the environment variable PROFFLAGS. Set PROFFLAGS to -threads, as shown in the following example:

setenv PROFFLAGS "-threads"

The profiling data file will be named according to the following convention:

pid.sid.progname

In the preceding example, pid is the process id of the program, sid corresponds to the order in which the thread was created, progname is your program name.

If the application controls profiling by using the monitor routines, sid corresponds to the order in which profiling was started for the thread.

If you use the monitor( ) or monstartup( ) calls in a threaded program, you must first set PROFFLAGS to "-disable_default -threads", giving you complete control of profiling the application.

If the application uses monitor( ) and allocates separate buffers for each thread profiled, you must first set PROFFLAGS to "disable_default -threads", because this setting affects the file naming conventions that are used. Without the -threads flag, the buffer and address range used as a result of the first monitor or monstartup call would be applied to every thread that subsequently requests profiling. In this case, a single data file that covers all threads being profiled would be created.

Each thread in a process must call the monitor( ) or monstartup( ) routines to initiate profiling for itself.