Profiling is a method of identifying sections of code that consume large portions of execution time. In a typical program, most execution time is spent in relatively few sections of code. To improve performance, the greatest gains result from improving coding efficiency in time-intensive sections.
This chapter discusses the following topics:
Profiling methods include:
To select an appropriate profiling method for an application, you must take into consideration the following factors:
The profiling data display tools, and their respective data collection methods, include the following:
The prof tool supports the following data collection methods:
The -p flag supports the profiling of shared libraries, but requires you to at least relink the program. It collects only CPU statistics using PC sampling
The uprofile tool profiles user code. It does not support the profiling of shared libraries. It does not require you to relink the program and collects either CPU statistics or other information.
The kprofile tool profiles the running operating system kernel. It does not require you to relink the program and collects either CPU statistics or other information.
The pixie Atom tool supports the profiling of shared libraries and does not require you to relink the program. It supports the prof tool's instruction-level profiling and true cycle-count estimation.
The gprof tool supports the following data collection methods:
The -pg flag does not allow the profiling of shared libraries. It requires that you recompile the program's sources and uses an apportioned call cost method to determine a given procedure's cost to its callers.
The hiprof Atom tool supports the profiling of shared libraries and does not require you to recompile or relink. To determine a given procedure's cost to its callers, it supports both the apportioned call cost method and the measured call cost method.
You can also use the monitor routines to perform PC-sampling on a specified address range in a program. For more information on using monitor routines, see Section 8.13 and monitor(3).
Table 8-1 provides a concise overview of the profiling tools available in the Digital UNIX operating system.
Tool | Use |
PC-sampling/ prof | Link application with -p; analyze results with prof; see prof(1) and monitor(3). |
Call-arcs/ gprof | Compile and link with -pg; analyze results with gprof; see gprof(1) and monitor(3). |
pixstats | Additional postprocessor for pixified program output; see pixstats(1). |
uprofile/ kprofile | Run application under uprofile or kprofile; requires pfm driver to be installed; analyze results with prof; see uprofile(1), kprofile(1), and pfm(7). |
Atom toolkit | Programmable debug/performance analysis tool. Example tools are contained in /usr/lib/cmplrs/atom/examples; see atom(1) and other Atom reference pages for programming interface. |
pixie | Atom-based basic block profiler; analyze results with prof; see pixie(5). |
hiprof | Atom-based call-arc analyzer; analyze results with gprof; see hiprof(5). |
third | Atom-based memory error/leak detection tool, Third Degree; generates text output. See third(5). |
All profiling tools work on call-shared and nonshared applications.
Statistical PC-sampling for the program is useful for diagnosing high CPU-usage procedures in the program and it supports both threads and shared libraries.
Interface summary:
%
cc -p *.o -o program # Link with libprof1.a
%
program # Run program to collect data
%
prof program # Process the mon.out file
The gprof tool provides procedure call information coupled with statistical PC-sampling. This is useful for determining which routines are called most frequently and from where. The gprof tool also gives a flat profile for CPU-usage on the routines. It supports threads and call-shared programs, but does not support shared libraries.
Using the gprof tool, you can retrieve information from libc.a and libm.a because these two libraries are compiled with the -pg flag. Other Digital-supplied libraries are not compiled with -pg, so calling information on these other system libraries is not available.
Interface summary:
%
cc -pg *.c -o program # Compile and link with -pg
%
program # Run program to collect data
%
gprof program # Process the gmon.out file
The uprofile and kprofile tools use the performance counters on the Alpha chip. They do not collect information on shared libraries. By default, both tools collect cycles for the program. The performance data produced by these tools is processed with the prof command. See uprofile(1) and kprofile(1) for more information.
The Atom toolkit consists of a programmable instrumentation tool and several packaged tools. Examples are included in the /usr/lib/cmplrs/atom/examples directory that demonstrate how to develop instrumentation and analysis code. The instrumentation part of the tool instructs Atom on where to insert calls to analysis routines in the program. When the program is run, the analysis routines are entered and data collection is performed as prescribed by the Atom tool specified on the atom command.
Atom does not work on programs built with the -om flag.
Interface summary:
%
atom -tool toolname program
%
program.tool
Postprocessing is tool-dependent. See Chapter 9 for details on Atom.
The Atom-based pixie is a basic block profiler that supports shared libraries and threaded applications.
Interface summary:
%
atom -tool pixie [-env threads] program
%
program.pixie[.threads]
%
prof -pixie program
The hiprof Atom tool collects call-arc information on a program. By default, it operates like the gprof support provided by the -pg flag, but has flag-selectable options that are more powerful. The hiprof Atom tool supports shared libraries and threaded applications.
Interface summary:
%
atom -tool hiprof [-env threads] program
%
program.hiprof[.threads]
%
gprof program program.hiout
Third Degree is a memory-leak and memory-overwrite detection tool, also based on Atom. Third Degree generates text output to a file called program.3log. The log contains the diagnostics that Third Degree detected (for example, reads of uninitialized heap or stack, memory overwrites, and memory leaks).
Interface summary:
%
atom -tool third [-env threads] program
%
program.third[.threads]
%
cat program.3log
The examples in the remainder of this chapter refer to the sample program, profsample.c, shown in Example 8-1.
#include <math.h> #include <stdio.h>
#define LEN 100
void mult_by_scalar(double ary[], int len, double num); void add_vector(double arya[], double aryb[], int len); double value; void printit(double value);
main() { double ary1[LEN]; double ary2[LEN]; int i;
for (i=0; i<LEN; i++) { ary1[i] = 0.0; ary2[i] = sqrt((double)i); } mult_by_scalar(ary1, LEN, 3.14159); mult_by_scalar(ary2, LEN, 2.71828); for (i=0; i<20; i++) add_vector(ary1, ary2, LEN); }
void mult_by_scalar(double ary[], int len, double num) { int i;
for (i=0; i<len; i++) { ary[i] *= num; value = ary[i]; printit(value); } }
void add_vector(double arya[], double aryb[], int len) { int i;
for (i=0; i<len; i++) { arya[i] += aryb[i]; value = arya[i]; printit(value); } }
void printit(double value) { printf("Value = %f\n", value); }
To use prof to obtain PC sampling data on a program, follow these steps:
%
cc -c profsample.c
%
cc -p -o profsample profsample.o -lm
You must specify the -p profiling option during the link step to obtain PC sampling information. If you have an existing application, you will not need to recompile to profile the executable program; simply relink the program using the -p option with the cc command.
If you are building an application for the first time, you can compile and link in the same step. In the preceding example, the -lm option ensures that libm.{a,so} is used to resolve symbols that refer to math library functions.
You might also consider compiling with one of the optimization flags to help improve the efficiency of your code, compiling with a debug flag to provide more symbolic information for the profile report, or compiling with both types of flags.
If you are profiling a multithreaded application, use the -threads flag with the cc command. For more information on profiling multithreaded applications, see Section 8.14.
%
profsample
You can run the program several times, altering the input data (if any) to create multiple profile data files.
During execution, profiling data is saved in a profile data file. The default name for the profile data file is mon.out, unless you have set the environment variable PROFDIR. For more information on using PROFDIR, see Section 8.12.1
%
prof profsample mon.out
Example 8-2
shows output produced by
the
prof
command on the
profsample.c
program.
Profile listing generated Thu May 26 13:36:14 1994 with: prof profsample mon.out
-------------------------------------------------------------- * -p[rocedures] using pc-sampling; sorted in descending * * order by total time spent in each procedure; * * unexecuted procedures excluded * --------------------------------------------------------------
Each sample covers 4.00 byte(s) for 14% of 0.0068 seconds
%time seconds cum % cum sec procedure (file)
42.9 0.0029 42.9 0.00 printit (profsample.c) 42.9 0.0029 85.7 0.01 add_vector (profsample.c) [1] 14.3 0.0010 100.0 0.01 mult_by_scalar (profsample.c)
Because the prof program works by periodic sampling of the program counter, you might see different output when you profile the same program multiple times. A different profiling run than the preceding example of the sample program produced the following output:
Profile listing generated Thu May 26 13:34:00 1994 with: prof -procedures profsample mon.out
-------------------------------------------------------------- * -p[rocedures] using pc-sampling; sorted in descending * * order by total time spent in each procedure; * * unexecuted procedures excluded * --------------------------------------------------------------
Each sample covers 4.00 byte(s) for 17% of 0.0059 seconds
%time seconds cum % cum sec procedure (file)
66.7 0.0039 66.7 0.00 add_vector (profsample.c) 33.3 0.0020 100.0 0.01 printit (profsample.c)
To determine the manner in which routines call, or are called by, other routines, use the gprof profiling tool.
The gprof tool postprocesses both hiprof output and -pg output.
To use this tool, follow these steps:
%
atom -tool hiprof profsample
%
profsample.hiprof
%
gprof profsample profsample.hiout
During execution, profiling data is saved in the data file profsample.hiout, unless you have set the -dirname flag in the HIPROF_ARGS environment variable or on the command line.
Alternatively, you can use the following procedure to collect profiling data for the gprof tool:
%
cc -pg -c profsample.c
%
cc -pg -o profsample profsample.o -lm
You must specify the -pg flag with the cc command during both the compile and link steps to obtain call graph information.
%
profsample
When this method is used, profiling data is saved during execution in the data file gmon.out, unless you have set the PROFDIR environment variable. For more information on using this variable, see Section 8.12.1.
%
gprof profsample gmon.out
The output produced by the
gprof
utility comprises three sections:
You can control gprof profiling by file by using the -no_pg flag to the cc command. When you use this flag, you disable gprof profiling for all objects that follow the flag on the command line. You cannot use the -no_pg flag with the -r and -shared flags to the ld command.
Example 8-3 shows output for gprof profiling of the sample program. The -b flag was used with gprof to suppress printing of the description of each output field. The descriptions are valuable, but they are lengthy and were left out due to space considerations. To see these descriptions, follow the steps to produce gprof output and write the output to a file or pipe the output through the more utility.
In the call graph profile section, each routine in the program
has its own subsection that is contained within dashed lines and
identified by the index number in the first column.
Note that for the purpose of this
example output, the three sections have been separated by rows of
asterisks that do not appear in the output produced by
gprof.
Each row of asterisks includes the name of the section.
For more information on
gprof
flags, see the
gprof(1)
reference page.
*********************** call graph profile *******************
granularity: each sample hit covers 4 byte(s) for 10.00% of 0.01 seconds
called/total parents index %time self descendents called+self name index called/total children
<spontaneous> [1] 100.0 0.00 0.01 main [1] 0.00 0.00 20/20 add_vector [2] 0.00 0.00 2/2 mult_by_scalar [4]
-----------------------------------------------
0.00 0.00 20/20 main [1] [1] [2] 75.5 0.00 0.00 20 add_vector [2] [2] 0.00 0.00 2000/2200 printit [3] [3]
-----------------------------------------------
0.00 0.00 200/2200 mult_by_scalar [4] 0.00 0.00 2000/2200 add_vector [2] [3] 50.0 0.00 0.00 2200 printit [3]
-----------------------------------------------
0.00 0.00 2/2 main [1] [4] 4.5 0.00 0.00 2 mult_by_scalar [4] 0.00 0.00 200/2200 printit [3]
-----------------------------------------------
*********************** timing profile section ***************
granularity: each sample hit covers 4 byte(s) for 10.00% of 0.01 seconds
% cumulative self self total time seconds seconds calls ms/call ms/call name 50.0 0.00 0.00 2200 0.00 0.00 printit [3] 30.0 0.01 0.00 20 0.15 0.37 add_vector [2] 20.0 0.01 0.00 main [1] 0.0 0.01 0.00 2 0.00 0.22 mult_by_scalar[4]
*********************** index section ************************ Index by function name
[2] add_vector [4] mult_by_scalar [1] main [3] printit
A basic block is a set of instructions with one entry and one exit. The pixie Atom tool provides execution counts for the basic blocks of a program. With prof, the execution counts can be viewed at the instruction level.
To obtain data for basic block counting, follow these steps:
%
cc -c profsample.c
%
cc -o profsample profsample.o -lm
%
atom -tool pixie profsample
The profsample.pixie file is equivalent to profsample but contains additional code that counts the execution of each basic block. To create an output file with a name other than pname.pixie, use the -o flag followed by the name you assign to the output file.
The profsample.Addrs file contains the address of each of the basic blocks. For more information, see pixie(5).
%
profsample.pixie
This command generates the file profsample.Counts, which contains the basic block counts. Each time you execute the profsample.pixie file, you create a new profsample.Counts file.
%
prof -pixie profsample
This command extracts information from profsample.Addrs and profsample.Counts and displays information in an easily readable format. Note that you do not need to specify the .Addrs and .Counts file suffixes because pixie searches by default for files containing them.
You can also run the pixstats program on the executable file profsample to generate a detailed report on opcode frequencies, interlocks, a miniprofile, and more. For more information, see pixstats(1).
Note
The pixie profiling tool provided in the current version of the Digital UNIX operating system is the pixie Atom tool. If you use the syntax provided in earlier versions of the operating system to invoke pixie, a script transforms the call into a call to the pixie Atom tool. The previous version of the pixie tool can be found at /usr/opt/obsolete/usr/bin/pixie.
Depending on the size of the application and the
type of profiling you request,
prof
may generate a very large amount of output.
However, you are often only interested in profiling data about a
particular portion of your application.
The prof program provides the following flags to display information selectively by procedure:
The -only option tells prof to print only profiling information for a particular procedure. You can specify the -only option multiple times on the command line. For example, the following command displays profiling information for procedures mult_by_scalar and add_vector from the sample program:
%
prof -only mult_by_scalar -only add_vector profsample
The -exclude option tells prof to print profiling information for all procedures except the specified procedure. You can use multiple -exclude flags on the command line.
The following command displays
profiling information for all procedures except
add_vector:
%
prof -exclude add_vector profsample
Do not use the -only and -exclude flags on the same command line.
Many of the prof utility's profiling flags print output as percentages, for example, the percentage of total execution time attributed to a particular procedure.
By default, the -only and -exclude flags cause prof to calculate percentages based on all of the procedures in the application even if they were omitted from the listing. You can change this behavior with the -Only and -Exclude flags. These flags work the same as -only and -exclude, but cause prof to calculate percentages based only on those procedures that appear in the listing. For example, the following command omits the add_vector procedure from both the listing and from percentage calculations:
%
prof -Exclude add_vector profsample
The -totals flag, used with the -procedures and -invocations listings, prints cumulative statistics for the entire object file instead of for each procedure in the object.
The -all, -incobj, and -excobj flags allows you to display profiling information for shared libraries used by the program:
The -heavy and -lines flags cause prof to display the total number of machine cycles executed by each source line in your application. Both of these flags require you to use basic block counting (the -pixie option); they do not work in PC-sampling mode.
The -heavy option prints an entry for every source line that was executed by your application. Each entry shows the total number of machine cycles executed by that line. Entries are sorted from the line with the most machine cycles to the line with the least machine cycles. Because this option often prints a huge number of entries, you might want to use one of the -quit, -only, or -exclude flags to reduce output to a manageable size.
Example 8-4 shows output generated by the following command:
%
prof -pixie -heavy -only add_vector -only mult_by_scalar \
-only main profsample
For example, you can see in Example 8-4 that line 47 of profsample.c in the procedure add_vector( ) accounts for over 12 percent of the application's total execution time. The listing also shows the size in bytes of each source line.
Profile listing generated Fri May 27 14:09:10 1994 with: prof -pixie -heavy -only add_vector -only mult_by_scalar -only main profsample
------------------------------------------------------------------ * -h[eavy] using basic-block counts; * * sorted in descending order by the number of cycles executed * * in each * * line; unexecuted lines are excluded * ------------------------------------------------------------------
procedure (file) line bytes cycles % cum %
add_vector (profsample.c) 48 44 22000 23.26 23.26 add_vector (profsample.c) 46 40 20000 21.15 44.41 add_vector (profsample.c) 47 24 12000 12.69 57.10 mult_by_scalar (profsample.c) 36 44 2200 2.33 59.43 main (profsample.c) 20 60 1500 1.59 61.02 mult_by_scalar (profsample.c) 34 28 1400 1.48 62.50 mult_by_scalar (profsample.c) 35 24 1200 1.27 63.77 main (profsample.c) 19 12 300 0.32 64.08 main (profsample.c) 25 48 240 0.25 64.34 add_vector (profsample.c) 41 28 140 0.15 64.48 add_vector (profsample.c) 44 12 60 0.06 64.55 add_vector (profsample.c) 50 12 60 0.06 64.61 mult_by_scalar (profsample.c) 29 28 14 0.01 64.63 main (profsample.c) 23 32 8 0.01 64.63 main (profsample.c) 22 32 8 0.01 64.64 mult_by_scalar (profsample.c) 38 12 6 0.01 64.65 mult_by_scalar (profsample.c) 32 12 6 0.01 64.66 main (profsample.c) 26 16 4 0.00 64.66
main (profsample.c) 13 16 4 0.00 64.66 main (profsample.c) 18 8 2 0.00 64.67 main (profsample.c) 24 8 2 0.00 64.67
The -lines option is similar to -heavy, but it sorts the output differently. This option prints the lines for each procedure in the order that they occur in the source file. Even lines that never executed are printed. The procedures themselves are sorted from those procedures that execute the most machine cycles to those that execute the least.
Example 8-5 shows the same information as Example 8-4, but in a different format as generated by the following command:
%
prof -pixie -lines -only add_vector -only mult_by_scalar \
-only main profsample
Profile listing generated Fri May 27 14:07:28 1994 with: prof -pixie -lines -only add_vector -only mult_by_scalar -only main profsample
------------------------------------------------------------------ * -l[ines] using basic-block counts; * * grouped by procedure, sorted by cycles executed per procedure;* * '?' means that line number information is not available. * ------------------------------------------------------------------
procedure (file) line bytes cycles % cum %
add_vector (profsample.c) 41 28 140 0.15 0.15 44 12 60 0.06 0.21 46 40 20000 21.15 21.36 47 24 12000 12.69 34.05 48 44 22000 23.26 57.32 50 12 60 0.06 57.38 mult_by_scalar (profsample.c) 29 28 14 0.01 57.39 32 12 6 0.01 57.40 34 28 1400 1.48 58.88 35 24 1200 1.27 60.15 36 44 2200 2.33 62.48 38 12 6 0.01 62.48 main (profsample.c) 13 16 4 0.00 62.49 18 8 2 0.00 62.49 19 12 300 0.32 62.81 20 60 1500 1.59 64.39 22 32 8 0.01 64.40 23 32 8 0.01 64.41
24 8 2 0.00 64.41 25 48 240 0.25 64.66 26 16 4 0.00 64.67
The -quit option reduces the amount of profiling output displayed. The -quit option affects the output from the -procedures, -heavy, and -lines profiling modes.
The -quit option provides three versions:
The n refers to an integer. All lines after the n line are truncated.
The n is an integer followed by a percent sign (%). All lines after the line containing n% calls in the %calls column of the display are truncated.
The ncum% refers to an integer n followed by the characters cum (for cumulative) and a percent sign (%). All lines after the line containing ncum% calls in the cum% column of the display are truncated.
If you specify several modes on the same command line, the -quit option affects the output from each mode. For example, the -quit option in the following command reduces the output from both the -procedures and -heavy modes:
%
prof -pixie -procedures -heavy -quit 20 profsample
This command prints only the 20 most time-consuming procedures and the 20 most time-consuming source lines. The -quit n option has no affect on the -lines profiling mode.
The -quit n% option restricts the output to those entries that account for at least n% of the total. Depending on the profiling mode, the total can refer to the total amount of time, the total number of machine cycles, or the total number of invocation counts. For example, the following command prints only those source lines that account for at least 2 percent of the application's total number of machine cycles:
%
prof -pixie -lines -quit 2% profsample
The -quit ncum% option truncates the output after n% of the total has been accounted for. The definition of total depends on the profiling mode, as described in the preceding paragraph. For example, the following command prints the most heavily used source line and stops after 30 percent of the application's total number of machine cycles have been accounted for:
%
prof -pixie -heavy -quit 30cum% sample
A single run of a program may not produce the desired results. You can repeatedly run the version of the program created by pixie, varying the input with each run, and then use the resulting .Counts files to produce a consolidated report. For example:
%
cc -c profsample.c
%
cc -o profsample profsample.o -lm
%
atom -tool pixie -toolargs=-pids profsample
This command produces the profsample.Addrs file to be used in step 4, as well as the modified program profsample.pixie.
%
profsample.pixie
The -pids option specified with the atom -tool pixie command in step 2 appends the process ID of the process running the executable program to the name of the profsample.Counts file, for example, profsample.Counts.1753.
%
prof -pixie profsample profsample.Addrs profsample.Counts.*
If you had run profsample.pixie three times, the prof utility would have averaged the basic block data in the three files generated by the executable (profsample.Counts.<pid1>, profsample.Counts.<pid2>, and profsample.Counts.<pid3>) to produce the profile report.
When you are writing a test suite for an application, you might want to know how effectively your suite tests the application. The prof utility provides two flags that can help you determine this. The -zero option prints the names of procedures that were never executed by your application. The -testcoverage option lists all of the source lines that were never executed by your application. Both of these flags require basic block counting.
Typically, you would perform the following steps to make use of these flags.
If the application you are profiling is fairly complicated, you may want to run it several times with different inputs to get an accurate picture of its profile. If you are using PC sampling, each run of your application produces a new mon.out file, or a program.pid file if you have set the PROFDIR environment variable. If you are using basic block counting, each run produces a new \.Counts file.
You have two ways of displaying profiling information that is based on an average of all of these output files.
The first way is to specify the names of each profiling data file explicitly on the command line. For example, the following command prints profiling information from two profile data files:
%
prof -procedures profsample 1510.profsample 1522.profsample
Keeping track of many different profiling data files, however, can be difficult. Therefore, prof provides the -merge option to combine several data files into a single merged file. When prof operates in -pixie mode, the -merge flag combines the \.Counts files. When prof operates in PC-sampling mode, this switch combines the mon.out or other profile data files.
The following example combines two profile data files into a single data file named total.out:
%
prof -merge total.out profsample 1773.profsample \
1777.profsample
At a later time, you can then display profiling data using the combined file, just as you would use a normal mon.out file. For example:
%
prof -procedures profsample total.out
The merge process is similar for -pixie mode. You must specify the executable file's name, the \.Addrs file, and each \.Counts file:
%
prof -pixie -merge total.Counts a.out a.out.Addrs \
a.out.Counts.1866 a.out.Counts.1868
Feedback files are useful in identifying portions of a large executable program in which significant percentages of the execution occur. Without feedback, the compiler must make assumptions about call frequency based on nesting levels. These assumptions are almost never as good as actual data from a sample run. The following sections describes how to use feedback files by using the cc command and the atom -tool pixie and prof commands.
Follow these steps to generate feedback information that can be used to optimize subsequent compilations:
%
cc -O2 -o profsample profsample.c -lm
%
atom -tool pixie -toolargs=-o profsample.pixie profsample
This step creates an output executable file named profsample.pixie and a prof input file named profsample.Addrs.
%
profsample.pixie
This step creates a file named profsample.Counts, which contains execution statistics.
%
prof -pixie -feedback profsample.feedback profsample
%
cc -O3 -feedback profsample.feedback -o \
profsample profsample.c -lm
The feedback file provides the compiler with actual execution information that can be used to improve certain optimizations, such as inlining function calls. Use a feedback file generated from a -O2 compilation for any subsequent compilations with -O2 or -O3 flags.
You can also use a feedback file as input to the cord utility. The cord utility orders the procedures in an executable program to improve execution time. The following example shows how to use the -cord option as part of a compilation command with a feedback file as input:
%
cc -O2 -cord -feedback profsample.feedback \
-o profsample profsample.c -lm
Use a feedback file generated with the same optimization level as the level you use in subsequent compilations.
You can also use cord with the runcord utility. For more information, see runcord(1).
By default, the -p and -pg flags to the cc command provide the following:
The -p flag supports the profiling of shared libraries. The -pg flag and uprofile tool support the profiling of only the part of a program that is in the executable. When using these tools to generate profiling information for library routines, link your object file with the -non_shared flag to the cc command.
You can use one of the following environment variables to control profiling behavior:
By using these variables, you can disable aspects of default profiling behavior, including:
You can use the PROFFLAGS and PROFDIR environment variables together.
Note that these environment variables have no effect on the prof and gprof post-processors; they affect the profiling behavior of a program during its execution. These environment variables have no effect when you use the pixie Atom tool.
By default, profiling data is collected in a data file named [g]mon.out. When you do multiple profiling runs, each run overwrites the existing [g]mon.out file. Use the PROFDIR environment variable when you want to collect PC sampling data in files with unique names. Set this environment variable as follows:
setenv PROFDIR path
PROFDIR = path; export PROFDIR
The results are saved in the file path/pid.progname, which resolves as follows:
When you set PROFDIR to a null string, no profiling occurs.
By default, the profiling library libprof1.a (or libprof1_r.a, for multithreaded programs) allocates one buffer per process to record your profiling data, as well as placing the data output file in your current directory.
To disable this default behavior, set the PROFFLAGS environment variable as follows:
setenv PROFFLAGS "-disable_default"
PROFFLAGS = "-disable_default"; export PROFFLAGS
When you have set PROFFLAGS to -disable_default, the default profiling support is disabled, allowing you to use the monitor calls to profile specific sections of your program for both nonthreaded and multithreaded programs. See monitor(3) and Section 8.13 for more information on using the monitor, monstartup, and moncontrol routines.
For multithreaded programs, you can allocate one buffer per thread by setting the PROFFLAGS environment variable as follows:
setenv PROFFLAGS "-threads"
PROFFLAGS = "-threads"; export PROFFLAGS
When you have set PROFFLAGS to -threads, a separate file is produced for each thread and is named pid.sid.progname, which is resolved as follows:
You can use the -threads and -disable_default flags together to control profiling of your program when you use the monitor routines.
You can also set the PROFFLAGS environment variable to include or exclude profiling information:
The default profiling behavior on Digital UNIX systems is to profile the entire text segment of your program and place the profiling data in mon.out for prof profiling or in gmon.out for gprof profiling. For large programs, you might not need to profile the entire text segment. The monitor routines provide the ability to profile portions of your program specified by the lower and upper address boundaries of a function address range.
The monitor routines are:
You can use monitor and monstartup to profile an address range in each shared library as well as in the static executable.
For more information on these functions, see monitor(3).
By default, profiling begins as soon your program starts to execute. You can set the PROFFLAGS environment variable to -disable_default to prevent profiling from beginning when your program executes. Then, you can use the monitor routines to begin profiling after the first call to monitor or monstartup.
You can disable the default naming of the profiling data file by using the PROFDIR environment variable. For more information on using this environment variable, see Section 8.12.1.
Example 8-6 demonstrates how to use the monstartup and monitor routines within a program to begin and end profiling.
/* Profile the domath() routine using monstartup. * This example allocates a buffer for the entire program. * Compile command: cc -p foo.c -o foo -lm * Before running the executable, enter the following * from the command line to disable default profiling support: * setenv PROFFLAGS -disable_default */
#include <stdio.h> #include <sys/syslimits.h>
char dir[PATH_MAX];
extern void _ _start(); extern unsigned long _etext;
main() { int i; int a = 1;
/* Start profiling between _ _start (beginning of text * and _etext (end of text). The profiling library * routines will allocate the buffer. */
monstartup(_ _start,&_etext);
for(i=0;i<10;i++) domath();
/* Stop profiling and write the profiling output file. */
monitor(0);
} domath() { int i; double d1, d2;
d2 = 3.1415; for (i=0; i<1000000; i++) d1 = sqrt(d2)*sqrt(d2); }
The external name _etext lies just above all the program text. See end(3) for more information.
When you set the PROFFLAGS environment variable to -disable_default, you disable default profiling buffer support. You can allocate buffers within your program, as shown in Example 8-7.
/* Profile the domath routine using monitor(). * Compile command: cc -p foo.c -o foo -lm * Before running the executable, enter the following * from the command line to disable default profiling support: * setenv PROFFLAGS -disable_default */
#include <sys/types.h> #include <sys/syslimits.h>
extern char *calloc();
void domath(void); void nextproc(void);
#define INST_SIZE 4 /* Instruction size on Alpha */ char dir[PATH_MAX];
main() { int i; char *buffer; size_t bufsize;
/* Allocate one counter for each instruction to * be sampled. Each counter is an unsigned short. */
bufsize = (((char *)nextproc - (char *)domath)/INST_SIZE) * sizeof(unsigned short);
/* Use calloc() to ensure that the buffer is clean * before sampling begins. */
buffer = calloc(bufsize,1);
/* Start sampling. */ monitor(domath,nextproc,buffer,bufsize,0); for(i=0;i<10;i++) domath();
/* Stop sampling and write out profiling buffer. */ monitor(0); } void domath(void) { int i; double d1, d2;
d2 = 3.1415; for (i=0; i<1000000; i++) d1 = sqrt(d2)*sqrt(d2); }
void nextproc(void) {}
You use the monitor_signal( ) routine to profile programs that do not terminate. Declare this routine as a signal handler in your program and build the program for prof or gprof profiling. While the program is executing, send a signal from the shell by using the kill command.
When the signal is received, monitor_signal is invoked and writes profiling data to the data file. If the program receives another signal, the data file is overwritten.
Example 8-8
illustrates how to use the
monitor_signal
routine.
/* From the shell, start up the program in background. * Send a signal to the process, for example: kill -30 <pid> * Process the [g]mon.out file normally using gprof or prof */
#include <signal.h>
extern int monitor_signal();
main() { int i; double d1, d2;
/* * Declare monitor_signal() as signal handler for SIGUSR1 */ signal(SIGUSR1,monitor_signal); d2 = 3.1415; /* * Loop infinitely (absurd example of non-terminating process) */ for (;;) d1 = sqrt(d2)*sqrt(d2); }
Profiling multithreaded applications is essentially the same as profiling non-threaded applications. However, to profile multithreaded applications, you must compile your program with the -pthread or -threads flag to the cc command. Specifying one of these flags and either the -p or -pg flag enables the thread profiling library, libprof1_r.a.
The default case for profiling multithreaded applications is to provide one sampling buffer for all threads. In this case, you get sampling across the entire process and you get one output file comprising sampling data from all threads. Depending on whether you use the -p or -pg flag, your output file will be named mon.out or gmon.out, respectively.
To get a separate buffer and a separate output file for each thread in your program, use the environment variable PROFFLAGS. Set PROFFLAGS to -threads, as shown in the following example:
setenv PROFFLAGS "-threads"
The profiling data file will be named according to the following convention:
pid.sid.progname
In the preceding example, pid is the process id of the program, sid corresponds to the order in which the thread was created, progname is your program name.
If the application controls profiling by using the monitor routines, sid corresponds to the order in which profiling was started for the thread.
If you use the monitor( ) or monstartup( ) calls in a threaded program, you must first set PROFFLAGS to "-disable_default -threads", giving you complete control of profiling the application.
If the application uses monitor( ) and allocates separate buffers for each thread profiled, you must first set PROFFLAGS to "disable_default -threads", because this setting affects the file naming conventions that are used. Without the -threads flag, the buffer and address range used as a result of the first monitor or monstartup call would be applied to every thread that subsequently requests profiling. In this case, a single data file that covers all threads being profiled would be created.
Each thread in a process must call the monitor( ) or monstartup( ) routines to initiate profiling for itself.