Profiling is a method of identifying sections of code that consume large portions of execution time. In a typical program, most execution time is spent in relatively few sections of code. To improve performance, the greatest gains result from improving coding efficiency in time-intensive sections.
This chapter discusses the following topics:
Using the
prof
program
Using the
gprof
program
Using the
pixie
and
hiprof
Atom tools
Using the
uprofile
and
kprofile
tools
Selecting profiling information to display
Using feedback files
Using profiling environment variables
Using
monitor
routines
Profiling multithreaded applications
In addition to these tools, you can use Visual Threads (available on the Associated Products CD) to analyze multithreaded applications for potential logic and performance problems. You can use Visual Threads with DECthreads applications that use POSIX threads (Pthreads) and with Java applications.
Profiling methods include:
Program counter (PC) sampling, a technique that periodically
interrupts your program and logs the value of the PC.
The
prof
and
gprof
tools use PC sampling to produce a statistical
sample showing which portions of code consume the most time.
The
gprof
tool also produces call graphs, which show the relationship
of calling and called routines.
Basic block counting, a technique that inserts profiling code at key points in your program. It produces a count of the number of times each instruction executes.
To select an appropriate profiling method for an application, you must take into consideration the following factors:
The statistics that you want to collect and examine (for example, CPU usage, call counts, call cost, memory usage, and I/O operations).
The level at which you need to collect these statistics (for example, at a procedure level or at an instruction level).
Whether you must profile the shared libraries used by the application as well as its executable.
The method that you use to collect the profiling data. Certain collection methods require that you compile and/or link the application's sources in a special way. Others allow you to run a utility that inserts instrumentation code into an existing program. Still others retrieve information from the CPU's performance counters while the uninstrumented program is running.
The tool that you use to display the profiling data. Depending on the information that you need, you can choose from three tools that display previously collected profiling information. Each tool supports multiple data collection methods.
The profiling data display tools, and their respective data collection methods, include the following:
profPrints a profile of statistics per procedure.
The
prof
tool supports the following data collection
methods:
Compiling or linking with the
-p
option
The
-p
option supports the profiling of shared
libraries, but requires you to at least relink the program.
It collects only
CPU statistics using PC sampling.
Using the
uprofile
tool
The
uprofile
tool profiles user code.
It does not
support the profiling of shared libraries.
It does not require you to relink
the program and collects either CPU statistics or other information.
Using the
kprofile
tool
The
kprofile
tool profiles the running operating
system kernel.
It does not require you to relink the program and collects
either CPU statistics or other information.
prof -pixiePrints a profile showing the number of times each procedure,
source line, or instruction is executed.
The
prof -pixie
tool supports the following basic block counting profiling data collection
method:
Using the
pixie
Atom tool (that is, the
atom -tool pixie
command) to instrument the program's basic
blocks.
The
pixie
Atom tool supports the profiling of shared
libraries and does not require you to relink the program.
It supports the
prof
tool's instruction-level profiling and true cycle-count estimation.
gprofProduces call-graph profile data showing the effects of calling routines on called routines as well as other information.
The
gprof
tool supports the following data collection
methods:
Compiling with the
-pg
option
The
-pg
option does not allow the profiling
of shared libraries.
It requires that you recompile the program's sources,
and uses an apportioned call cost method to determine a given procedure's
cost to its callers.
Using the
hiprof
Atom tool (that is, the
atom -tool hiprof
command) to instrument the program
The
hiprof
Atom tool supports the profiling of shared
libraries and does not require you to recompile or relink.
To determine a
given procedure's cost to its callers, it supports both the apportioned call
cost method and the measured call cost method.
You can also use the
monitor
routines to perform
PC sampling on a specified address range in a program.
For more information
on using
monitor
routines, see
Section 8.13
and
monitor(3).
Table 8-1 provides a concise overview of the profiling tools available in the Tru64 UNIX operating system.
| Tool | Use |
PC sampling/
prof |
Link application with
-p;
analyze results with
prof; see
prof(1)
and
monitor(3). |
Call-arcs/
gprof |
Compile and link with
-pg;
analyze results with
gprof; see
gprof(1)
and
monitor(3). |
prof -pixstats |
Additional postprocessor for pixie program
output; see
prof(1). |
uprofile/
kprofile |
Run application under
uprofile
or
kprofile; requires
pfm
driver to
be installed; analyze results with
prof; see
uprofile(1),
kprofile(1),
and
pfm(7). |
| Atom toolkit | Programmable debug/performance analysis tool.
Example tools are contained in
/usr/lib/cmplrs/atom/examples;
see
atom(1)
and other Atom reference pages for the programming interface. |
pixie |
Atom-based basic block profiler; analyze
results with
prof; see
pixie(5). |
hiprof |
Atom-based call-arc analyzer; analyze results
with
gprof; see
hiprof(5). |
third |
Atom-based memory error/leak detection tool,
Third Degree; generates text output.
See
third(5). |
All profiling tools work on call-shared and nonshared applications. The following sections describe these profiling tools in more detail.
Statistical PC sampling for the program is useful for diagnosing high CPU usage procedures in the program and it supports both threads and shared libraries.
The interface summary is as follows:
%cc -p *.o -o program # Link with libprof1.a%program # Run program to collect data%prof program # Process the mon.out file
The
gprof
tool provides procedure call information
coupled with statistical PC sampling.
This is useful to determine which routines
are called most frequently and from where.
The
gprof
tool
also gives a flat profile for CPU usage on the routines.
It supports threads
and call-shared programs but does not support shared libraries.
Using the
gprof
tool, you can retrieve information
from
libc.a
and
libm.a
because these
two libraries are compiled with the
-pg
option.
Other Compaq supplied libraries are not compiled with
-pg, so calling information on these other system libraries is not available.
The interface summary is as follows:
%cc -pg *.c -o program # Compile and link with -pg%program # Run program to collect data%gprof program # Process the gmon.out file
The
uprofile
and
kprofile
tools
use the performance counters on the Alpha chip.
They do not collect information
on shared libraries.
By default, both tools collect cycles for the program.
The performance data produced by these tools is processed with the
prof
command.
See
uprofile(1)
and
kprofile(1)
for more information.
The Atom toolkit consists of a programmable instrumentation tool and
several packaged tools.
Examples are included in the
/usr/lib/cmplrs/atom/examples
directory that demonstrate how to develop instrumentation and analysis
code.
The instrumentation part of the tool instructs Atom on where to insert
calls to analysis routines in the program.
When the program is run, the analysis
routines are entered and data collection is performed as prescribed by the
Atom tool specified on the
atom
command.
Atom does not work on programs built with the -om, -pg, or -p options..
The interface summary is as follows:
%atom -tool toolname program%program.tool#
Postprocessing is tool-dependent. See Chapter 9 for details on Atom.
The Atom-based
pixie
tool is a basic block profiler
that supports shared libraries and threaded applications.
The interface summary is as follows:
%atom -tool pixie [-env threads] program%program.pixie[.threads]%prof -pixie program
The
hiprof
Atom tool collects call-arc information on a program.
By
default, it operates like the
gprof
support provided by
the
-pg
option, but has option-selectable features
that are more powerful.
The
hiprof
Atom tool supports
shared libraries and threaded applications.
The interface summary is as follows:
%atom -tool hiprof [-env threads] program%program.hiprof[.threads]%gprof program program.hiout
Third Degree is a memory-leak and memory-overwrite detection tool, also
based on Atom.
Third Degree generates text output to a file called
program.3log.
The log contains the diagnostics that Third Degree
detected (for example, reads of uninitialized heap or stack, memory overwrites,
and memory leaks).
The interface summary as follows:
%atom -tool third [-env threads] program%program.third[.threads]%cat program.3log
The examples in the remainder of this chapter refer to the sample program,
profsample.c, shown in
Example 8-1.
#include <math.h>
#include <stdio.h>
#define LEN 100
void mult_by_scalar(double ary[], int len, double num);
void add_vector(double arya[], double aryb[], int len);
double value;
void printit(double value);
main()
{
double ary1[LEN];
double ary2[LEN];
int i;
for (i=0; i<LEN; i++) {
ary1[i] = 0.0;
ary2[i] = sqrt((double)i);
}
mult_by_scalar(ary1, LEN, 3.14159);
mult_by_scalar(ary2, LEN, 2.71828);
for (i=0; i<20; i++)
add_vector(ary1, ary2, LEN);
}
void mult_by_scalar(double ary[], int len, double num)
{
int i;
for (i=0; i<len; i++)
{
ary[i] *= num;
value = ary[i];
printit(value);
}
}
void add_vector(double arya[], double aryb[], int len)
{
int i;
for (i=0; i<len; i++)
{
arya[i] += aryb[i];
value = arya[i];
printit(value);
}
}
void printit(double value)
{
printf("Value = %f\n", value);
}
To use
prof
to obtain PC sampling data on a program, follow these steps:
Compile and link (or just link) using the
-p
option, as follows:
%cc -c profsample.c%cc -p -o profsample profsample.o -lm
You must specify the
-p
profiling
option during the link step to obtain PC sampling information.
If you have
an existing application, you will not need to recompile to profile the executable
program; simply relink the program using the
-p
option
with the
cc
command.
If you are building an application for the first time, you can compile
and link in the same step.
In the preceding example, the
-lm
option ensures that
libm.{a,so}
is used to resolve
symbols that refer to math library functions.
You might also consider compiling with one of the optimization options to help improve the efficiency of your code, compiling with a debug option to provide more symbolic information for the profile report, or with both.
If you are profiling a multithreaded application, use the
-threads
option with the
cc
command.
For
more information on profiling multithreaded applications, see
Section 8.14.
Execute the profiled program:
%profsample
You can run the program several times, altering the input data (if any) to create multiple profile data files.
During execution, profiling data is saved in a profile data file.
The
default name for the profile data file is
mon.out, unless
you have set the environment variable
PROFDIR.
For more
information on using
PROFDIR, see
Section 8.12.1.
Run the profile formatting program
prof,
which extracts information from one or more profile data files and produces
a tabular report:
%prof profsample mon.out
Example 8-2
shows output produced
by the
prof
command on the
profsample.c
program.
Profile listing generated Thu May 26 13:36:14 1998 with: prof profsample mon.out -------------------------------------------------------------- * -p[rocedures] using pc-sampling; sorted in descending * * order by total time spent in each procedure; * * unexecuted procedures excluded * -------------------------------------------------------------- Each sample covers 4.00 byte(s) for 14% of 0.0068 seconds %time seconds cum % cum sec procedure (file) 42.9 0.0029 42.9 0.00 printit (profsample.c) 42.9 0.0029 85.7 0.01 add_vector (profsample.c) [1] 14.3 0.0010 100.0 0.01 mult_by_scalar (profsample.c)
This sample line of output presents the following information:
42.9 percent of execution time was spent in
add_vector.
85.7 percent of total execution time was spent cumulatively
in the
printit
and
add_vector
routines.
The name of the source file for
mult_by_scalar
is
profsample.c
Because the
prof
program works by periodic sampling
of the program counter, you might see different output when you profile the
same program several times.
A different profiling run than the preceding example
of the sample program produced the following output:
Profile listing generated Thu May 26 13:34:00 1994 with: prof -procedures profsample mon.out -------------------------------------------------------------- * -p[rocedures] using pc-sampling; sorted in descending * * order by total time spent in each procedure; * * unexecuted procedures excluded * -------------------------------------------------------------- Each sample covers 4.00 byte(s) for 17% of 0.0059 seconds %time seconds cum % cum sec procedure (file) 66.7 0.0039 66.7 0.00 add_vector (profsample.c) 33.3 0.0020 100.0 0.01 printit (profsample.c)
To determine the manner in which routines call, or
are called by, other routines, use the
gprof
profiling
tool.
The
gprof
tool postprocesses both
hiprof
output and
-pg
output.
To use this tool, follow these steps:
Use the
hiprof
Atom tool to produce an
instrumented version of the program:
%atom -tool hiprof profsample
Execute the instrumented version of
profsample:
%profsample.hiprof
Examine the profiling data as follows:
%gprof profsample profsample.hiout
During execution, profiling data is saved in the data file
profsample.hiout, unless you have set the
-dirname
option in the
HIPROF_ARGS
environment variable
or on the command line.
Alternatively, you can use the following procedure to collect profiling
data for the
gprof
tool:
Compile and link using the
-pg
option,
as follows:
%cc -pg -c profsample.c%cc -pg -o profsample profsample.o -lm
You must specify the
-pg
option
with the
cc
command during both the compile and link steps
to obtain call graph information.
Execute the program:
%profsample
When this method
is used, profiling data is saved during execution in the data file
gmon.out, unless you have set the
PROFDIR
environment
variable.
For more information on using this variable, see
Section 8.12.1.
Run the formatting program
gprof, which
extracts information from the data file:
%gprof profsample gmon.out
The output produced by the
gprof
utility comprises
three sections:
Call graph profile
Timing profile, similar to the profile produced by
prof
Index
You can control
gprof
profiling by file by using
the
-no_pg
option to the
cc
command.
When you use this option, you disable
gprof
profiling for
all objects that follow the option on the command line.
You cannot use the
-no_pg
option with the
-r
and
-shared
options to the
ld
command.
Example 8-3
shows output for
gprof
profiling of the sample program.
The
-b
option was
used with
gprof
to suppress printing of the description
of each output field.
The descriptions are valuable, but they are lengthy
and were left out due to space considerations.
To see these descriptions,
follow the steps to produce
gprof
output and write the
output to a file or pipe the output through the
more
utility.
In the call graph profile section, each routine in the program has its
own subsection that is contained within dashed lines and identified by the
index number in the first column.
For the purpose of this example output,
the three sections have been separated by rows of asterisks that do not appear
in the output produced by
gprof.
Each row of asterisks
includes the name of the section.
For more information on
gprof
options, see
gprof(1).
*********************** call graph profile *******************
granularity: each sample hit covers 4 byte(s) for 10.00%
of 0.01 seconds
called/total parents
index %time self descendents called+self name index
called/total children
<spontaneous>
[1] 100.0 0.00 0.01 main [1]
0.00 0.00 20/20 add_vector [2]
0.00 0.00 2/2 mult_by_scalar [4]
-----------------------------------------------
0.00 0.00 20/20 main [1] [1]
[2] 75.5 0.00 0.00 20 add_vector [2] [2]
0.00 0.00 2000/2200 printit [3] [3]
-----------------------------------------------
0.00 0.00 200/2200 mult_by_scalar [4]
0.00 0.00 2000/2200 add_vector [2]
[3] 50.0 0.00 0.00 2200 printit [3]
-----------------------------------------------
0.00 0.00 2/2 main [1]
[4] 4.5 0.00 0.00 2 mult_by_scalar [4]
0.00 0.00 200/2200 printit [3]
-----------------------------------------------
*********************** timing profile section ***************
granularity: each sample hit covers 4 byte(s) for 10.00%
of 0.01 seconds
% cumulative self self total
time seconds seconds calls ms/call ms/call name
50.0 0.00 0.00 2200 0.00 0.00 printit [3]
30.0 0.01 0.00 20 0.15 0.37 add_vector [2]
20.0 0.01 0.00 main [1]
0.0 0.01 0.00 2 0.00 0.22 mult_by_scalar[4]
*********************** index section ************************
Index by function name
[2] add_vector [4] mult_by_scalar
[1] main [3] printit
This line describes the relationship of the
main
routine to the
add_vector
routine.
Because
main
is listed above the
add_vector
routine in
the final column of this section,
main
is identified as
the parent of
add_vector.
The fraction
20/20
indicates that of the 20 times that
add_vector
(the denominator
of the fraction) was called, it was called 20 times by
main
(the numerator of this fraction).
[Return to example]
This line describes the
add_vector
routine, which is the subject of this portion of the call graph
profile because it is the leftmost routine in the rightmost column of this
section.
The index number
[2]
in the first column corresponds
to the index number
[2]
in the index section at the end
of the output.
The
75.5%
in the second column reports the
total amount of time in the sample that is accounted for by the
add_vector
routine and its descendent, in this case the
printit
routine.
The
20
in the
called
column indicates the total number of times that the
add_vector
routine is called.
[Return to example]
This line describes the relationship of the
printit
routine to the
add_vector
routine.
Because
the
printit
routine is below the
add_vector
routine in this section,
printit
is identified as the child
of
add_vector.
The fraction
2000/2200
indicates that of the total of 2200 calls to
printit, 2000
of these calls came from
add_vector.
[Return to example]
A basic block is a set of instructions with one entry
and one exit.
The
pixie
Atom tool provides execution counts
for the basic blocks of a program.
With
prof, the execution
counts can be viewed at the instruction level.
To obtain data for basic block counting, follow these steps:
Compile and link. For example:
%cc -c profsample.c%cc -o profsample profsample.o -lm
Run the
pixie
Atom tool.
You do not have
to specify a name for the output because
pixie
produces
an output file by default with the same name as the original C source file,
but with
pixie
appended after a period.
For example, the
following command causes
pixie
to create two files,
profsample.pixie
and
profsample.Addrs:
%atom -tool pixie profsample
The
profsample.pixie
file is equivalent to
profsample
but contains
additional code that counts the execution of each basic block.
To create an
output file with a name other than
pname.pixie, use the
-o
option followed by the name
you assign to the output file.
The
profsample.Addrs
file contains the address of
each of the basic blocks.
For more information, see
pixie(5).
Execute the
profsample.pixie
file:
%profsample.pixie
This command generates the file
profsample.Counts, which contains the basic block counts.
Each time you execute the
profsample.pixie
file, you create a new
profsample.Counts
file.
Run the profile formatting program
prof
with the
-pixie
option over the
profsample
executable file:
%prof -pixie profsample
This command extracts information
from
profsample.Addrs
and
profsample.Counts
and displays information in an easily readable format.
You do not need to
specify the
.Addrs
and
.Counts
file
suffixes because
pixie
searches by default for files containing
them.
You can also run the
pixstats
program on the executable file
profsample
to generate a detailed report on opcode frequencies, interlocks, a miniprofile,
and more.
For more information, see
pixstats(1).
Note
The
pixieprofiling tool provided in the current version of the Tru64 UNIX operating system is thepixieAtom tool. If you use the syntax provided in earlier versions of the operating system to invokepixie, a script transforms the call into a call to thepixieAtom tool. The previous version of thepixietool can be found at/usr/opt/obsolete/usr/bin/pixie.
Depending on the size of the application and the type of profiling you
request,
prof
may generate a very large amount of output.
However, you are often only
interested in profiling
data about a particular portion of your application.
The
prof
program provides the following options to
display information selectively by procedure:
-only-exclude
-Only
-Exclude
-totals
The
-only
option tells
prof
to print only profiling information for a particular procedure.
You can specify
the
-only
option several times on the command line.
For example, the following command displays profiling information for procedures
mult_by_scalar
and
add_vector
from the sample
program:
%prof -only mult_by_scalar -only add_vector profsample
The
-exclude
option tells
prof
to print profiling information for all procedures except the specified procedure.
You can use several
-exclude
options on the command
line.
The following command displays profiling information for all procedures
except
add_vector:
%prof -exclude add_vector profsample
Do not use the
-only
and
-exclude
options on the same command line.
Many of the
prof
utility's profiling options print
output as percentages, for example, the percentage of total execution time
attributed to a particular procedure.
By default, the
-only
and
-exclude
options cause
prof
to calculate percentages
based on all of the procedures in the application even if they were omitted
from the listing.
You can change this behavior with the
-Only
and
-Exclude
options.
These options work
the same as
-only
and
-exclude,
but cause
prof
to calculate percentages based only on those
procedures that appear in the listing.
For example, the following command
omits the
add_vector
procedure from both the listing and
from percentage calculations:
%prof -Exclude add_vector profsample
The
-totals
option, used with the
-procedures
and
-invocations
listings,
prints cumulative statistics for the entire object file instead of for each
procedure in the object.
The
-all,
-incobj,
and
-excobj
options allow you to display profiling
information for shared libraries used by the program as follows:
The
-all
option causes the profiles
for all shared libraries (if any) described in the data file(s) to be displayed,
in addition to the profile for the executable.
The
-incobj
option causes the profile
for the named shared library to be printed, in addition to the profile for
the executable.
The
-excobj
option causes the profile
for the named executable or shared library not to be printed.
The
-heavy
and
-lines
options cause
prof
to display the total number of machine
cycles executed by each source line in your application.
The
-heavy
option prints an entry for every
source line that was executed by your application.
Each entry shows the total
number of machine cycles executed by that line.
Entries are sorted from the
line with the most machine cycles to the line with the least machine cycles.
Because this option often prints a huge number of entries, you might want
to use one of the
-quit,
-only,
or
-exclude
options to reduce output to a manageable
size.
Example 8-4 shows output generated by the following command:
%prof -pixie -heavy -only add_vector -only mult_by_scalar \-only main profsample
For example, you can see in
Example 8-4
that
line 47 of
profsample.c
in the procedure
add_vector( )
accounts for over 12 percent of the application's total execution
time.
The listing also shows the size in bytes of each source line.
Profile listing generated Fri May 27 14:09:10 1998 with: prof -pixie -heavy -only add_vector -only mult_by_scalar -only main profsample ------------------------------------------------------------------ * -h[eavy] using basic-block counts; * * sorted in descending order by the number of cycles executed * * in each * * line; unexecuted lines are excluded * ------------------------------------------------------------------ procedure (file) line bytes cycles % cum % add_vector (profsample.c) 48 44 22000 23.26 23.26 add_vector (profsample.c) 46 40 20000 21.15 44.41 add_vector (profsample.c) 47 24 12000 12.69 57.10 mult_by_scalar (profsample.c) 36 44 2200 2.33 59.43 main (profsample.c) 20 60 1500 1.59 61.02 mult_by_scalar (profsample.c) 34 28 1400 1.48 62.50 mult_by_scalar (profsample.c) 35 24 1200 1.27 63.77 main (profsample.c) 19 12 300 0.32 64.08 main (profsample.c) 25 48 240 0.25 64.34 add_vector (profsample.c) 41 28 140 0.15 64.48 add_vector (profsample.c) 44 12 60 0.06 64.55 add_vector (profsample.c) 50 12 60 0.06 64.61 mult_by_scalar (profsample.c) 29 28 14 0.01 64.63 main (profsample.c) 23 32 8 0.01 64.63 main (profsample.c) 22 32 8 0.01 64.64 mult_by_scalar (profsample.c) 38 12 6 0.01 64.65 mult_by_scalar (profsample.c) 32 12 6 0.01 64.66 main (profsample.c) 26 16 4 0.00 64.66 main (profsample.c) 13 16 4 0.00 64.66 main (profsample.c) 18 8 2 0.00 64.67 main (profsample.c) 24 8 2 0.00 64.67
The
-lines
option is similar to
-heavy, but it sorts the output differently.
This option prints the lines
for each procedure in the order that they occur in the source file.
Even
lines that never executed are printed.
The procedures themselves are sorted
from those procedures that execute the most machine cycles to those that execute
the least.
Example 8-5 shows the same information as Example 8-4, but in a different format as generated by the following command:
%prof -pixie -lines -only add_vector -only mult_by_scalar \-only main profsample
Profile listing generated Fri May 27 14:07:28 1998 with:
prof -pixie -lines -only add_vector -only mult_by_scalar
-only main profsample
-----------------------------------------------------------------
* -l[ines] using basic-block counts; *
* grouped by procedure, sorted by cycles executed per procedure;*
* '?' means that line number information is not available. *
-----------------------------------------------------------------
procedure (file) line bytes cycles % cum %
add_vector (profsample.c) 41 28 140 0.15 0.15
44 12 60 0.06 0.21
46 40 20000 21.15 21.36
47 24 12000 12.69 34.05
48 44 22000 23.26 57.32
50 12 60 0.06 57.38
mult_by_scalar (profsample.c) 29 28 14 0.01 57.39
32 12 6 0.01 57.40
34 28 1400 1.48 58.88
35 24 1200 1.27 60.15
36 44 2200 2.33 62.48
38 12 6 0.01 62.48
main (profsample.c) 13 16 4 0.00 62.49
18 8 2 0.00 62.49
19 12 300 0.32 62.81
20 60 1500 1.59 64.39
22 32 8 0.01 64.40
23 32 8 0.01 64.41
24 8 2 0.00 64.41
25 48 240 0.25 64.66
26 16 4 0.00 64.67
The
-quit
option reduces
the amount of profiling output displayed.
The
-quit
option affects the output from the
-procedures,
-heavy, and
-lines
profiling modes.
The
-quit
option provides three versions:
-quit
n
The n refers to an integer. All lines after the n line are truncated.
-quit
n%
The
n
is an integer followed by a percent
sign (%).
All lines after the line containing
n%
calls in the
%calls
column of the display are
truncated.
-quit
ncum%
The
ncum%
refers to an
integer
n
followed by the characters
cum
(for cumulative) and a percent sign (%).
All lines after the
line containing
ncum%
calls
in the
cum%
column of the display are truncated.
If you specify several modes on the same command line, the
-quit
option affects the output from each mode.
For example,
the
-quit
option in the following command reduces
the output from both the
-procedures
and
-heavy
modes:
%prof -pixie -procedures -heavy -quit 20 profsample
This command prints only the 20 most time-consuming procedures and the
20 most time-consuming source lines.
The
-quitn
option has no affect on the
-lines
profiling mode.
The
-quit
n%
option restricts the output to those entries that account for at
least
n%
of the total.
Depending
on the profiling mode, the total can refer to the total amount of time, the
total number of machine cycles, or the total number of invocation counts.
For example, the following command prints only those source lines that account
for at least 2 percent of the application's total number of machine cycles:
%prof -pixie -lines -quit 2% profsample
The
-quit
ncum%
option truncates the output after
n%
of the total has been accounted for.
The definition of total depends
on the profiling mode, as described in the preceding paragraph.
For example,
the following command prints the most heavily used source line and stops after
30 percent of the application's total number of machine cycles have been accounted
for:
%prof -pixie -heavy -quit 30cum% sample
A single run of a program may not produce the desired
results.
You can repeatedly run the version of the program created by
pixie, varying the input with each run, and then use the resulting
.Counts
files to produce a consolidated report.
For example:
Compile and link.
Do not use the
-p
option when linking to produce an executable file for
pixie:
%cc -c profsample.c%cc -o profsample profsample.o -lm
Run the profiling utility
pixie, as follows:
%atom -tool pixie -toolargs=-pids profsample
This command produces the
profsample.Addrs
file to be used in step 4, as well as the modified program
profsample.pixie.
Delete any existing
.Counts
files, set
the
PIXIE_ARGS
environment variable to
"-pids"
and run the executable program produced by
pixie.
For example:
%profsample.pixie
The
-pids
option
specified with the
atom -tool pixie
command in step 2 appends
the PID of the process running the executable program to the name of the
profsample.Counts
file, for example,
profsample.Counts.1753.
Run the profiled program as many times as desired.
Each time
the program is run, a
profsample.Counts.<pid>
file is created.
Run
prof
to create the report as follows:
%prof -pixie profsample profsample.Addrs profsample.Counts.*
If you had run
profsample.pixie
three
times, the
prof
utility would have averaged the basic block
data in the three files generated by the executable (profsample.Counts.<pid1>,
profsample.Counts.<pid2>, and
profsample.Counts.<pid3>) to produce the profile report.
When you are writing a test suite for an application, you might want
to know how effectively your suite tests the application.
The
prof
utility provides two options that can help you determine this.
The
-zero
option prints the names of procedures
that were never executed by your application.
The
-testcoverage
option lists all of the source lines that were never executed by
your application.
Both of these options require basic block counting.
Typically, you would perform the following steps to make use of these options.
Run the
pixie
Atom tool on your application.
Run the results of step 1 through your test suite, saving
any
.Counts
files.
Profile your application with the
-zero
or
-testcoverage
options and specify all of the
.Counts
files produced when you ran the test suite.
If the application you are profiling is fairly complicated, you may
want to run it several times with different inputs to get an accurate picture
of its profile.
If you are using PC sampling, each run of your application
produces a new
mon.out
file, or a
program.pid
file if you have set the
PROFDIR
environment
variable.
If you are using basic block counting, each run produces a new
.Counts
file.
You have two ways of displaying profiling information that is based on an average of all of these output files.
The first way is to specify the names of each profiling data file explicitly on the command line. For example, the following command prints profiling information from two profile data files:
%prof -procedures profsample 1510.profsample 1522.profsample
Keeping track of many different profiling data files, however, can be
difficult.
Therefore,
prof
provides the
-merge
option to combine several data files into a single merged file.
When
prof
operates in
-pixie
mode, the
-merge
option combines the
.Counts
files.
When
prof
operates in PC-sampling mode,
this switch combines the
mon.out
or other profile data
files.
The following example combines two profile data files into a single
data file named
total.out:
%prof -merge total.out profsample 1773.profsample \1777.profsample
At a later time, you can then display profiling data using the combined
file, just as you would use a normal
mon.out
file.
For
example:
%prof -procedures profsample total.out
The merge process is similar for
-pixie
mode.
You must specify the executable file's name, the
.Addrs
file, and each
.Counts
file:
%prof -pixie -merge total.Counts a.out a.out.Addrs \a.out.Counts.1866 a.out.Counts.1868
Feedback files
are useful in identifying portions of a large executable program in which
significant percentages of the execution occur.
Without feedback, the compiler
must make assumptions about call frequency based on nesting levels.
These
assumptions are almost never as good as actual data from a sample run.
The
following sections describes how to use feedback files by using the
cc
command and the
atom -tool pixie
and
prof
commands.
To generate feedback information that can be used to optimize subsequent compilations, follow these steps:
Compile the source code:
%cc -O2 -o profsample profsample.c -lm
Run the
pixie
Atom tool on the executable
file:
%atom -tool pixie -toolargs=-o profsample.pixie profsample
This step creates an output executable file named
profsample.pixie
and a
prof
input file named
profsample.Addrs.
Execute the program you just created:
%profsample.pixie
This step
creates a file named
profsample.Counts, which contains
execution statistics.
Use
prof
to create a feedback file from
the execution statistics:
%prof -pixie -feedback profsample.feedback profsample
You can use a feedback file as input to a compilation at
-O2
or
-O3
optimization levels when
you use the
-feedback
option with the
cc
command, as shown in the following example:
%cc -O3 -feedback profsample.feedback -o \profsample profsample.c -lm
The feedback
file provides the compiler with actual execution information that can be used
to improve certain optimizations, such as inlining function calls.
Use a feedback
file generated from a
-O2
compilation for any subsequent
compilations with
-O2
or
-O3
options.
You can also use a feedback file as input to the
cord
utility.
The
cord
utility orders the procedures in an executable
program to improve execution time.
The following example shows how to use
the
-cord
option as part of a compilation command
with a feedback file as input:
%cc -O2 -cord -feedback profsample.feedback \-o profsample profsample.c -lm
Use a feedback file generated with the same optimization level as the level you use in subsequent compilations.
You can also use
cord
with the
runcord
utility.
For more information, see
runcord(1).
By default, the
-p
and
-pg
options to the
cc
command provide the following:
A single profile covering the whole text segment and all threads.
To profile specific portions of the program, use the
monitor
utilities, as described in
Section 8.13
and
monitor(3).
A single data file called
mon.out
(for
-p) or
gmon.out
(for
-pg) placed in the current directory.
The
-p
option supports the profiling
of shared libraries.
The
-pg
option and
uprofile
tool support the profiling of only the part of a program
that is in the executable.
When using these tools to generate profiling information
for library routines, link your object file with the
-non_shared
option to the
cc
command.
You can use one of the following environment variables to control profiling behavior:
PROFDIR
PROFFLAGS
By using these variables, you can disable aspects of default profiling behavior, including:
Changing the name and path of profiling data files
Controlling when profiling begins
Controlling profiling of multithreaded applications
You can use the
PROFFLAGS
and
PROFDIR
environment variables together.
These environment variables have no effect on the
prof
and
gprof
post-processors; they affect the profiling behavior
of a program during its execution.
These environment variables have no effect
when you use the
pixie
Atom tool.
By default, profiling data is collected in a data file named
[g]mon.out.
When you do multiple profiling runs, each run overwrites
the existing
[g]mon.out
file.
Use the
PROFDIR
environment variable when you want to collect PC sampling data
in files with unique names.
Set this environment variable as follows:
C shell:
setenv PROFDIR
path
Bourne shell:
PROFDIR = path ; export PROFDIR
The results are saved in the file path/pid.progname, which resolves as follows:
The directory path, specified with
PROFDIR,
identifying an existing directory.
The PID of the executing program.
The program name.
When you set
PROFDIR
to a null string, no profiling
occurs.
By default, the profiling library
libprof1.a
(or
libprof1_r.a, for multithreaded
programs) allocates one buffer per process to record your profiling data,
as well as placing the data output file in your current directory.
To disable this default behavior, set the
PROFFLAGS
environment variable as follows:
C shell:
setenv PROFFLAGS "-disable_default"
Bourne shell:
PROFFLAGS = "-disable_default"; export PROFFLAGS
When you have set
PROFFLAGS
to
-disable_default, the default profiling support is disabled, allowing you to use
the
monitor
calls to profile specific sections of your
program for both nonthreaded and multithreaded programs.
See
monitor(3)
and
Section 8.13
for more information on using the
monitor,
monstartup, and
moncontrol
routines.
For multithreaded programs, you can allocate one buffer per thread by
setting the
PROFFLAGS
environment variable as follows:
C shell:
setenv PROFFLAGS "-threads"
Bourne shell:
PROFFLAGS = "-threads"; export PROFFLAGS
When you have set
PROFFLAGS
to
-threads, a separate file is produced for each thread and is named
pid.sid.progname, which is resolved as follows:
The PID of the program.
The sequence number of the thread, which depends on the order in which the threads were created.
The name of the program being profiled.
You can use the
-threads
and
-disable_default
options together to control profiling of your program when you
use the
monitor
routines.
You can also set the
PROFFLAGS
environment variable
to include or exclude profiling information:
setenv PROFFLAGS "-all" Causes the profiles for all shared libraries (if any) described in the data file(s) to be displayed, in addition to the profile for the executable.
setenv PROFFLAGS "-incobj"Causes the profile for the named shared library to be printed, in addition to the profile for the executable.
setenv PROFFLAGS "-excobj" Causes the profile for the named executable or shared library not to be printed.
The default profiling behavior
on Tru64 UNIX systems is to profile the entire text segment of your program
and place the profiling data in
mon.out
for
prof
profiling or in
gmon.out
for
gprof
profiling.
For large programs, you might not need to profile the
entire text segment.
The
monitor
routines provide the ability
to profile portions of your program specified by the lower and upper address
boundaries of a function address range.
The
monitor
routines are:
monitor( )Use this routine to gain control of explicit profiling by
turning profiling on and off for a specific text range.
This routine is not
supported for
gprof
profiling.
monstartupSimilar to
monitor,
except it specifies address
range only and is supported for
gprof
profiling.
moncontrolUse this routine with
monitor
and
monstartup
to turn PC sampling on or off during program execution
for a specific
process
or thread.
monitor_signalUse this routine to profile nonterminating programs, such as daemons.
You can use
monitor
and
monstartup
to profile an address range in each shared library as well as in the static
executable.
For more information on these functions, see
monitor(3).
By default, profiling begins as soon your program starts to execute.
You can set the
PROFFLAGS
environment variable to
-disable_default
to prevent profiling from beginning when
your program executes.
Then, you can use the
monitor
routines
to begin profiling after the first call to
monitor
or
monstartup.
You can disable the default naming of the profiling data file by using
the
PROFDIR
environment variable.
For more information
on using this environment variable, see
Section 8.12.1.
Example 8-6
demonstrates how to use
the
monstartup
and
monitor
routines
within a program to begin and end profiling.
/* Profile the domath( ) routine using monstartup.
* This example allocates a buffer for the entire program.
* Compile command: cc -p foo.c -o foo -lm
* Before running the executable, enter the following
* from the command line to disable default profiling support:
* setenv PROFFLAGS -disable_default
*/
#include <stdio.h>
#include <sys/syslimits.h>
char dir[PATH_MAX];
extern void _ _start( );
extern unsigned long _etext;
main( )
{
int i;
int a = 1;
/* Start profiling between _ _start (beginning of text
* and _etext (end of text). The profiling library
* routines will allocate the buffer.
*/
monstartup(_ _start,&_etext);
for(i=0;i<10;i++)
domath( );
/* Stop profiling and write the profiling output file. */
monitor(0);
}
domath( )
{
int i;
double d1, d2;
d2 = 3.1415;
for (i=0; i<1000000; i++)
d1 = sqrt(d2)*sqrt(d2);
}
The external name
_etext
lies just above all the
program text.
See
end(3)
for more information.
When you set the
PROFFLAGS
environment variable to
-disable_default, you disable default profiling buffer support.
You can allocate buffers within your program, as shown in
Example 8-7.
/* Profile the domath routine using monitor().
* Compile command: cc -p foo.c -o foo -lm
* Before running the executable, enter the following
* from the command line to disable default profiling support:
* setenv PROFFLAGS -disable_default
*/
#include <sys/types.h>
#include <sys/syslimits.h>
extern char *calloc( );
void domath(void);
void nextproc(void);
#define INST_SIZE 4 /* Instruction size on Alpha */
char dir[PATH_MAX];
main( )
{
int i;
char *buffer;
size_t bufsize;
/* Allocate one counter for each instruction to
* be sampled. Each counter is an unsigned short.
*/
bufsize = (((char *)nextproc - (char *)domath)/INST_SIZE)
* sizeof(unsigned short);
/* Use calloc( ) to ensure that the buffer is clean
* before sampling begins.
*/
buffer = calloc(bufsize,1);
/* Start sampling. */
monitor(domath,nextproc,buffer,bufsize,0);
for(i=0;i<10;i++)
domath( );
/* Stop sampling and write out profiling buffer. */
monitor(0);
}
void domath(void)
{
int i;
double d1, d2;
d2 = 3.1415;
for (i=0; i<1000000; i++)
d1 = sqrt(d2)*sqrt(d2);
}
void nextproc(void)
{}
Use the
monitor_signal( )
routine to profile
programs that do not terminate.
Declare this routine as a signal handler in
your program and build the program for
prof
or
gprof
profiling.
While the program is executing, send a signal from
the shell by using the
kill
command.
When the signal is received,
monitor_signal
is invoked
and writes profiling data to the data file.
If the program receives another
signal, the data file is overwritten.
Example 8-8
demonstrates how to use
the
monitor_signal
routine.
/* From the shell, start up the program in background.
* Send a signal to the process, for example: kill -30 <pid>
* Process the [g]mon.out file normally using gprof or prof
*/
#include <signal.h>
extern int monitor_signal();
main()
{
int i;
double d1, d2;
/*
* Declare monitor_signal() as signal handler for SIGUSR1
*/
signal(SIGUSR1,monitor_signal);
d2 = 3.1415;
/*
* Loop infinitely (absurd example of non-terminating process)
*/
for (;;)
d1 = sqrt(d2)*sqrt(d2);
}
Profiling
multithreaded applications is essentially the same as profiling nonthreaded
applications.
However, to profile multithreaded applications, you must compile
your program with the
-pthread
or
-threads
option to the
cc
command.
Specifying one of
these options and either the
-p
or
-pg
option enables the thread profiling library,
libprof1_r.a.
The default case for profiling multithreaded applications is to provide
one sampling buffer for all threads.
In this case, you get sampling across
the entire process and you get one output file comprising sampling data from
all threads.
Depending on whether you use the
-p
or
-pg
option, your output file will be named
mon.out
or
gmon.out, respectively.
To get a separate buffer and a separate output file for each thread
in your program, use the environment variable
PROFFLAGS.
Set
PROFFLAGS
to
-threads, as
shown in the following example:
setenv PROFFLAGS "-threads"
The profiling data file will be named according to the following convention:
pid.sid.progname
In the preceding example, pid is the PID of the program, sid corresponds to the order in which the thread was created, progname is your program name.
If the application controls profiling by using the
monitor
routines,
sid
corresponds to the order in which
profiling was started for the thread.
If you use the
monitor( )
or
monstartup( )
calls in a threaded program, you must first set
PROFFLAGS
to
"-disable_default -threads", giving you complete
control of profiling the application.
If the application uses
monitor( )
and allocates
separate buffers for each thread profiled, you must first set
PROFFLAGS
to
"disable_default -threads"
because this setting
affects the file naming conventions that are used.
Without the
-threads
option, the buffer and address range used as a result of the first
monitor
or
monstartup
call would be applied to
every thread that subsequently requests profiling.
In this case, a single
data file that covers all threads being profiled would be created.
Each thread in a process must call the
monitor( )
or
monstartup( )
routines to initiate profiling
for itself.