9    Using and Developing Atom Tools

Program analysis tools are extremely important for computer architects and software engineers. Computer architects use them to test and measure new architectural designs, and software engineers use them to identify critical pieces of code in programs or to examine how well a branch prediction or instruction scheduling algorithm is performing. Program analysis tools are needed for problems ranging from basic block counting to data cache simulation. Although the tools that accomplish these tasks may appear quite different, each can be implemented simply and efficiently through code instrumentation.

Atom provides a flexible code instrumentation interface that is capable of building a wide variety of tools. Atom separates the common part in all problems from the problem-specific part by providing machinery for instrumentation and object-code manipulation, and allowing the tool designer to specify what points in the program are to be instrumented. Atom is independent of any compiler and language as it operates on object modules that make up the complete program.

This chapter discusses the following topics:

9.1    Running Atom Tools

The following sections describe how to:

9.1.1    Using Installed Tools

The Tru64 UNIX operating system provides a number of example Atom tools, listed in Table 9-1, to help you develop your own custom-designed Atom tools. These tools are distributed in source form to illustrate Atom's programming interfaces -- they are not intended for production use. Section 9.2 describes some of the tools in more detail.

Table 9-1:  Example Prepackaged Atom Tools

Tool Description
branch Instruments all conditional branches to determine how many are predicted correctly.
cache Determines the cache miss rate if an application runs in an 8-KB direct-mapped cache.
dtb Determines the number of dtb (data translation buffer) misses if the application uses 8-KB pages and a fully associative translation buffer.
dyninst Provides fundamental dynamic counts of instructions, loads, stores, blocks, and procedures.
inline Identifies potential candidates for inlining.
iprof Prints the number of times each procedure is called as well as the number of instructions executed by each procedure.
malloc Records each call to the malloc function and prints a summary of the application's allocated memory.
prof Prints the number of instructions executed by each procedure in pthread programs.
ptrace Prints the name of each procedure as it is called.
trace Generates an address trace, logs the effective address of every load and store operation, and logs the address of the start of every basic block as it is executed.

The example tools can be found in the /usr/lib/complrs/atom/examples directory. Each one has three files:

Atom tools that are put into production use or that are delivered to customers as products usually have .o object modules installed instead of their proprietary sources. The Tru64 UNIX hiprof(1), pixie(1), and third(1) commands and the Visual Threads product include Atom tools that are delivered, installed, and run this way. By convention, their instrumentation, analysis, and description files are in /usr/lib/complrs/atom/tools.

To run an installed Atom tool or example on an application program, use the following form of the atom(1) command:

atom application_program -tool toolname [-env environment] [ options... ]

This form of the atom command requires the -tool option and accepts the -env option.

The -tool option identifies the installed Atom tool to be used. By default, Atom searches for installed tools in the /usr/lib/cmplrs/atom/tools and /usr/lib/cmplrs/atom/examples directories. You can add directories to the search path by supplying a colon-separated list of additional directories to the ATOMTOOLPATH environment variable.

The -env option indicates that an alternative version of the tool is desired. For example, some Tru64 UNIX tools require -env threads to run the thread-safe version. The atom(1) command searches for a toolname.env.desc file instead of the default toolname.desc file. It prints an error message if a description file for the specified environment cannot be found.

9.1.2    Testing Tools Under Development

A second form of the atom(1) command is provided to make it easy to compile and run a new atom tool that you are developing. You just name the instrumentation and analysis files directly on the command line:

atom application_program instrumentation_file [ analysis_file ] [ options...]

This form of the command requires the instrumentation_file parameter and accepts the analysis_file parameter, but not the -tool or -env options.

The instrumentation_file parameter specifies the name of a C source file or an object module that contains the Atom tool's instrumentation procedures. If the instrumentation procedures are in more than one file, the .o of each file may be linked together into one file using the ld command with a -r option. By convention, most instrumentation files have the suffix .inst.c or .inst.o.

If you pass an object module for this parameter, consider compiling the module with either the -g1 or-g option. If there are errors in your instrumentation procedures, Atom can issue more complete diagnostic messages when the instrumentation procedures are thus compiled.

The analysis_file parameter specifies the name of a C source file or an object module that contains the Atom tool's analysis procedures. If the analysis routines are in more than one file, the .o of each file may be linked together into one file using the ld command with a -r option. Note that you do not need to specify an analysis file if the instrumentation file does not call analysis procedures to the application it instruments. By convention, most analysis files have the suffix .anal.c or .anal.o.

Analysis routines may perform better if they are compiled as a single compilation unit.

You can have multiple instrumentation and analysis source files. The following example creates composite instrumentation and analysis objects from several source files:

% cc -c file1.c file2.c
% cc -c file7.c file8
% ld -r -o tool.inst.o file1.o file2.o
% ld -r -o tool.anal.o file7.o file8.o
% atom hello tool.inst.o tool.anal.o -o hello.atom

Note

You can also write analysis procedures in C++. You must assign a type of extern "C" to each procedure to allow it to be called from the application. You must also compile and link the analysis files before entering the atom command. For example:


% cxx -c tool.a.C
% ld -r -o tool.anal.o tool.a.o -lcxx -lexc
% atom hello tool.inst.c tool.anal.o -o hello.atom

9.1.3    Atom Options

With the exception of the -tool and -env options, both forms of the atom command accept any of the remaining options described in atom(1). The following options deserve special mention:

-A1

Causes Atom to optimize calls to analysis routines by reducing the number of registers that need to be saved and restored. For some tools, specifying this option increases the performance of the instrumented application by a factor of two (at the expense of some increase in application size). The default behavior is for Atom not to apply these optimizations.

-debug

Lets you debug instrumentation routines by causing Atom to transfer control to the symbolic debugger at the start of the instrumentation routine. In the following example, the ptrace sample tool is run under the dbx debugger. The instrumentation is stopped at line 12, and the procedure name is printed.

% atom hello ptrace.inst.c ptrace.anal.c -o hello.ptrace -debug
dbx version 3.11.8
Type 'help' for help.
Stopped in InstrumentAll
(dbx) stop at 12
[4] stop at "/udir/test/scribe/atom.user/tools/ptrace.inst.c":12
(dbx) c
[3] [InstrumentAll:12 ,0x12004dea8] if (name == NULL) name = "UNKNOWN";
(dbx) p name
0x2a391 = "__start"

-ladebug

Lets you debug instrumentation routines with the optional ladebug debugger, if installed on your system. Atom puts the control in ladebug with a stop at the instrumentation routine. Use ladebug if the instrumentation routines contain C++ code. See the Ladebug Debugger Manual for more information.

-ga (-g)

Produces the instrumented program with debugging information. This option lets you debug analysis routines with a symbolic debugger. The default -A0 option (not -A1) is recommended with -ga (or -g). For example:

% atom hello ptrace.inst.c ptrace.anal.c -o hello.ptrace -ga
% dbx hello.ptrace
dbx version 3.11.8
Type 'help' for help.
(dbx) stop in ProcTrace
[2] stop in ProcTrace
(dbx) r
[2] stopped at [ProcTrace:5 ,0x120005574] fprintf (stderr,"%s\n",name);
(dbx) n
__start
     [ProcTrace:6 ,0x120005598] }

-gp

Produces the instrumented program with debugging information. This option lets you debug application routines with a symbolic debugger.

-pthread

Specifies that thread-safe support is required. This option should be used when instrumenting threaded applications.

-toolargs

Passes arguments to the Atom tool's instrumentation routine. Atom passes the arguments in the same way that they are passed to C programs, using the argc and argv arguments to the main program. For example:

#include <stdio.h>
unsigned InstrumentAll(int argc, char **argv) {
     int i;
     for (i = 0; i < argc; i++) {
       printf(stderr,"argv[%d]: %s\n",argv[i]);
     }
}

The following example shows how Atom passes the -toolargs arguments:

% atom hello args.inst.c -toolargs="8192 4"
argv[0]: hello
argv[1]: 8192
argv[2]: 4

9.2    Developing Atom Tools

The remainder of this chapter describes how to develop atom tools.

9.2.1    Atom's View of an Application

Atom views an application as a hierarchy of components:

  1. The program, including the executable and all shared libraries.

  2. A collection of objects. An object can be either the main executable or any shared library. An object has its own set of attributes (such as its name) and consists of a collection of procedures.

  3. A collection of procedures, each of which consists of a collection of basic blocks.

  4. A collection of basic blocks, each of which consists of a collection of instructions.

  5. A collection of instructions.

Atom tools insert instrumentation points in an application program at procedure, basic block, or instruction boundaries. For example, basic block counting tools instrument the beginning of each basic block, data cache simulators instrument each load and store instruction, and branch prediction analyzers instrument each conditional branch instruction.

At any instrumentation point, Atom allows a tool to insert a procedure call to an analysis routine. The tool can specify that the procedure call be made before or after an object, procedure, basic block, or instruction.

9.2.2    Atom Instrumentation Routine

A tool's instrumentation routine contains the code that traverses the application's objects, procedures, basic blocks, and instructions to locate instrumentation points; adds calls to analysis procedures; and builds the instrumented version of an application.

As described in atom_instrumentation_routines(5), an instrumentation routine can employ one of the following interfaces based on the needs of the tool:

Instrument (int iargc, char **iargv, Obj *obj)

Atom calls the Instrument routine for each object in the application program. As a result, an Instrument routine does not need to use the object navigation routines (such as GetFirstObj). Because Atom automatically writes each modified object before passing the next to the Instrument routine, the Instrument routine should never call the BuildObj, WriteObj, or ReleaseObj routine. When using the Instrument interface, you can define an InstrumentInit routine to perform tasks required before Atom calls Instrument for the first object (such as defining analysis routine prototypes, adding program level instrumentation calls, and performing global initializations). You can also define an InstrumentFini routine to perform tasks required after Atom calls Instrument for the last object (such as global cleanup).

InstrumentAll (int iargc, char **iargv)

Atom calls the InstrumentAll routine once for the entire application program, which allows a tool's instrumentation code itself to determine how to traverse the application's objects. With this method, there are no InstrumentInit or InstrumentFini routines. An InstrumentAll routine must call the Atom object navigation routines and use the BuildObj, WriteObj, or ReleaseObj routine to manage the application's objects.

Regardless of the instrumentation routine interface, Atom passes the arguments specified in the -toolargs option to the routine. In the case of the Instrument interface, Atom also passes a pointer to the current object.

9.2.3    Atom Instrumentation Interfaces

Atom provides a comprehensive interface for instrumenting applications. The interface supports the following types of activities:

9.2.3.1    Navigating Within a Program

The Atom application navigation routines, described in atom_application_navigation(5), allow an Atom tool's instrumentation routine to find locations in an application at which to add calls to analysis procedures as follows:

9.2.3.2    Building Objects

The Atom object management routines, described in atom_object_management(5), allow an Atom tool's InstrumentAll routine to build, write, and release objects.

The BuildObj routine builds the internal data structures Atom requires to manipulate the object. An InstrumentAll routine must call the BuildObj routine before traversing the procedures in the object and adding analysis routine calls to the object. The WriteObj routine writes the instrumented version of the specified object, deallocating the internal data structures the BuildObj routine previously created. The ReleaseObj routine deallocates the internal data structures for the given object, but it does not write out the instrumented version the object.

The IsObjBuilt routine returns a nonzero value if the specified object has been built with the BuildObj routine but not yet written with the WriteObj routine or unbuilt with the ReleaseObj routine.

9.2.3.3    Obtaining Information About an Application's Components

The Atom application query routines, described in atom_application_query(5), allow an instrumentation routine to obtain static information about a program and its objects, procedures, basic blocks, and instructions.

The GetAnalName routine returns the name of the analysis file, as passed to the atom command. This routine is useful for tools that have a single instrumentation file and multiple analysis files. For example, multiple cache simulators might share a single instrumentation file but each have a different analysis file.

The GetProgInfo routine returns the number of objects in a program.

Table 9-2 lists the routines that provide information about a program's objects.

Table 9-2:  Atom Object Query Routines

Routine Description
GetObjInfo Returns information about an object's text, data, and bss segments; the number of procedures, basic blocks, or instructions it contains; its object ID; or a Boolean hint as to whether the given object should be excluded from instrumentation.
GetObjInstArray Returns an array consisting of the 32-bit instructions included in the object.
GetObjInstCount Returns the number of instructions in the array included in the array returned by the GetObjInstArray routine.
GetObjName Returns the original file name of the specified object.
GetObjOutName Returns the name of the instrumented object.

The following instrumentation routine, which prints statistics about the program's objects, demonstrates the use of Atom object query routines:

   1   #include <stdio.h>
   2   #include <cmplrs/atom.inst.h>
   3   unsigned InstrumentAll(int argc, char **argv)
   4   {
   5      Obj *o; Proc *p;
   6      const unsigned int *textSection;
   7      long textStart;
   8      for (o = GetFirstObj(); o != NULL; o = GetNextObj(o)) {
   9        BuildObj(o);
  10        textSection = GetObjInstArray(o);
  11        textStart = GetObjInfo(o,ObjTextStartAddress);
  12        printf("Object %d\n", GetObjInfo(o,ObjID));
  13        printf("   Object name: %s\n", GetObjName(o));
  14        printf("   Text segment start: 0x%lx\n", textStart);
  15        printf("   Text size: %ld\n", GetObjInfo(o,ObjTextSize));
  16        printf("   Second instruction: 0x%x\n", textSection[1]);
  17        ReleaseObj(o);
  18      }
  19      return(0);
  20   }

Because the instrumention routine adds no procedures to the executable, there is no need for an analysis procedure. The following example demonstrates the process of compiling and instrumenting a program with this tool. A sample run of the instrumented program prints the object identifier, the compile-time starting address of the text segment, the size of the text segment, and the binary for the second instruction. The disassembler provides a convenient method for finding the corresponding instructions.


% cc hello.c -o hello
% atom hello info.inst.c -o hello.info
Object 0
  Object Name: hello
  Start Address: 0x120000000
  Text Size: 8192
  Second instruction: 0x239f001d
Object 1
  Object Name: /usr/shlib/libc.so
  Start Address: 0x3ff80080000
  Text Size: 901120
  Second instruction: 0x239f09cb
% dis hello | head -3
  0x120000fe0: a77d8010     ldq t12, -32752(gp)
  0x120000fe4: 239f001d     lda at, 29(zero)
  0x120000fe8: 279c0000     ldah at, 0(at)
% dis /ust/shlib/libc.so | head -3
  0x3ff800bd9b0: a77d8010   ldq t12,-32752(gp)
  0x3ff800bd9b4: 239f09cb   lda at,2507(zero)
  0x3ff800bd9b8: 279c0000   ldah at, 0(at)

Table 9-3 lists the routines that provide information about an object's procedures.

Table 9-3:  Atom Procedure Query Routines

Routine Description
GetProcInfo Returns information pertaining to the procedure's stack frame, register-saving, register-usage, and prologue characteristics as defined in the Calling Standard for Alpha Systems and the Assembly Language Programmer's Guide. Such values are important to tools, like Third Degree, that monitor the stack for access to uninitialized variables. It can also return such information about the procedure as the number of basic blocks or instructions it contains, its procedure ID, its lowest or highest source line number, or an indication if its address has been taken.
ProcFileName Returns the name of the source file that contains the procedure.
ProcName Returns the procedure's name.
ProcPC Returns the compile-time program counter (PC) of the first instruction in the procedure.

Table 9-4 lists the routines that provide information about a procedure's basic blocks.

Table 9-4:  Atom Basic Block Query Routines

Routine Description
BlockPC Returns the compile-time program counter (PC) of the first instruction in the basic block.
GetBlockInfo Returns the number of instructions in the basic block or the block ID. The block ID is unique to this basic block within its containing object.
IsBranchTarget Indicates if the block is the target of a branch instruction.

Table 9-5 lists the routines that provide information about a basic block's instructions.

Table 9-5:  Atom Instruction Query Routines

Routine Description
GetInstBinary Returns a 32-bit binary representation of the assembly language instruction.
GetInstClass Returns the instruction class (for example, floating-point load or integer store) as defined by the Alpha Architecture Reference Manual. An Atom tool uses this information to determine instruction scheduling and dual-issue rules.
GetInstInfo Parses the entire 32-bit instruction and obtains all or a portion of that instruction.
GetInstRegEnum Returns the register type (floating-point or integer) from an instruction field as returned by the GetInstInfo routine.
GetInstRegUsage Returns a bit mask with one bit set for each possible source register and one bit set for each possible destination register.
InstPC Returns the compile-time program counter (PC) of the instruction.
InstLineNo Returns the instruction's source line number.
IsInstType Indicates whether the instruction is of the specified type (load instruction, store instruction, conditional branch, or unconditional branch).

9.2.3.4    Resolving Procedure Names and Call Targets

Resolving procedure names and subroutine targets is trivial for nonshared programs because all procedures are contained in the same object. However, the target of a subroutine branch in a call-shared program could be in any object.

The Atom application procedure name and call target resolution routines, described in atom_application_resolvers(5), allow an Atom tool's instrumentation routine to find a procedure by name and to find a target procedure for a call site:

9.2.3.5    Adding Calls to Analysis Routines to a Program

The Atom application instrumentation routines, described in atom_application_instrumentation(5), add arbitrary procedure calls at various points in the application as follows:

9.2.4    Atom Description File

An Atom tool's description file, as described in atom_description_file(5), identifies and describes the tool's instrumentation and analysis files. It can also specify the options to be used by the cc, ld, and atom commands when it is compiled, linked, and invoked. Each Atom tool must supply at least one description file.

There are two types of Atom description file:

The names supplied for the tool and environment portions of these description file names correspond to values the user specifies to the -tool and -env options of an atom command when invoking the tool.

An Atom description file is a text file containing a series of tags and values. See atom_description_file(5) for a complete description of the file's syntax.

9.2.5    Writing Analysis Procedures

An instrumented application calls analysis procedures to perform the specific functions defined by an Atom tool. An analysis procedure can use system calls or library functions, even if the same call or function is instrumented within the application. The routines used by the analysis routine and the instrumented application are physically distinct. The following is a list of library routines that can and cannot be called:

Thread Local Storage (TLS) is not supported in analysis routines.

9.2.5.1    Input/Output

The standard I/O library provided to analysis routines does not automatically flush and close streams when the instrumented program terminates, so the analysis code must flush or close them explicitly when all output has been completed. Also, the stdout and stderr streams that are provided to analysis routines will be closed when the application calls exit(), so analysis code may need to duplicate one or both of these streams if they need to be used after application exit (for example, in a ProgramAfter or ObjAfter analysis routine -- see AddCallProto(5)).

For output to stderr (or a duplicate of stderr) to appear immediately, analysis code should call setbuf(stream,NULL) to make the stream unbuffered or call fflush after each set of fprintf calls. Similarly, analysis routines using C++ streams can call cerr.flush().

9.2.5.2    fork and exec System Calls

If a process calls a fork function but does not call an exec function, the process is cloned and the child inherits an exact copy of the parent's state. In many cases, this is exactly the behavior that an Atom tool expects. For example, an instruction-address tracing tool sees references for both the parent and the child, interleaved in the order in which the references occurred.

In the case of an instruction-profiling tool (for example, the trace tool referenced in Table 9-1), the file is opened at a ProgramBefore instrumentation point and, as a result, the output file descriptor is shared between the parent and the child processes. If the results are printed at a ProgramAfter instrumentation point, the output file contains the parent's data, followed by the child's data (assuming that the parent process finishes first).

For tools that count events, the data structures that hold the counts should be returned to zero in the child process after the fork call because the events occurred in the parent, not the child. This type of Atom tool can support correct handling of fork calls by instrumenting the fork library procedure and calling an analysis procedure with the return value of the fork routine as an argument. If the analysis procedure is passed a return value of 0 (zero) in the argument, it knows that it was called from a child process. It can then reset the counts variable or other data structures so that they tally statistics only for the child process.

9.2.6    Determining the Instrumented PC from an Analysis Routine

The Atom Xlate routines, described in Xlate(5), allow you to determine the instrumented program counter (PC) for selected instructions. You can use these functions to build a table that translates an instruction's PC in the instrumented application to its PC in the uninstrumented application.

To enable analysis code to determine the instrumented PC of an instruction at run time, an Atom tool's instrumentation routine must select the instruction and place it into an address translation buffer (XLATE).

An Atom tool's instrumentation routine creates and fills the address translation buffer by calling the CreateXlate and AddXlateAddress routines, respectively. An address translation buffer can only hold instructions from a single object.

The AddXlateAddress routine adds the specified instruction to an existing address translation buffer.

An Atom tool's instrumentation passes an address translation buffer to an analysis routine by passing it as a parameter of type XLATE *, as indicated in the analysis routine's prototype definition in an AddCallProto call.

Another way to determine an instrumented PC is to specify a formal parameter type of REGV in an analysis routine's prototype and pass the REG_IPC value.

An Atom tool's analysis routine uses the following interfaces to access an address translation buffer passed to it:

The following example demonstrates the use of the Xlate routines by the instrumentation and analysis files of a tool that uses the Xlate routines. This tool prints the target address of every jump instruction. To use it, enter the following command:


% atom progname xlate.inst.c xlate.anal.c -all

The following source listing (xlate.inst.c) contains the instrumentation for the xlate tool:

#include <stdlib.h>
#include <stdio.h>
#include <alpha/inst.h>
#include <cmplrs/atom.inst.h>
 
static void             address_add(unsigned long);
static unsigned         address_num(void);
static unsigned long *  address_paddrs(void);
static void             address_free(void);
 
void InstrumentInit(int iargc, char **iargv)
{
    /* Create analysis prototypes. */
    AddCallProto("RegisterNumObjs(int)");
    AddCallProto("RegisterXlate(int, XLATE *, long[0])");
    AddCallProto("JmpLog(long, REGV)");
 
    /* Pass the number of objects to the analysis routines. */
    AddCallProgram(ProgramBefore, "RegisterNumObjs",
        GetProgInfo(ProgNumberObjects));
}
 
Instrument(int iargc, char **iargv, Obj *obj)
{
    Proc *                      p;
    Block *                     b;
    Inst *                      i;
    Xlate *                     pxlt;
    union alpha_instruction     bin;
    ProcRes                     pres;
    unsigned long               pc;
    char                        proto[128];
 
    /*
     * Create an XLATE structure for this Obj.  We use this to translate
     * instrumented jump target addresses to pure jump target addresses.
     */
    pxlt = CreateXlate(obj, XLATE_NOSIZE);
 
    for (p = GetFirstObjProc(obj);  p;  p = GetNextProc(p)) {
        for (b = GetFirstBlock(p);  b;  b = GetNextBlock(b)) {
            /*
             * If the first instruction in this basic block has had its
             * address taken, it's a potential jump target.  Add the
             * instruction to the XLATE and keep track of the pure address
             * too.
             */
            i = GetFirstInst(b);
            if (GetInstInfo(i, InstAddrTaken)) {
                AddXlateAddress(pxlt, i);
                address_add(InstPC(i));
            }
 
            for (;  i;  i = GetNextInst(i)) {
                bin.word = GetInstInfo(i, InstBinary);
                if (bin.common.opcode == op_jsr &&
                    bin.j_format.function == jsr_jmp)
                {
                    /*
                     * This is a jump instruction.  Instrument it.
                     */
                    AddCallInst(i, InstBefore, "JmpLog",  InstPC(i),
                        GetInstInfo(i, InstRB));
                }
            }
        }
    }
 
    /*
     * Re-prototype the RegisterXlate() analysis routine now that we
     * know the size of the pure address array.
     */
    sprintf(proto, "RegisterXlate(int, XLATE *, long[%d])", address_num());
    AddCallProto(proto);
 
    /*
     * Pass the XLATE and the pure address array to this object.
     */
    AddCallObj(obj, ObjBefore, "RegisterXlate", GetObjInfo(obj, ObjID),
        pxlt, address_paddrs());
 
    /*
     * Deallocate the pure address array.
     */
    address_free();
}
 
/*
** Maintains a dynamic array of pure addresses.
*/
static unsigned long *  pAddrs;
static unsigned         maxAddrs = 0;
static unsigned         nAddrs = 0;
 
/*
** Add an address to the array.
*/
static void address_add(
    unsigned long       addr)
{
    /*
     * If there's not enough room, expand the array.
     */
    if (nAddrs >= maxAddrs) {
        maxAddrs = (nAddrs + 100) * 2;
        pAddrs = realloc(pAddrs, maxAddrs * sizeof(*pAddrs));
        if (!pAddrs) {
            fprintf(stderr, "Out of memory\n");
            exit(1);
        }
    }
 
    /*
     * Add the address to the array.
     */
    pAddrs[nAddrs++] = addr;
}
 
 
/*
** Return the number of elments in the address array.
*/
static unsigned address_num(void)
{
    return(nAddrs);
}
 
 
/*
** Return the array of addresses.
*/
static unsigned long *address_paddrs(void)
{
    return(pAddrs);
}
 
/*
** Deallocate the address array.
*/
static void address_free(void)
{
    free(pAddrs);
    pAddrs = 0;
    maxAddrs = 0;
    nAddrs = 0;
}

The following source listing (xlate.anal.c) contains the analysis routine for the xlate tool:

#include <stdlib.h>
#include <stdio.h>
#include <cmplrs/atom.anal.h>
 
/*
 * Each object in the application gets one of the following data
 * structures.  The XLATE contains the instrumented addresses for
 * all possible jump targets in the object.  The array contains
 * the matching pure addresses.
 */
typedef struct {
    XLATE *             pXlt;
    unsigned long *     pAddrsPure;
} ObjXlt_t;
 
/*
 * An array with one ObjXlt_t structure for each object in the
 * application.
 */
static ObjXlt_t *       pAllXlts;
static unsigned         nObj;
static int      translate_addr(unsigned long, unsigned long *);
static int      translate_addr_obj(ObjXlt_t *, unsigned long, 
                    unsigned long *);
 
/*
**  Called at ProgramBefore.  Registers the number of objects in
**  this application.
*/
void RegisterNumObjs(
    unsigned    nobj)
{
    /*
     * Allocate an array with one element for each object.  The
     * elements are initialized as each object is loaded.
     */
    nObj = nobj;
    pAllXlts = calloc(nobj, sizeof(pAllXlts));
    if (!pAllXlts) {
        fprintf(stderr, "Out of Memory\n");
        exit(1);
    }
}
 
/*
**  Called at ObjBefore for each object.  Registers an XLATE with
**  instrumented addresses for all possible jump targets.  Also
**  passes an array of pure addresses for all possible jump targets.
*/
void RegisterXlate(
    unsigned            iobj,
    XLATE *             pxlt,
    unsigned long *     paddrs_pure)
{
    /*
     * Initialize this object's element in the pAllXlts array.
     */
    pAllXlts[iobj].pXlt = pxlt;
    pAllXlts[iobj].pAddrsPure = paddrs_pure;
}
 
/*
**  Called at InstBefore for each jump instruction.  Prints the pure
**  target address of the jump.
*/
void JmpLog(
    unsigned long       pc,
    REGV                targ)
{
    unsigned long       addr;
 
    printf("0x%lx jumps to - ", pc);
    if (translate_addr(targ, &addr))
        printf("0x%lx\n", addr);
    else
        printf("unknown\n");
}
 
/*
**  Attempt to translate the given instrumented address to its pure
**  equivalent.  Set '*paddr_pure' to the pure address and return 1
**  on success.  Return 0 on failure.
**
**  Will always succeed for jump target addresses.
*/
static int translate_addr(
    unsigned long       addr_inst,
    unsigned long *     paddr_pure)
{
    unsigned long       start;
    unsigned long       size;
    unsigned            i;
 
    /*
     * Find out which object contains this instrumented address.
     */
    for (i = 0;  i < nObj;  i++) {
        start = XlateInstTextStart(pAllXlts[i].pXlt);
        size = XlateInstTextSize(pAllXlts[i].pXlt);
        if (addr_inst >= size && addr_inst < start + size) {
            /*
             * Found the object, translate the address using that
             * object's data.
             */
            return(translate_addr_obj(&pAllXlts[i], addr_inst,
                paddr_pure));
        }
    }
 
    /*
     * No object contains this address.
     */
    return(0);
}
 
/*
**  Attempt to translate the given instrumented address to its
**  pure equivalent using the given object's translation data.
**  Set '*paddr_pure' to the pure address and return 1 on success.
**  Return 0 on failure.
*/
static int translate_addr_obj(
    ObjXlt_t *          pObjXlt,
    unsigned long       addr_inst,
    unsigned long *     paddr_pure)
{
    unsigned    num;
    unsigned    i;
 
    /*
     * See if the instrumented address matches any element in the XLATE.
     */
    num = XlateNum(pObjXlt->pXlt);
    for (i = 0;  i < num;  i++) {
        if (XlateAddr(pObjXlt->pXlt, i) == addr_inst) {
            /*
             * Matches this XLATE element, return the matching pure
             * address.
             */
            *paddr_pure = pObjXlt->pAddrsPure[i];
            return(1);
        }
    }
 
    /*
     * No match found, must not be a possible jump target.
     */
    return(0);
}

9.2.7    Sample Tools

This section describes the basic tool-building interface by using three simple examples: procedure tracing, instruction profiling, and data cache simulation.

9.2.7.1    Procedure Tracing

The ptrace tool prints the names of procedures in the order in which they are executed. The implementation adds a call to each procedure in the application. By convention, the instrumentation for the ptrace tool is placed in the file ptrace.inst.c. For example:

 1  #include <stdio.h>
 2  #include <cmplrs/atom.inst.h>  [1]
 3
 4  unsigned InstrumentAll(int argc, char **argv)  [2]
 5  {
 6     Obj *o; Proc *p;
 7     AddCallProto("ProTrace(char *)");  [3]
 8     for (o = GetFirstObj(); o != NULL; o = GetNextObj(o)) {  [4]
 9       if (BuildObj(o) return 1;  [5]
10       for (p = GetFirstObjProc(o); p != NULL; p = GetNextProc(p)) {  [6]
11         const char *name = ProcName(p);  [7]
12         if (name == NULL) name = "UNKNOWN";  [8]
13         AddCallProc(p,ProcBefore,"ProcTrace",name);  [9]
14       }
15       WriteObj(o);  [10]
16     }
17     return(0);
18  }

  1. Includes the definitions for Atom instrumentation routines and data structures. [Return to example]

  2. Defines the InstrumentAll procedure. This instrumentation routine defines the interface to each analysis procedure and inserts calls to those procedures at the correct locations in the applications it instruments. [Return to example]

  3. Calls the AddCallProto routine to define the ProcTrace analysis procedure. ProcTrace takes a single argument of type char *. [Return to example]

  4. Calls the GetFirstObj and GetNextObj routines to cycle through each object in the application. If the program was linked nonshared, there is only a single object. If the program was linked call-shared, it contains multiple objects: one for the main executable and one for each dynamically linked shared library. The main program is always the first object. [Return to example]

  5. Builds the first object. Objects must be built before they can be used. In very rare circumstances, the object cannot be built. The InstrumentAll routine reports this condition to Atom by returning a nonzero value. [Return to example]

  6. Calls the GetFirstObjProc and GetNextProc routines to step through each procedure in the application program. [Return to example]

  7. For each procedure, calls the ProcName procedure to find the procedure name. Depending on the amount of symbol table information that is available in the application, some procedures names, such as those defined as static, may not be available. (Compiling applications with the -g1 option provides this level of symbol information.) In these cases, Atom returns NULL. [Return to example]

  8. Converts the NULL procedure name string to UNKNOWN. [Return to example]

  9. Calls the AddCallProc routine to add a call to the procedure pointed to by p. The ProcBefore argument indicates that the analysis procedure is to be added before all other instructions in the procedure. The name of the analysis procedure to be called at this instrumentation point is ProcTrace. The final argument is to be passed to the analysis procedure. In this case, it is the procedure named obtained on line 11. [Return to example]

  10. Writes the instrumented object file to disk. [Return to example]

The instrumentation file added calls to the ProcTrace analysis procedure. This procedure is defined in the analysis file ptrace.anal.c as shown in the following example:

     1  #include <stdio.h>
     2
     3  void ProcTrace(char *name)
     4  {
     5    fprintf(stderr, "%s\n",name);
     6  }

The ProcTrace analysis procedure prints, to stderr, the character string passed to it as an argument. Note that an analysis procedure cannot return a value.

After the instrumentation and analysis files are specified, the tool is complete. To demonstrate the application of this tool, compile and link the following application as follows:

        #include <stdio.h>
        main()
        {
           printf("Hello world!\n");
        }

The following example builds a nonshared executable, applies the ptrace tool, and runs the instrumented executable. This simple program calls almost 30 procedures.

% cc -non_shared hello.c -o hello
% atom hello ptrace.inst.c ptrace.anal.c -o hello.ptrace
% hello.ptrace
     __start
     main
     printf
     _doprnt
     __getmbcurmax
     strchr
     strlen
     memcpy
     .
     .
     .

The following example repeats this process with the application linked call-shared. The major difference is that the LD_LIBRARY_PATH environment variable must be set to the current directory because Atom creates an instrumented version of the libc.so shared library in the local directory.

% cc hello.c -o hello
% atom hello ptrace.inst.c ptrace.anal.c -o hello.ptrace -all
% setenv LD_LIBRARY_PATH `pwd`
% hello.ptrace
     __start
     _call_add_gp_range
     __exc_add_gp_range
     malloc
     cartesian_alloc
     cartesian_growheap2
     __getpagesize
     __sbrk
     .
     .
     .

The call-shared version of the application calls almost twice the number of procedures that the nonshared version calls.

Note that only calls in the original application program are instrumented. Because the call to the ProcTrace analysis procedure did not occur in the original application, it does not appear in a trace of the instrumented application procedures. Likewise, the standard library calls that print the names of each procedure are also not included. If the application and the analysis program both call the printf function, Atom would link into the instrumented application two copies of the function. Only the copy in the application program would be instrumented. Atom also correctly instruments procedures that have multiple entry points.

9.2.7.2    Profile Tool

The iprof example tool counts the number of instructions a program executes. It is useful for finding critical sections of code. Each time the application is executed, iprof creates a file called iprof.out that contains a profile of the number of instructions that are executed in each procedure and the number of times each procedure is called.

The most efficient place to compute instruction counts is inside each basic block. Each time a basic block is executed, a fixed number of instructions are executed. The following example shows how the iprof tool's instrumentation procedure (iprof.inst.c) performs these tasks:

 1 #include #include 
 2 static int n = 0;
 3
 4 static const char *     SafeProcName(Proc *);
 5
 6 void InstrumentInit(int argc, char **argv)
 7{
 8    AddCallProto("OpenFile(int)");  [1]
 9    AddCallProto("ProcedureCalls(int)");
10    AddCallProto("ProcedureCount(int,int)");
11    AddCallProto("ProcedurePrint(int,char*)");
12    AddCallProto("CloseFile()");
13    AddCallProgram(ProgramAfter,"CloseFile");  [2]
14 }
15 
16 Instrument(int argc, char **argv, Obj *obj)
17 {
18    Proc *p; Block *b;
19 
20      for (p = GetFirstObjProc(obj); p != NULL; p = GetNextProc(p)) {  [3]
21        AddCallProc(p,ProcBefore,"ProcedureCalls",n);
22        for (b = GetFirstBlock(p); b != NULL; b = GetNextBlock(b)) {  [4]
23          AddCallBlock(b,BlockBefore,"ProcedureCount",  [5]
24                       n,GetBlockInfo(b,BlockNumberInsts));
25        }
26        AddCallObj(obj, ObjAfter,"ProcedurePrint",n,SafeProcName(p));  [6]
27        n++;  [7]
28      }
29 }
30 
31 void InstrumentFini(void)
32 {
33   AddCallProgram(ProgramBefore,"OpenFile",n);  [8]
34 }
35 
36 static const char *SafeProcName(Proc *p)
37 {
38     const char *        name;
39     static char         buf[128];
40 
41     name = ProcName(p);  [9]
42     if (name)
43         return(name);
44     sprintf(buf, "proc_at_0x%lx", ProcPC(p));
45     return(buf);
46 }
 
 

  1. Defines the interface to the analysis procedures. [Return to example]

  2. Adds a call to the CloseFile analysis procedure to the end of the program. [Return to example]

  3. Loops through each procedure in the object. [Return to example]

  4. Loops through each basic block in the procedure. [Return to example]

  5. Adds a call to the ProcedureCount analysis procedure before any of the instructions in this basic block are executed. The argument types of the ProcedureCount are defined in the prototype on line 10. The first argument is a procedure index of type int; the second argument, also an int, is the number of instructions in the basic block. The ProcedureCount analysis procedure adds the number of instructions in the basic block to a per-procedure data structure. Similarly, the ProcedureCalls analysis procedure increments a procedure's call count before each call begins executing the called procedure. [Return to example]

  6. Adds a call to the ProcedurePrint analysis procedure to the end of the program. The ProcedurePrint analysis procedure prints a line summarizing this procedure's instruction use and call count. [Return to example]

  7. Increments the procedure index. [Return to example]

  8. Adds a call to the OpenFile analysis procedure to the beginning of the program, passing it an int representing the number of procedures in the application. The OpenFile procedure allocates the per-procedure data structure that tallies instructions and opens the output file. [Return to example]

  9. Determines the procedure name. [Return to example]

The analysis procedures used by the iprof tool are defined in the iprof.anal.c file as shown in the following example:

 1 #include #include #include #include 
 2 long instrTotal = 0;
 3 long *instrPerProc;
 4 long *callsPerProc;
 5 
 6 FILE *OpenUnique(char *fileName, char *type)
 7 {
 8  FILE *file;
 9   char Name[200];
10 
11   if (getenv("ATOMUNIQUE") != NULL)
12     sprintf(Name,"%s.%d",fileName,getpid());
13   else
14     strcpy(Name,fileName);
15 
16   file = fopen(Name,type);
17   if (file == NULL)
18     {
19       fprintf(stderr,"Atom: can't open %s for %s\n",Name, type);
20       exit(1);
21     }
22   return(file);
23 }
24 
25 static FILE *file;
26 void OpenFile(int number)
27 {
28   file = OpenUnique("iprof.out","w");
29   fprintf(file,"%30s %15s %15s %12s\n","Procedure","Calls",
30           "Instructions","Percentage");
31   instrPerProc = (long *) calloc(sizeof(long), number);  [1]
32   callsPerProc = (long *) calloc(sizeof(long), number);
33   if (instrPerProc == NULL || callsPerProc == NULL) {
34     fprintf(stderr,"Malloc failed\n");
35     exit(1);
36   }
37 }
38 
39 void ProcedureCalls(int number)
40 {
41   callsPerProc[number]++;
42 }
43   
44 void ProcedureCount(int number, int instructions)
45 {
46   instrTotal += instructions;
47   instrPerProc[number] += instructions;
48 }
49   
50 
51 void ProcedurePrint(int number, char *name)
52 {
53   if (instrPerProc[number] > 0) {  [2]
54     fprintf(file,"%30s %15ld %15ld %12.3f\n",
55             name, callsPerProc[number], instrPerProc[number], 
56             100.0 * instrPerProc[number] / instrTotal);
57   }
58 }
59 
60 void CloseFile()  [3]
61 {
62   fprintf(file,"\n%30s %15s %15ld\n", "Total", "", instrTotal);
63   fclose(file);
64 }
 
 

  1. Allocates the counts data structure. The calloc function zero-fills the counts data. [Return to example]

  2. Filters procedures that are never called. [Return to example]

  3. Closes the output file. Tools must explicitly close files that are opened in the analysis procedures. [Return to example]

After the instrumentation and analysis files are specified, the tool is complete. To demonstrate the application of this tool, compile and link the "Hello" application as follows:

        #include <stdio.h>
        main()
        {
           printf("Hello world!\n");
        }

The following example builds a call-shared executable, applies the iprof tool, and runs the instrumented executable. In contrast to the ptrace tool described in Section 9.2.7.1, the iprof tool sends its output to a file instead of stdout.

% cc hello.c -o hello
% atom hello iprof.inst.c iprof.anal.c -o hello.iprof -all
% setenv LD_LIBRARY_PATH `pwd`
% hello.iprof
Hello world!
% more iprof.out
Procedure           Calls    Instructions   Percentage
__start               1              92        1.487
   main               1              15        0.242
   .
   .
   .
 printf               1              81        0.926
   .
   .
   .
 
  Total                            8750
% unsetenv LD_LIBRARY_PATH

9.2.7.3    Data Cache Simulation Tool

Instruction and data address tracing has been used for many years as a technique to capture and analyze cache behavior. Unfortunately, current machine speeds make this increasingly difficult. For example, the Alvinn SPEC92 benchmark executes 961,082,150 loads, 260,196,942 stores, and 73,687,356 basic blocks, for a total of 2,603,010,614 Alpha instructions. Storing the address of each basic block and the effective address of all the loads and stores would take in excess of 10 GB and slow down the application by a factor of over 100.

The cache tool uses on-the-fly simulation to determine the cache miss rates of an application running in an 8-KB, direct-mapped cache. The following example shows its instrumentation routine:

 1  #include <stdio.h>
 2  #include <cmplrs/atom.inst.h>
 3
 4  unsigned InstrumentAll(int argc, char **argv)
 5  {
 6    Obj *o; Proc *p;  Block *b;  Inst *i;
 7
 8    AddCallProto("Reference(VALUE)");
 9    AddCallProto("Print()");
10    for (o = GetFirstObj(); o != NULL; o = GetNextObj(o)) {
11      if (BuildObj(o)) return (1);
12      for (p=GetFirstProc(); p != NULL; p = GetNextProc(p)) {
13        for (b = GetFirstBlock(p); b != NULL;  b = GetNextBlock(b)) {
14          for (i = GetFirstInst(b); i != NULL; i = GetNextInst(i)) {  [1]
15            if (IsInstType(i,InstTypeLoad) || IsInstType(i,InstTypeStore)) {
16              AddCallInst(i,InstBefore,"Reference",EffAddrValue);  [2]
17            }
18          }
19        }
20      }
21      WriteObj(o);
22    }
23  AddCallProgram(ProgramAfter,"Print");
24  return (0);
25  }

  1. Examines each instruction in the current basic block. [Return to example]

  2. If the instruction is a load or a store, adds a call to the Reference analysis procedure, passing the effective address of the data reference. [Return to example]

The analysis procedures used by the cache tool are defined in the cache.anal.c file as shown in the following example:

 1  #include <stdio.h>
 2  #include <assert.h>
 3  #define CACHE_SIZE 8192
 4  #define BLOCK_SHIFT 5
 5  long tags[CACHE_SIZE >> BLOCK_SHIFT];
 6  long references, misses;
 7
 8  void Reference(long address) {
 9    int index = (address & (CACHE_SIZE-1)) >> BLOCK_SHIFT;
10    long tag = address >> BLOCK_SHIFT;
11    if tags[index] != tag) {
12      misses++;
13      tags[index] = tag;
14    }
15    references++;
16  }
17  void Print() {
18    FILE *file = fopen("cache.out","w");
19    assert(file != NULL);
20    fprintf(file,"References: %ld\n", references);
21    fprintf(file,"Cache Misses: %ld\n", misses);
22    fprintf(file,"Cache Miss Rate: %f\n", (100.0 * misses) / references);
23    fclose(file);
24  }

After the instrumentation and analysis files are specified, the tool is complete. To demonstrate the application of this tool, compile and link the "Hello" application as follows:

        #include <stdio.h>
        main()
        {
           printf("Hello world!\n");
        }

The following example applies the cache tool to instrument both the nonshared and call-shared versions of the application:


% cc hello.c -o hello
% atom hello cache.inst.c cache.anal.c -o hello.cache -all
% setenv LD_LIBRARY_PATH `pwd`
% hello.cache
Hello world!
% more cache.out
References: 1091
Cache Misses: 225
Cache Miss Rate: 20.623281
% cc -non_shared hello.c -o hello
% atom hello cache.inst.c cache.anal.c -o hello.cache -all
% hello.cache
Hello world!
% more cache.out
References: 382
Cache Misses: 93
Cache Miss Rate: 24.345550