[Return to Library] [Contents] [Previous Chapter] [Next Section] [Next Chapter] [Index] [Help]


7    Debugging Programs with Third Degree

Third Degree is an Atom tool. It performs memory access checks and memory leak detection of C and C++ programs at run time. It accomplishes this by using Atom to instrument executable objects. Instrumentation is the process of inserting instructions into existing executable objects to perform program analysis. See Chapter 9 or atom(1) for details on Atom.

Third Degree instruments the entire program, adding code to perform run-time checks for all of its data references. The instrumented program locates many occurrences of the worst types of bugs in C and C++ programs: array overflows, memory smashing, and errors in the use of the malloc and free functions. It also helps you determine the allocation habits of your application by listing the heap and finding memory leaks.

Except for being larger and running slower than the original application and having its uninitialized data filled with a special pattern, the instrumented program runs like the original. The Atom instrumentation code logs all specified errors and generates the requested reports.

You can use Third Degree for the following types of applications:


[Return to Library] [Contents] [Previous Chapter] [Next Section] [Next Chapter] [Index] [Help]


7.1    Running Third Degree on an Application

You invoke the Third Degree tool by using the atom command, as follows:

atom app -tool third

In this example, app is the name of an application. When it is run, the instrumented version of the application (app.third) behaves exactly like the original application (app), with the following exceptions:

The instrumented version of the application generates a log file (app.3log) containing information about allocated objects and potential leaks.

Note

Third Degree writes .3log messages in a format similar to that used by the C compiler. If you use emacs or a similar editor that automatically points, in sequence, to each compilation error, you can use the same editor to follow Third Degree errors. In emacs, compile with a command such as cat app.3log, and step through the Third Degree errors as if they were compilation errors.

You can control the name used for the output log file by specifying one of the following flags to the -toolargs flag on the atom command line that invokes the Third Degree tool:

-pids
Appends the process identification number to the log file name.

-nopids
Does not append the process identification number to the log file name. This is the default.

-dirnamefname
Specifies the directory path in which Third Degree creates its log file.

Depending upon the flag supplied to Third Degree in the atom command's -toolargs flag, the log file's name will be as follows:
Flag Filename Use
-nopids app.3log Default
-pids app.12345.3log Include pid
-dirname/tmp /tmp/app.3log Set directory
-dirname/tmp-pids /tmp/app.12345.3log Set directory and pid


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


7.1.1    Using Third Degree with Shared Libraries

Errors in an application, such as passing too small a buffer to the strcpy function, are often caught in library routines. Third Degree supports the instrumentation of shared libraries; it instruments programs linked with the -non_shared or -call_shared flags.

The atom command provides the following flags to allow you to determine which shared libraries are instrumented by Third Degree:

-all
Instruments all statically loaded shared libraries in the shared executable.

-excobj objname
Excludes the named shared library from instrumentation. You can use the -excobj flag more than once to specify several shared libraries.

-incobj objname
Instruments the named shared library. You can use the -incobj flag more than once to specify several shared libraries.

When Atom finishes instrumenting the application, the current directory contains an instrumented version of each specified shared library. The instrumented application uses these versions of the libraries. Define the LD_LIBRARY_PATH environment variable to tell the instrumented application where the instrumented shared libraries reside.

By default, Third Degree does not instrument any of the shared libraries used by the application; this makes the instrumentation operation much faster and causes the instrumented application to run faster as well. Third Degree detects and reports errors in the instrumented portion normally, but terminates stack traces at the first uninstrumented procedure. It does not detect errors in the uninstrumented libraries. If your partially instrumented application crashes or malfunctions and you have fixed all of the errors reported by Third Degree, reinstrument the application with all of its shared libraries and run the new instrumented version.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


7.1.2    Using Third Degree with Threaded Applications

Third Degree supports applications that use threads. To instrument a threaded application, add the -env threads flag to the atom command line that invokes the Third Degree tool.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


7.2    Step-by-Step Example

Assume that you must debug the small application represented by the following source code (ex.c):

     1  /* ex.c */
     2  #include <assert.h>;
     3
     4  int Bug() {
     5      int q;
     6      return q;           /* q is uninitialized */
     7  }
     8
     9  long* Booboo(int n) {
    10      long* t = (long*) malloc(n * sizeof(long));
    11      t[0] = Bug();
    12      t[0] = t[1]+1;      /* t[1] is uninitialized */
    13      t[1] = -1;
    14      t[n] = n;           /* array bounds error*/
    15      if (n<10) free(t);  /* may be a leak */
    16      return t;
    17  }
    18
    19  main() {
    20      long* t = Booboo(20);
    21      t = Booboo(4);
    22      free(t);            /* already freed */
    23      exit(0);
    24  }


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


7.2.1    Customizing Third Degree

An optional customization file named .third is used to turn on and off various capabilities of the Third Degree tool and to set the tool's internal parameters. Third Degree looks for a .third file first in the local directory, then in your home directory. The .third customization file is further discussed throughout this chapter and its syntax is described in the third(5) reference page.

If you do not specify a .third customization file, Third Degree uses its default settings:


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


7.2.2    Modifying the Makefile

Add the following entry to the application's Makefile:

ex.third: ex
        atom ex -tool third -o ex.third

Build ex.third as follows:

make ex.third

atom ex -tool third -o ex.third
ex.third

Now run the instrumented application ex.third and check the log ex.3log.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


7.2.3    Examining the Third Degree Log File

The ex.3log file contains several sections, described in the following sections.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


7.2.3.1    Copy of the .third File

If you supplied a .third customization file, Third Degree copies it to the log file. The short customization file used in this example requests a summary of the contents of heap-allocated memory blocks when the program finishes:

//////////////  begin .3rd  ///////////////////
-----------------------------------------------
heap_history    yes
-----------------------------------------------
//////////////  end  .3rd ///////////////////


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


7.2.3.2    List of Runtime Memory Access Errors

The types of errors that Third Degree can detect at runtime include such conditions as reading uninitialized memory, reading or writing unallocated memory, freeing invalid memory, and certain serious errors likely to cause an exception. For each error, an error entry is generated with the following items:

The following examples show entries from the log file:


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


7.2.3.3    Memory Leaks

The following excerpt shows the report generated when leak detection on program exit, the default, is selected. The report shows a list of memory leaks sorted by importance and by call stack.

---------------------------------------------------------------
---------------------------------------------------------------
Searching for new leaks in heap after program exit

 
160 bytes in 1 object were found:
 
160 bytes in 1 leak (including 1 super leak) created at: malloc malloc.c, line 585 Booboo ex.c, line 10 main ex.c, line 20 __start crt0.s, line 370

Upon examining the source, it is clear that the first call of Booboo did not free the memory object, nor was it freed anywhere else in the program. Moreover, no pointer to this object exists anywhere in the program, so it qualifies as a super leak. The distinction is often useful to find the real culprit for large memory leaks.

Consider a large tree structure and assume that the pointer to the root has been erased. Every object in the structure is a leak, but losing the pointer to the root is the real cause of the leak. Because all objects but the root still have pointers to them, albeit only from other leaks, only the root will be identified as a super leak, and therefore the likely cause of the memory loss.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


7.2.3.4    Heap History

When heap history is enabled, Third Degree collects information about dynamically allocated memory. It collects this information for every object that is freed by the application and for every object that still exists (including memory leaks) at the end of the program's execution. The following excerpt shows a heap allocation history report:

----------------------------------------------------------------
----------------------------------------------------------------
                Heap Allocation History for parent process

 

 
Legend for object contents: There is one character for each 32-bit word of contents. There are 64 characters, representing 256 bytes of memory per line. '.' : word never written in any object. 'z' : zero in every object. 'i' : a non-zero non-pointer value in at least one object. 'pp': a valid pointer or zero in every object. 'ss': a valid pointer or zero in some but not all objects.
 
192 bytes in 2 objects were allocated during program execution:
 
---------------------------------------------------------------- 160 bytes allocated (5% written) in 1 objects created at: malloc malloc.c, line 585 Booboo ex.c, line 10 main ex.c, line 20 __start crt0.s, line 370
 
Contents: 0: ..ii....................................
 
---------------------------------------------------------------- 32 bytes allocated (25% written) in 1 objects created at: malloc malloc.c, line 585 Booboo ex.c, line 10 main ex.c, line 21 __start crt0.s, line 370
 
Contents: 0: ..ii....

The sample program allocated two objects, for a total of 192 bytes (8*(20+4)). Because each object was allocated from a different call stack, there are two entries in the history. Only one long (8 bytes) in each array was set to a valid value, resulting in the written ratios of 8/160=5% and 8/32=25% shown. The character map, with one character for each 32-bit word in the object, shows that the initialized value was the second long in each of the arrays.

If the sample program was a real application, the fact that so little of the dynamic memory was ever initialized is a warning that it was probably using memory ineffectively.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


7.2.3.5    Memory Layout

The memory layout section of the report summarizes the memory used by the program by size and address range. The following excerpt shows a memory layout section. The first two entries give the final (maximum) sizes of the heap and stack at the end of the program. The last two entries give the text and static data areas for the program and any shared libraries.

-----------------------------------------------------------------
-----------------------------------------------------------------
  memory layout at program exit
              heap      81920 bytes [0x38000000000-0x38000014000]
             stack       2224 bytes [0x11ffff750-0x120000000]
           ex data      23168 bytes [0x140000000-0x140005a80]
           ex text     262144 bytes [0x120000000-0x120040000]


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


7.3    Interpreting Third Degree Error Messages

Third Degree reports both fatal errors and memory access errors.

Fatal errors include the following:

A fatal error causes the instrumented application to crash after flushing the log file. If the application crashes, first check the log file and then rerun it under a debugger.

Memory errors include the following (as represented by a three-letter abbreviation):
Name Error
ror Reading out of range: neither in heap, stack, or static area
ris Reading invalid data in stack: probably an array bound error
rus Reading an uninitialized (but valid) location in stack
rih Reading invalid data in heap: probably an array bound error
ruh Reading an uninitialized (but valid) location in heap
wor Writing out of range: neither in heap, stack, or static area
wis Writing invalid data in stack: probably an array bound error
wih Writing invalid data in heap: probably an array bound error
for Freeing out of range: neither in heap or stack
fis Freeing an address in the stack
fih Freeing an invalid address in the heap: no valid object there
fof Freeing an already freed object
fon Freeing a null pointer (really just a warning)
mrn malloc returned null

You can suppress the reporting of specific memory errors by providing a .third customization file containing the ignore option. This is often useful when the errors occur within library functions for which you do not have the source. Third Degree allows you to suppress specific memory errors in individual procedures and files, and at particular line numbers. See third(5) for more details.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


7.3.1    Fixing Errors and Retrying an Application

If Third Degree reports many write errors from your instrumented program, you should fix the first few errors and reinstrument the program. Not only can write errors compound, but they can also corrupt Third Degree's internal data structures.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


7.3.2    Detecting Uninitialized Values

Third Degree's technique for detecting the use of uninitialized values can cause programs that have worked to fail when instrumented. For example, if a program depends on the fact that the first call to the malloc function returns a block initialized to zero, the instrumented version of the program will fail because Third Degree initializes all blocks to a nonzero value.

When it detects a signal, perhaps caused by dereferencing or otherwise using this uninitialized value, Third Degree displays a message of the following form:

*** Fatal signal SIGSEGV detected.
*** This can be caused by the use of uninitialized data.
*** Please check all errors reported in app.3log.

Using uninitialized data is the most likely reason for an instrumented program to crash. To determine the cause of the problem, first examine the log file for reading-uninitialized-stack and reading-uninitialized heap errors. Very often, one of the last errors in the log file reports the cause of the problem.

If you have trouble pinpointing the source of the error, you can confirm that it is indeed due to reading uninitialized data by supplying a .third customization file containing the uninit_heapno and uninit_stackno options. Using the uninit_stackno option disables the initialization of newly allocated stack memory that Third Degree normally performs on each procedure entry. Similarly, the uninit_heapno option disables the initialization of heap memory performed on each dynamic memory allocation. By using one or both options, you can alter the behavior of the instrumented program and may likely get it to complete successfully. This will help you determine which type of error is causing the instrumented program to crash and, as a result, help you focus on specific messages in the log file.

Notes

Do not use the uninit_heapno and uninit_stackno options under normal operation. They hamper Third Degree's ability to detect a program's use of uninitialized data.

If your program establishes signal handlers, there is a small chance that Third Degree's changing of the default signal handler may interfere with it. Third Degree defines signal handlers only for those signals that normally cause program crashes (including SIGILL, SIGTRAP, SIGABRT, SIGEMT, SIGFPE, SIGBUS, SIGSEGV, SIGSYS, SIGXCPU, and SIGXFSZ). You can disable Third Degree's signal handling by supplying a .third customization file including the signalsno option.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


7.3.3    Locating Source Files

Third Degree prefixes each error message with a file and line number in the style used by compilers. For example:

----------------------------------------------------- fof -- 3 --
ex.c: 21: freeing already freed heap at byte 0 of 32-byte block
    free                               malloc.c
    main                               ex.c, line 21
    __start                            crt0.s

Third Degree tries to point as closely as possible to the source of the error, and it usually gives the file and line number of a procedure near the top of the call stack when the error occurred, as in this example. However, Third Degree may not be able to find this source file, either because it is in a library or because it is not in the current directory. In this case, Third Degree moves down the call stack until it finds a source file to which it can point. Usually, this is the point of call of the library routine.

In order to tag these error messages, Third Degree must determine the location of the program's source files. If you are running Third Degree in the directory containing the source files, Third Degree will locate the source files there. If not, to add directories to Third Degree's search path, supply a .third customization file including a use option. This allows Third Degree to find the source files contained in other directories. Specifying the use option with no arguments clears the search path. The location of each source file is the first directory on the search path in which it is found.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


7.4    Examining an Application's Heap Usage

In addition to run-time checks that ensure that only properly allocated memory is accessed and freed, Third Degree provides two ways to understand an application's heap usage:

By default, Third Degree checks for leaks when the program exits.

This section discusses how to use the information provided by Third Degree to analyze an application's heap usage.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


7.4.1    Detecting Memory Leaks

A memory leak is an object in the heap to which no pointer exists. The object can no longer be accessed and can no longer be used or freed. It is useless and will never go away.

Third Degree finds memory leaks by using a simple trace-and-sweep algorithm. Starting from a set of roots (the currently active stack and static area), Third Degree finds pointers to objects in the heap and marks these objects as visited. It then recursively finds all potential pointers inside these objects and, finally, sweeps the heap and reports all unmarked objects. These unmarked objects are leaks.

The trace-and-sweep algorithm finds all leaks, including circular structures. This algorithm is conservative: in the absence of type information, any 64-bit pattern that is properly aligned and pointing inside a valid object in the heap is treated as a pointer. This assumption can infrequently lead to the following problems:


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


7.4.2    Reading Heap and Leak Reports

You can supply .third configuration file options that tell Third Degree to generate heap and leak reports incrementally, listing only new heap objects or leaks since the last report or listing all heap objects or leaks. You can request these reports when the program terminates, or before or after every nth call to a user-specified function (see third(5) for details).

Third Degree lists memory objects and leaks in the report by decreasing importance, based on the number of bytes involved. It groups together objects allocated with identical call stacks. For example, if the same call sequence allocates a million one-byte objects, Third Degree reports them as a one-megabyte group containing a million allocations.

To tell Third Degree when objects or leaks are the same and should be grouped in the report (or when objects or leaks are different and should not be thus grouped), specify a .third configuration file containing the object_stack_depth or leak_stack_depth option. (See third(5) for further description of the .third configuration file.) These options set the depth of the call stack that Third Degree uses to differentiate leaks or objects. For example, if you specify a depth of 1 for objects, Third Degree groups valid objects in the heap by the function and line number that allocated them, no matter what function was the caller. Conversely, if you specify a very large depth for leaks, Third Degree groups only leaks allocated at points with identical call stacks from main upwards.

In most heap reports, the first few entries account for most of the storage, but there is a very long list of small entries. To limit the length of the report, you can use the .third configuration file object_min_percent or leak_min_percent option. (See third(5) for further description of the .third configuration file.) These options define a percentage of the total memory leaked or in use by an object as a threshold. When all smaller remaining leaks or objects amount to less than this threshold, Third Degree groups them together under a single final entry.

Notes

Because the realloc function always allocates a new object (by involving calls to malloc, copy, and free), its use can make interpretation of a Third Degree report counterintuitive. When an object is allocated, listed, or shrunk through a call to the realloc function, it can be listed twice under different identities.

Leaks and objects are mutually exclusive: an object must be reachable from the roots.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


7.4.3    Searching for Leaks

It may not always be obvious when to search for memory leaks. By default, Third Degree checks for leaks after program exit, but this may not always be what you want.

Leak detection is best done as near as possible to the end of the program while all used data structures are still in scope. Remember, though, that the roots for leak detection are the contents of the stack and static areas. If your program terminates by returning from main and the only pointer to one of its data structures was kept on the stack, this pointer will not be seen as a root during the leak search, leading to false reporting of leaked memory. For example:

     1  main (int argc, char* argv[]) {
     2      char* bytes = (char*) malloc(100);
     3      exit(0);
     4  }

When you instrument a program, providing a .third configuration file specifying the all leaks before exit every 1 option line will result in Third Degree not finding any leaks. When the program calls the exit function, all of main's variables are still in scope.

However, consider the following example:

     1  main (int argc, char* argv[]) {
     2      char* bytes = (char*) malloc(100);
     3  }

When you instrument this program, providing the same (or no) .third configuration file, Third Degree's leak check may report a storage leak because main has returned by the time the check happens. Either of these two behaviors may be correct, depending on whether bytes was a true leak or simply a data structure still in use when main returned.

Rather than reading the program carefully to understand when leak detection should be performed, you can check for new leaks after a specified number of memory allocations. The number of allocations depends on the characteristics of the application being instrumented. Use a .third configuration file specifying the following options:

no leaks at_exit
new leaks before proc_name every 10000

See third(5) for further description of the .third configuration file.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


7.4.4    Interpreting the Heap History

When you instrument this program, providing a .third configuration file specifying the heap_history yes option line allows Third Degree to generate a heap history for the program. A heap history allows you to see how the program used dynamic memory during its execution. You can use this feature, for instance, to eliminate unused fields in data structures or to pack active fields to use memory more efficiently. The heap history also shows memory blocks that are allocated but never used by the application.

When heap history is enabled, Third Degree collects information about each dynamically allocated object at the time it is freed by the application. When program execution completes, Third Degree assembles this information for every object that is still alive (including memory leaks). For each object, Third Degree looks at the contents of the object and categorizes each word as never written by the application, zero, a valid pointer, or some other value.

Third Degree next merges the information for each object with what it has gathered for all other objects allocated at the same call stack in the program. The result provides you with a cumulative picture of the use of all objects of a given type.

Third Degree provides a summary of all objects allocated during the life of the program and the purposes for which their contents were used. The report shows one entry per allocation point (for example, a call stack where an allocator function such as malloc or new was called). Entries are sorted by decreasing volume of allocation.

Each entry provides the following:

The contents part of each entry describes how the objects allocated at this point were used. If all allocated objects are not the same size, Third Degree considers only the minimum size common to all objects. For very large allocations, it summarizes the contents of only the beginning of the objects, by default, the first kilobyte. You can adjust the maximum size value by specifying the history_size option in the third configuration file.


In the contents portion of an entry, Third Degree uses one of the following characters to represent each 32-bit longword that it examines:
Character Description
Dot (.) Indicates a longword that was never written in any of the objects, a definite sign of wasted memory. Further analysis is generally required to see if it is simply a deficiency of a test that never used this field; if it is a padding problem solved by swapping fields or choosing better types; or if this field is obsolete.
z Indicates a field whose value was always 0 (zero) in every object.
pp Indicates a pointer: that is, a 64-bit quantity that was a valid pointer into the stack, the static data area, or the heap; or was zero in every object.
ss Indicates a sometime pointer. This longword looked like a pointer in at least one of the objects, but not in all objects. It could be a pointer that is not initialized in some instances, or a union. However, it could also be the sign of a serious programming error.
i Indicates a longword that was written with some nonzero value in at least one object and that never contained a pointer value in any object.

Even if an entry is listed as allocating 100MB, it does not mean that at any point in time 100MB of heap storage were used by the allocated objects. It is a cumulative figure; it indicates that this point has allocated 100MB over the lifetime of the program. This 100MB may have been freed, may have leaked, or may still be in the heap. The figure simply indicates that this allocator has been quite active.

Ideally, the fraction of the bytes actually written should always be close to 100%. If it is much lower, some of what is allocated is never used. The common reasons why a low percentage is given include the following:



[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


7.5    Using Third Degree on Programs with Insufficient Symbolic Information

If the executable you instrumented contains too little symbolic information for Third Degree to pinpoint some program locations, Third Degree prints messages in which procedure names or file names or line numbers are unknown. For example:

------------------------------------------------------ rus -- 0 --
reading uninitialized stack at byte 40 of 176 in frame of main
    proc_at_0x1200286f0                libc.so
    pc = 0x12004a268                   libc.so
    main                               app
    __start                            app

Third Degree tries to print the procedure name in the stack trace, but if the procedure name is missing (because this is a static procedure), Third Degree prints the program counter in the instrumented program. This information enables you to find the location with a debugger. If the program counter is unavailable, Third Degree prints the address of the unnamed procedure.

More frequently, the file name or line number is unavailable because the program's symbol table is incomplete. In this case, Third Degree prints the name of the object in which the procedure was found. This object may be either the main application or a shared library.

If the lack of symbolic information is hampering your debugging, consider recompiling the program with more symbolic information. For C and C++ programs, recompile with the -g flag and link without the -x flag.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


7.6    Validating Third Degree Error Reports

The following spurious errors may occur in rare instances:

If you think that you have found a false positive, you can verify it by using the disassembler (dis) on the procedure in which the error was reported. All errors reported by Third Degree are detected at loads and stores in the application, and the line numbers shown in the error report match those shown in the disassembly output.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Chapter] [Index] [Help]


7.7    Undetected Errors

Third Degree can fail to detect real errors, such as the following: