12 Optimized Debugging

Version Note
The optimized debugging information described here is primarily supported by the ladebug debugger in Tru64 UNIX V5.1 and greater. The default compilers for Tru64 UNIX V5.1 do not generate this information.

Information to assist debugging of optimized code can be stored in the optimization symbol table. This information is generated by compilers, partitioned by procedure, and is not modified at link time. It includes information on semantic events, discontiguous scopes, inlined calls, and split lifetimes of variables.

For background information (including descriptions of the terms discussed here), see "Debugging Optimized Code: Concepts and Implementation on DIGITAL Alpha Systems, Brender, Nelson, and Arsenault, Digital Technical Journal, Vol 10, No 1, December 1998".

In general each type of optimized debugging information is stored in a PPODE (see Section 6.3.3) for the procedure to which it applies. The data makes frequent use of LEB and SLEB fields (see Section 1.4.6) to minimize the overall size of the optimized debugging information. Due to the variable length nature of the data, some records contained within the optimized debugging information sections cannot be accessed through fixed-length structures. Variable length records will be described in the following sections in terms of the fields they contain and the order in which those fields occur as the data is read in sequential byte order.

12.1 New or Changed Optimized Debugging Features

No changes have been introduced to optimized debugging information.

12.2 Structures, Fields, and Values for Optimized Debugging

12.2.1 OPTRNDX

        +------------+--------------+
        | RFD (LEB)  | INDEX (LEB)  |
        +------------+--------------+

An OPTRNDX is interpreted exactly like the RNDXR auxiliary described in Section 11.2.2.2, except that (because of the variable length value representation) no RFD_ESCAPE convention is needed or applies.

12.3 Optimized Debugging Usage

12.3.1 Semantic Events

Semantic events are those points in a program where the user-visible and user-relevant semantic actions of a program actually occur. For example, for an assignment statement, the instruction that stores into a user-declared variable is generally the location of a semantic event. (The event temporally occurs when that instruction is executed.) Semantic event locations are generally divided into these kinds:

assignments

control points (conditional transfers)

calls (and returns, including PALcalls)

labels

Semantic events are represented using a Per Procedure Optimization Data Entry of type PPODE_SEM_EVENT. For a given procedure there will be, at most, one instance of this PPODE type that describes the semantic event information for the entire procedure.

A semantic event PPODE consists of an array of Semantic Event Entries where:

The length of the array is specified by the ppode_len field in the PPODHDR.

Each element of the array is a PPODSE, a byte consisting of two 4-bit fields. The type definition and macros for accessing the fields can be found in the sym.h header file.

typedef struct {
    coff_ubyte  sem_event;  /* count:4, event:4 */
} PPODSE, *pPPODSE;
 
typedef PPODSE*    PPODSE_ARRAY;
 
#define PPODSE_count(ppode)             ( (ppode)       & 0x0f)
#define PPODSE_event(ppode)             (((ppode) >> 4) & 0x0f)
#define PPODSE_make(count, event)\
                (((event) & 0x0f) << 4) + ((count) & 0x0f)))

The event field is a 4-bit code that indicates the event being described. These codes are listed in Table 12-1.

The count field is a 4-bit value in the range 0 to 15 indicating the number of executable instructions following the previous event description to which this event applies. If more than 15 instructions separate events, then multiple event entries that indicate the null event are used to add up to the required separation. If more than one event applies to the same instruction, then the first event is encoded with the appropriate count, and subsequent events are encoded using a count of 0.

Note

The encoding of this field is not identical to the encoding of the count field of a line number entry. This count encodes the values from 0 to 15 rather than 1 to 16.

The first semantic event of each procedure must be a Label event with a count of zero. The address in the text section for this first instruction is specified in the procedure descriptor that points to the containing optimization section.

Typically (but not necessarily), the last semantic event entry will consist of the value 0x3n corresponding to the last RET instruction of the routine. There is no need to describe any out-of-line code or padding NOP instructions that may occur at the end of a routine following the last RET so long as they contain no semantic event locations.

Table 12-1: Semantic Event Codes

Name	Value	Description
`PPODSE_NONE`	0	No event (used for a count of 16 or more)
`PPODSE_WRITE`	1	Write event
`PPODSE_CONTROL`	2	Control (branch) event
`PPODSE_CALL`	3	Call or return event
`PPODSE_LABEL`	4	Label event
`PPODSE_INST_ONLY`	5	Instruction only event

12.3.2 Split Lifetime Variables

The split lifetime variable description is designed to supplement an existing symbol description. There are several reasons why split lifetime information needs to supplement, not entirely replace, a symbol's description. The most important is that the variable may be split in a compilation unit that is independent from the compilation unit that declares the variable. For example, consider a global variable. It is declared once, but there are potentially many independent compilation units that manipulate the variable. Because each compilation unit is independent, it is not possible to replace the global definition, because each compilation would have to know about the others in order to give a complete replacement definition.

The split lifetime description can be skipped by consumers who choose to ignore it. Those consumers will have some understanding of the variable (its name, type, and scope in which it appears), though less-accurate understanding of the symbol's location(s).

In addition to LEB encoding of integers (see Section 1.4.6), split lifetime information makes use of another key building block representation. A pointer into the local symbol table is represented as an OPTRNDX (see Section 12.2.1). An OPTRNDX consists of a relative file descriptor (RFD) index followed by a symbol index within the given file (INDEX). Both the RFD and INDEX components are represented as LEB integers.

Split lifetime variable information is represented using a Per Procedure Optimization Data Entry of type PPODE_SPLIT. For a given procedure there will be, at most, one instance of this PPODE type that describes all of the split lifetime variable information for a procedure.

The PPODE_SPLIT data consists of a sequence of split lifetime descriptions. Note that the end of the sequence is implied by the ppode_len field of the PPODHDR.

The split lifetime description for each variable consists of:

A target variable identifier

A child description scheme code

A counted sequence of child descriptions

12.3.2.1 Target Variable Identifier

A target variable identifier is a byte consisting of two 4-bit codes followed by either a null terminated string or an OPTRNDX. The type definition and macros for accessing the fields can be found in the sym.h header file.

typedef struct {
    coff_ubyte    target;    /* type:4, scheme:4 */
} PPODE_SPLIT_DESC, *pPPODE_SPLIT_DESC;
 
#define PPODE_SPLIT_DESC_type(ppode)    ( (ppode)       & 0x0f)
#define PPODE_SPLIT_DESC_scheme(ppode)  (((ppode) >> 4) & 0x0f)
#define PPODE_SPLIT_DESC_make(type, scheme) \
            (((event) & 0x0f) << 4) + ((count) & 0x0f)))

The type field indicates whether the target variable is found in the local or external symbol table. This will determine the manner in which the target variable is identified. Values for the type field can be found in Table 12-2.

The scheme field indicates how tools should interpret the target variable's default location and its child descriptions. Values for the scheme field can be found in Table 12-3. A description of how this field is used can be found in Section 12.3.2.2.

For target variables in the external symbol table, the symbol table entry is identified by name. The name is encoded as a null terminated character string. (For C++, these names are the mangled form.)

For target variables in the local symbol table, the symbol table entry is identified using an OPTRNDX (see Section 12.2.1).

Table 12-2: Split Lifetime Target Type Codes

Name	Value	Description
`PPODE_SPLIT_TYPE_EXT`	1	Target is in external symbol table
`PPODE_SPLIT_TYPE_LCL`	2	Target is in local symbol table

12.3.2.2 Child Description Scheme

There are three schemes used to identify split lifetime variable locations: the normal scheme, the normal but promoted scheme, and the duplicate scheme.

Normal Scheme

Normally each separate lifetime of a split lifetime variable may be allocated to a different location in memory (different registers and/or on the stack). In this scheme, each child description includes location information.

If there is a default location that is valid whenever none of the split lifetime children apply, that location is encoded directly in the target variable. This is typically the case for static and global variables.

If there is no default location, the storage class of the target variable is set to scUnallocated.

Normal But Promoted Scheme

This scheme is identical to the normal scheme, except for the interpretation of the default location. Tools should assume that there is no default location, regardless of the target variable's storage class. Prior to the introduction of split lifetime variable information, the default compilers for Tru64 UNIX set the location of a split lifetime variable to the location of its children when they are all the same. This practice is continued to allow debuggers that do not read split lifetime information to access split lifetime variables with varying degrees of accuracy.

Duplicate Scheme

There are two cases where the duplicate scheme is used:

When a Fortran subprogram with alternate entry points has a parameter that is passed in more than one entry, the local symbol table for the procedure will contain duplicate symbol table entries for the procedure's parameters.

More generally, any time that two user variables are allocated to exactly the same storage locations.

The entry that occurs first in the local symbol table will be the target of the appropriate normal split lifetime description. Subsequent symbols that share the identical split lifetime description, use a child description that consists of a single OPTRNDX that points to the first symbol table entry of the set.

Table 12-3: Split Lifetime Target Scheme Codes

Name	Value	Description
`PPODE_SPLIT_SCHEME_DEF`	1	Default or normal scheme
`PPODE_SPLIT_SCHEME_DEF_PROMOTED`	2	Normal but promoted scheme
`PPODE_SPLIT_SCHEME_DUP`	3	Duplicate scheme

12.3.2.3 Child Descriptions

Child descriptions (that are not a duplicate of some other variable) are represented as a count of the number of children and a description of each child. The count is a LEB integer. Each child is a triplet composed of:

A location

The PC range for which the given location is valid

The list of definition points that potentially assign a value to the split child

The child's location is represented as a sequence of three values:

The symbol type (see Table 11-1), represented as a LEB integer

The storage class (see Table 11-2), represented as a LEB integer

The value, represented as an SLEB integer, that would otherwise be used to represent a variable allocated in that same location in a normal symbol table entry

The child's PC range consists of an SLEB/LEB pair of values that represent instruction counts (not bytes).

For the first child description of a split lifetime variable, the first (SLEB) value gives the beginning of the range relative to the base address of the containing procedure as determined from the appropriate procedure descriptor. For subsequent children, the beginning of the range is relative to the instruction following the highest instruction specified in the previous range.

The second (LEB) value of each pair gives the number of instructions that are included in the range.

The child's definition points are represented as a LEB integer giving the number of definition points followed by an SLEB integer for each definition point giving its instruction count delta. For the child's first definition point, the SLEB value gives the address of an instruction relative to the instruction preceding the beginning of the range of instructions for this child. For subsequent definition points, the SLEB value gives the address of an instruction relative to the previous definition point address.

Object consumers may not make assumptions about the order in which child descriptions appear. Consumers may not make assumptions about the address ranges of child descriptions. In particular, the address ranges of two or more split children may overlap. (If this occurs, then more than one split child of the same variable is active within the overlapping range.) There is no significance to the order of child descriptions.

12.3.2.4 Split Lifetime Variable Example

Suppose that a Fortran parameter N is passed by reference in register R17, to a routine whose base address is 0x120001800. Suppose further that there are three split children as follows:

    from PC 0x120001818 to PC 0x120001818
        N is a VarRegister parameter in register 0x11
    from PC 0x12000181c to PC 0x12000181c
        N is a VarRegister parameter in register 0x00
    from PC 0x120001838 to PC 0x120001840
        N is a VarRegister parameter in register 0x00

This variable would be represented in PPODE_SPLIT optimization data as follows:

    .byte  2+1^4                             ! Local sym, normal scheme
    .byte  LEB(rfd_of_N), LEB(index_of_N)    ! OPTRNDX for variable N
    .byte  LEB(3)                            ! 3 split children
    .byte  LEB(stLocal),
           LEB(scVarRegister),
           SLEB(0x11)                        ! Var reg 17
    .byte      SLEB(24), LEB(1)              ! Range
    .byte      LEB(1), SLEB(-5)              ! 1 definition at entry
    .byte  LEB(stLocal),
           LEB(scVarRegister),
           SLEB(0x00)                        ! Var reg 0
    .byte      SLEB(0), LEB(1)               ! Range
    .byte      LEB(1), SLEB(-6)              ! 1 definition at entry
    .byte  LEB(stLocal),
           LEB(scVarRegister),
           SLEB(0x00)                        ! Var reg 0
    .byte      SLEB(24), LEB(2)              ! Range
    .byte      LEB(1), SLEB(0)               ! 1 definition at
                                             !   preceding instruction

12.3.3 Discontiguous Scopes

A discontiguous scope is a scope whose set of addresses cannot be represented as a single contiguous range of address values. Discontiguous scopes arise frequently in the context of optimized code, including notably inlining; however, it also can occur even in non-optimized code (compiled -O0). Discontiguous scopes may also result from the use of post-link optimization tools such as om and spike.

The single contiguous range of addresses described by a pair of stBegin/ stEnd symbol table entries is artificially normalized in two ways that can misrepresent the actual scope:

If two addresses are in the same scope, then all intermediate addresses are considered to be contained in that same scope -- even if they are not.

Some addresses, notably for out-of-line code sequences, are not considered to be part of any scope (other than the outermost scope for a complete routine.)

Together these normalizations imply that a scope may both contain addresses that it should not as well as not contain addresses that it should. In the absence of the ability to represent discontiguous scopes, these normalizations are helpful and desirable to support reasonable debugging of non-optimized code. However, for optimized code, they can lead to misleading scope descriptions.

For compatibility with debuggers that do not read this information, discontiguous scope information is intended to replace, or supersede, the range described by the stBegin/ stEnd pair of symbol table entries.

Discontiguous scope information is represented using a Per Procedure Optimization Data Entry of type PPODE_DISCONTIG_SCOPE. For a given procedure there will be, at most, one instance of this PPODE type that describes all of the discontiguous scope information for a procedure. Note, however, that not every scope in a procedure necessarily has a discontiguous range just because some do. If the range of a scope is correctly described by a single contiguous range using the stBegin/ stEnd pair of symbol table entries, then a redundant description for that scope is not required in the discontiguous scope information.

The PPODE_DISCONTIG_SCOPE optimization data consists of a sequence of discontiguous lifetime descriptions. Note that the end of the sequence is implied by the ppode_len field of the PPODHDR.

Each discontiguous lifetime description consists of a target scope identifier and a counted sequence of address ranges. The target scope identifier consists of an OPTRNDX (see Section 12.2.1) that points to the applicable scope. This is followed by a LEB integer representing the count of address ranges.

Each address range consists of an SLEB/LEB pair of values, that represent instruction counts (not bytes).

For the first range of a discontiguous scope, the first (SLEB) value gives the beginning of the range relative to the beginning address of the target scope. (This beginning address may not be exact, but it will usually be close.) For subsequent ranges, the beginning of the range is relative to the second instruction following the highest instruction specified in the previous range.

The second (LEB) value of each pair gives the number of instructions that are included in the range.

Consumers may not make any assumptions about the order in which address ranges occur.