5    Symbol Table

One of the chief tasks of the compilation process is the production of a symbol table, which is a collection of data structures whose purpose is to store type, scope, and address information about program data. Compilers and assemblers create the symbol table. It is read and may be modified by linkers, profiling tools, and assorted object manipulation tools. It also contains information required for debugging.

For large applications, a single compilation can involve many program components, including source files, header files, and libraries. Data from all of these files must be described in the symbol table.

The Tru64 UNIX eCOFF symbol table, when present, comprises a large portion of the physical object file and is often considered a stand-alone entity. It is divided into numerous sections, including a header section that is used for navigation. The contents of the symbol table are shown in Figure 5-1.

Figure 5-1:  Symbol Table Sections

The symbol table has a hierarchical design. The sections storing local symbols, local strings, relative file descriptors, procedure descriptors, line numbers, auxiliary symbols, and optimization symbols are divided into subtables and organized by file. Local symbols, local strings, and optimization symbols are further broken down by procedure. Figure 5-2 depicts this hierarchy.

Figure 5-2:  Symbol Table Hierarchy

A particular symbol table may not contain all sections, for one of the following reasons:

The function of each symbol table section is summarized below:

Several tools are available to view the contents of the symbol table. See the stdump(1), odump(1), and nm(1) man pages.

This chapter covers symbol table organization and usage, concentrating on debugging issues in particular. The current version of the symbol table is V3.13 . The dynamic symbol table built by the linker is discussed separately in Section 6.3.3.

5.1    New or Changed Symbol Table Features

Tru64 UNIX V5.1 includes the following new or changed features:

Version 3.13 of the symbol table includes the following new or changed features:

5.2    Structures, Fields and Values for Symbol Tables

Unless otherwise specified, all structures described in this section are declared in the header file sym.h , and all constants are defined in the header file symconst.h.

5.2.1    Symbolic Header (HDRR)

typedef struct {
        coff_ushort     magic;          
        coff_ushort     vstamp;         
        coff_int        ilineMax;       
        coff_int        idnMax;         
        coff_int        ipdMax;         
        coff_int        isymMax;        
        coff_int        ioptMax;        
        coff_int        iauxMax;        
        coff_int        issMax;         
        coff_int        issExtMax;      
        coff_int        ifdMax;         
        coff_int        crfd;           
        coff_int        iextMax;        
        coff_long       cbLine;         
        coff_off        cbLineOffset;   
        coff_off        cbDnOffset;     
        coff_off        cbPdOffset;     
        coff_off        cbSymOffset;    
        coff_off        cbOptOffset;    
        coff_off        cbAuxOffset;    
        coff_off        cbSsOffset;     
        coff_off        cbSsExtOffset;  
        coff_off        cbFdOffset;     
        coff_off        cbRfdOffset;    
        coff_off        cbExtOffset;    
} HDRR, *pHDRR;

SIZE - 144 bytes, ALIGNMENT - 8 bytes

Symbolic Header Fields

magic

To verify validity of the symbol table, this field must contain the constant magicSym, defined as 0x1992.

vstamp

Symbol table version stamp. This value consists of a major version number and a minor version number, as defined in the stamp.h header file:

Symbol Value Description
MAJ_OBJ_STAMP 3 Current major object format version
MIN_OBJ_STAMP 13 Current minor object format version

See Section 1.4.5 for a description of object and symbol table versioning.

ilineMax

Number of line number entries (if expanded).

idnMax

Obsolete.

ipdMax

Number of procedure descriptors.

isymMax

Number of local symbols.

ioptMax

Byte size of optimization symbol table.

iauxMax

Number of auxiliary symbols.

issMax

Byte size of local string table.

issExtMax

Byte size of external string table.

ifdMax

Number of file descriptors.

crfd

Number of relative file descriptors.

iextMax

Number of external symbols.

cbLine

Byte size of (packed) line number entries.

cbLineOffset

Byte offset to start of (packed) line numbers.

cbDnOffset

Obsolete.

cbPdOffset

Byte offset to start of procedure descriptors.

cbSymOffset

Byte offset to start of local symbols.

cbOptOffset

Byte offset to start of optimization entries.

cbAuxOffset

Byte offset to start of auxiliary symbols.

cbSsOffset

Byte offset to start of local strings.

cbSsExtOffset

Byte offset to start of external strings.

cbFdOffset

Byte offset to start of file descriptors.

cbRfdOffset

Byte offset to start of relative file descriptors.

cbExtOffset

Byte offset to start of external symbols.

General Notes:

The size and offset fields describing symbol table sections must be set to zero if the section described is not present.

The cb*Offset fields are byte offsets from the beginning of the object file.

The i*Max fields contain the number of entries for a symbol table section. Legal index values for a symbol table section will range from 0 to the value of the associated i*Max field minus one.

For an explanation of packed and expanded line number entries, see the discussion in Section 5.3.2.2.

5.2.2    File Descriptor Entry (FDR)

typedef struct fdr {
        coff_addr       adr;    
        coff_long       cbLineOffset;   
        coff_long       cbLine;         
        coff_long       cbSs;           
        coff_int        rss;            
        coff_int        issBase;        
        coff_int        isymBase;      
        coff_int        csym;         
        coff_int        ilineBase;    
        coff_int        cline;       
        coff_int        ioptBase;   
        coff_int        copt;      
        coff_int        ipdFirst;       
        coff_int        cpd;            
        coff_int        iauxBase;      
        coff_int        caux;        
        coff_int        rfdBase;    
        coff_int        crfd;           
        coff_uint       lang : 5;      
        coff_uint       fMerge : 1;  
        coff_uint       fReadin : 1;
        coff_uint       fBigendian : 1;
        coff_uint       glevel : 2;    
        coff_uint       fTrim : 1;    
#ifndef TANDEMSYM
        coff_uint       reserved : 5;  
#else
        coff_uint       platform : 3;   (not supported)
        coff_uint       reserved : 2;
#endif
        coff_ushort     vstamp;         (SV3.13 - )
        coff_uint       reserved2;
} FDR, *pFDR;

SIZE - 96 bytes, ALIGNMENT - 8 bytes

See Section 5.3.2.1 for related information.

File Descriptor Table Entry Fields

adr

Address of first instruction generated from this source file, which should be the same value as found in the PDR.adr field of the first procedure descriptor for this file. If no instructions are associated with this source file, this field should be set to 0 . File descriptors that have been merged by source language in locally-stripped objects will have this field set to addressNil (-1).


Version Note

This use of addressNil is supported in symbol table format V3.13 and greater.


cbLineOffset

Byte offset from start of packed line numbers to start of entries for this file.

cbLine

Byte size of packed line numbers for this file.

cbSs

Byte size of local string table entries for this file.

rss

Byte offset from start of file's local string table entries to source file name; set to issNil (-1) to indicate the source file name is unknown.

issBase

Start of local strings for this file.

isymBase

Starting index of local symbol entries for this file.

csym

Count of local symbol entries for this file.

ilineBase

Debuggers and other tools expand the packed line numbers, producing an array of line numbers with an entry for each machine instruction in the program. This field is an index for this file's first line number entry in the expanded line number array.

cline

See the preceding description of ilineBase. This field is a count of this file's entries in the expanded line number array.

ioptBase

Byte offset from start of optimization symbol table to optimization symbol entries for this file.

copt

Byte size of optimization symbol entries for this file.

ipdFirst

Starting index of procedure descriptors for this file.

cpd

Count of procedure descriptors for this file.

iauxBase

Starting index of auxiliary symbol entries for this file.

caux

Count of auxiliary symbol entries for this file.

rfdBase

Starting index of relative file descriptors for this file.

crfd

Count of relative file descriptors for this file.

lang

Source language for this file (see Table 5-1).

fMerge

Informs linker whether this file can be merged.

fReadin

True if file was read in (as opposed to just created).

fBigendian

Unused.

glevel

Symbolic information level with which this file was compiled. This value is not the same as the user's idea of debugging levels. The value mapping from the user level -g option to the symbol table value is:

Debug switch glevel contents
-g0 2
-g1 1
-g2 0
-g3 3

fTrim

Unused.

platform

Identifies the platform associated with the file descriptor. Set to platUndef, platGuard, platOss, or platPc.


Version Note

The platform field is reserved for use on Tandem big-endian systems. It is not supported on Tru64 UNIX


vstamp

Symbol table version stamp ( HDRR.vstamp) value from the original object module (.o file) that is recorded by the linker. The linker may combine objects that were compiled at different times and potentially contain different versions of the symbol table. In post-link objects, this value may or may not match the version stamp in the symbolic header. For pre-link objects, the value in this field will either be zero or the same as the symbolic header stamp.


Version Note

The vstamp field is supported on Tru64 UNIX V5.0 and greater for symbol table version V3.13 and greater.


reserved

Must be zero.

reserved2

Must be zero.

General Notes:

The i*Base fields provide the starting indices of this file's subtables within the symbol table sections. If the associated count fields are set to 0, the base fields will also be set to zero.

For an explanation of packed and expanded line number entries, see the discussion in Section 5.3.2.2.

Table 5-1:  Source Language (lang) Constants

Name Value Commant
langC 0  
langPascal 1  
langFortran 2  
langAssembler 3  
langMachine 4  
langNil 5  
langAda 6  
langPl1 7  
langCobol 8  
langStdc 9  
langMIPSCxx 10 Unused.
langDECCxx 11  
langCxx 12  
langFortran90 13 Not used by all compilers - langFortran might be used instead for both f77 and f90
langBliss 14  
langPTAL 15 (not supported)
langCplusplusV1 16 (not supported)
langCplusplusV2 17 (not supported)
langMax 31 Number of language codes available


Version Note

The language constants langPTAL, langCplusplusV1, and langCplusplusV2 are reserved for use on Tandem big-endian systems. They are not supported on Tru64 UNIX.


5.2.3    Procedure Descriptor Entry (PDR)

#ifndef TANDEMSYM
struct pdr {
#else
struct pdrv4 {
#endif
        coff_addr       adr;    
        coff_long       cbLineOffset;   
        coff_int        isym;          
        coff_int        iline;        
        coff_uint       regmask;     
        coff_int        regoffset;  
        coff_int        iopt;          
        coff_uint       fregmask;     
        coff_int        fregoffset;  
        coff_int        frameoffset;
        coff_int        lnLow;          
        coff_int        lnHigh;        
        coff_uint       gp_prologue : 8; 
        coff_uint       gp_used : 1;   
        coff_uint       reg_frame : 1;
        coff_uint       prof : 1;      
        coff_uint       gp_tailcall : 1;   (V5.1 - )
#ifndef TANDEMSYM
        coff_uint       reserved : 12; 
#else
        coff_uint       optlevel : 4;      (not supported)
        coff_uint       reserved : 8;
#endif
        coff_uint       localoff : 8; 
        coff_ushort     framereg;     
        coff_ushort     pcreg;         
#ifdef TANDEMSYM
        coff_uint       proctype : 16;     (not supported)
        coff_uint       reserved2 : 48;
} PDRV4, *pPDRV4;
#else
} PDR, *pPDR;
#endif

SIZE - 64 bytes (72 bytes for Tandem), ALIGNMENT - 8 bytes

See Section 5.3.4 for related information.

Procedure Descriptor Table Entry Fields

adr

The start address of this procedure. Set to addressNil (-1) for procedures with no text.


Version Note

Prior to symbol table format V3.13 this field may not be updated by the linker. To determine the procedure start address for symbol table formats V3.10 - V3.12, use the algorithm described in Section 5.3.4.1.


cbLineOffset

Byte offset to the start of this procedure's packed line numbers from the start of the file descriptor entry ( FDR.cbLineOffset).

isym

Start of local symbols for this procedure. This symbol is the symbol for the procedure (symbol type stProc). The name of the procedure can be obtained from the iss field of the symbol table entry.

If the object is stripped of local symbol information, this field contains an external symbol table index for the procedure symbol's entry.

If this procedure has no symbols associated with it, this field should be set to isymNil (-1). This situation occurs for a static procedure in an object stripped of local symbol information.

iline

Start of line number entries (if expanded) for this procedure. Set to ilineNil (-1) to indicate that this procedure does not have line numbers.

regmask

Saved general register mask.

regoffset

Offset from the virtual frame pointer to the general register save area in the stack frame.

iopt

Start of procedure's optimization symbol entries. Set to ioptNil (-1) to indicate that this procedure does not have optimization symbol entries.

fregmask

Saved floating-point register mask.

fregoffset

Offset from the virtual frame pointer to the floating-point register save area in the stack frame.

frameoffset

Size of the fixed part of the stack frame. The actual frame size can exceed this value. A routine can extend its own frame size for frame sizes larger than 2 GB or for dynamic stack allocation requests.

lnLow

Lowest source line number within this file for the procedure. This is typically the line number of the first instruction in the procedure, but not always. Code optimizations can rearrange or remove instructions making the first instruction map to a different line number.

lnHigh

Highest source line number within this file for the procedure. This field contains a value of -1 for alternate entry points, which is how an alternate entry point is identified.

gp_prologue

Byte size of GP prologue.

gp_used

Flag set if the procedure uses GP.

reg_frame

True if the procedure is a light-weight or null-weight procedure. See the General Notes section following these definitions for more details on procedure weights.

prof

True if the procedure has been compiled with -pg for gprof profiling.

gp_tailcall

Indicates that a call to this procedure may result in a tail call return from a different GP domain. This bit is used exclusively for tail call optimizations.


Version Note

The gp_tailcall field is supported in Tru64 UNIX V5.1 and greater.


optlevel

Optimization level. Set to 0 for unknown or 1 through 6 for optimization levels 0 through 5 respectively.


Version Note

The optlevel field is used on Tandem big-endian systems. It is not supported on Tru64 UNIX.


reserved

Must be zero.

localoff

Bias value for accessing local symbols on the stack at run time.

framereg

Frame pointer register number.

pcreg

PC (Program Counter) register number.

proctype

Procedure attribute flags. See Table 5-2 for flag descriptions.


Version Note

The proctype field and the associated flag values in Table 5-2 are reserved for use on Tandem big-endian systems. They are not supported on Tru64 UNIX.


Table 5-2:  Procedure Attribute Flags

Flag Value Description
TNDM_MAIN 0x0001 Main entry point
TNDM_RESIDENT 0x0002 Resident routine
TNDM_PRIVILEGED 0x0004 Privileged routine
TNDM_CALLABLE 0x0008 Callable routine
TNDM_ENTRY 0x0010 Alternate entry, procedure, or subprocedure
TNDM_SUBPROC 0x0020 Subprocedure
TNDM_INTERRUPT 0x0040 Interrupt routine
TNDM_SHELL 0x0080 Shell routine
TNDM_COMPILER_GENERATED 0x0200 Procedure can have multiple copies
TNDM_EXTENSIBLE 0x0800 Extensible procedure
TNDM_EDITLINE 0x8000 Edit line numbers

General Notes:

For more information on call frames, see Section 5.3.4.2.

If the value of gp_prologue is zero and gp_used is 1, a gp prologue is present but was scheduled into the procedure prologue. Otherwise, the gp_prologue field gives the number of bytes occupied by the GP prologue instructions at the procedure's start address.

If there is a chain of tail call procedures, some of which are in the same GP domain, and some that are in a different GP domain, then gp_tailcall must be set for all procedures in the chain. For example, suppose there is a tail call from A to B, and a tail call from B to C. A and B are in the same GP domain, but C is in a different GP domain. In this case gp_tailcall must be set in both A's and B's PDR, because callers can't rely on the standard definition of GP after calling A. See the Alpha Architecture Reference Manual for additional details.

For an explanation of packed and expanded line number entries, see the discussion in Section 5.3.2.2.

A procedure may be heavy-, light-, or null-weight. The weight of a procedure can be determined from its descriptor by using the following guidelines:

Weight Indications
Heavy reg_frame is 0 and bit 26 of the register mask (regmask) is on
Light reg_frame is 1 and regoffset is ra_save
Null reg_frame is 1 and regoffset is 26

See the Calling Standard for Alpha Systems for details on the calling conventions for different weight procedures. Note that a calling routine does not need to know the weight of the routine being called.

5.2.4    Line Number Entry (LINER)

Line numbers are represented using two formats: packed and expanded. The packed format is a byte stream that can be interpreted as described in Section 5.3.2.2 to build an expanded table that maps instructions to source line numbers. The LINER type is used to refer to a single entry in the expanded table. It is declared as:

typedef int LINER, *pLINER;

A second, newer form of line number information is located in the optimization symbols section. See Section 5.2.10 and Section 5.3.2.2.

5.2.5    Local Symbol Entry (SYMR)

typedef struct {
        coff_long       value;       
        coff_int        iss;           
        coff_uint       st : 6;      
        coff_uint       sc  : 5;    
        coff_uint       reserved : 1; 
        coff_uint       index : 20; 
} SYMR, *pSYMR;

SIZE - 16 bytes, ALIGNMENT - 8 bytes

See Section 5.2.11, Section 5.3.4, and Section 5.3.8 for related information.

Local Symbol Table Entry Fields

value

A field that can contain an address, size, offset, or index. Its interpretation is determined by the symbol type and storage class combination, as explained in Section 5.2.11.

iss

Byte offset from the issBase field of a file descriptor table entry to the name of the symbol. If the symbol does not have a name, this field is set to issNil (-1). Generally, all user-defined symbols have names. A symbol without a name is one that has been created by the compilation system for its own use.

st

Symbol type (see Table 5-3).

sc

Storage class (see Table 5-4).

reserved

Must be zero.

index

An index into either the local symbol table or auxiliary symbol table, depending on the symbol type and class. The index is used as an offset from the isymBase field in the file descriptor entry for an entry in the local symbol table or an offset from the iauxBase field for an entry in the auxiliary symbol table.

The index field may have a value of indexNil, which is defined as (long)0xfffff. This value is used to indicate that the index is not a valid reference.

The next two tables contain all defined values for the st and sc constants, along with short descriptions. However, these fields must be considered as pairs that have a limited number of possible pairings as explained in Section 5.2.11.

Table 5-3:  Symbol Type (st) Constants

Constant Value Description
stNil 0 Dummy entry
stGlobal 1 Global variable
stStatic 2 Static variable
stParam 3 Procedure argument
stLocal 4 Local variable
stLabel 5 Label
stProc 6 Global procedure
stBlock 7 Start of block
stEnd 8 End of block, file, or procedure
stMember 9 Member of class, structure, union, or enumeration
stTypedef 10 User-defined type definition
stFile 11 Source file name
stStaticProc 14 Static procedure
stConstant 15 Constant data
stBase 17 Base class (for example, C++)
stVirtBase 18 Virtual base class (for example, C++)
stTag 19 Data structure tag value (for example, C++ class or struct)
stInter 20 Interlude (for example, C++)
stModule 22 (not yet implemented) Fortran90 module definition.
stNamespace 22 (V5.0 - ) Namespace definition (for example, C++)
stModview 23 (not yet implemented) Modifiers for current view of given module.
stUsing 23 (V5.0 - ) Namespace use (for example, C++ "using").
stAlias 24 (V5.0 - ) Defines an alias for another symbols. Currently, only used for namespace aliases.
stDefine 25 (not supported) Macro definition
stObjinfo 26 (not supported) Name/data object info
stToolinfo 27 (not supported) Compiler info
stSrcinfo 28 (not supported) Source data info
stEquivRel 29 (not supported) Equivalence variable
stMax 64 Maximum number of symbol types

General Notes:

Symbol type codes with more than one interpretation are identified by the lang field in the associated file descriptor. This applies to the stModule/ stNamespace and stModview/ stUsing symbol types.


Version Note

The symbol types: stDefine, stObjinfo, stToolinfo, stSrcinfo, and stEquivRel are reserved for use on Tandem big-endian systems. They are not supported on Tru64 UNIX.


Table 5-4:  Storage Class (sc) Constants

Constant Value Description
scNil 0 Dummy entry
scText 1 Symbol allocated in the .text section
scData 2 Symbol allocated in the .data section
scBss 3 Symbol allocated in the .bss section
scRegister 4 Symbol allocated in a register
scAbs 5 Symbol value is absolute
scUndefined 6 Symbol referenced but not defined in the current module
scUnallocated 7 Storage not allocated for this symbol
scResText 8 (not supported) Resident text
scTlsUndefined 9 TLS symbol referenced but not defined in the current module
scInfo 11 Symbol contains debugger information
scSData 13 Symbol allocated in the .sdata section
scSBss 14 Symbol allocated in the .sbss section
scRData 15 Symbol allocated in the .rdata section
scVar 16 Parameter passed by reference (for example, Fortran or Pascal)
scCommon 17 Common symbol
scSCommon 18 Small common symbol
scVarRegister 19 Parameter passed by reference in a register
scVariant 20 Variant record (for example, Pascal or Ada)
scFileDesc 20 File descriptor (for example, COBOL)
scSUndefined 21 Small undefined symbol
scInit 22 Symbol allocated in the .init section
scReportDesc 23 Report descriptor (for example, COBOL)
scXData 24 Symbol allocated in the .xdata section
scPData 25 Symbol allocated in the .pdata section
scFini 26 Symbol allocated in the .fini section
scRConst 27 Symbol allocated in the .rconst section
scTlsCommon 29 TLS common symbol
scTlsData 30 Symbol allocated in the .tlsdata section
scTlsBss 31 Symbol allocated in the .tlsbss section
scMax 32 Maximum number of storage classes


Version Note

The scResText storage class is reserved for use on Tandem big-endian systems. It is not supported on Tru64 UNIX.


5.2.6    External Symbol Entry (EXTR)

typedef struct {
        SYMR          asym;     
        coff_uint     jmptbl : 1;    
        coff_uint     cobol_main : 1;  
        coff_uint     weakext : 1;
        coff_uint     alignment : 4;   (V5.1 - )
#ifdef TANDEMSYM
        coff_uint     xport : 1;       (not supported)
        coff_uint     multiext : 1;    (not supported)
        coff_uint     reserved : 23;
#else
        coff_uint     reserved:25; 
#endif
        coff_int      ifd;         
} EXTR, *pEXTR;

SIZE - 24 bytes, ALIGNMENT - 8 bytes

External Symbol Table Entry Fields

asym

External symbol table entry. This structure has the same format as a local symbol entry. The field interpretations differ as described in the following entries.

asym.value

Contains the symbol address for most defined symbols. See Section 5.2.11 for details.

asym.iss

Byte offset in external string table to symbol name. Set to issNil (-1) if there is no name for this symbol.

asym.st

Symbol type. See Table 5-3 for possible values.

asym.sc

Storage class. See Table 5-4 for possible values.

asym.reserved

Must be zero.

asym.index

Contains either an index into the auxiliary symbol table for a type description or an index into the local symbol table pointing to a related symbol.

The index field may have a value of indexNil, which is defined as (long)0xfffff. This value is used to indicate that the index is not a valid reference.

jmptbl

Unused.

cobol_main

Flag set to indicate that the symbol is a COBOL main procedure.

weakext

Flag set to identify the symbol as a weak external. See Section 6.3.4.2 for more details on weak symbols.

alignment

Power of two byte alignment for common storage class symbols biased by 2^3 (8). Supported values range from 0 through 13 yielding a minimum alignment of 8 bytes and a maximum alignment of 64K bytes. For symbols with storage classes other than scCommon and scSCommon this field should be ignored.


Version Note

The alignment field is supported on Tru64 UNIX V5.1 and greater.


xport

Flag set to indicate the symbol is to be exported from a shared library.


Version Note

The xport field is reserved for use on Tandem big-endian systems. It is not supported on Tru64 UNIX.


multiext

Flag set to indicate that multiple definitions of the symbol are allowed.


Version Note

The multiext field is reserved for use on Tandem big-endian systems. It is not supported on Tru64 UNIX.


reserved

Must be zero.

ifd

Index of the file descriptor where the symbol is defined. Set to ifdNil (-1) for undefined symbols and for some compiler system symbols.

5.2.7    Relative File Descriptor Entry (RFDT)

The relative file descriptor table provides a post-link mapping of file descriptor indices. The purpose of this table is to minimize work for the linker, which does not update symbol table references to local symbols. This information is used to obtain the file offset used to bias local symbol indices. Because this table is also known as the File Indirect Table, two declarations are included in the sym.h header file, as shown here.

typedef int RFDT, *pRFDT;
typedef int FIT, *pFIT;

SIZE - 4 bytes, ALIGNMENT - 4 bytes

See Section 5.3.2.1 for related information.

5.2.8    Auxiliary Symbol Table Entry (AUXU)

The auxiliary symbol table entry is a 32-bit union. It is either interpreted as a TIR or RNDXR structure or as an integer value. See Section 5.3.7.3 for detailed instructions on reading the auxiliary symbols.

typedef union {
        TIR             ti;             
        RNDXR           rndx;          
        coff_int        dnLow;        
        coff_int        dnHigh;      
        coff_int        isym;       
        coff_int        iss;       
        coff_int        width;    
        coff_int        count;   
        coff_int        slice;    (V5.0a)
} AUXU, *pAUXU;

SIZE - 4 bytes, ALIGNMENT - 4 bytes

See Section 5.3.7.3 for related information.

Auxiliary Symbol Table Entry Fields

ti

Type information record (TIR), as defined in Section 5.2.8.1.

rndx

Relative index into local or auxiliary symbols (rndx), as defined in Section 5.2.8.2.

dnLow

Lower bound of range or array dimension. For large structures, two of these fields can be used together to form one 64-bit number.

dnHigh

Upper bound of range or array dimension. For large structures, two of these fields can be used together to form one 64-bit number.

isym

For procedures ( stProc or stStaticProc symbols), this field is an index into the local symbols. It is also used as an index into the relative file descriptors.

iss

Unused.

width

Width of a bit field or array stride in bits. Fortran compilers set the array stride to the array element size in bits. Two of these fields can be used together to form one 64-bit number.

count

Count of ranges for variant arm. This field name is only used within the type description of a variant block ( stBlock, scVariant).

slice

Reserved.

General Notes:

The fields dnLow, dnHigh, or width must all use either the 32-bit or 64-bit representation when used together. For example, an array dimension cannot be specified with a 32-bit dnLow and a 64-bit dnHigh.

5.2.8.1    Type Information Record (TIR)

typedef struct {
        coff_uint       fBitfield : 1;
        coff_uint       continued : 1;
        coff_uint       bt  : 6;     
        coff_uint       tq4 : 4;
        coff_uint       tq5 : 4;
        coff_uint       tq0 : 4;
        coff_uint       tq1 : 4;    
        coff_uint       tq2 : 4;
        coff_uint       tq3 : 4;
} TIR, *pTIR;

SIZE - 4 bytes, ALIGNMENT - 4 bytes

Type Information Record Entry Fields

fBitfield

Flag set if bit width is specified.

continued

Flag set to indicate that the type description is continued in another TIR record. This will happen if the type is represented with more than six type qualifiers.

bt

Basic type (see Table 5-5 and Section 5.3.7.1).

tq0, tq1, tq2, tq3, tq4, tq5

Type qualifiers (see Table 5-6 and Section 5.3.7.2). The lower-numbered tq fields must be used first, and all unneeded fields must be set to tqNil (0).

Table 5-5:  Basic Type (bt) Constants

Constant Value Description
btNil 0 Undefined or void
btAdr32 1 Address (32 bits)
btChar 2 Character
btUChar 3 Unsigned character
btShort 4 Short (16 bits)
btUShort 5 Unsigned short (16 bits)
btInt 6 Integer (32 bits)
btUInt 7 Unsigned integer (32 bits)
btLong32 8 Long (32 bits)
btULong32 9 Unsigned long (32 bits)
btFloat 10 Floating point
btDouble 11 Double-precision floating point
btStruct 12 Structure or record
btUnion 13 Union
btEnum 14 Enumeration
btTypedef 15 Defined by means of a user-defined type definition
btRange 16 Range of values (for example, Pascal subrange)
btSet 17 Sets (for example, Pascal)
btComplex 18 Single complex (for example, Fortran COMPLEX*8)
btDComplex 19 Double complex (for example, Fortran COMPLEX*16)
btIndirect 20 Indirect definition; following rndx points to an entry in the auxiliary symbol table that contains a TIR (type information record)
btFixedBin 21 Fixed binary (for example, COBOL)
btDecimal 22 Packed or unpacked decimal (for example, COBOL)
btPicture 25 Picture (for example, COBOL)
btVoid 26 Void
btPtrMem 27 Currently unused
btScaledBin 27 Scaled binary (for example, COBOL)
btVptr 28 Virtual function table (for example, C++)
btArrayDesc 28 Array descriptor (for example, Fortran, Pascal)
btClass 29 Class (for example, C++)
btLong64 30 Address (64 bits)
btLong 30 Long (64 bits)
btULong64 31 Unsigned long (64 bits)
btULong 31 Unsigned long (64 bits)
btLongLong 32 Long long (64 bits)
btULongLong 33 Unsigned long long (64 bits)
btAdr64 34 Address (64 bits)
btAdr 34 Address (64 bits)
btInt64 35 Integer (64 bits)
btUInt64 36 Unsigned integer (64 bits)
btLDouble 37 Long double floating point (128 bits)
btInt8 38 Integer (64 bits)
btUInt8 39 Unsigned integer (64 bits)
btRange_64 41 (V5.0 - ) 64-bit range
btProc 42 (V5.0 - ) Procedure or function
btCobolIndex 43 (not supported) COBOL index variables
btReal32 44 (not supported) Tandem float
btReal64 45 (not supported) Tandem double
btQComplex 46 (V5.1 - ) Quad complex (for example Fortran COMPLEX*32)
btChecksum 63 Symbol table checksum value stored in auxiliary record
btMax 64 Number of basic type codes

Table Notes:

  1. btInt and btLong32 are synonymous.

  2. btUInt and btULong32 are synonymous.

  3. btLong, btLong64, btLongLong, btInt64, and btInt8 are synonymous.

  4. btULong64, btULongLong, btUInt64, and btUInt8 are synonymous.


Version Note

The basic type constants: btCobolIndex, btReal32, and btReal64 are reserved for use on Tandem big-endian systems. They are not supported on Tru64 UNIX.


Table 5-6:  Type Qualifier (tq) Constants

Constant Value Description
tqNil 0 No qualifier (placeholder)
tqPtr 1 Pointer
tqProc 2 (obsolete) Procedure or function
tqArray 3 Array
tqFar 4 32-bit pointer; used with the -xtaso emulation
tqVol 5 Volatile
tqConst 6 Constant
tqRef 7 Reference
tqArray_64 8 (V5.0 - ) Large array
tqHasLen 9 (not supported) Length for buffer parameters
tqShar 10 (V5.0a - ) Reserved
tqSharArr_64 11 (V5.0a - ) Reserved
tqMax 16 Number of type qualifier codes


Version Note

The tqHasLen type qualifier is reserved for use on Tandem big-endian systems. It is not supported on Tru64 UNIX.


5.2.8.2    Relative Symbol Record (RNDXR)

typedef struct {
        coff_uint       rfd : 12;    
        coff_uint       index : 20; 
} RNDXR, *pRNDXR;

SIZE - 4, ALIGNMENT - 4

Relative Symbol Record Fields

rfd

Index into relative file descriptor table if it exists; otherwise, index into file descriptor table.

This field may have a value of ST_RFDESCAPE, defined as 0xfff in the header file cmplrs/stsupport.h. This value is used to indicate that the next auxiliary entry, interpreted as an isym, contains the actual rfd index.

index

Symbol index. Used as an offset from either FDR.isymbase or FDR.iauxbase, depending on context.

5.2.9    String Table

Objects can contain two string tables: the local string table (corresponding to local symbols) and the external string table (corresponding to external symbols). The local string table is present only for objects created with full debugging information; it is removed if an object is locally stripped.

The storage format for the string tables is a list of null-terminated character strings. It is correctly considered as one long character array, not an array of strings. Fields in the symbolic header and file headers represent string table sizes and offsets in bytes.

5.2.10    Optimization Symbol Entry (PPODHDR)

The optimization symbol table contains information for optimized debugging, basic block profiling, and other miscellaneous procedure-specific data. Each procedure's associated optimization symbol table data begins with an array of PPODHDR structures. See Section 5.3.3 for a description of the optimization symbol table.


Version Note

The following structure definition is for Tru64 UNIX V5.0 and greater. It is used for symbol table format V3.13 and greater.


typedef struct {
        coff_uint       ppode_tag;
        coff_uint       ppode_len;
        coff_ulong      ppode_val;
} PPODHDR, *pPPODHDR;

SIZE - 16 bytes, ALIGNMENT - 8 bytes

Optimization Symbol Entry Fields

ppode_tag

Identifies the kind of data described by this entry.

ppode_len

Indicates the size in bytes of the data that is found in the raw data area for this entry. When this field is zero, the only data is stored in the ppode_val field.

ppode_val

This field is either a pointer to the entry's data or is itself the data. If ppode_len is nonzero, this field is a relative file offset from the beginning of the current PPOD (Per-Procedure Optimization Descriptor ) to the applicable data area. If ppode_len is zero, this field contains the data for the entry.

A PPOD contains multiple PPODHDRs. A PPODHDR and its associated data are collectively referred to as a PPODE (Per-Procedure Optimization Descriptor Entry.) Figure 5-10 in Section 5.3.3 shows several PPODs with multiple PPODHDRs and their data.

Table 5-7:  Optimization Tag Values

Name Value Description
PPODE_STAMP 1 Version number of the PPOD stored in ppode_val. The current PPOD_VERSION value is 1.
PPODE_END 2 End of entries for this PPOD.
PPODE_EXT_SRC 3 Extended source line information.
PPODE_SEM_EVENT 4 Semantic event information. (Reserved for future use.)
PPODE_SPLIT 5 Split lifetime information. (Reserved for future use.)
PPODE_DISCONTIG_SCOPE 6 Discontiguous scope information. (Reserved for future use.)
PPODE_INLINED_CALL 7 Inlined procedure call information. (Reserved for future use.)
PPODE_PROFILE_INFO 8 Profile feedback information.

5.2.11    Symbol Type and Class (st/sc) Combinations

Entries in the symbol table are primarily identified by the combination of their symbol type (st ) and storage class (sc) values. Not all combinations are valid. Figure 5-3 indicates which combinations are currently in use.

Figure 5-3:  st/sc Combination Matrix

             sc  |     |     |     |     |     | T   |
                 |     |     |     |     |     | lU  | V
                 |     |     |     | R   |S    | sn U| a
                 |     |     |     | e   |U   T| UaUs| r
                 | B   | F   |    R|Rp   |n   l| nlne| R
                 | a   | i   |    e|eo S |d   s|Tdldr| eV
                 | s  C| l   |  R g|gr C |eS TC|leoeS| ga
                 | e  o| e   | PCRI|it oS|fy lo|sfcft| irX
                 | dB m|DDFII| DoDm|sDSmD|imTsm|Diair| siD
                 |AViBm|aeinn|Nanaa|teBma|nreBm|antnu|Vtaa
                 |batso|tsnfi|itstg|essot|eexso|teeec|aent
      st         |srssn|aciot|latae|rcsna|dftsn|adddt|rrta
      -----------+-----+-----+-----+-----+-----+-----+----
      Alias      |     |   X |     |     |     |     |
      Base       |     |   X |     |     |     |     |
      Block      |    X| X X |     | X   |  X  |     |  X
      Constant   |X  X |X    |  X  |  X X|     |     |
      End        |    X| X X |     | X   |  X  |     |  X
      Expr       |     |     |     |     |     |     |
      File       |     |     |     |     |  X  |     |
      Forward    |     |     |     |     |     |     |
      Global     |X  XX|X    |  XX |  XXX|X  XX|XX X |
      Inter      |     |   X |     |     |     |     |
      Label      |X  X |X X X| XXX |  X X|  XX |X X  |   X
      Local      |X  X |X X X| XXX |X X X|  XX |X X  |XX X
      Member     |     | X X |     | X   |     |     |
      Module     |     |     |     |     |     |     |
      Modview    |     |     |     |     |     |     |
      Namespace  |     |   X |     |     |     |     |
      Nil        |     |     |     |     |     |     |
      Number     |     |     |     |     |     |     |
      Param      |X  X |X  X |  XX |X X X|     |  X  |XX
      Proc       |     |   X |X    |     |  X  |   X |
      RegReloc   |     |     |     |     |     |     |
      Split      |     |     |     |     |     |     |
      StaParam   |     |     |     |     |     |     |
      Static     |X  XX|X  X |  XX |  X X|   X |X    |
      StaticProc |     |  X X|     |     |  X  |     |
      Str        |     |     |     |     |     |     |
      Tag        |     |   X |     |     |     |     |
      Type       |     |     |     |     |     |     |
      Typedef    |     |   X |     |     |     |     |
      Using      |     |   X |     |     |     |     |
      VirtBase   |     |   X |     |     |     |     |

A symbol's type and class taken together determines interpretation of other fields in the symbol table entry. The same combination can be used for different purposes in different contexts. As a result, to understand the symbol entry, it also may be necessary to access type information in the auxiliary table or the source language information in the file descriptor.

The contents of the value and index fields for each combination, with a brief explanation of the symbol's use, are described in the following list of combinations. For many combinations, greater detail can be found in Section 5.3.7 and Section 5.3.8.

stGlobal/ scAbs

stGlobal/ scSData, stGlobal/ scData, stGlobal/ scSBss, stGlobal/ scBss, stGlobal/ scRData, stGlobal/ scRConst

stGlobal/ scTlsData, stGlobal/ scTlsBss

stGlobal/ scSCommon, stGlobal/ scCommon, stGlobal/ scTlsCommon

stGlobal/ scSUndefined, stGlobal/ scUndefined, stGlobal/ scTlsUndefined

stStatic/ scAbs

stStatic/ scSData, stStatic/ scData, stStatic/ scSBss, stStatic/ scBss, stStatic/ scRData, stStatic/ scRConst

stStatic/ scTlsData, stStatic/ scTlsBss

stStatic/ scCommon

stStatic/ scInfo

stParam/ scAbs

stParam/ scRegister

stParam/ scVar

stParam/ scVarRegister

stParam/ scInfo

stParam/ scSData, stParam/ scData, stParam/ scSBss, stParam/ scBss, stParam/ scRData, stParam/ scRConst


Version Note

Static parameters are supported in symbol table format V3.13 and greater.


stParam/ scUnallocated

stLocal/ scAbs

stLocal/ scRegister

stLocal/ scVar

stLocal/ scVarRegister

stLocal/ scUnallocated


Version Note

The use of scUnallocated is supported in symbol table format V3.13 and greater.


stLocal/ scText, stLocal/ scInit, stLocal/ scFini, stLocal/ scSData, stLocal/ scData, stLocal/ scSBss, stLocal/ scBss, stLocal/ scRData, stLocal/ scRConst, stLocal/ scTlsData, stLocal/ scTlsBss

stLabel/ scAbs

stLabel/ scText, stLabel/ scInit, stLabel/ scFini, stLabel/ scSData, stLabel/ scData, stLabel/ scXData, stLabel/ scPData, stLabel/ scSBss, stLabel/ scBss, stLabel/ scRData, stLabel/ scRConst, stLabel/ scTlsData, stLabel/ scTlsBss

stLabel/ scUnallocated

stProc/ scNil

stProc/ scText

stProc/ scUndefined

stProc/ scInfo

stBlock/ scText

stBlock/ scInfo

stBlock/ scCommon

stBlock/ scVariant

stBlock/ scFileDesc, stBlock/ scReportDesc

stEnd/ scText

stEnd/ scInfo

stEnd/ scCommon

stEnd/ scVariant

stEnd/ scFileDesc, stEnd/ scReportDesc

stMember/ scInfo

stMember/ scFileDesc, stMember/ scReportDesc

stTypedef/ scInfo

stFile/ scText

stStaticProc/ scText

stStaticProc/ scInit, stStaticProc/ scFini

stConstant/ scAbs

stConstant/ scSData, stConstant/ scData, stConstant/ scSBss, stConstant/ scBss, stConstant/ scRData, stConstant/ scRConst

stBase/ scInfo

stVirtBase/ scInfo

stTag/ scInfo

stInter/ scInfo

stNamespace/ scInfo


Version Note

Namespace symbols are supported in symbol table format V3.13 and greater.


stUsing/ scInfo


Version Note

Namespace USING directives are supported in symbol table format V3.13 and greater.


stAlias/ scInfo


Version Note

Namespace aliases are supported in symbol table format V3.13 and greater.


Combinations may be valid in the local symbol table, the external symbol table, or both. Table 5-8 shows which combinations are valid in which table, based on the symbol type value and also the storage class value where necessary. Only combinations previously specified as valid apply where the storage class value is shown as a wildcard value with the character '*'.

Table 5-8:  Valid Placement for st/sc Combinations

st/sc Combination External Symbol Table Local Symbol Table
stNil, sc* X X
stGlobal, sc* X  
stStatic, sc*   X
stParam, sc*   X
stLocal, scSCN 1 X  
stLocal, not scSCN 1   X
stLabel, sc* X X
stProc, scInfo   X
stProc, scText X X
stProc, scUndefined X  
stBlock, sc*   X
stEnd, sc*   X
stMember, sc*   X
stTypedef, sc*   X
stFile, sc*   X
stStaticProc, scText   X
stStaticProc, scInit/ scFini X  
stConstant, sc* X X
stBase, sc*   X
stVirtBase, sc*   X
stTag, *   X
stInter, sc*   X
stNamespace, sc*   X
stUsing, sc*   X
stAlias, *   X

Table Notes:

  1. scSCN is a section storage class: scData, scSData, scBss, scSBss, scRConst, scRData, scInit, scFini, scText, scXData, scPData, scTlsData, scTlsBss

5.3    Symbol Table Usage

5.3.1    Levels of Symbolic Information

Different levels of symbolic information can be stored with an object file. Compilers often provide options that allow the user to choose the desired level of symbolic information for their program. This choice may be influenced by size considerations and debugging needs. A trade-off exists between the benefit of saving space in the object file and the amount of information available to tools that consume symbolic information.

It is also possible to change the amount of symbolic information present in a program that has already been compiled and linked. Information can be added or deleted. Two of the most common and useful operations are locally stripping and fully stripping the symbol tables in executable files. Tools that modify linked executables, such as instrumentation tools and code optimizers, may rewrite parts of the symbol table to reflect changes that they made.

5.3.1.1    Compilation Levels

The representation of symbolic information supported by compilers can be broken down into four levels:

  1. Minimal– Only information required for linking

  2. Limited– Source file and line number information for profiling and limited debugging (stack-tracing)

  3. Full– Complete debugging information for non-optimized code

  4. Optimized– Debugging information for optimized code

These levels correspond to the system compiler switches -g0 (minimal), -g1 (limited), -g2 (full), and -g3 (optimized). Table 5-9 shows the symbol table sections that are produced by system compilers at each compilation level.

Table 5-9:  Symbol Table Sections Produced at Various Compilation Levels

Compilation Level
Symbol Table Section Minimal Limited Full Optimized
Symbolic header Yes Yes Yes Yes
File Descriptors Yes Yes Yes Yes
External Symbols Yes Yes Yes Yes
External Strings Yes Yes Yes Yes
Procedure Descriptors Yes Yes Yes Yes
Line Numbers No Yes Yes Yes
Relative File Descriptors No No Yes Yes
Optimization Symbols No Partial Yes Yes
Local Symbols No Partial Yes Yes
Local Strings No Partial Yes Yes
Auxiliary Symbols No Partial Yes Yes

The minimal level of symbolic information that may be produced during compilation includes only the symbol information required for the linker to function properly. This includes external symbol information that is needed to perform symbol resolution and relocation.

If the limited level of symbolic information is requested, line number entries are generated, as well as external symbol information and procedure descriptors. In addition, local symbols for procedures (and the corresponding auxiliary symbols, optimization symbols, and local strings) are present. Limited symbolic information is sufficient to meet the needs of profiling tools. The information present at this level is a subset of that required for full debugger support.

If full symbolic information is included, all symbol table sections are produced in full. This level enables full debugging support with complete type descriptions for local and external symbols. Optimization is disabled.

Optimized symbolic information is designed to balance the aims of performance and debugging capabilities. This level supplies the same information as the full debugging option, but it also allows all compiler optimizations. As a result, some of the correlation is lost between the source code and the executable program.

On Tru64 UNIX systems, users can choose to compile their programs with any one of the four levels of symbolic information. The options -g0, -g1, and -g2 specify increasing levels of symbolic information. The system compiler's default is to produce the minimal level (-g0). Currently, debugging of optimized code (-g3) is not fully supported. See cc(1) for more details.

5.3.1.2    Locally Stripped Images

Objects can be produced with only global symbolic information stored in the symbol table. Selection of the -x option causes the linker to create a locally-stripped object. Reasons for stripping local symbolic information include reducing file size and limiting the amount of symbolic information available to end users of an application.

A locally-stripped object is very similar to an object produced with minimal symbolic information (see Section 5.3.1.1). The difference is the consolidation of file descriptors, which the linker does only for locally-stripped objects.

In a locally-stripped image, the file descriptors are included solely for the purpose of identifying source file languages. One file descriptor is present for each source language involved in the compilation. These file descriptors will have their adr field set to addressNil indicating the file descriptors cannot be used to identify text addresses.


Version Note

The preceding use of addressNil is supported in symbol table format V3.13 and greater. In symbol table formats less than V3.13, the file descriptor adr value should be ignored.


The procedure descriptor table is present in full but is rearranged to group procedures by source language. All procedure descriptors for procedures written in a particular source language are thus contiguous, and they reflect the file descriptor's information.

External symbols are also present in a locally-stripped image. The file indices (ifd field) of the external symbols are updated to identify the generic file descriptor for the appropriate source language. The index fields are set to zero to indicate that no type information is available. External symbols with the storage class scNil are removed. These are debugging symbols that are not normally produced for minimal symbol tables.

Limited debugging is possible with locally-stripped objects. Because the procedure descriptors are retained, stack traces are possible. External symbol information can also be viewed, and language-dependent handling of symbols (for example, C++ name demangling) is preserved.

A linked executable file can be locally stripped at any time after its creation using the command ostrip -x. The output is the same as described above. This operation may also alter the raw data of the .comment section. See Chapter 7 for details.

5.3.1.3    (Fully) Stripped Images

Executable files may be fully stripped at any time after creation using either the strip command or the command ostrip -s. Stripping an executable will result in complete removal of the symbol table, including the symbolic header. The file header fields f_symptr and f_nsyms are set to zero to indicate that the file has been stripped.

This operation may also alter the raw data of the .comment section. See Chapter 7 for details.

5.3.2    Source Information

The final executable image for a program bears little resemblance to the source code files from which it was created. One of the principal functions of the symbol table is to track the relationship between the two so that the debugger is able to describe the resulting program in a way that the programmer can recognize.

5.3.2.1    Source Files

Much of the complication of source information stems from the "include" system. When a compilation involves several source files, there may be duplication of the header files included in each source file, or of the source files themselves. To avoid repetition of header file information in the linked object, the linker merges the input objects' included files wherever possible. Compilers mark file descriptors as mergeable or unmergeable. The linker then examines the input file descriptors and performs the merge whenever possible.

The linker considers two file descriptors to be mergeable if all of the following criteria are met:

  1. The file descriptor fMerge bit is set in both (marked as mergeable by compiler).

  2. Files have the same name.

  3. Files are written in the same language.

  4. Files contain the same number of local and auxiliary symbols.

  5. Checksums match.

    The checksums match if either:

    1. Neither file's first auxiliary record is a btChecksum.

    2. Both files' first auxiliary record is a btChecksum and they are identical.

The role of the relative file descriptor (RFD) tables is to track file-relative information after merging. A relative file descriptor table entry maps the index of each file at compile time to its index after linking. After linking, local or auxiliary symbols must be accessed through the RFD table to obtain the updated file descriptor index. This mechanism is necessary because the indices in the local symbol table are not updated when files are merged.

Figure 5-4 is an example of the use of the relative file descriptor table.

Figure 5-4:  Relative File Descriptor Table Example

For a symbol reference composed of a file index and symbol index (offset within file), the relative file descriptor table is used as follows:

  1. To look up given file index in the RFD table to get the updated file index.

  2. To look up new file index in the (merged) file descriptor table to get the base of symbols for that file.

  3. To add symbol index to file's base to access the symbol entry.

See Section 5.3.7.3 for the representation of relative indices in the auxiliary symbol table.

5.3.2.2    Line Number Information

For a debugger to be effective, a connection must be made between high-level-language statements in source files and the executable machine instructions in object files. Line number entries map executable instructions to source lines. This mapping allows a debugger to present to a programmer the line of source code that corresponds to the code being executed. The line number information is produced by the compiler and should be rewritten if an application such as an instrumentation tool or an optimizer modifies code.

Line number information is emitted in two forms, one found in the line number table and one in the optimization symbol table (see Section 5.3.3).

The line number information found in the optimization symbol table is referred to as ESLI (extended source location information). This is a new form of line number that augments the information in the line number table. ESLI will only be present for procedures that cannot be described accurately by entries in the line number table.


Version Note

In symbol table formats less than V3.13 line number information is found exclusively in the line number table.


5.3.2.2.1    The Line Number Table

Line number information is generated for each source file that contributes executable code to a program. Within each source file, line numbers are organized by procedure, in the order of appearance in the file. The line number symbol table section is produced only when a program is compiled with limited or greater symbolic information (see Section 5.3.2.2).

Figure 5-5 illustrates the organization of the line number table.

Figure 5-5:  Line Number Table

The order outlined in Figure 5-5 is not guaranteed to match the ordering of file descriptors or procedure descriptors in those tables. The starting offset for a procedure's line table entries can be computed by adding the procedure descriptor's cbLineOffset to the containing file descriptor's cbLineOffset. The count of line number entries for a specific procedure can only be determined by finding the starting offset of the next procedure's entries in the line number table. This calculation is illustrated by the proc_pline_count() function in the packed line number programming example in Section 10.1.

Alternate entry points have a starting line number, but they have no specific ending line number. Procedure descriptors for a procedure and each of its associated alternate entry points share a common end offset in the line number table. See Section 5.3.6.7 for more information on alternate entry points.

The line number table has two forms. The "packed" form is used in the object file. The "expanded" form is a more useful representation to programmers and can be derived algorithmically (or by API) from the packed form.

The packed line numbers are stored as bytes. Each packed entry within the single byte value consists of two parts: count and delta. The count is the number of instructions generated from a source line. The delta is the number of source lines between the current source line and the previous one that generated executable instructions.

Figure 5-6 shows how these two values are represented.

Figure 5-6:  Line Number Byte Format

The four-bit count is interpreted as an unsigned value between 1 and 16 (0 means 1, 1 means 2, and so forth). A zero value would be wasted when no instructions are generated for a source line and, as a result, no line number entry will exist for that line.

The four-bit delta is interpreted as a signed value in the range -7 to +7. Code generators may produce instructions that are not in the same order as the corresponding source lines. Therefore, the offset to the "next" source line may be a forwards or backward jump.

Either of these quantities may fall outside the representable range. For a delta outside the range, an extended format exists (as shown in Figure 5-7). This extended format can represent delta values in the range -32768 to 32767. Delta values outside of this range are not representable. This is a permanent restriction of the packed line number format.

Figure 5-7:  Line Number 3-Byte Extended Format

For a count outside the range, one or more additional entries follow, with the delta set to zero.

If both fields are out of range, the delta is handled first. An extended-format delta representation is followed by an entry with the delta bits set to zero and the remainder of the count contained in the count value.

The packed line number format can be expanded to produce the instruction-to-source-line mapping that is needed for debugging. A sample program is provided in Section 10.1 to illustrate interpretation of packed line numbers.

The following source listing of a file named lines.c provides an example that shows how the compiler assigns line numbers:

1   #include <stdio.h>
2   main()
3   {
4       char c;
5
6       printf("this program just prints input\n");
7       for (;;) {
8          if ((c =fgetc(stdin)) != EOF) break;
9       /*   this is a greater than 7-line comment
10           * 1
11           * 2
12           * 3
13           * 4
14           * 5
15           * 6
16           * 7
17           */
18           printf("%c", c);
19      } /* end for */
20  } /* end main */

The compiler generates line numbers only for the lines 2, 6, 8, 18, and 20; the other lines are either blank or contain only comments.

Table 5-10 shows the packed entries' interpretation for each source line.

Table 5-10:  Line Number Example

Source Line LINER contents Interpretation
2 03 Delta 0, count 4
6 44 Delta 4, count 5
8 29 Delta 2, count 10
18 1 88 00 0a Delta 10, count 9
19 10 Delta 1, count 1
20 14 Delta 1, count 5

Table Note:

  1. Extended format (delta is greater than 7 lines).

The compiler generates the following instructions for the example program:

  [lines.c:   2] 0x0:     ldah     gp, 1(t12)
  [lines.c:   2] 0x4:     lda      gp, -32592(gp)
  [lines.c:   2] 0x8:     lda      sp, -16(sp)
  [lines.c:   2] 0xc:     stq      ra, 0(sp)
  [lines.c:   6] 0x10:    ldq      a0, -32720(gp)
  [lines.c:   6] 0x14:    ldq      t12, -32728(gp)
  [lines.c:   6] 0x18:    jsr      ra, (t12), printf
  [lines.c:   6] 0x1c:    ldah     gp, 1(ra)
  [lines.c:   6] 0x20:    lda      gp, -32620(gp)
  [lines.c:   8] 0x24:    ldq      a0, -32736(gp)
  [lines.c:   8] 0x28:    ldq      t12, -32744(gp)
  [lines.c:   8] 0x2c:    jsr      ra, (t12), fgetc
  [lines.c:   8] 0x30:    ldah     gp, 1(ra)
  [lines.c:   8] 0x34:    lda      gp, -32640(gp)
  [lines.c:   8] 0x38:    and      v0, 0xff, t0
  [lines.c:   8] 0x3c:    stq      v0, 8(sp)
  [lines.c:   8] 0x40:    xor      t0, 0xff, t0
  [lines.c:   8] 0x44:    bne      t0, 0x6c
  [lines.c:  18] 0x48:    ldq      t2, 8(sp)
  [lines.c:  18] 0x4c:    sll      t2, 0x38, t2
  [lines.c:  18] 0x50:    sra      t2, 0x38, a1
  [lines.c:  18] 0x54:    ldq      a0, -32752(gp)
  [lines.c:  18] 0x58:    ldq      t12, -32728(gp)
  [lines.c:  18] 0x5c:    jsr      ra, (t12), printf
  [lines.c:  18] 0x60:    ldah     gp, 1(ra)
  [lines.c:  18] 0x64:    lda      gp, -32688(gp)
  [lines.c:  19] 0x68:    br       zero, 0x24
  [lines.c:  20] 0x6c:    bis      zero, zero, v0
  [lines.c:  20] 0x70:    ldq      ra, 0(sp)
  [lines.c:  20] 0x74:    lda      sp, 16(sp)
  [lines.c:  20] 0x78:    ret      zero, (ra), 1
  [lines.c:  20] 0x7c:    call_pal halt

After expanding packed line numbers, the following instruction-to-source mapping (formatted instruction number.source line number) is produced by odump for the -l option:

           0.    2         1.    2         2.    2
           3.    2         4.    6         5.    6
           6.    6         7.    6         8.    6
           9.    8        10.    8        11.    8
          12.    8        13.    8        14.    8
          15.    8        16.    8        17.    8
          18.   18        19.   18        20.   18
          21.   18        22.   18        23.   18
          24.   18        25.   18        26.   19
          27.   20        28.   20        29.   20
          30.   20        31.   20

Header files included in an object have no associated line numbers recorded in the symbol table. Line number information for included files containing source code is not supported by the packed line number format. The following section describes a more comprehensive line number representation that includes line number information for header files.

5.3.2.2.2    Extended Source Location Information (ESLI)


Version Note

ESLI is supported for symbol table format V3.13 and greater.


The line number table does not correctly describe optimized code or programs with untraditional source files, resulting in images that are difficult to debug. Extended Source Location Information (ESLI) is intended to provide more information to enable debugging of optimized programs, including PC and line number changes, file transitions, and line and column ranges. ESLI is essentially a superset of the older line number table.

ESLI is stored in the optimization symbols section. This information is accessible on a per-procedure basis from the procedure descriptors. See Section 5.3.3 for more detail on accessing information in the optimization symbols section.

ESLI is a byte stream that can be interpreted in two modes: data mode or command mode. Currently, two formats are defined for data mode. These are designated as "Data Mode 1" and "Data Mode 2". Additional data modes may be defined as needed.

Figure 5-8:  ESLI Data Mode Bytes

Data Mode 1 is the initial mode for a procedure's ESLI. Data Mode 1 is identical to the packed line number format with the exception of the interpretation of the delta PC escape value 0x80 (which indicates a switch to command mode).

In Data Mode 2, each entry consists of two bytes. The first byte is identical to the encoding and interpretation of Data Mode 1. The second byte is an absolute column number (from 0 to 255), where column number 0 indicates that column information is missing or not meaningful for this entry. The escape from Data Mode 2 to command mode consists of a delta PC escape value set to 0x80 and column number set to 0.

In command mode, each byte is either a command or a command parameter. For a command byte, the low-order six bits are a command code, and the two high bits are used as flags, as shown in Figure 5-9. The "mark" flag, if set, announces that a new state has been established. Several commands may be required to fully describe a new state. The "resume" flag, if set, indicates the end of command mode. The next byte following a command with "resume" set will be a data mode byte. The effective data mode can be changed by SET_DATA_MODE commands in command mode, otherwise the data mode that was in effect prior to the escape to command mode will be resumed. See Table 5-11 for a complete list of commands.

Figure 5-9:  ESLI Command Byte

Command parameters are stored in LEB (Little Endian Byte) 128 format. See Section 1.4.6 for a description of this data representation. PC deltas are always expressed as machine instruction offsets and must be scaled by the size of a machine instruction before adding to the current PC. No other deltas need to be scaled.

Table 5-11 shows how to interpret the bytes in command mode. These definitions can be found in the system header file linenum.h.

Table 5-11:  ESLI Commands

Name Value Parameters by Type
ADD_PC 1 SLEB
ADD_LINE 2 SLEB
SET_COL 3 LEB
SET_FILE 4 LEB
SET_DATA_MODE 5 LEB
ADD_LINE_PC 6 SLEB, SLEB
ADD_LINE_PC_COL 7 SLEB, SLEB, LEB
SET_LINE 8 LEB
SET_LINE_COL 9 LEB, LEB
SEQUENCE_BREAK 10 SLEB
SET_EXP 11 LEB

ADD_PC

Parameter is a signed value to add to the current PC value.

ADD_LINE

Parameter is a signed value to add to the current line number.

SET_COL

Parameter is an unsigned value that represents a new column number. The column number is used to associate the PC with a particular location within a source line. Column number parameters use a zero-based representation that must be adjusted by adding 1.

SET_FILE

Parameter is an unsigned value used to switch file context. This command is typically followed by a set_line command.

SET_DATA_MODE

Parameter is an unsigned value used to set the data mode that will be in effect when data mode is resumed. The only parameter values that are currently accepted are 1 and 2. Additional data modes may be defined in future releases.

ADD_LINE_PC

Both parameters are signed values. The first is added to the line number and the second is added to the PC.

ADD_LINE_PC_COL

The first two parameters are signed values and the third is an unsigned value. The first two are added to the line number and PC respectively. The third is used to set the column number.

SET_LINE

Parameter is an unsigned value that sets the current line number.

SET_LINE_COL

Both parameters are unsigned values. The first represents the line number and the second represents the column number.

SEQUENCE_BREAK

Indicates the end of a contiguous sequence of address descriptions. The value of the parameter is added to the current address, and the resulting address becomes the starting address of the next sequence of address descriptions. The current file and line number continue to apply as the current values for the new sequence as well. (These can, however, be changed using the appropriate commands.)


Version Note

The SEQUENCE_BREAK command is supported in Tru64 UNIX V5.1 and greater for symbol table format V3.13 and greater.


SET_EXP

Set exponent for Tandem edit line numbers. The value of the parameter is an unsigned integer from 0 through 7 representing a power of 10 from -3 through 4.


Version Note

The SET_EXP command is reserved for use on Tandem big-endian systems. It is not supported on Tru64 UNIX.


A tool reading the ESLI must maintain the current PC value, file number, line number, and column. Taken together, these four values represent the current "state". Consumers must also keep track of the mode in effect to interpret the data properly. A sample program is provided in Section 10.2 to illustrate consumption of ESLI.

Data encoded in ESLI can be represented in tabular format. The PC value and file, line, and column numbers can be stored as a state table. The following example shows how to build this state table.

In this example ESLI will record line numbers for a routine that includes text from a header file.

Source listing for line1.c:

1   /* ESLI example using included source lines */
2   
3   main() {
4      char *msg;
5   
6      msg = (char *)0;
7   
8   #include "line2.h"
9   
10     printf("%s", msg);
11  }

Source listing for line2.h

1   msg = (char *)malloc(20);
2   /*
3    *
4    *
5    *
6    *
7    *
8    *
9    *
10   */
11  strcpy(msg, "Hello\n");

The compiler generates the following instructions for the example program:

      main:
[line1.c:   3] 0x1200011d0:     ldah    gp, 8192(t12)
[line1.c:   3] 0x1200011d4:     lda     gp, 28336(gp)
[line1.c:   3] 0x1200011d8:     lda     sp, -16(sp)
[line1.c:   3] 0x1200011dc:     stq     ra, 0(sp)
[line1.c:   3] 0x1200011e0:     stq     s0, 8(sp)
[line1.c:   6] 0x1200011e4:     bis     zero, zero, s0
[line2.h:   1] 0x1200011e8:     bis     zero, 0x14, a0
[line2.h:   1] 0x1200011ec:     ldq     t12, -32560(gp)
[line2.h:   1] 0x1200011f0:     jsr     ra, (t12)
[line2.h:   1] 0x1200011f4:     ldah    gp, 8192(ra)
[line2.h:   1] 0x1200011f8:     lda     gp, 28300(gp)
[line2.h:   1] 0x1200011fc:     bis     zero, v0, s0
[line2.h:  11] 0x120001200:     bis     zero, s0, a0
[line2.h:  11] 0x120001204:     lda     a1, -32768(gp)
[line2.h:  11] 0x120001208:     ldq     t12, -32600(gp)
[line2.h:  11] 0x12000120c:     jsr     ra, (t12)
[line2.h:  11] 0x120001210:     ldah    gp, 8192(ra)
[line2.h:  11] 0x120001214:     lda     gp, 28272(gp)
[line1.c:  10] 0x120001218:     ldq_u   zero, 0(sp)
[line1.c:  10] 0x12000121c:     lda     a0, -32760(gp)
[line1.c:  10] 0x120001220:     bis     zero, s0, a1
[line1.c:  10] 0x120001224:     ldq     t12, -32552(gp)
[line1.c:  10] 0x120001228:     jsr     ra, (t12)
[line1.c:  10] 0x12000122c:     ldah    gp, 8192(gp)
[line1.c:  10] 0x120001230:     lda     gp, 28244(gp)
[line1.c:  11] 0x120001234:     bis     zero, zero, v0
[line1.c:  11] 0x120001238:     ldq     ra, 0(sp)
[line1.c:  11] 0x12000123c:     ldq     s0, 8(sp)
[line1.c:  11] 0x120001240:     lda     sp, 16(sp)
[line1.c:  11] 0x120001244:     ret     zero, (ra)

The ESLI and its interpretation for the generated code is shown in the following table.

Table 5-12:  ESLI Example

  Command State
  (M)ark (R)esume (F)ile (L)ine (C)olumn
ESLI bytes (hex) Mode Code M R PC (hex) F L C
Initial State (from PDR) Data1       1200011d0 0 3 0
04 Data1       1200011e4 0 3 0
30 Data1       1200011e8 0 6 0
80 Data1 Escape            
04 01 Cmd set_file(1)       1    
48 01 Cmd set_line(1)   R     1  
05 Data1       120001200 1 1 0
80 Data1 Escape            
86 0a 06 Cmd add_line_pc(10,6) M   120001218 1 11 0
04 00 Cmd set_file(0)       0    
48 0a Cmd set_line(10)   R     10  
06 Data1       120001234 0 10 0
16 Data1       120001250 0 11 0

The handling of alternate entry points differs from the handling of main entry points. Procedure descriptors for alternate entry points are identified by a PDR.lnHigh value of -1. If the PC for an instruction maps to an alternate entry point, the following steps should be taken:

5.3.3    Optimization Symbols


Version Note

Optimization symbols are supported for symbol table format V3.13. and greater.


The optimization symbols section gives individual producers and consumers the ability to communicate information about any aspect of the object file, in any form they choose. New information can be generated at any time with minimal coordination between all producers and consumers.

The optimization section is organized on a per-procedure basis. Each procedure descriptor has a pointer to the optimization symbols in the field PDR.iopt. If no optimization symbols are associated with the procedure, the field contains ioptNil. Otherwise, it contains the index of the first optimization symbol entry for this procedure. Consumers should access the optimization symbols through the procedure descriptors. The optimization section is not present in a locally-stripped object.

This section consists of a sequence of zero or more Per-Procedure Optimization Descriptions (PPODs), as shown in Figure 5-10. Each PPOD's internal structure consists of two parts:

  1. A leading sequence of structured entries using a Tag-Length-Value model to describe subsequent raw data. The structure of the PPOD entry can be found in Section 5.2.10.

  2. The raw data area.

Figure 5-10:  Optimization Symbols Section

This section has the following alignment requirements:

Object file producers must produce either an empty optimization symbols section or a valid one. An empty one has the symbolic header fields cbOptOffset and ioptMax set to zero. If an optimization section is present, but a particular file does not contribute to it, the file descriptor field copt is set to zero. In this case, all procedure descriptors belonging to the file must have their iopt fields set to ioptNil.

Tools that both read and write object files must consume a valid optimization symbols section (if present in the input file) and produce an equivalent and valid section in its output file. If a tool does not know how to process the section contents, the section must be omitted from the output file. If a tool does know how to process portions of the optimization symbols, those portions may be modified and the rest should be removed. The linker concatenates input optimization symbols sections into one output section without reading or modifying any of the entries.

The format and flexible nature of this section are similar by design to the .comment section. The structures are the same size and contain the same fields (with different names), and the rules of navigation are the same. The primary difference is that the optimization section contains procedure-specific information; whereas, the comment section contains object-specific information.

5.3.4    Run-Time Information

The symbol table contains information that debuggers must interpret to find symbols at run time. This section describes the information that the static symbol table structures provides. Algorithms for determining run-time symbol addresses are included.

5.3.4.1    Procedure Addresses

The following pseudocode describes an algorithm for determining the procedure start address:

if (HDRR.vstamp >= 0x30D || PDR.isym == isymNil) 
    return(PDR.adr)
else
    foreach FDR in HDRR
        foreach PDR in FDR
            if PDR matches
                if (FDR.csym == 0)  /* Use external symbol */
                    return (EXTR[PDR.isym].asym.value)
                else                /* Use local symbol */
                    return (SYMR[FDR.isymbase + PDR.isym].value)

If local symbol information is present for the given PDR, the isym field identifies the local symbol table entry that contains the start address of the procedure. If no local symbol information is present, the isym field identifies the external symbol table entry containing the start address of the procedure. If no symbol information is present for the PDR, the isym field is set to isymNil and the adr field will contain a reliable start address.


Version Note

The PDR.adr field is reliably updated by the linker for symbol table format V3.13. The preceding algorithm is recommended for determining procedure addresses in symbol table formats less than V3.13.


5.3.4.2    Stack Frames

A stack frame is a run-time memory structure that is created whenever a procedure is called. The Calling Standard for Alpha Systems specifies the stack frame format and related code requirements. This section explains how to interpret procedure descriptor fields related to the stack frame.

Two types of stack frames are supported: fixed-size frames and variable-size frames. The variable frame format is used for procedures that dynamically allocate memory and for those with very large frames. Figure 5-11 shows a fixed-size frame and Figure 5-12 shows a variable-sized frame.

From the procedure descriptor, you can determine which type of stack frame the procedure has. The field PDR.framereg stores the frame pointer register number. If this field has a value of 30 ($sp), the stack frame is a fixed-size frame. If it has a value of 15 ($fp), the stack frame is a variable-size frame.

Figure 5-11:  Fixed-Size Stack Frame

Figure 5-12:  Variable-Size Stack Frame

For both types of stack frames, the value of PDR.frameoffset is the size of the fixed part of the stack frame. In the case of a fixed-size frame, it is the entire frame size. For a variable-sized frame, the entire frame size cannot be determined from the symbol table. The code may dynamically increase and decrease the size of the frame multiple times during procedure execution.

The virtual frame pointer represents the contents of the frame pointer register at procedure entry, prior to prologue execution. The (real) frame pointer is the contents of the frame pointer register after prologue execution. The difference between the virtual and real frame pointer values is the fixed frame size, which is subtracted from the $sp contents during the procedure prologue. Note that stack offsets recorded in the symbol table are relative to the virtual frame pointer, not the real value used at run time.

The contents of the frame pointer register at are used at run time as the base address for accessing data, such as parameters and local variables, on the stack. See Section 5.3.4.3 for details.

5.3.4.3    Local Symbol Addresses

Local variables and parameters may be stored in registers or on the stack. Those stored in registers (identified by a storage class of scRegister) do not have addresses. For local variables and parameters with addresses, this section explains how to calculate their run-time locations from the symbol table information.

To calculate the run-time address for a local variable ( stLocal) based on its symbol table value:

Frame pointer - PDR.localoff + SYMR.value

To calculate the run-time address for a parameter ( stParam) based on its symbol table value:

Frame pointer - argument_home_area_size + SYMR.value

The argument home area is a portion of the stack frame designated for parameter storage. See Figure 5-11 for an illustration. For historical reasons, the size of this area is always 48 bytes.

The calculations above must be performed at run time when the actual frame pointer value is known. Note that the value becomes valid only after the procedure prologue has executed.

To calculate the locations based on static information, convert the symbol's value to an offset from the real frame pointer:

Local:

PDR.frameoffset - PDR.localoff + SYMR.value

Parameter:

PDR.frameoffset - 48 + SYMR.value

The resulting offsets are always positive values because the frame pointer contains the address of the lowest memory in the fixed part of the stack frame at run time.

5.3.4.4    Uplevel Links


Version Note

Uplevel links are supported in symbol table format V3.13 and greater.


An uplevel link is the real frame pointer of an ancestor of a nested routine. The routine nesting may be a feature of the language (such as Pascal), or the nesting may occur in optimized code which has been decomposed for parallel execution into smaller routines. Uplevel links provide debuggers a method of finding all local symbols associated with the ancestor routine.

When a procedure is passed a static link, that static link will be represented within the scope of the procedure definition as a local automatic symbol with a special name beginning with "__StaticLink.". The lifetime of this symbol begins after the procedure prologue has been executed.

The static link symbol will occur between the procedure's parameter definitions and the first stBlock symbol.

The full name of the symbol will be "__StaticLink." followed by a positive decimal integer with no leading zeros. This integer value identifies the number of levels up the ancestor tree the static link points to.

For example, if the name is "__StaticLink.3" it will contain the static link of the procedure in which it is defined, and that procedure's static link points to a stack frame that is three levels up in the procedure's ancestor tree, the great-grandfather of the procedure.

Figure 5-13:  Representation of Uplevel Reference

Debuggers of Tru64 UNIX object files need to use the uplevel link information to determine which symbols are visible at a location in the program and to compute the addresses of local symbols in ancestor routines. When the debugger needs the current value or address of a name that might be defined as an uplevel reference, two separate actions may be required: finding the procedure that defines the currently visible instance of that name, and finding the address of the currently visible instance of that name. If only type information is required, finding the procedure that defines the name may be sufficient.

Finding the defining procedure is accomplished by repeatedly looking up the name in the local symbol table of a chain of procedures that extends from the current procedure through its chain of ancestors until either the name is found in a procedure or the end of the chain of ancestors is reached without finding the name. If this search terminates without finding the name, the debugger should conclude that the name is not visible by uplevel reference at the current location in the program.

When searching for the desired procedure, the debugger should count how many levels in the ancestor chain were traversed before finding the name. If zero levels were traversed, the name is defined within the current procedure and is not an uplevel reference. The number of levels traversed is assumed to be in the variable LevelsToGo in the algorithm below.

Finding the address for the name involves locating static link values and dereferencing them with appropriate offsets. Basically, while the number of levels to be traversed is greater than zero, find the static link symbol for the current level and obtain its value. Finally, add the desired symbol's offset from the real frame pointer to the final static link value.

The recommended algorithm for finding the address is as follows:

LevelsToGo = <from name lookup above>
NewProc = CurrentProcedure
NewFrame = FramePointerValue(CurrentProcedure)
Failed = false
while (LevelsToGo > 0 && !Failed)
    StaticLink = FindStaticLinkSym(NewProc)
    if (StaticLink == NULL)
        Failed = true
    else
        NewFrame = *(NewFrame + StaticLink->symbol.offset)
        Levels = StaticLinkLevels(StaticLink)
        LevelsToGo = LevelsToGo - Levels
        for (; Levels > 0; Levels--)
            NewProc = NewProc->proc.parent

if Failed is true after executing this algorithm, required information about static links is missing in the symbol table, and an error has occurred. If LevelsToGo ends up less than zero, the optimizer's static link optimization has eliminated a static link level that would be needed to compute the address of the name. It is recommended that debuggers inform the user that optimization prevents the debugger from computing the address of the name.

If Failed is false and LevelsToGo is equal to zero, the address for the currently visible instance of the name is NewFrame plus the offset of the name with respect to the real frame pointer for NewProc.

The function StaticLinkLevels returns the integer at the end of the name for the indicated static link symbol.

5.3.4.5    Finding Thread Local Storage (TLS) Symbols

This section explains how to interpret symbolic information for TLS symbols (identified by a storage class of scTlsData or scTlsBss). See Section 3.3.9 or the Programmer's Guide for general information on TLS.

A TLS symbol's value contains its offset from the start of the TLS region for that object. This offset can be used at process execution time to determine the address of the TLS symbol for a particular thread.

A debugger can calculate TLS symbol addresses by looking up the address of the TLS region using run-time structures and adding the offset of the TLS symbol to that address. The following formula can be used to calculate TLS symbol addresses.

TLS sym address = *(TEB.TSD + __tlskey) + SYMR.value

A detailed description of this formula follows:

  1. Get the address of the Thread Environment Block (TEB).

  2. Get the address of the Thread Specific Data (TSD) array from the TEB structure.

  3. Get the offset of the TLS pointer in the TSD array.

    This offset is normally stored in a .lita or .got entry. This value should be accessed using the symbol __tlskey . In spite of the fact that __tlskey is a label symbol, no ampersand is used in this context because the value that the label points to is being retrieved. The address of __tlskey will need to be adjusted by the address mapping displacement in the same manner that the debugger adjusts addresses of text and data symbols.

    For static executables, the .lita entry contains the constant offset (2048). This offset identifies the first and only TSD slot (256) that will be allocated for the TLS pointer.

    For shared objects, the .got entry labeled by __tlskey is initially 0, indicating that the TSD slot has not been allocated yet. After the object's initialization routines have run, a TSD key will be allocated and the .got entry will contain its offset.

  4. Get the TLS pointer value. The TLS pointer is a 64-bit address set to the start of the TLS Region.

  5. Calculate the address of the TLS symbol by adding the offset of the TLS symbol to the TLS pointer value.

TLS common symbols ( scTlsCommon) should not occur in linked objects, so debuggers should not need to support them. Executables and shared libraries can only reference TLS symbols that they define, so successfully linked objects should have not TLS undefined or TLS common symbols.

5.3.5    Profile Feedback Data


Version Note

Profile feedback data is supported in symbol table format V3.13 and greater.


Profile feedback data is stored in entries in the optimization symbols table with tag type PPODE_PROFILE_INFO. The data contained in this section is intended for Compaq internal use only. It contains execution profiling feedback used by compilers and the om utility.

Profile feedback data contains relative file descriptor and local symbol table indexes. If an object tool removes, adds, or rearranges relative file descriptors or local symbol table entries it must also remove all optimization symbol table entries including the profile feedback data.

5.3.6    Scopes

From a user-program's point of view, an identifer's scope determines its visibility in different parts of the program. Programming languages provide facilities for declaring and defining names of procedures, variables and other program components inside various scoping levels. This section briefly discusses the concept of scope and then explains how it is represented in the symbol table. References are made to structures in the auxiliary symbol table; see Section 5.3.7.3 for details.

Generally speaking, the four main scoping levels in a program are block scope, procedure scope, file scope, and program scope. Most programming languages have constructs to implement at least these scoping levels. Figure 5-14 shows the hierarchy of these scopes.

Figure 5-14:  Basic Scopes

Names with block scope can only be referenced inside the declaring block. Blocks are delimited by begin and end markers, the syntax of which varies among languages.

Names with procedure scope are only recognized inside their enclosing subroutines. For instance, the names of formal parameters and local variables declared inside a procedure are accessible only to that procedure's executable statements.

Names with file scope can be referenced by any instruction within the file where they are declared. A file can be composed of procedures and data external to any procedure. Both external data names and procedure names can have file scope or program scope. Note that in a compilation involving only a single file or in a compilation for a programming language with no separate-compilation facilities, file scope and program scope are equivalent.

Names with program scope are visible everywhere in the program, even when the executable program is built from many source and header files. The linker must resolve these names or pass them to the dynamic loader to resolve. See Section 5.3.10 for more information about symbol resolution.

In the symbol table, procedure scope, file scope, and program scope correspond to local, static, and global symbols, respectively. Block scope names are also local symbols. Local and static symbols appear in the local symbol table, and global symbols are in the external symbol table.

5.3.6.1    Procedure Scope

Although procedure symbols can only be global or static (with symbol types stProc and stStaticProc, respectively), procedure entries appear in the local symbol table to identify the containing scope of their local data. The set of symbols appearing in the local symbol table to describe a procedure scope and their associated auxiliary entries is shown in Figure 5-15. Global procedures also have entries in the external symbol table. As illustrated, the indices of these external entries point to the scoping entries in the local symbol table.

Note

In this chapter, all diagrams of symbol table representations use arrows to show that one entry contains an index to another entry. For external and local symbol table entries, the index used is contained in the index field. For auxiliary symbols, the isym or RNDXR field is the index used. Any exceptions to this general rule are noted in the diagrams.

Figure 5-15:  Procedure Representation

A special instance of a procedure definition occurs for a procedure with no text. This type of procedure occurs only in the local symbol table and is very similar to the representation of other procedures. It is generally used for procedures that have been optimized away that still need to be represented for debugging or profiling information.

Figure 5-16:  Procedure with No Text

A procedure with no code can contain only nested procedures that also have no code associated with them. If a procedure with no code does not contain any nested procedures, the stBlock/ stEnd symbol pair can be omitted from the representation.

The stProc symbol included in this representation is distinguished from similar stProc symbols by its value field that is set to addressNil (-1).


Version Note

Procedures with no code are supported in symbol table format V3.13 and greater.


5.3.6.2    File Scope

As in the case of procedures, file name entries appear in the local symbol table to define the file's scope. This representation is shown in Figure 5-17. Note that file symbols appear in the local symbol table only.

Figure 5-17:  File Representation

5.3.6.3    Block Scope

In general, the local symbol table denotes scoping levels with stBlock and stEnd pairs, as shown in Figure 5-18.

All symbols contained between these two entries belong to the scope they describe. Nested blocks are possible, and stEnd symbols match the most recent occurrences of stBlock (or other opening symbol entries such as stProc or stTag).

Figure 5-18:  Block Representation

Block scopes occur in many languages. In C, they take the form of lexical blocks. In C++, declarations can occur anywhere in the code. In Pascal and Ada, nested procedures are possible, with local variables at any or all levels.

5.3.6.4    Namespaces (C++)


Version Note

Namespaces are supported in symbol table format V3.13 and greater.


A C++ namespace is a mechanism that allows the partitioning of the program global name space. This partitioning is intended to reduce name clashing and provide greater program manageability to C++ developers.

Figure 5-19:  C++ Namespace Representation

A namespace definition may exist only at the global scope or within another namespace. The namespace representation in Figure 5-19 shows a single contribution to a namespace. This representation may be replicated many times in the symbol table for a single namespace. A namespace definition may be continued within the same file or over multiple source files.

A single namespace contribution that spans multiple source files is represented as if it were contained entirely within the source file in which it began.

Namespaces may be aliased, allowing a single namespace to be referred to by multiple names. Namespace components may also be referenced without their namespace qualification if they are included within a scope by a using directive or using declaration. The representations of namespace aliases, using directives, and using declarations are shown in Figure 5-19. Namespace definitions, namespace component declarations, namespace aliases, using directives, and using declarations occur only in the local symbol table. Namespace component definitions may occur in the local or external symbol table.

5.3.6.4.1    Namespace Components

The components of a namespace are represented in two parts: declarations and definitions. Namespace components that do not require definition must be declared in the namespace definition. Namespace components that are referenced by a using declaration must be declared in the namespace definition. All other namespace component declarations may be omitted from the namespace definition.

Namespace component names are mangled only as needed. Function and data definitions have mangled name definitions in the local or external symbol table. These entries are mangled for type-safe linkage and as a method of matching components with the namespaces to which they belong. Names of component declarations within a namespace definition may or may not be mangled. They are not required to include the namespace name in their mangled form.

Empty namespace contributions can be omitted, but at least one instance of a namespace definition must occur somewhere in the local symbol table. This definition is required because name mangling rules do not distinguish namespace component definitions from class member definitions.

5.3.6.4.2    Namespace Aliases

Namespace aliases can occur in namespace, file, procedure, or block scope in the local symbol table. The index value for the stAlias entry is an auxiliary table index. The auxiliary entry is a RNDXR record containing the local symbol table index of the stNamespace symbol in the first instance of a namespace definition within a compilation unit. For an alias of an alias, the RNDXR record can also contain the index of another stAlias symbol in the local symbol table. Section 9.2.5 provides an example of a namespace alias.

The stAlias symbol type may be used in future versions of the symbol table format as a general purpose symbol alias representation. The semantic interpretation of the stAlias symbol depends on the type of the symbol it aliases.

5.3.6.4.3    Unnamed Namespace

An unnamed namespace can be declared at the global scope or within another namespace. An unnamed namespace is unique within a compilation unit. Multiple contributions to a unique unnamed namespace are not allowed. Unnamed namespace contributions are included in the non-mergeable portion of a C++ header file.

Unnamed namespace components are subject to the same rules as named namespaces for declarations and definitions.

The stNamespace symbol for an unnamed namespace has a compiler generated name starting with __N1. This same name is used to identify the unnamed namespace in the mangled names of components of that namespace. (See the unnamed namespace example in Section 9.2.4.)

5.3.6.4.4    Usage of Namespaces

A C++ using directive or a using declaration is represented by a symbol of type stUsing. It may occur in any scope in the local symbol table. The index value for the stUsing entry is an auxiliary table index. If the stUsing entry represents a using declaration for a single namespace component, the auxiliary entry is a RNDXR record containing the local symbol table index of a namespace component declaration. If the stUsing entry represents a using directive, its RNDXR auxiliary contains the local symbol table index of the stNamespace symbol in the first definition of that namespace in the compilation unit.

A using directive for a namespace alias is represented with a RNDXR auxiliary that directly references the aliased namespace. This representation contains no record of the alias referenced by the using directive.

Names are not required for stUsing entries, but they can be set to match the namespace or namespace component to which they refer.

Namespace components that are referenced by an stUsing symbol must be declared in the namespace definition.

Section 9.2.3 provides an example of namespace definitions and uses.

5.3.6.5    Exception Handling Blocks (C++)

In C++, a special scoping mechanism is introduced to expand user-defined exception-handling capabilities. Exception handlers are defined to "catch" exceptions that are "thrown" by other functions. The symbol table must contain sufficient information to recognize the scope of a handler. The compiler generates special symbols to identify where exception handlers are valid.

Figure 5-20:  C++ Exception Handler Representation

5.3.6.6    Fortran Common Blocks

Fortran common blocks constitute another scoping level. Fortran uses common blocks as a way of specifying data that is global or shared between program units. A common block is global storage that can be named, allocated, accessed, and used by various subroutines. The block can be named or unnamed; unnamed blocks are known as "blank commons". Internal to the symbol table, blank commons are named _BLNK__.

Figure 5-21 shows the symbolic representation of Fortran common blocks.

Figure 5-21:  Fortran Common Block Representation

Because a Fortran common is represented as a synthesized file, it also has an entry in the file descriptor table. Furthermore, a global symbol with the same name is also present in the external symbol table.

An example of a Fortran common block can be found in Section 9.3.1.

5.3.6.7    Alternate Entry Points

Fortran also has a facility for creating alternate entry points in procedures. An alternate entry point is represented using an stProc/ scText symbol. In the procedure descriptor table, an alternate entry point is identified by a lnHigh field with a value of -1. Procedure descriptors for alternate entry points follow the procedure descriptor for the primary entry point. In the local symbol table, an alternate entry point has an entry inside the scope of the procedure's primary entry.

The representation of a procedure with an alternate entry point is shown in Figure 5-22


Version Note

The stBlock symbol that follows the alternate entry's stProc symbol in Figure 5-22 is supported in symbol table format V3.13 and greater. In symbol table formats less than V3.13 alternate entries do not have a start block symbol, and their prologue size is unknown.


Figure 5-22:  Alternate Entry Point Representation

An example of Fortran alternate entries can be found in Section 9.3.2.

5.3.7    Data Types in the Symbol Table

A data element's type dictates its size and interpretation in a programming environment. One of the symbol table's most important tasks is to represent data types in a compact and complete manner.

Type information is stored in the local and auxiliary symbol tables. This section provides guidelines for understanding the type information plus specific examples for depicting a range of types.

5.3.7.1    Basic Types

All programming languages have a set of simple types that are built into the language and from which other data types can be derived. Examples of simple types are integer, character, and floating point. Languages also provide constructs for creating user-defined types based on the simple types. For example, a C++ class can be built using any simple type or previously defined user-defined type and the language facility for declaring classes.

Similarly, a basic type in the symbol table is a building block from which each language constructs its type information. Basic type (bt) values directly represent many of the simple types for supported languages; for instance, the value btChar indicates a character. Other bt values represent language constructs for building aggregate types; a value of btStruct may be used, for example, to represent a C structure or Pascal record.

The symbol table uses approximately forty basic type values. The interpretation of some of these values is language dependent. See Table 5-5 for a list of all values.

5.3.7.2    Type Qualifiers

Type qualifiers can be applied to basic types to create other data types. Examples are "pointer to", "array of", and "function returning". Generally the number and order of type qualifiers is unrestricted.

See Table 5-6 for a list of type qualifiers and their meanings.

5.3.7.3    Interpreting Type Descriptions in the Auxiliary Table

This section explains in detail the encoding of type descriptions in the symbol table. To fully describe the type of a symbol, the auxiliary symbol table must be created and referenced. Compilation with full symbolic information (-g option on system compilers) results in the creation of this table.

To correctly decode the type information, proceed sequentially, beginning with the symbol table entry. Several fields may be required from other symbol table structures:

The first step is to determine whether the symbol contains an index of an auxiliary table description.

Table 5-13:  Symbols with Auxiliary Type Descriptions

Symbol Type Storage Class Conditions AUXU Index Field
stGlobal Any None index
stStatic Any None index
stParam Any None index
stLocal Any Local symbol table index
stProc Any Local symbol table index
stBlock scInfo Inside an scVariant block value
stMember scInfo None index
stTypedef scInfo None index
stStaticProc Any Local symbol table index
stConstant Any None index
stBase scInfo None index
stVirtBase scInfo None index
stTag scInfo None index
stInter scInfo None index
stNamespace scInfo None index
stUsing scInfo None index
stAlias scInfo None index

If the index does represent a record in the auxiliary symbol table, the interpretation of the first auxiliary entry ( AUXU) depends on the type of the symbol:

The next task is to examine the contents of the TIR. The TIR contains constants representing the basic type of the symbol and up to six type qualifiers, labeled tq0-tq5. If a type has more than one qualifier, they are ordered from lowest to highest. Lower qualifiers are applied to the basic type before higher qualifiers. All unused tq fields are set to tqNil, and no tqNil fields are present before or between other type qualifiers.

In addition to the basic type and type qualifiers, the TIR contains two flags: an fBitfield flag to mark whether the size of the type is explicitly recorded, and a continued flag to indicate that the type description is continued in another TIR. If fBitfield is set, the TIR is immediately followed by a width entry. If more than six type qualifiers are required for the current definition, the description is continued, and the continued flag is set. If exactly six type qualifiers are needed, all six fields are used and the continued flag is cleared.

To illustrate, consider the type "array of pointers to integers". The basic type is "integer" and has two qualifiers, "array of" and "pointer to". Each element of the array is a "pointer to integer". Therefore, the qualifier "pointer to" must be applied first to the basic type "integer". In this example, the qualifier "pointer to" is lower than the qualifier "array of". The contents of the TIR are as follows:

        bt: btInt
        tq0: tqPtr
        tq1: tqArray
        tq2: tqNil
        tq3: tqNil
        tq4: tqNil
        tq5: tqNil
        continued: 0
        fBitfield: 0

The contents of the TIR dictate how to interpret any subsequent records. The records appear in a prescribed order:

For a type description containing more than one TIR, the fields of all TIR records are interpreted in the same way. When a TIR is reached with the flag cleared and any records associated with that TIR have been decoded, the type description is complete.

As an example, consider an array of structures with the fBitfield flag set. A total of seven auxiliary records can be used to describe the type:

  1. The TIR with a basic type of btStruct and with tq0 set to tqArray.

  2. A width record. The size of the basic type.

  3. A RNDXR record. A pointer to the structure definition in the local symbol table.

  4. A RNDXR record. A pointer to the array index type description elsewhere in the auxiliary table.

  5. A dnlow record. The lower bound of the array's range.

  6. A dnhigh record. The upper bound of the array's range.

  7. A width record. The distance in bits between each element in the array.

If the continued flag of the TIR is cleared, the width record corresponding to the array qualifier is the final AUXU for this type description.

For another view of this process, see Figure 5-23. Each box represents one auxiliary entry belonging to the symbol's type description. Using the flowchart, an ordered list of entries can be assembled.

Figure 5-23:  Auxiliary Table Interpretation

Figure 5-24:  Auxiliary Table "ti" Interpretation

Figure 5-25:  Auxiliary Table "bt vals" Interpretation

Figure 5-26:  Auxiliary Table "arrays" Interpretation

Figure 5-27:  Auxiliary Table Range Interpretation

Figure 5-28:  Auxiliary Table RNDXR Interpretation

The final step is to decode the RNDXR records. The basic types that are followed by RNDXR records require reference to another local or auxiliary symbol to complete the type description. Interpret the RNDXR records as follows:

Additionally, the index of every RNDXR used as a pointer must be mapped through the relative file descriptor table (see Section 5.3.2.1), if the table exists. The rfd field of the record controls this mapping. The following algorithm can be used to locate the symbol referenced by the relative index record:

if (RNDXR.rfd == ST_RFDESCAPE)
    RFD = (++AUXU).isym
else 
    RFD = RNDXR.rfd 
if (HDRR.crfd) /* RFD table exists */
    IFD = (current FDR's RFD table)[RFD]
else
    IFD = RFD
 
if (SYMR needed)
    SYMBASE = FDR[IFD].isymBase
    SYMR = SYMBASE[RNDXR.index]
else if (AUXU needed)
    AUXBASE = FDR[IFD].iauxBase
    AUXU = AUXBASE[RNDXR.index]

5.3.8    Individual Type Representations

This section provides sketches of type representations in the local and auxiliary symbol tables. The connections between the two tables is depicted for each type. This form of representation is only possible when full symbolic information is present.

Note that external symbols as well as local symbols reference the auxiliary table, although the examples in this chapter use local symbols only.

5.3.8.1    Pointer Type

A pointer is a variable containing the address of another variable. A pointer is represented by a tqPtr type qualifier modifying another type. A pointer is represented by a single symbol with an entry in the auxiliary table, as shown in Figure 5-29.

Note that if the pointer referenced a user-defined type, such as a class or structure, the TIR would be followed by an RNDXR (and possibly an isym).

Figure 5-29:  Pointer Representation

The combination of type qualifiers tqFar and tqPtr are used to represent a short (32-bit) pointer. This pointer type is used with the XTASO emulation.

5.3.8.2    Array Type

An array is a list of elements that all have the same type. Arrays may be fixed size and allocated at compile time or dynamically sized and allocated at run time. This section describes the fixed-size array symbol table representation. For information on Fortran dynamic arrays, see Section 5.3.8.9. For conformant arrays in Pascal and Ada, see Section 5.3.8.10.

An array is represented by a tqArray or tqArray_64 type qualifier applied to another type. This second type describes the type of all elements in the array. In the local or external symbol table, a single entry represents an array. Figure 5-30 shows the symbol table description for an array.

Figure 5-30:  Array Representation

Note that for an array of elements of a user-defined type, such as a class or structure, another RNDXR (and possibly an isym) would be inserted between the TIR and the RNDXR describing the subscript type.

If an array has multiple dimensions, the symbols describing the dimension appear in the order of innermost to outermost. For example, the following declaration produces a TIR with the tqArray qualifier followed by the RNDXR and range description for 0-1 followed by the entries for the dimension 0-99:

float floattable[100][2]

Some arrays may have dimensions too large to represent in the 32-bit format shown in Figure 5-30. Such arrays are represented using a 64-bit format in which two auxiliary entries are used for the dimension bounds and size. Figure 5-31 illustrates the 64-bit representation.


Version Note

The 64-bit representation of arrays is supported in symbol table format V3.13 and greater.


Figure 5-31:  64-Bit Array Representation

5.3.8.3    Structure, Union, and Enumerated Types

This section applies to data structures in languages other than C++. For the C++ structure, union, or enumerated type representation, see Section 5.3.8.6.

Structures, unions, and enumerated types have a common representation. All three are identified using "tags" and contain zero or more fields. In the symbol table, the tag is the name associated with the starting stBlock symbol for the structure's set of local symbols. Note that it may be empty because the tag is optional. Symbols for fields follow. The definition is completed by a block-end symbol matching the block-start symbol.

Figure 5-32 contains a graphical depiction of this set of symbols.

Figure 5-32:  Structure Representation

The structure members have auxiliary table indices pointing to their type descriptions.

Untagged structures and unions are represented with a NULL tag name. Unnamed structures can be embedded in other structures and are represented as a NULL-named member of the outer structure. See Section 9.1.1 for an example of an unnamed structure.


Version Note

Unnamed member structures are supported in symbol table format V3.13 and greater. As of Tru64 UNIX V5.1 dbx will display structures with unnamed member structures, but neither dbx nor ladebug provide specific access to members of unnamed member structures.


A structure can contain a field that is a pointer to itself. This field is represented by an stMember symbol with an auxiliary table entry that references the beginning of the structure's block of local symbols, as shown in Figure 5-33.

Figure 5-33:  Recursive Structure Representation

When a field within a structure is itself a structure, the compiler may choose to generate the structure definitions either sequentially or embedded, as shown in Figure 5-34.

Figure 5-34:  Nested Structure Representation

The following declaration might result in the nested structure representation:

struct line { 
        struct point { 
            float x, y;
        }  p1, p2;
};

5.3.8.4    Typedef Type

Most languages allow programmers to choose alternate names, or aliases, for data types. The alias created by such a facility (such as C's typedef) is represented as a single local symbol entry that has a pointer to its type description in the auxiliary table. The auxiliary entry contains a pointer to the definition of the type name, as shown in Figure 5-35.

Figure 5-35:  Typedef Representation

5.3.8.5    Function Pointer Type


Version Note

The following function pointer representation is the preferred representation for symbol table format V3.13 and greater.


Languages such as C and C++, which allow pointers to functions, represent the type of the function pointer using a special stProc/ scInfo block describing the parameters and return value for the function as shown in Figure 5-36.

Figure 5-36:  Function Pointer Representation

The stProc/ scInfo entry has its value set to -2, which distinguishes it from similar entries used to represent procedures with no text and C++ member functions. The stProc/ scInfo and stEnd/ scInfo entries have null names in the function pointer representation. The parameters are optional and may or may not be named.


Version Note

For symbol table formats less than V3.13 the preceding representation for function pointers is not supported, and the following alternate representation is used exclusively.


An alternate representation of function pointers is shown in Figure 5-37. This representation describes the return type of the function pointer but not its parameters, and it is valid for all symbol table format versions. The combination of type qualifiers tqPtr and tqProc is interpreted as "pointer to function returning". The function return type may be the base type (bt) in the TIR or it may be constructed from the base type augmented by additional type qualifiers.

Figure 5-37:  Function Pointer Alternate Representation

5.3.8.6    Class Type (C++)

A C++ class resembles an extended C structure. One major distinction is that class fields (referred to as "members") can be functions as well as variables. The set of symbols created for a class is organized as follows:

Another characteristic of classes is that symbols are defined implicitly. For example, all classes have an operator= operator-overloading function included in the class definition and a this pointer to its own type as a parameter to all member functions. These symbols are always included explicitly in the symbol table description.

Figure 5-38 is a graphical representation of the set of symbols for a class.

Figure 5-38:  Class Representation

Class members, including member functions, have auxiliary references that point to their type descriptions. Note that member functions are represented as prototypes. The set of symbols defining the member function is elsewhere in the symbol table. To locate the definition of a member function, a name lookup can be performed using the mangled name of the member function with its class name qualifier. See Section 5.3.10.3 for information on name mangling.

C++ structures, unions, and enumerated types are represented the same way as classes. The different data structures are distinguished by basic type value.

The symbol table does not represent class member access attributes.

Examples of base and derived classes can be found in Section 9.2.1.

5.3.8.6.1    Empty Class or Structure (C++)

The representation of an empty class in C++ is shown in Figure 5-39. Empty structures in C++ are represented in a similar manner with the TIR.bt set to btStruct.

Figure 5-39:  Empty Class or Structure (C++)


Version Note

This empty class or structure representation is supported in Tru64 UNIX V5.1. Prior to Tru64 UNIX V5.1, the default compilers did not distinguish empty classes and structures from opaque classes and structures. See Section 5.3.8.6.2 for more details.


5.3.8.6.2    Opaque Class or Structure (C++)

Opaque classes and structures are incomplete types. They have no member information, and they are distinguished from empty classes and structures that have no members. The representation of an opaque class in C++ is shown in Figure 5-40. Opaque structures in C++ are represented in a similar manner with TIR.bt set to btStruct.

Figure 5-40:  Opaque Class or Structure (C++)


Version Note

Prior to Tru64 UNIX V5.1 the default compilers used the preceding representation for empty classes and structures as well as opaque classes and structures.


5.3.8.6.3    Base and Derived Classes (C++)

Hierarchical groups of classes can be designed in C++. A base class serves as a wider classification for its derived classes, and a derived class has all of the members and methods of the base class, plus additional members of its own. In the symbol table, the set of symbols denoting a derived class is nearly identical to that for a non-derived class. The derived class includes an additional stBase or stVirtBase symbol that identifies its corresponding base class, and it does not need to duplicate the definitions for the base class members. This representation is shown in Figure 5-41.

Figure 5-41:  Base Class Representation

The representation of virtual base classes for C++ relies on the definition of a special symbol that identifies the virtual base table. The name for this symbol is derived from the name of the class to which it belongs. For example, the virtual base table symbol for class C5 would be named "_btbl_2C5". This table contains entries for base class run-time descriptions.

A class can include the special member _bptr. This class member is a pointer to the virtual base table for that class.

The value field for a virtual base class symbol ( stVirtBase/ scInfo) serves as an index (starting at 1) into the virtual base class table.

5.3.8.7    Template Type (C++)

Templates are a C++-specific language construct allowing the parameterization of types. C++ class templates are represented in the symbol table for each instantiation, but not for the template itself. The set of class symbols is unchanged from the set shown in Figure 5-38.

5.3.8.8    Interlude Type (C++)

Interludes are compiler generated functions in C++. They are represented in the local symbol table with special names starting with the "__INTER__" prefix. Their representation in the symbol table makes use of two RNDXR aux entries to identify the related member function and the actual interlude function, both of which are local symbol table entries.

Figure 5-42:  Interlude Representation

5.3.8.9    Array Descriptor Type (Fortran90)

A Fortran90 array descriptor is a structure that describes an array: its location, dimensions, bounds, sizes, and other attributes. Array descriptors are described in detail in the Fortran 90 User Manual for Tru64 UNIX. Fortran90 includes several types of arrays for which the dimensions or dimension bounds are determined at run time: allocatable arrays, assumed shape arrays, and array pointers.

Two symbol table representations have been used for array descriptors. The current representation describes the array descriptor itself. The retired representation described attributes of the array known at compile time.

For both representations, symbols of this type point to a data location at which the array descriptor is allocated. One of the array descriptor fields contains a pointer to the actual array. Other fields are used to describe the attributes of the array. Fields that describe the number of dimensions and upper and lower bounds are filled in at run time.

By default, array descriptors are described by a structure tag representation. Most of the array descriptor fields are represented as structure members. (Excluded fields are not needed by debuggers.) Special tag names are used to identify array descriptor structure definitions: $f90$f90_array_desc (assumed-shape array), $f90$f90_ptr_desc (pointer to array) and $f90$f90_alloc_desc (allocatable array). Figure 5-43 shows the format of this representation.

Some compilers may emit other fields in addition to those shown in Figure 5-43. A consumer's ability to interpret additional fields depends on its knowledge of the producing compiler.

Figure 5-43:  Array Descriptor Representation

An example of the default Fortran array descriptor representation can be found in Section 9.3.3.


Version Note

The following representation of Fortan array descriptors is supported in symbol table formats less than V3.13. It is not supported in symbol table format V3.13 and greater.


This retired representation of Fortran array descriptors is substantially more compact in the local symbol table, but it provides no way to distinguish between the different array descriptor types.

The overloaded basic type value 28 indicates an array descriptor in the TIR, and dimension bounds are set to [1:1] indicating their true size is unknown. The alternate representation does not provide any information describing the contents of the array descriptor itself, so debuggers must assume a static representation for the descriptor and lookup the fields at their expected offsets.

Figure 5-44 shows this representation of array descriptors.

Figure 5-44:  Array Descriptor Representation (retired)

5.3.8.10    Conformant Array Type (Pascal)

Full details are not currently available for Pascal's conformant array representation. A Pascal conformant array is very similar to Fortran's assumed shape arrays. It is an array parameter with upper and lower dimension bounds that are determined by the input argument. A conformant array is represented by an array descriptor. The special names used and the format of the array descriptor differ from those used for Fortran. The DEC Pascal release notes contain additional information on conformant arrays.

5.3.8.11    Variant Record Type (Pascal and Ada)

A variant record is an extension to the record data type, which is a Pascal or Ada data structure akin to a C structure and is represented in the same manner in the symbol table. The variant part of the record consists of sets of one or more fields associated with a range of values. Only one such set is part of the record, and it is selected based on the value of another record field. Any number of variant parts can be embedded in a single record.


Version Note

The following variant record representation is for symbol table format V3.13 and greater.


The local symbol table entries for the variant part of a record are contained within a block with the storage class (sc value) scVariant. The value field of the stBlock entry contains the index of the local symbol entry for the member of the record whose value determines which variant arm is used. The variant block contains multiple inner blocks, each representing a variant arm. The value field of each of these block entries is an auxiliary table index. Each auxiliary table entry starts with a count, which indicates how many range entries follow. The range entries describe the values associated with the block.

Figure 5-45 is a graphical representation of a variant record.

Figure 5-45:  Variant Record Representation


Version Note

The following variant record representation is for symbol table formats less than V3.13. It is not supported in symbol table format V3.13 and greater.


The representation of variant records depicted in Figure 5-46 does not include TIR auxiliaries.

Figure 5-46:  Variant Record Representation (retired)

An example of a Pascal variant record can be found in Section 9.4.3.

5.3.8.12    Subrange Type (Pascal and Ada)

A subrange data type defines a subset of the values associated with a particular ordinal type (the "base type" of the subrange). Ordinal types in Pascal include integers, characters, and enumerated types. The symbol table representation of a subrange uses the btRange or btRange_64 type followed by an auxiliary index identifying the base type and entries providing the bounds of the subrange. The 32-bit representation is shown in Figure 5-47 and the 64-bit representation is shown in Figure 5-48.

Figure 5-47:  Subrange Representation

Figure 5-48:  64-bit Range Representation


Version Note

The 64-bit range representation is supported in symbol table format V3.13 and greater.


An example of a Pascal subrange can be found in Section 9.4.2.

5.3.8.13    Set Type (Pascal)

A set is a data type that groups ordinal elements in an unordered list. The arithmetic and logical operators are overloaded in Pascal; this enables them to be used with set variables to perform classic set operations such as union and intersection. A special auxiliary type definition btSet exists to identify this type. The symbol table representation is depicted in Figure 5-49.

Figure 5-49:  Set Representation

The element type for a set is typically a range or an enumeration. An example of a Pascal set can be found in Section 9.4.1.

5.3.9    Special Debug Symbols

A variety of special symbols are used throughout the symbol table to convey call frame information, special type semantics, or other language specific information. These names are reserved for use by compilers and other tools that produce Tru64 UNIX object files.

Table 5-14:  Special Debug Symbols

Name Purpose
Name Purpose
__StaticLink.* (SV3.13 - ) Uplevel link. See Section 5.3.4.4.
_BLNK__ Fortran unnamed common block. See Section 5.3.6.6.
MAIN__ Fortran alias for main program unit. See Section 5.3.10.4.
ARGNAME.len Generated parameter for Fortran routines. It contains the length of ARGNAME, a parameter of character type.

.lb_<ARRAY>.<dim>
.ub_<ARRAY>.<dim>

Lower and upper bounds of particular dimensions of arrays - when the array has an explicit shape, yet some bounds come from non-constant specification expressions (array arguments in Pascal and Fortran routines).

$f90$f90_array_desc
$f90$f90_alloc_desc
$f90$f90_ptr_desc

Variants of Fortran-90 described arrays (assumed shape, ALLOCATABLE, and POINTER, respectively). See Section 5.3.8.9.
cray pointee Fortran-generated typedef describing the type of a variable pointed to by a CRAY pointer.
pointer Fortran generated typedef describing the type of a scalar with the POINTER attribute.
_DECCXX_generated_name_* DECC++ compiler-inserted name for unnamed classes and enumerations.
this Hidden parameter in C++ member functions that is a pointer to the current instance of the class. See Section 5.3.8.6.
__vptr Hidden C++ class member containing the virtual function table. See example in Section 9.2.2.
__bptr Hidden C++ class member containing the virtual base class table. See example in Section 9.2.2.
__vtbl_* Global symbols for C++ virtual function tables. See example in Section 9.2.2.
__btbl_* Global symbols for C++ virtual base class tables. See example in Section 9.2.2.
__control Hidden argument to C++ constructors controlling descent (in the face of virtual base classes).
__t*__evdf Structure used to maintain a list of C++ global deconstructors.
t*__iviw C++ static procedure used for global constructors.
t*__evdw C++ static procedure used for global destructors.
__t*_thunk C++ static procedure used to provide a defaulted argument value.
__INTER__* C++ interlude. See example in Section 9.2.2.
__N1* C++ unnamed namespaces. See example in Section 9.2.4.

5.3.10    Symbol Resolution

Among the linker's chief tasks is symbol resolution. Because most compilations involve multiple source files and virtually all programs rely on system libraries, a process is necessary to resolve conflicting uses of global symbol names. The linker must decide which symbol is referenced by a given name. This section highlights the major issues involved in that decision. Related information is contained in Section 6.3.4 and the Programmer's Guide.

Symbol table entries provide information relevant to performing symbol resolution. External symbols with a storage class of sc(S)Undefined, sc(S)Common, or scTlsCommon must be resolved before they are referenced. By default, the linker will not mark an object file with unresolved symbols as executable. However, linker options give programmers a fair measure of control over its symbol resolution behavior. See ld(1) for more information.

5.3.10.1    Library Search

Symbols referenced, but not defined in the main executable of an application must be matched with definitions in linked-in libraries. The linker combines objects, archives, and shared libraries while attempting to resolve all references to undefined symbols. The Programmer's Guide covers related topics in detail, such as how to specify libraries during compilation and the search order of libraries.

In general, main executable objects and shared libraries are searched before archive libraries. If no undefined external symbols remain, archive libraries in the library list do not have to be searched, because archive members are only loaded to resolve external references. Archives are not used to find "better" common definitions (see Section 5.3.10.2), and no archive definitions preempt symbol definitions from the main object or shared libraries.

5.3.10.2    Resolution of Symbols with Common Storage Class

Symbols with common storage class are a special category of global symbols that have a size but no allocated storage. Symbols with common storage class should not be confused with Fortran common symbols, which are not represented by a single symbol table entry. (See Section 5.3.6.6 for a description of Fortran common symbols.) Common storage classes are scCommon, scSCommon, and scTlsCommon.

The symbol definition model used by Tru64 UNIX allows an unlimited number of common storage class symbols with the same name. Ultimately, the "best" of these must be selected (by the linker or the loader) during symbol resolution. The criteria used to select the best symbol definition include the symbol's allocation status and size.

The symbol table does not provide an "allocated common" storage class. Common storage class symbols adopt a new storage class when they are allocated. Typically, their new storage class is scBss or scSBss or scTlsBss . On the other hand, the dynamic symbol table does explicitly distinguish common storage class symbols that have been allocated. See Section 6.3.4 for more information on dynamic symbol resolution.

A symbol reference is resolved according to the following precedence rules:

  1. Find a symbol definition that does not have a common storage class and is not identified as an allocated common in the dynamic symbol table.

  2. Find the largest allocated common identified in the dynamic symbol table.

  3. Find the largest common storage class symbol and allocate it. This step will be skipped when the linker produces a relocatable object file.

Precedence is given to symbol definitions with storage allocation to minimize load time common allocation and redundant storage allocations in shared objects. The loader is capable of allocating space for common storage class symbols, but this should only be necessary when a program references an allocated common symbol in a shared library that is later removed from that shared library.

Note that Fortran common block representations use common storage class symbols. Another very frequent occurrence of a common storage class symbol is a C-language global variable that does not have an initializer in its declaration.

5.3.10.3    Mangling and Demangling

Another issue related to symbol resolution is the need to "mangle" user-level identifiers. For example, C++ allows function overloading, prototyping, and the use of templates–all of which can result in the occurrence of the same names for different entities. The solution employed by the symbol table is to use mangled names that derive from the symbol's type signature.

Object file consumers, such as debuggers and object dumpers, need to "demangle" the identifiers so they can be output in a form that is recognizable to the user. For linking and loading, the mangled names are used for symbol resolution.

The encoding of C++ names is described in the manual Using DEC C++ for Tru64 UNIX Systems.

Other compilers may write symbol names that are modified by prepending or appending special characters such as dollar sign ($) or underscore (_) or by prepending qualifier strings such as file names or namespace names. Uppercasing of names is also common for certain languages such as Fortran. All of these transformations fall into the general category of mangled names. Refer to the release notes for specific compilers for additional information.

5.3.10.4    Mixed Language Resolution

Compilation of a program involving multiple source languages introduces additional symbol resolution issues. One important task is resolving the main program entry point because conflicting "main" symbols may be present in the different files. For C and C++, the symbol "main" is the main program entry point, but for other languages, "main" will either be an alias for the main program or an interlude. DEC Fortran and DEC COBOL provide interludes that perform some language specific initializations and then call the real main program entry point. For DEC Fortran the main program is "MAIN__" and for DEC COBOL the main program is "__cobol_main". DEC Pascal provides a "main" symbol that aliases the actual main program symbol.

The symbols "MAIN__" and "__cobol_main" can both be present in a mixed language program, and either, neither, or both can be used by the program. Debuggers can set a breakpoint in the user's main program by applying some precedence for selecting the most appropriate symbol. For a mixed language program, there is a slight chance that "MAIN__" or "__cobol_main" will be present but never called.

5.3.10.5    TLS Symbols

TLS (Thread Local Storage) symbols, like non-TLS symbols, can be undefined or common. Unresolved TLS symbols are identified by the storage class scTlsUndefined, and TLS commons have the storage class scTlsCommon . The symbol resolution process for TLS names is similar, but separate; TLS symbols cannot be resolved to non-TLS symbols or vice versa.

TLS common symbols are resolved in the same manner as other common storage class symbols (see Section 5.3.10.2), except that, again, only TLS symbols are candidates for resolution.

Another rule special to TLS is that symbol definitions for TLS common and undefined symbols cannot be imported from shared libraries.

5.4    Language-Specific Symbol Table Features

Language-specific characteristics are pervasive in the symbol table, particularly in the local, external, and auxiliary symbol tables. See Section 5.2 and Section 5.3.7 for information on language-specific values.

The lang field of the file descriptor entry encodes the source language of the file. This field should be accessed prior to decoding symbolic information, especially type descriptions. This section highlights, by language, language-specific features represented in the symbol table. Additional information on certain features is available elsewhere in this chapter.

5.4.1    Fortran77 and Fortran90

In Fortran, it is possible to create multiple entry points in subroutines. A subroutine has one main entry point and zero or more alternate entry points, indicated by ENTRY statements. See Section 5.3.6.7 for their representation in the symbol table.

Fortran90 array descriptors include allocatable arrays, assumed-shape arrays, and pointers to arrays. Their representation in the symbol table is discussed in Section 5.3.8.9.

Modules provide another scoping level in Fortran90 programs. The symbol table representation for modules has not yet been implemented.

5.4.2    C++

C++ classes encapsulate functions and data inside a single structure. Classes are represented in the symbol table using a btClass basic type and the stBlock/ stEnd scoping mechanism. See Section 5.3.8.6.

Templates provide for parameterized types. At present, no special symbol table values are related to templates. The template itself is not represented; rather, entries that correspond to each instantiation are generated. Template instantiations are distinguished by mangled names based on their type signatures.

C++ namespaces, like Fortran modules, offer an additional scope for program identifiers.

The C++ concepts of private, protected, and public data attributes are not currently represented in the symbol table. The C++ concept of "friend" classes and functions are also not represented.

5.4.3    Pascal and Ada

Pascal conformant arrays are function parameters with array dimensions that are determined by the arguments passed to the function at run time. See Section 5.3.8.10.

Variant records are an extension of the record data structure. Variant records allow different sets of fields depending on the value of a particular record member. See Section 5.3.8.11.

Nested procedures are supported in these languages. They are represented using standard scoping mechanisms discussed in Section 5.3.6 and uplevel references described in Section 5.3.4.4.

Sets and subranges are user-defined subsets of ordinal types. Sets are unordered groups of elements, which can be manipulated with the classic set operations. Subranges are ordered and are used with the usual operators. See Section 5.3.8.12 and Section 5.3.8.13.

Ada subtypes of ordinal types are represented in the same manner as Pascal subranges.