13    Symbol Resolution

Among the linker's chief tasks is symbol resolution. Because most compilations involve multiple source files and virtually all programs rely on system libraries, a process is necessary to resolve conflicting uses of global symbol names. The linker must decide which symbol is referenced by a given name. This section highlights the major issues involved in that decision. Related information is contained in Section 14.3.4 and the Programmer's Guide.

Symbol table entries provide information relevant to performing symbol resolution. External symbols with a storage class of sc(S)Undefined, sc(S)Common, or scTlsCommon must be resolved before they are referenced. By default, the linker will not mark an object file with unresolved symbols as executable. However, linker options give programmers a fair measure of control over its symbol resolution behavior. See ld(1) for more information.

13.1    New or Changed Symbol Resolution Features

Tru64 UNIX V5.1B includes the following new or changed features:

Tru64 UNIX V5.1 includes the following new or changed features:

13.2    Structures, Fields, and Values for Symbol Resolution

Unless otherwise specified, all structures described in this section are declared in the header file sym.h, and all constants are defined in the header file symconst.h.

13.2.1    External Symbol Entry (EXTR)

typedef struct {
        SYMR          asym;     
        coff_uint     jmptbl : 1;    
        coff_uint     cobol_main : 1;  
        coff_uint     weakext : 1;
        coff_uint     alignment : 4;   (V5.1 - )
        coff_uint     reserved2:2; 
        coff_uint     linkerdef: 1     (V5.1B - )
        coff_uint     reserved : 22;
        coff_int      ifd;         
} EXTR, *pEXTR;

SIZE - 24 bytes, ALIGNMENT - 8 bytes

External Symbol Table Entry Fields

asym

External symbol table entry. This structure has the same format as a local symbol entry. The field interpretations differ as described in the following entries.

asym.value

Contains the symbol address for most defined symbols. See Section 11.2.4 for details.

asym.iss

Byte offset in external string table to symbol name. Set to issNil (-1) if there is no name for this symbol.

asym.st

Symbol type. See Table 11-1 for possible values.

asym.sc

Storage class. See Table 11-2 for possible values.

asym.reserved

Must be zero.

asym.index

Contains either an index into the auxiliary symbol table for a type description or an index into the local symbol table pointing to a related symbol.

The index field may have a value of indexNil, which is defined as (long)0xfffff. This value is used to indicate that the index is not a valid reference.

jmptbl

Unused.

cobol_main

Flag set to indicate that the symbol is a COBOL main procedure.

weakext

Flag set to identify the symbol as a weak external. See Section 14.3.4.2 for more details on weak symbols.

alignment

Power of two byte alignment biased by 2^3 (8). Supported values range from 0 through 13 yielding a minimum alignment of 8 bytes and a maximum alignment of 64K bytes. For unallocated common symbols this value specifies a requested alignment. For defined data and text symbols a non-zero value records the symbol's actual alignment. A zero value indicates that the alignment for a data or text symbol is unspecified, but size and address values can be used to determine a sufficient alignment. For symbols with storage class scUndefined or scSUndefined this field is not used.


Version Note

The alignment field is supported on Tru64 UNIX V5.1 and greater.


reserved2

Must be zero.

linkerdef

Identifies linker-defined symbols.


Version Note

The linkerdef field is supported on Tru64 UNIX V5.1B and greater.


reserved

Must be zero.

ifd

Index of the file descriptor where the symbol is defined. Set to ifdNil (-1) for undefined symbols and for some compiler system symbols.

13.3    Symbol Resolution Usage

13.3.1    Library Search

Symbols referenced, but not defined in the main executable of an application must be matched with definitions in linked-in libraries. The linker combines objects, archives, and shared libraries while attempting to resolve all references to undefined symbols. The Programmer's Guide covers related topics in detail, such as how to specify libraries during compilation and the search order of libraries.

In general, main executable objects and shared libraries are searched before archive libraries. If no undefined external symbols remain, archive libraries in the library list do not have to be searched. Archive members are only loaded to resolve references to undefined symbols. Archives are not used to find "better" common definitions (see Section 13.3.2) or higher-precedence symbol definitions. However, precedence rules do apply for any symbol definitions that occur in archive members which satisfy references to undefined symbols.

13.3.2    Resolution of Symbols with Common Storage Class

Symbols with common storage class are a special category of global symbols that have a size but no allocated storage. Symbols with common storage class should not be confused with Fortran common symbols, which are not represented by a single symbol table entry. (See Section 11.3.1.8 for a description of Fortran common symbols.) Common storage classes are scCommon, scSCommon, and scTlsCommon.

The symbol definition model used by Tru64 UNIX allows an unlimited number of common storage class symbols with the same name. Ultimately, the "best" of these must be selected (by the linker or the loader) during symbol resolution. The criteria used to select the best symbol definition include the symbol's allocation status and size.

The symbol table does not provide an "allocated common" storage class. Common storage class symbols adopt a new storage class when they are allocated. Typically, their new storage class is scBss or scSBss or scTlsBss. On the other hand, the dynamic symbol table does explicitly distinguish common storage class symbols that have been allocated. See Section 14.3.4 for more information on dynamic symbol resolution.

A symbol reference is resolved according to the following precedence rules:

  1. Find a symbol definition that does not have a common storage class and is not identified as an allocated common in the dynamic symbol table.

  2. Find the largest allocated common identified in the dynamic symbol table.

  3. Find the largest common storage class symbol and allocate it. This step will be skipped when the linker produces a relocatable object file.

Precedence is given to symbol definitions with storage allocation to minimize load time common allocation and redundant storage allocations in shared objects. The loader is capable of allocating space for common storage class symbols, but this should only be necessary when a program references an allocated common symbol in a shared library that is later removed from that shared library.

Note that Fortran common block representations use common storage class symbols. Another very frequent occurrence of a common storage class symbol is a C-language global variable that does not have an initializer in its declaration.

13.3.3    Mangling and Demangling

Another issue related to symbol resolution is the need to "mangle" user-level identifiers. For example, C++ allows function overloading, prototyping, and the use of templates–all of which can result in the occurrence of the same names for different entities. The solution employed by the symbol table is to use mangled names that derive from the symbol's type signature.

Object file consumers, such as debuggers and object dumpers, need to "demangle" the identifiers so they can be output in a form that is recognizable to the user. For linking and loading, the mangled names are used for symbol resolution.

One highly visible mangled name that appears in C++ programs is the name of the unmergeable portion of a file header. The C++ compiler reduces the size of a linked image's symbol table, by splitting header files into mergeable and non-mergeable entries in the file descriptor table. The mergeable entry retains its on-disk name, but the name of the non-mergeable entry is mangled by appending the string ~alt~deccxx_XXXXXXXX to the name. The eight X's represent the CRC encoding of the date and time when the program was compiled. See Section 6.3.2 for more information on file merging.

The encoding of C++ names is described in the manual Using DEC C++ for Tru64 UNIX Systems.

Other compilers may write symbol names that are modified by prepending or appending special characters such as dollar sign ($) or underscore (_) or by prepending qualifier strings such as file names or namespace names. Uppercasing of names is also common for certain languages such as Fortran. All of these transformations fall into the general category of mangled names. Refer to the release notes for specific compilers for additional information.

13.3.4    Mixed Language Resolution

Compilation of a program involving multiple source languages introduces additional symbol resolution issues. One important task is resolving the main program entry point because conflicting "main" symbols may be present in the different files. For C and C++, the symbol "main" is the main program entry point, but for other languages, "main" will either be an alias for the main program or an interlude. DEC Fortran and DEC COBOL provide interludes that perform some language specific initializations and then call the real main program entry point. For DEC Fortran the main program is "MAIN__" and for DEC COBOL the main program is "__cobol_main". DEC Pascal provides a "main" symbol that aliases the actual main program symbol.

The symbols "MAIN__" and "__cobol_main" can both be present in a mixed language program, and either, neither, or both can be used by the program. Debuggers can set a breakpoint in the user's main program by applying some precedence for selecting the most appropriate symbol. For a mixed language program, there is a slight chance that "MAIN__" or "__cobol_main" will be present but never called.

13.3.5    TLS Symbols

TLS (Thread Local Storage) symbols, like non-TLS symbols, can be undefined or common. Unresolved TLS symbols are identified by the storage class scTlsUndefined, and TLS commons have the storage class scTlsCommon. The symbol resolution process for TLS names is similar, but separate; TLS symbols cannot be resolved to non-TLS symbols or vice versa.

TLS common symbols are resolved in the same manner as other common storage class symbols (see Section 13.3.2), except that, again, only TLS symbols are candidates for resolution.

Another rule special to TLS is that symbol definitions for TLS common and undefined symbols cannot be imported from shared libraries.