Among the linker's chief tasks is symbol resolution. Because most compilations involve multiple source files and virtually all programs rely on system libraries, a process is necessary to resolve conflicting uses of global symbol names. The linker must decide which symbol is referenced by a given name. This section highlights the major issues involved in that decision. Related information is contained in Section 14.3.4 and the Programmer's Guide.
Symbol table entries provide information relevant to performing symbol
resolution.
External symbols with a storage class of
sc(S)Undefined,
sc(S)Common,
or
scTlsCommon
must be resolved
before they are referenced.
By default, the linker will not mark an object
file with unresolved symbols as executable.
However, linker options give
programmers a fair measure of control over its symbol resolution behavior.
See
ld(1)13.1 New or Changed Symbol Resolution Features
Tru64 UNIX V5.1B includes the following new or changed features:
The EXTR alignment field can be used to record the alignment for all symbol definitions, not just the required alignment for linker-allocated commons (see Section 13.2.1 and Section 2.3.5).
Name mangling is described for non-mergeable C++ header files. (see Section 13.3.3).
Identification of linker-defined symbols in the external symbol table (see Section 13.2.1)
Tru64 UNIX V5.1 includes the following new or changed features:
Alignment for common storage class symbols (see Section 13.2.1 and Section 2.3.5)
13.2 Structures, Fields, and Values for Symbol Resolution
Unless otherwise specified, all structures described in this section
are declared in the header file
sym.h, and all constants
are defined in the header file
symconst.h.
13.2.1 External Symbol Entry (
EXTR)
typedef struct {
SYMR asym;
coff_uint jmptbl : 1;
coff_uint cobol_main : 1;
coff_uint weakext : 1;
coff_uint alignment : 4; (V5.1 - )
coff_uint reserved2:2;
coff_uint linkerdef: 1 (V5.1B - )
coff_uint reserved : 22;
coff_int ifd;
} EXTR, *pEXTR;
SIZE - 24 bytes, ALIGNMENT - 8 bytes
External Symbol Table Entry Fields
asymExternal symbol table entry. This structure has the same format as a local symbol entry. The field interpretations differ as described in the following entries.
asym.valueContains the symbol address for most defined symbols. See Section 11.2.4 for details.
asym.issByte offset
in external string table to symbol name.
Set to
issNil
(-1)
if there is no name for this symbol.
asym.stSymbol type. See Table 11-1 for possible values.
asym.scStorage class. See Table 11-2 for possible values.
asym.reservedMust be zero.
asym.indexContains either an index into the auxiliary symbol table for a type description or an index into the local symbol table pointing to a related symbol.
The index field may have a value of
indexNil, which is defined as
(long)0xfffff.
This
value is used to indicate that the index is not a valid reference.
jmptblUnused.
cobol_mainFlag set to indicate that the symbol is a COBOL main procedure.
weakextFlag set to identify the symbol as a weak external. See Section 14.3.4.2 for more details on weak symbols.
alignmentPower of two byte alignment biased by
2^3 (8).
Supported values range from 0 through 13 yielding a minimum alignment
of 8 bytes and a maximum alignment of 64K bytes.
For unallocated common symbols
this value specifies a requested alignment.
For defined data and text symbols
a non-zero value records the symbol's actual alignment.
A zero value indicates
that the alignment for a data or text symbol is unspecified, but size and
address values can be used to determine a sufficient alignment.
For symbols
with storage class
scUndefined
or
scSUndefined
this field is not used.
Version Note The
alignmentfield is supported on Tru64 UNIX V5.1 and greater.
reserved2Must be zero.
linkerdefIdentifies linker-defined symbols.
Version Note The
linkerdeffield is supported on Tru64 UNIX V5.1B and greater.
reservedMust be zero.
ifdIndex of the file descriptor where
the symbol is defined.
Set to
ifdNil
(-1)
for undefined symbols and for some compiler system symbols.
13.3 Symbol Resolution Usage
13.3.1 Library Search
Symbols referenced, but not defined in the main executable of an application must be matched with definitions in linked-in libraries. The linker combines objects, archives, and shared libraries while attempting to resolve all references to undefined symbols. The Programmer's Guide covers related topics in detail, such as how to specify libraries during compilation and the search order of libraries.
In general, main executable objects and shared libraries are searched
before archive libraries.
If no undefined external symbols remain, archive
libraries in the library list do not have to be searched.
Archive members
are only loaded to resolve references to undefined symbols.
Archives are
not used to find "better" common definitions (see
Section 13.3.2)
or higher-precedence symbol definitions.
However, precedence rules do apply
for any symbol definitions that occur in archive members which satisfy references
to undefined symbols.
13.3.2 Resolution of Symbols with Common Storage Class
Symbols with common storage class are a special category
of global symbols that have a size but no allocated storage.
Symbols with
common storage class should not be confused with Fortran common symbols, which
are not represented by a single symbol table entry.
(See
Section 11.3.1.8
for a description of Fortran common symbols.) Common storage classes are
scCommon,
scSCommon, and
scTlsCommon.
The symbol definition model used by Tru64 UNIX allows an unlimited number of common storage class symbols with the same name. Ultimately, the "best" of these must be selected (by the linker or the loader) during symbol resolution. The criteria used to select the best symbol definition include the symbol's allocation status and size.
The symbol table does not provide an "allocated common" storage class.
Common storage class symbols adopt a new storage class when they are allocated.
Typically, their new storage class is
scBss
or
scSBss
or
scTlsBss.
On the other hand, the dynamic symbol table does explicitly
distinguish common storage class symbols that have been allocated.
See
Section 14.3.4
for more information on dynamic symbol resolution.
A symbol reference is resolved according to the following precedence rules:
Find a symbol definition that does not have a common storage class and is not identified as an allocated common in the dynamic symbol table.
Find the largest allocated common identified in the dynamic symbol table.
Find the largest common storage class symbol and allocate it. This step will be skipped when the linker produces a relocatable object file.
Precedence is given to symbol definitions with storage allocation to minimize load time common allocation and redundant storage allocations in shared objects. The loader is capable of allocating space for common storage class symbols, but this should only be necessary when a program references an allocated common symbol in a shared library that is later removed from that shared library.
Note that Fortran common block representations use common storage class
symbols.
Another very frequent occurrence of a common storage class symbol
is a C-language global variable that does not have an initializer in its declaration.
13.3.3 Mangling and Demangling
Another issue related to symbol resolution is the need to "mangle" user-level identifiers. For example, C++ allows function overloading, prototyping, and the use of templatesall of which can result in the occurrence of the same names for different entities. The solution employed by the symbol table is to use mangled names that derive from the symbol's type signature.
Object file consumers, such as debuggers and object dumpers, need to "demangle" the identifiers so they can be output in a form that is recognizable to the user. For linking and loading, the mangled names are used for symbol resolution.
One highly visible mangled name that appears in C++ programs is the
name of the unmergeable portion of a file header.
The C++ compiler reduces
the size of a linked image's symbol table, by splitting header files into
mergeable and non-mergeable entries in the file descriptor table.
The mergeable
entry retains its on-disk name, but the name of the non-mergeable entry is
mangled by appending the string
~alt~deccxx_XXXXXXXX
to the name.
The eight X's represent the CRC encoding of the
date and time when the program was compiled.
See
Section 6.3.2
for more information on file merging.
The encoding of C++ names is described in the manual Using DEC C++ for Tru64 UNIX Systems.
Other compilers may write symbol names that are modified by prepending
or appending special characters such as dollar sign ($) or underscore (_)
or by prepending qualifier strings such as file names or namespace names.
Uppercasing of names is also common for certain languages such as Fortran.
All of these transformations fall into the general category of mangled names.
Refer to the release notes for specific compilers for additional information.
13.3.4 Mixed Language Resolution
Compilation of a program involving multiple source languages introduces additional symbol resolution issues. One important task is resolving the main program entry point because conflicting "main" symbols may be present in the different files. For C and C++, the symbol "main" is the main program entry point, but for other languages, "main" will either be an alias for the main program or an interlude. DEC Fortran and DEC COBOL provide interludes that perform some language specific initializations and then call the real main program entry point. For DEC Fortran the main program is "MAIN__" and for DEC COBOL the main program is "__cobol_main". DEC Pascal provides a "main" symbol that aliases the actual main program symbol.
The symbols "MAIN__" and "__cobol_main" can both be present in a mixed
language program, and either, neither, or both can be used by the program.
Debuggers can set a breakpoint in the user's main program by applying some
precedence for selecting the most appropriate symbol.
For a mixed language
program, there is a slight chance that "MAIN__" or "__cobol_main" will be
present but never called.
13.3.5 TLS Symbols
TLS (Thread Local Storage)
symbols, like non-TLS symbols, can be undefined or common.
Unresolved TLS
symbols are identified by the storage class
scTlsUndefined, and TLS commons have the storage class
scTlsCommon.
The symbol resolution process
for TLS names is similar, but separate; TLS symbols cannot be resolved to
non-TLS symbols or vice versa.
TLS common symbols are resolved in the same manner as other common storage class symbols (see Section 13.3.2), except that, again, only TLS symbols are candidates for resolution.
Another rule special to TLS is that symbol definitions for TLS common and undefined symbols cannot be imported from shared libraries.