5    Development Environment Notes

This chapter contains notes about issues and known problems with the development environment software and, whenever possible, provides solutions or workarounds to those problems. The following topics are discussed:


5.1    General Programming

The following note applies to general programming.


5.1.1    The malloc Function Is Now Tunable for Better Multithreaded Performance

The C runtime library malloc function (and associated functions) have been modified to allow significantly better concurrency when used by multithreaded applications. Additionally, three new memory allocator tuning variables have been added to allow more control of allocator behavior:

As always when developing applications making significant use of dynamically allocated memory and requiring maximum speed of execution, you should carefully read the Tuning Memory Allocation section of the malloc(3) reference page.


5.1.2    New DEC C Default Tuning Could Impact Applications That Directly Map I/O Space

Applications that directly map and access I/O space with bytes or shorts may be impacted by the new DEC C compiler.

The default tuning for the DEC C compiler has advanced its focus from EV4-EV5 architectures to EV56-EV6 architectures. With this change in tuning, the compiler now generates amask-guarded byte and word instruction sequences for some loops. The amask guards assure that the byte and word instructions will not execute on processors that do not support them. Less efficient instructions will execute instead.

The net result of this change is that users who recompile their applications with the default tuning may see a slight increase in object code size, a very slight decrease in performance on EV4-EV5 processors, and a sizable increase in performance on EV56-EV6 machines.

This change may be disruptive for applications that use special device driver interfaces that directly map I/O space for devices that do not support 8-bit and 16-bit access granularity.

If those applications are compiled without -Wf, -static and are run on EV56-EV6 machines they may corrupt I/O memory. To avoid this possibility, those applications should be compiled with -tune ev5 which disables byte/word instruction generation.


5.2    Realtime Programming

The following notes apply to realtime programming.


5.2.1    SA_SIGINFO Not Visible Under Certain Namespace Conditions

The symbol SA_SIGINFO, defined in sys/signal.h, is not visible under certain namespace conditions when _POSIX_C_SOURCE is explicitly defined in the application or on the compile line.

The SA_SIGINFO symbol is visible if you do not explicitly define _POSIX_C_SOURCE. For most applications, unistd.h provides the standards definitions needed, including _POSIX_C_SOURCE. As a general rule, avoid explicitly defining standards macros in your application or on the compile line. If you do explicitly define _POSIX_C_SOURCE, then SA_SIGINFO is visible if you also explicitly define _OSF_SOURCE.


5.2.2    POSIX 1003.1b Synchronized I/O and File Truncation

POSIX 1003.1b synchronized I/O using file status flags does not apply to file truncation. When file status flags are used to control I/O synchronization, no synchronization occurs for file truncation operations.

You can use the fsync() or fdatasync() function to explicitly synchronize truncation operations.


5.2.3    The fcntl() Function and F_GETFL with O_DSYNC File Status

A problem occurs when fcntl() is called with the F_GETFL request, and the file operated on has the O_DSYNC file status flag set. The return mask incorrectly indicates O_SYNC instead of O_DSYNC.


5.3    DECthreads (pthreads)

The following notes apply to DECthreads. See Section 8.10 and Section 8.11 for information about DECthreads interfaces that will be retired in a future release. See Section 1.11 for information about Visual Threads, a new product that lets you analyze your multithreaded applications for potential logic and performance problems.


5.3.1    Static Libraries

Users who desire optimal performance from DECthreads, and who are willing to relink on future versions of Tru64 UNIX, might want to use the DECthreads static libraries that are located in the CMPDEVENH440 subset. Once this subset is installed, you can find the libraries in the /usr/opt/alt/usr/lib/threads directory.

Before using these static libraries, you should read the README file in the same location.


5.3.2    Signal Handling

Signal handling in the POSIX 1003.1c (pthread) interface of DECthreads is substantially different from signal handling for the draft 4 POSIX and the CMA interfaces of DECthreads. When migrating your application from the draft 4 POSIX or CMA interfaces to the POSIX 1003.1c interface, please see the IEEE POSIX 1003.1c standard or the Guide to DECthreads for a discussion of signal handling in threaded applications.


5.3.3    Scheduling Behavior (Contention Scope)

In releases prior to Version 4.0, thread scheduling attributes were systemwide. In other words, threads had a system contention scope. Since Version 4.0, thread policies and priorities are, by default, local to the process. No artificial limit exists for thread priorities of these process contention scope threads, the full priority range is accessible by every thread.

Previously, there was no way to control the contention scope of a thread. Starting with Version 4.0D, applications coded to the POSIX 1003.1c pthreads interface can set the desired contention scope upon thread creation. For more information on setting and determining thread contention scope, see the descriptions of the following routines in the Guide to DECthreads:

pthread_attr_setscope()

pthread_attr_getscope()

The guide also describes a problem with inheritance of the contention scope scheduling attribute in Versions 4.0D and higher.

Process contention scope threads provide faster context switches between threads in the same process, and reduce the demand on system resources without reducing execution concurrency. The Tru64 UNIX "two level scheduling" implementation (the code that supports process contention scope scheduling) automatically replaces kernel execution entities when a process contention scope thread blocks in the kernel for any reason, and it provides time-slicing of compute-bound threads. Therefore, there is no need to worry that using process contention scope will reduce parallelism or allow the execution of some threads to prevent other threads from executing.

The only code that should require system contention scope is code that must run on a specific processor via binding and code that must be directly scheduled by the operating system kernel against threads in other processes, particularly threads running inside the kernel. While the scheduling policy and priority of process contention scope threads is virtual and affects scheduling only against other threads within the process, the scheduling policy and priority of system contention scope threads (when the process runs with root access) can allow the thread to preempt threads within the kernel. While this can sometimes be valuable and even essential, extreme care must be used in such programs to avoid locking up the system. It might be impossible to interrupt such a thread.


5.3.4    Problems Using of the stackaddr Thread Creation Attribute

Compaq does not recommend using the stackaddr thread creation attribute which allows you to allocate your own stack for a thread. The semantics of this attribute are poorly defined by POSIX and the Single UNIX Specification, Version 2. As a result, code using the attribute is unlikely to be portable between implementations. The attribute is difficult to use reliably, since the developer must, by intimate knowledge of the machine architecture and implementation, know the correct address to specify relative to the allocated stack. The implementation cannot diagnose an incorrect value because the interface does not provide sufficient information. Using an incorrect value might result in program failure, possibly in obscure ways.


5.3.5    DECthreads Read-Write Locks

DECthreads now supports read-write locks. A read-write lock is a synchronization object for protecting a data object that can be accessed concurrently by more than one thread in the same program. Unlike a mutex, a read-write lock distinguishes between shared read and exclusive write operations on the shared data object. A read-write lock is most useful in protecting a shared data object that is read frequently and modified less frequently. The following routines provide access to the read-write lock capability:

For more information about read-write locks, see the reference pages for these routines.


5.3.6    DECthreads Object Naming

DECthreads now allows you to assign names, as C language strings, to thread objects including threads, mutexes, condition variables, and read-write locks (see Section 5.3.5 ). During debugging, you can use these names to help identify individual objects by function rather than by the numeric identifiers the thread library assigns. The Ladebug debugger and the Visual Threads analysis tool (see Section 1.11 ) include these names when displaying information about thread objects. Other debuggers and analysis tools can also use the names you have assigned.

Use the following routines to assign and retrieve object names:

For more information about object naming, see the reference pages for these routines.


5.3.7    DECthreads Metering Capability May Not Be Reliable in Some Situations

In this release, the metering capabilities of DECthreads may not be reliable in a process that forks.


5.3.8    Memory Alignment Issue

Although older Alpha processors (prior to the 21264 chip) can only access memory in units of at least a quadword (8 bytes), multiple variables, each of which is less than eight bytes, can occupy the same quadword in memory. In such cases, multithreaded programs might experience a problem if two or more threads read the same quadword, update different parts of it, then independently write their respective copies back to memory. The last thread to write the quadword overwrites any data previously written to other parts of the quadword. This can happen even though each thread protects its part of the quadword with its own mutex.

The Tru64 UNIX C compiler protects scalar variables against this problem by aligning them in memory on quadword (8-byte) boundaries. However, in composite data objects such as structures or arrays, the compiler aligns members on their natural boundaries. For example, a 2-byte member is aligned on a 2-byte boundary. Because of this, any adjacent members of the composite object that total eight bytes or less could occupy the same quadword in memory.

Inspect your multithreaded application code to determine if you have a composite data object in which adjacent members could share the same quadword in memory. If you do and if your project allows, Compaq recommends that you force alignment of each such member variable to a quadword boundary by redefining the variable to be at least eight bytes, or by defining sufficient padding storage after the variable to total eight bytes.

Alternatively, you can create one mutex for each composite data object in which adjacent members can share the same quadword in memory. Then use this single mutex to protect all write accesses by all threads to the composite data object. This technique might be less desirable because of performance considerations.


5.3.9    DECthreads pthread_debug() and pthread_debug_cmd() Routines

In order to allow for the possibility of a more comprehensive and robust threads debugging environment, it has become necessary to remove the pthread_debug() and pthread_debug_cmd() routines. To prevent existing binaries from failing, the routines will continue to be recognized. However, a call to either routine now results in an immediate return to the calling program. The pthread_debug_cmd() routine returns a zero (0) indicating success. Debuggers such as Ladebug and TotalView provide functionality formerly provided by these routines.


5.3.10    DECthreads SIGEV_THREAD Notification Mechanism

The SIGEV_THREAD notification mechanism works correctly, starting in Version 4.0D. Using this notification mechanism, a user-defined function is called to perform notification of an asynchronous event. The function is run as though it were the start routine of a thread and can make full use of the DECthreads synchronization objects.

The SIGEV_THREAD notification mechanism and the function to be called are specified in the sigevent structure. This mechanism is useful for programming with the POSIX 1.b realtime signal interfaces such as timers and asynchronous I/O. For information and cautions concerning the use of signals in a multithreaded environment, see the Guide to DECthreads. For more information about using SIGEV_THREAD, see the IEEE POSIX 1003.1c-1996 standard and The Open Group Single UNIX Specification, Version 2.


5.4    Profiling

The following notes apply to the profiler tools.


5.4.1    Change to hiprof's Profiling of Threaded Programs

The -cputime option of the hiprof(5) profiler now provides an instruction-count profile for threaded programs, the same as the -calltime option, because the CPU cycles reported for kernel-threads by the RPCC instruction can not be mapped to pthread(3) threads.

The only significant difference is that the profile is displayed as the number of instructions executed instead of CPU seconds used. The -cputime option still profiles CPU seconds for nonthreaded programs.


5.4.2    Change in Naming of Files by cc Profiling Option

The cc command's -prof_gen option (which causes the pixie profiler to be run after the executable is linked) names files differently from the way it did in releases prior to Version 4.0E.

The new naming scheme is necessary to support formal benchmarking, which is the primary purpose of the -prof_gen option.

Before Version 4.0E, the uninstrumented executable produced by the ld linker and provided as input to pixie was named a.out (or as indicated with the -o option). The instrumented executable produced by pixie was given the usual .pixie filename extension.

Starting with Version 4.0E, the instrumented executable is named a.out (or as indicated with the -o option). The uninstrumented executable is given a .non_pixie file name extension.


5.5    Debugging with dbx

The following note applies to debugging with dbx.


5.5.1    Examining the User Program Stack in a Kernel Crash Dump

When debugging a crash dump with dbx, you can examine the call stack of the user program whose execution precipitated the kernel crash. To examine a crash dump and also view the user program stack, you must invoke dbx using the following command syntax:

dbx -k vmunix. n vm[z]core. n path / user-program

The version number (n) is determined by the value contained in the bounds file, which is located in the same directory as the dump files. The user-program parameter specifies the user program executable.

The crash dump file must contain a full crash dump. For information on setting system defaults for full or partial crash dumps, see the System Administration guide. You can use the assign command in dbx, as shown in the following example, to temporarily specify a full crash dump. This setting stays in effect until the system is rebooted.

# dbx -k vmunix.3
dbx version 5.0

.
.
.
(dbx) assign partial_dump=0

To specify a full crash dump permanently so that this setting remains in effect after a reboot, use the patch command in dbx, as shown in the following example:

(dbx) patch partial_dump=0

With either command, a partial_dump value of 1 specifies a partial dump.

The following example shows how to examine the state of a user program named test1 that purposely precipitated a kernel crash with a syscall after several recursive calls:

# dbx -k vmunix.1 vmzcore.1 /usr/proj7/test1
dbx version 5.0
Type 'help' for help.

 
stopped at [boot:1890 ,0xfffffc000041ebe8] Source not available
 
warning: Files compiled -g3: parameter values probably wrong (dbx) where [1] > 0 boot() ["../../../../src/kernel/arch/alpha/machdep.c":1890, 0xfffffc000041ebe8] 1 panic(0xfffffc000051e1e0, 0x8, 0x0, 0x0, 0xffffffff888c3a38) ["../../../../src/kernel/bsd/subr_prf.c":824, 0xfffffc0000281974] 2 syscall(0x2d, 0x1, 0xffffffff888c3ce0, 0x9aa1e00000000, 0x0) ["../../../../src/kernel/arch/alpha/syscall_trap.c":593, 0xfffffc0000423be4] 3 _Xsyscall(0x8, 0x3ff8010f9f8, 0x140008130, 0xaa, 0x3ffc0097b70) ["../../../../src/kernel/arch/alpha/locore.s":1409, 0xfffffc000041b0f4] 4 __syscall(0x0, 0x0, 0x0, 0x0, 0x0) [0x3ff8010f9f4] 5 justtryme(scall = 170, cpu = 0, levels = 25) ["test1.c":14, 0x120001310] 6 recurse(inbox = (...)) ["test1.c":28, 0x1200013c4] 7 recurse(inbox = (...)) ["test1.c":30, 0x120001400] 8 recurse(inbox = (...)) ["test1.c":30, 0x120001400] 9 recurse(inbox = (...)) ["test1.c":30, 0x120001400]
.
.
.
30 recurse(inbox = (...)) ["test1.c":30, 0x120001400] 31 main(argc = 3, argv = 0x11ffffd08) ["test1.c":52, 0x120001518] (dbx) up 8 [2] recurse: 30 if (r.a[2] > 0) recurse(r); (dbx) print r [3] struct { a = { [0] 170 [1] 0 [2] 2 [3] 0
.
.
.
(dbx) print r.a[511] [4] 25 (dbx)

  1. The where command displays the kernel stack followed by the user program stack at the time of the crash. In this case, the kernel stack has 4 activation levels; the user program stack starts with the fifth level and includes several recursive calls. [Return to example]

  2. The up 8 command moves the debugging context 8 activation levels up the stack to one of the recursive calls within the user program code. [Return to example]

  3. The print r command displays the current value of the variable r, which is a structure of array elements. Full symbolization is available for the user program, assuming it was compiled with the -g option. [Return to example]

  4. The print r.a[511] command displays the current value of array element 511 of structure r. [Return to example]


5.6    Java Programming

The following note applies to Java programming.


5.6.1    Name Space Conflict Between Java and SVE

A file system conflict exists between Java and the System V Environment (SVE) on Version 4.0 and later systems.

The problem arises because both Java and SVE use the file system path name string /usr/bin/alpha for different purposes. Java creates /usr/bin/alpha as a directory. SVE (specifically, the optional SVEBCP4** Base Compatibility Package subset) creates /usr/bin/alpha as a symbolic link to the /usr/opt/svr4/usr/bin/alpha directory. The order in which these applications are installed determines if the customer will experience a problem. Here are three ways to avoid the problem:

There will be no patch or other resolution mechanism for this problem other than the workaround provided here.