6 Programming Considerations

This chapter gives rules and examples to follow when creating an assembly language program.

The chapter addresses the following topics:

Why your assembly programs should use the calling conventions observed by the C compiler (Section 6.1)

An overview of the composition of executable programs (Section 6.2)

The use of registers, section and location counters, and stack frames (Section 6.3)

A technique for coding an interface between an assembly language procedure and a procedure written in a high-level language (Section 6.4)

The default memory-allocation scheme used by the Alpha system (Section 6.5)

This chapter does not address coding issues related to performance or optimization. See Appendix A of the Alpha Architecture Reference Manual for information on how to optimize assembly code.

6.1 Calling Conventions

When you write assembly language procedures, you should use the same calling conventions that the C compiler observes. The reasons for using the same calling conventions are as follows:

Often your code must interact with compiler-generated code, accepting and returning arguments or accessing shared global data.

The symbolic debugger gives better assistance in debugging programs that use standard calling conventions.

The conventions observed by the Tru64 UNIX compiler system are more complicated than those of some other compiler systems, mostly to enhance the speed of each procedure call. Specifically:

The C compiler uses the full, general calling sequence only when necessary; whenever possible, it omits unneeded portions of the sequence. For example, the C compiler does not use a register as a frame pointer if it is unnecessary to do so.

The C compiler and the debugger observe certain implicit rules instead of communicating by means of instructions or data at execution time. For example, the debugger looks at information placed in the symbol table by a .frame directive at compilation time. This technique enables the debugger to tolerate the lack of a register containing a frame pointer at execution time.

The linker performs code optimizations based on information that is not available at compile time. For example, the linker can, in some cases, replace the general calling sequence to a procedure with a single instruction.

6.2 Program Model

A program consists of an executable image and zero or more shared images. Each image has an independent text and data area.

Each data segment contains a global offset table (GOT), which contains address constants for procedures and data locations that the text segment references. The GOT provides the means to access arbitrary 64-bit addresses and allows the text segment to be position-independent.

The size of the GOT is limited only by the maximum image size. However, because only 64 KB can be addressed by a single memory-format instruction, the GOT is segmented into one or more sections of 64 KB or less.

In addition to providing efficient access to the GOT, the gp register is also used to access global data within ±2 GB of the global pointer. This area of memory is known as the global data area.

A static executable image is not a special case in the program model. It is simply an executable image that uses no shared libraries. However, it is possible for the linker to perform code optimizations. In particular, if a static executable image's GOT is less than or equal to 64 KB (that is, has only one segment), the code to load, save, and restore the gp register is not necessary because all procedures will access the same GOT segment.

6.3 General Coding Concerns

This section describes three general areas of concern to the assembly language programmer:

Usable and restricted registers (Section 6.3.1)

Control of section and location counters with directives (Section 6.3.2)

Stack frame requirements on entering and exiting a procedure (Section 6.3.3)

Another general coding consideration is the use of data structures to communicate between high-level language procedures and assembly procedures. In most cases, this communication is handled by means of simple variables: pointers, integers, Booleans, and single- and double-precision real numbers. Describing the details of the various high-level data structures that can also be used -- arrays, records, sets, and so on -- is beyond the scope of this manual.

6.3.1 Register Use

The main processor has 32 64-bit integer registers. The uses and restrictions of these registers are described in Table 6-1.

The floating-point coprocessor has 32 floating-point registers. Each register can hold either a single-precision (32 bit) or double-precision (64 bit) value. See Table 6-2 for details.

Table 6-1: Integer Registers

Register Name	Software Name (from regdef.h)	Use
`$0`	`v0`	Used for expression evaluations and to hold the integer function results. Not preserved across procedure calls.
`$1-8`	`t0-t7`	Temporary registers used for expression evaluations. Not preserved across procedure calls.
`$9-14`	`s0-s5`	Saved registers. Preserved across procedure calls.
`$15` or `$fp`	`s6` or `fp`	Contains the frame pointer (if needed); otherwise, a saved register.
`$16-21`	`a0-a5`	Used to pass the first six integer type actual arguments. Not preserved across procedure calls.
`$22-25`	`t8-t11`	Temporary registers used for expression evaluations. Not preserved across procedure calls.
`$26`	`ra`	Contains the return address. Preserved across procedure calls.
`$27`	`pv` or `t12`	Contains the procedure value and used for expression evaluation. Not preserved across procedure calls.
`$28` or `$at`	`AT`	Reserved for the assembler. Not preserved across procedure calls.
`$29` or `$gp`	`gp`	Contains the global pointer. Not preserved across procedure calls.
`$30` or `$sp`	`sp`	Contains the stack pointer. Preserved across procedure calls.
`$31`	zero	Always has the value 0.

Table 6-2: Floating-Point Registers

Register Name	Use
`$f0-f1`	Used to hold floating-point type function results (`$f0`) and complex type function results (`$f0` has the real part and `$f1` has the imaginary part). Not preserved across procedure calls.
`$f2-f9`	Saved registers. Preserved across procedure calls.
`$f10-f15`	Temporary registers used for expression evaluation. Not preserved across procedure calls.
`$f16-f21`	Used to pass the first six single- or double-precision actual arguments. Not preserved across procedure calls.
`$f22-f30`	Temporary registers used for expression evaluations. Not preserved across procedure calls.
`$f31`	Always has the value 0.0.

6.3.2 Using Directives to Control Sections and Location Counters

Assembled code and data are stored in the object file sections shown in Figure 6-1. Each section has an implicit location counter that begins at zero and increments by one for each byte assembled in the section. Location control directives (.align, .data, .rconst, .rdata, .sdata, .space, and .text) can be used to control what is stored in the various sections and to adjust location counters.

The assembler always generates the text section before other sections. Additions to the text section are done in 4-byte units.

The bss (block started by symbol) section holds data items (usually variables) that are initialized to zero. If a .lcomm directive defines a variable, the assembler assigns that variable to either the .bss section or the .sbss (small bss) section, depending on the variable's size.

The default size for variables in the .sbss section is eight or fewer bytes. You can change the size using the -G compilation option for the C compiler or the assembler. Items smaller than or equal to the specified size go in the .sbss section. Items greater than the specified size go in the .bss section.

At run time, the $gp register points into the area of memory occupied by the .lita section. The .lita section is used to hold address literals for 64-bit addressing.

Figure 6-1: Sections and Location Counters for Nonshared Object Files

See the Symbol Table/Object File Specification manual for more information on section data.

6.3.3 The Stack Frame

The C compiler classifies each procedure into one of the following categories:

Nonleaf procedures. These procedures call other procedures.

Leaf procedures. These procedures do not themselves call other procedures. Leaf procedures are of two types: those that require stack storage for local variables and those that do not.

You must decide the procedure category before determining the calling sequence.

To write a program with proper stack frame usage and debugging capabilities, you should observe the conventions presented in the following list of steps. Steps 1 through 6 describe the code you must provide at the beginning of a procedure, step 7 describes how to pass parameters, and steps 8 through 12 describe the code you must provide at the end of a procedure:

Regardless of the type of procedure, you should include a .ent directive and an entry label for the procedure:
```
        .ent    procedure_name
procedure_name:
```
The .ent directive generates information for the debugger, and the entry label is the procedure name.

If you are writing a procedure that references static storage, calls other procedures, uses constants greater than 31 bits in size, or uses floating constants, you must load the $gp register with the global pointer value for the procedure:
```
        ldgp    $gp,0($27)
 
```
Register $27 contains the procedure value (the address of this procedure as supplied by the caller).

If you are writing a leaf procedure that does not use the stack, skip to step 4. For a nonleaf procedure or a leaf procedure that uses the stack, you must adjust the stack size by allocating all of the stack space that the procedure requires:
```
        lda     $sp,-framesize($sp)
 
```
The framesize operand is the size of frame required, in bytes, and must be a multiple of 16. You must allocate space on the stack for the following items:
- Local variables.
- Saved general registers. Space should be allocated only for those registers saved. For nonleaf procedures, you must save register $26, which is used in the calls to other procedures from this procedure. If you use registers $9 to $15, you must also save them.
- Saved floating-point registers. Space should be allocated only for those registers saved. If you use registers $f2 to $f9, you must also save them.
- Procedure call argument area. You must allocate the maximum number of bytes for arguments of any procedure that you call from this procedure; this area does not include space for the first six arguments because they are always passed in registers.
Note

Once you have modified register $sp, you should not modify it again in the remainder of the procedure.

To generate information used by the debugger and exception handler, you must include a .frame directive:
```
        .frame  framereg,framesize,returnreg
```
The virtual frame pointer does not have a register allocated for it. It consists of the framereg ($sp, in most cases) added to the framesize (see step 3). Figure 6-2 shows the stack components.
Figure 6-2: Stack Organization

The returnreg argument for the .frame directive specifies the register that contains the return address (usually register $26). The usual values may change if you use a varying stack pointer or are specifying a kernel trap procedure.

If the procedure is a leaf procedure that does not use the stack, skip to step 11. Otherwise, you must save the registers for which you allocated space in step 3.
Saving the general registers requires the following operations:
- Specify which registers are to be saved using the following .mask directive:
```
        .mask   bitmask,frameoffset
```
  The bit settings in bitmask indicate which registers are to be saved. For example, if register $9 is to be saved, bit 9 in bitmask must be set to 1. The value for frameoffset is the offset (negative) from the virtual frame pointer to the start of the register save area.
- Use the following stq instruction to save the registers specified in the mask directive:
```
        stq     reg,framesize+frameoffset+N($sp)
```
  The value of N is the size of the argument build area for the first register and is incremented by 8 for each successive register. If the procedure is a nonleaf procedure, the return address register ($26) is the first register to be saved; it must be saved at framesize+frameoffset+0($sp) for exception handling. For example, a nonleaf procedure that saves register $9 and $10 would use the following stq instructions:
```
        stq     $26,framesize+frameoffset($sp)
        stq     $9,framesize+frameoffset+8($sp)
        stq     $10,framesize+frameoffset+16($sp)
 
```
  (Figure 6-2 shows the order in which the registers in the preceding example would be saved.)
  Then, save any floating-point registers for which you allocated space in step 3:
```
        .fmask  bitmask,frameoffset
        stt     reg,framesize+frameoffset+N($sp)
```
  Saving floating-point registers is identical to saving integer registers except you use the .fmask directive instead of .mask, and the storage operations involve single- or double-precision floating-point data. (The previous discussion about how to save integer registers applies here as well.)

The final step in creating the procedure's prologue is to mark its end as follows:
```
       .prologue flag
```
The flag is set to 1 if the prologue contains an ldgp instruction (see step 2); otherwise, it is set to zero.

This step describes parameter passing: how to access arguments passed into your procedure and how to pass arguments correctly to other procedures. For information on high-level, language-specific constructs (call-by-name, call-by-value, string or structure passing), see the programmer's guides for the high-level languages used to write the procedures that interact with your program.

General registers $16 to $21 and floating-point registers $f16 to $f21 are used for passing the first six arguments. All nonfloating-point arguments in the first six arguments are passed in general registers. All floating-point arguments in the first six arguments are passed in floating-point registers.

Stack space is used for passing the seventh and subsequent arguments. The stack space allocated to each argument is an 8-byte multiple and is aligned on an 16-byte boundary.

Table 6-3 summarizes the location of procedure arguments in the register or stack.

Table 6-3: Argument Locations

Argument Number	Integer Register	Floating-Point Register	Stack
1	`$16` (`a0`)	`$f16`
2	`$17` (`a1`)	`$f17`
3	`$18` (`a2`)	`$f18`
4	`$19` (`a3`)	`$f19`
5	`$20` (`a4`)	`$f20`
6	`$21` (`a5`)	`$f21`
`7-n`			`0($sp)..(n -7)*8($sp)`

On procedure exit, you must restore registers that were saved in step 5. To restore general purpose registers:
```
        ldq     reg,framesize+frameoffset+N($sp)
```
To restore the floating-point registers:
```
        ldt     reg,framesize+frameoffset+N($sp)
```
(See step 5 for a discussion of the value of N.)

Get the return address:

        ldq     $26,framesize+frameoffset($sp)

Clean up the stack:
```
        lda     $sp,framesize($sp)
```

Return:
```
        ret     $31,($26),1
 
```

End the procedure:
```
        .end    procedurename
 
```

6.3.4 Coding Examples

The examples in this section show procedures written in C and the equivalent procedures written in assembly language.

Example 6-1 shows a nonleaf procedure. Note that it creates a stack frame and saves its return address. It saves its return address because it must put a new return address into register $26 when it makes a procedure call.

Example 6-1: Nonleaf Procedure

int
nonleaf(i, j)
  int i, *j;
  {
  int abs();
  int temp;
 
  temp = i - *j;
  return abs(temp);
  }
 
        .globl  nonleaf
 #    1 int
 #    2 nonleaf(i, j)
 #    3   int i, *j;
 #    4   {
        .ent    nonleaf 2
nonleaf:
        ldgp    $gp, 0($27)
        lda     $sp, -16($sp)
        stq     $26, 0($sp)
        .mask   0x04000000, -16
        .frame  $sp, 16, $26, 0
        .prologue       1
        addl    $16, 0, $18
 #    5   int abs();
 #    6   int temp;
 #    7
 #    8   temp = i - *j;
        ldl     $1, 0($17)
        subl    $18, $1, $16
 #    9   return abs(temp);
        jsr     $26, abs
        ldgp    $gp, 0($26)
        ldq     $26, 0($sp)
        lda     $sp, 16($sp)
        ret     $31, ($26), 1
        .end    nonleaf

Example 6-2 shows a leaf procedure that does not require stack space for local variables. Note that it does not create a stackframe and does not save a return address.

Example 6-2: Leaf Procedure Without Stack Space for Local Variables

int
leaf(p1, p2)
  int p1, p2;
  {
  return (p1 > p2) ? p1 : p2;
  }
 
        .globl  leaf
 #    1 leaf(p1, p2)
 #    2   int p1, p2;
 #    3   {
        .ent    leaf 2
leaf:
        ldgp    $gp, 0($27)
        .frame  $sp, 0, $26, 0
        .prologue       1
        addl    $16, 0, $16
        addl    $17, 0, $17
 #    4   return (p1 > p2) ? p1 : p2;
        bis     $17, $17, $0
        cmplt   $0, $16, $1
        cmovne  $1, $16, $0
        ret     $31, ($26), 1
        .end    leaf

Example 6-3 shows a leaf procedure that requires stack space for local variables. Note that it creates a stack frame but does not save a return address.

Example 6-3: Leaf Procedure with Stack Space for Local Variables

int
leaf_storage(i)
  int i;
  {
  int a[16];
  int j;
  for (j = 0; j < 10; j++)
    a[j] = '0' + j;
  return a[i];
  }
 
        .globl  leaf_storage
 #    1 int
 #    2 leaf_storage(i)
 #    3   int i;
 #    4   {
        .ent    leaf_storage 2
leaf_storage:
        ldgp    $gp, 0($27)
        lda     $sp, -80($sp)
        .frame  $sp, 80, $26, 0
        .prologue       1
        addl    $16, 0, $1
 #    5   int a[16];
 #    6   int j;
 #    7   for (j = 0; j < 10; j++)
        ldil    $2, 48
        stl     $2, 16($sp)
        ldil    $3, 49
        stl     $3, 20($sp)
        ldil    $0, 2
        lda     $16, 24($sp)
$32:
 #    8     a[j] = '0' + j;
        addl    $0, 48, $4
        stl     $4, 0($16)
        addl    $0, 49, $5
        stl     $5, 4($16)
        addl    $0, 50, $6
        stl     $6, 8($16)
        addl    $0, 51, $7
        stl     $7, 12($16)
        addl    $0, 4, $0
        addq    $16, 16, $16
        subq    $0, 10, $8
        bne     $8, $32
 #    9   return a[i];
        mull    $1, 4, $22
        addq    $22, $sp, $0
        ldl     $0, 16($0)
        lda     $sp, 80($sp)
        ret     $31, ($26), 1
        .end    leaf_storage

6.4 Developing Code for Procedure Calls

The rules and parameter requirements for passing control and exchanging data between procedures written in assembly language and procedures written in other languages are varied and complex. The simplest approach to coding an interface between an assembly procedure and a procedure written in a high-level language is to do the following:

Use the high-level language to write a skeletal version of the procedure that you plan to code in assembly language.

Compile the program using the -S option, which creates an assembly language (.s) version of the compiled source file.

Study the assembly language listing and then, using the code in the listing as a guideline, write your assembly language code.

Section 6.4.1 and Section 6.4.2 describe techniques you can use to create interfaces between procedures written in assembly language and procedures written in a high-level language. The examples show what to look for in creating your interface. Details such as register numbers will vary according to the number, order, and data types of the arguments. In writing your particular interface, you should write and compile realistic examples of the code you want to write in assembly language.

6.4.1 Calling a High-Level Language Procedure

The following steps show an approach to use in writing an assembly language procedure that calls atof(3), a procedure written in C that converts ASCII characters to numbers:

Write a C program that calls atof. Pass global variables instead of local variables; this makes them easy to recognize in the assembly language version of the C program (and ensures that optimization does not remove any of the code on the grounds that it has no effect).
The following C program is an example of a program that calls atof:
```
char c[] = "3.1415";
double d, atof();
float f;
caller()
  {
  d = atof(c);
  f = (float)atof(c);
  }
```

Compile the program using the following compiler options:
```
cc -S -O caller.c
```
The -S option causes the compiler to produce the assembly language listing; the -O option, though not required, reduces the amount of code generated, making the listing easier to read.

After compilation, examine the file caller.s. The comments in the file show how the parameters are passed, the execution of the call, and how the returned values are retrieved:

        .globl  c
        .data
c:
        .ascii  "3.1415\X00"
        .comm   d 8
        .comm   f 4
        .text
        .globl  caller
 #    1 char c[] = "3.1415";
 #    2 double d, atof();
 #    3 float f;
 #    4 caller()
 #    5   {
        .ent    caller 2
caller:
        ldgp    $gp, 0($27)
        lda     $sp, -16($sp)
        stq     $26, 0($sp)
        .mask   0x04000000, -16
        .frame  $sp, 16, $26, 0
        .prologue       1
 #    6   d = atof(c);
        lda     $16, c
        jsr     $26, atof
        ldgp    $gp, 0($26)
        stt     $f0, d
 #    7   f = (float)atof(c);
        lda     $16, c
        jsr     $26, atof
        ldgp    $gp, 0($26)
        cvtts   $f0, $f10
        sts     $f10, f
 #    8   }
        ldq     $26, 0($sp)
        lda     $sp, 16($sp)
        ret     $31, ($26), 1
        .end    caller

6.4.2 Calling an Assembly Language Procedure

The following steps show an approach to use in writing an assembly language procedure that can be called by a procedure written in a high-level language:

Using a high-level language, write a facsimile of the assembly language procedure you want to call. In the body of the procedure, write statements that use the same arguments you intend to use in the final assembly language procedure. Copy the arguments to global variables instead of local variables to make it easy for you to read the resulting assembly language listing.
The following C program is a facsimile of the assembly language program:
```
typedef char str[10];
typedef int boolean;
 
float global_r;
int global_i;
str global_s;
boolean global_b;
 
boolean callee(float *r, int i, str s)
  {
  global_r = *r;
  global_i = i;
  global_s[0] = s[0];
  return i == 3;
  }
```

Compile the program using the following compiler options:
```
cc -S -O callee.c
```
The -S option causes the compiler to produce the assembly language listing; the -O option, though not required, reduces the amount of code generated, making the listing easier to read.

After compilation, examine the file callee.s. The comments in the file show how the parameters are passed, the execution of the call, and how the returned values are retrieved:

        .comm   global_r 4
        .comm   global_i 4
        .comm   global_s 10
        .comm   global_b 4
        .text
        .globl  callee
 #   10   {
        .ent    callee 2
callee:
        ldgp    $gp, 0($27)
        .frame  $sp, 0, $26, 0
        .prologue       1
        addl    $17, 0, $17
 #   11   global_r = *r;
        lds     $f10, 0($16)
        sts     $f10, global_r
 #   12   global_i = i;
        stl     $17, global_i
 #   13   global_s[0] = s[0];
        ldq_u   $1, 0($18)
        extbl   $1, $18, $1
        .set     noat
        lda     $28, global_s
        ldq_u   $2, 0($28)
        insbl   $1, $28, $3
        mskbl   $2, $28, $2
        bis     $2, $3, $2
        stq_u   $2, 0($28)
        .set     at
 #   14   return i == 3;
        cmpeq   $17, 3, $0
        ret     $31, ($26), 1
        .end    callee

6.5 Memory Allocation

The default memory allocation scheme used by the Alpha system gives every process two storage areas that can grow without bounds. A process exceeds virtual storage only when the sum of the two areas exceeds virtual storage space. By default, the linker and assembler use the scheme shown in Figure 6-3.

Figure 6-3: Default Layout of Memory (User Program View)

This area is not allocated until a user requests it. (The same behavior is observed in System V shared memory regions.)

The heap is reserved for sbrk and brk system calls, and it is not always present.

See the Symbol Table/Object File Specification manual for details on the sections contained within the bss, data, and text segments.

The stack is used for local data in C programs.