This chapter gives rules and examples to follow when creating an assembly-language program.
The chapter addresses the following topics:
This chapter does not address coding issues related to performance or optimization. See Appendix A of the Alpha Architecture Reference Manual for information on how to optimize assembly code.
When you write assembly-language procedures, you should use the same calling conventions that the C compiler observes. The reasons for using the same calling conventions are as follows:
The conventions observed by the Digital UNIX compiler system are more complicated than those of some other compiler systems, mostly to enhance the speed of each procedure call. Specifically:
A program consists of an executable image and zero or more shared images. Each image has an independent text and data area.
Each data segment contains a global offset table (GOT), which contains address constants for procedures and data locations that the text segment references. The GOT provides the means to access arbitrary 64-bit addresses and allows the text segment to be position independent.
The size of the GOT is limited only by the maximum image size. However, because only 64KB can be addressed by a single memory-format instruction, the GOT is segmented into one or more sections of 64KB or less.
In addition to providing efficient access to the GOT, the gp register is also used to access global data within ±2GB of the global pointer. This area of memory is known as the global data area.
A static executable image is not a special case in the program model. It is simply an executable image that uses no shared libraries. However, it is possible for the linker to perform code optimizations. In particular, if a static executable image's GOT is less than or equal to 64KB (that is, has only one segment), the code to load, save, and restore the gp register is not necessary because all procedures will access the same GOT segment.
This section describes three general areas of concern to the assembly language programmer:
Another general coding consideration is the use of data structures to communicate between high-level language procedures and assembly procedures. In most cases, this communication is handled by means of simple variables: pointers, integers, Booleans, and single- and double-precision real numbers. Describing the details of the various high-level data structures that can also be used - arrays, records, sets, and so on - is beyond the scope of this manual.
The main processor has 32 64-bit integer registers. The uses and restrictions of these registers are described in Table 6-1.
The floating-point co-processor has 32 floating-point registers. Each register can hold either a single-precision (32 bit) or double-precision (64 bit) value. Refer to Table 6-2 for details.
Register Name | Software Name | Use |
(from regdef.h) | ||
$0 | v0 | Used for expression evaluations and to hold the integer function results. Not preserved across procedure calls. |
$1-8 | t0-t7 | Temporary registers used for expression evaluations. Not preserved across procedure calls. |
$9-14 | s0-s5 | Saved registers. Preserved across procedure calls. |
$15 or $fp | s6 or fp | Contains the frame pointer (if needed); otherwise, a saved register. |
$16-21 | a0-a5 | Used to pass the first six integer type actual arguments. Not preserved across procedure calls. |
$22-25 | t8-t11 | Temporary registers used for expression evaluations. Not preserved across procedure calls. |
$26 | ra | Contains the return address. Preserved across procedure calls. |
$27 | pv or t12 | Contains the procedure value and used for expression evaluation. Not preserved across procedure calls. |
$28 or $at | AT | Reserved for the assembler. Not preserved across procedure calls. |
$29 or $gp | gp | Contains the global pointer. Not preserved across procedure calls. |
$30 or $sp | sp | Contains the stack pointer. Preserved across procedure calls. |
$31 | zero | Always has the value 0. |
Register | Use |
Name | |
$f0-f1 | Used to hold floating-point type function results ($f0) and complex type function results ($f0 has the real part, $f1 has the imaginary part). Not preserved across procedure calls. |
$f2-f9 | Saved registers. Preserved across procedure calls. |
$f10-f15 | Temporary registers used for expression evaluation. Not preserved across procedure calls. |
$f16-f21 | Used to pass the first six single- or double-precision actual arguments. Not preserved across procedure calls. |
$f22-f30 | Temporary registers used for expression evaluations. Not preserved across procedure calls. |
$f31 | Always has the value 0.0. |
Assembled code and data are stored in the object file sections shown in Figure 6-1. Each section has an implicit location counter that begins at zero and increments by one for each byte assembled in the section. Location control directives (.align, .data, .rconst, .rdata, .sdata, .space, and .text) can be used to control what is stored in the various sections and to adjust location counters.
The assembler always generates the text section before other sections. Additions to the text section are done in 4-byte units.
The bss (block started by symbol) section holds data items (usually variables) that are initialized to zero. If a .lcomm directive defines a variable, the assembler assigns that variable to either the .bss section or the .sbss (small bss) section, depending on the variable's size.
The default size for variables in the .sbss section is eight or fewer bytes. You can change the size using the -G compilation option for the C compiler or the assembler. Items smaller than or equal to the specified size go in the .sbss section. Items greater than the specified size go in the .bss section.
At run time, the $gp register points into the area of memory occupied by the .lita section. The .lita section is used to hold address literals for 64-bit addressing.
See Chapter 7 for more information on section data.
The C compiler classifies each procedure into one of the following categories:
You must decide the procedure category before determining the calling sequence.
To write a program with proper stack frame usage and debugging capabilities, you should observe the conventions presented in the following list of steps. Steps 1 through 6 describe the code you must provide at the beginning of a procedure, step 7 describes how to pass parameters, and steps 8 through 12 describe the code you must provide at the end of a procedure:
.ent procedure_name procedure_name:
The .ent directive generates information for the debugger, and the entry label is the procedure name.
ldgp $gp,0($27)
Register $27 contains the procedure value (the address of this procedure as supplied by the caller).
lda $sp,-framesize($sp)
The framesize operand is the size of frame required, in bytes, and must be a multiple of 16. You must allocate space on the stack for the following items:
Note
Once you have modified register $sp, you should not modify it again in the remainder of the procedure.
.frame framereg,framesize,returnreg
The virtual frame pointer does not have a register allocated for it. It consists of the framereg ($sp, in most cases) added to the framesize (see step 3). Figure 6-2 illustrates the stack components.
The returnreg argument for the .frame directive specifies the register that contains the return address (usually register $26). These usual values may change if you use a varying stack pointer or are specifying a kernel trap procedure.
Saving the general registers requires the following operations:
.mask bitmask,frameoffset
The bit setting in bitmask indicate which registers are to be saved. For example, if register $9 is to be saved, bit 9 in bitmask must be set to 1. The value for frameoffset is the offset (negative) from the virtual frame pointer to the start of the register save area.
stq reg,framesize+frameoffset+N($sp)
The value of N is the size of the argument build area for the first register and is incremented by 8 for each successive register. If the procedure is a nonleaf procedure, the return address is the first register to be saved. For example, a nonleaf procedure that saves register $9 and $10 would use the following stq instructions:
stq $26,framesize+frameoffset($sp) stq $9,framesize+frameoffset+8($sp) stq $10,framesize+frameoffset+16($sp)
(Figure 6-2 illustrates the order in which the registers in the preceding example would be saved.)
Then, save any floating-point registers for which you allocated space in step 3:
.fmask bitmask,frameoffset stt reg,framesize+frameoffset+N($sp)
Saving floating-point registers is identical to saving integer registers except you use the .fmask directive instead of .mask, and the storage operations involve single- or double-precision floating-point data. (The previous discussion about how to save integer registers applies here as well.)
.prologue flag
The flag is set to 1 if the prologue contains an ldgp instruction (see step 2); otherwise, it is set to 0.
General registers $16 to $21 and floating-point registers $f16 to $f21 are used for passing the first six arguments. All nonfloating-point arguments in the first six arguments are passed in general registers. All floating-point arguments in the first six arguments are passed in floating-point registers.
Stack space is used for passing the seventh and subsequent arguments. The stack space allocated to each argument is an 8-byte multiple and is aligned on an 16-byte boundary.
Table 6-3 summarizes the location of procedure arguments in the register or stack.
Argument | Integer | Floating-Point | |
Number | Register | Register | Stack |
1 | $16 (a0) | $f16 | |
2 | $17 (a1) | $f17 | |
3 | $18 (a2) | $f18 | |
4 | $19 (a3) | $f19 | |
5 | $20 (a4) | $f20 | |
6 | $21 (a5) | $f21 | |
7-n | 0($sp)..(n-7)*8($sp) |
ldq reg,framesize+frameoffset+N($sp)
To restore the floating-point registers:
ldt reg,framesize+frameoffset+N($sp)
(Refer to step 5 for a discussion of the value of N.)
ldq $26,framesize+frameoffset($sp)
lda $sp,framesize($sp)
ret $31,($26),1
.end procedurename
The examples in this section show procedures written in C and equivalent procedures written in assembly language.
Example 6-1 shows a nonleaf procedure. Notice that it creates a stack frame and saves its return address. It saves its return address because it must put a new return address into register $26 when it makes a procedure call.
int nonleaf(i, j) int i, *j; { int abs(); int temp;
temp = i - *j; return abs(temp); }
.globl nonleaf # 1 int # 2 nonleaf(i, j) # 3 int i, *j; # 4 { .ent nonleaf 2 nonleaf: ldgp $gp, 0($27) lda $sp, -16($sp) stq $26, 0($sp) .mask 0x04000000, -16 .frame $sp, 16, $26, 0 .prologue 1 addl $16, 0, $18 # 5 int abs(); # 6 int temp; # 7 # 8 temp = i - *j; ldl $1, 0($17) subl $18, $1, $16 # 9 return abs(temp); jsr $26, abs ldgp $gp, 0($26) ldq $26, 0($sp) lda $sp, 16($sp) ret $31, ($26), 1 .end nonleaf
Example 6-2
shows a leaf procedure that does not require stack space
for local variables. Notice that it does not create a stackframe
and does not save a return address.
int leaf(p1, p2) int p1, p2; { return (p1 > p2) ? p1 : p2; }
.globl leaf # 1 leaf(p1, p2) # 2 int p1, p2; # 3 { .ent leaf 2 leaf: ldgp $gp, 0($27) .frame $sp, 0, $26, 0 .prologue 1 addl $16, 0, $16 addl $17, 0, $17 # 4 return (p1 > p2) ? p1 : p2; bis $17, $17, $0 cmplt $0, $16, $1 cmovne $1, $16, $0 ret $31, ($26), 1 .end leaf
Example 6-3
shows a leaf procedure that requires stack space for local
variables. Notice that it creates a stack frame but does not save
a return address.
int leaf_storage(i) int i; { int a[16]; int j; for (j = 0; j < 10; j++) a[j] = '0' + j; return a[i]; }
.globl leaf_storage # 1 int # 2 leaf_storage(i) # 3 int i; # 4 { .ent leaf_storage 2 leaf_storage: ldgp $gp, 0($27) lda $sp, -80($sp) .frame $sp, 80, $26, 0 .prologue 1 addl $16, 0, $1 # 5 int a[16]; # 6 int j; # 7 for (j = 0; j < 10; j++) ldil $2, 48 stl $2, 16($sp) ldil $3, 49 stl $3, 20($sp) ldil $0, 2 lda $16, 24($sp) $32: # 8 a[j] = '0' + j; addl $0, 48, $4 stl $4, 0($16) addl $0, 49, $5 stl $5, 4($16) addl $0, 50, $6 stl $6, 8($16) addl $0, 51, $7 stl $7, 12($16) addl $0, 4, $0 addq $16, 16, $16 subq $0, 10, $8 bne $8, $32 # 9 return a[i]; mull $1, 4, $22 addq $22, $sp, $0 ldl $0, 16($0) lda $sp, 80($sp) ret $31, ($26), 1 .end leaf_storage
The rules and parameter requirements for passing control and exchanging data between procedures written in assembly language and procedures written in other languages are varied and complex. The simplest approach to coding an interface between an assembly procedure and a procedure written in a high-level language is to do the following:
Section 6.4.1 and Section 6.4.2 describe techniques you can use to create interfaces between procedures written in assembly language and procedures written in a high-level language. The examples show what to look for in creating your interface. Details such as register numbers will vary according to the number, order, and data types of the arguments. In writing your particular interface, you should write and compile realistic examples of the code you want to write in assembly language.
The following steps show an approach to use in writing an assembly-language procedure that calls atof(3), a procedure written in C that converts ASCII characters to numbers:
The following C program is an example of a program that calls atof:
char c[] = "3.1415"; double d, atof(); float f; caller() { d = atof(c); f = (float)atof(c); }
cc -S -O caller.c
The -S option causes the compiler to produce the assembly-language listing; the -O option, though not required, reduces the amount of code generated, making the listing easier to read.
.globl c .data c: .ascii "3.1415\X00" .comm d 8 .comm f 4 .text .globl caller # 1 char c[] = "3.1415"; # 2 double d, atof(); # 3 float f; # 4 caller() # 5 { .ent caller 2 caller: ldgp $gp, 0($27) lda $sp, -16($sp) stq $26, 0($sp) .mask 0x04000000, -16 .frame $sp, 16, $26, 0 .prologue 1 # 6 d = atof(c); lda $16, c jsr $26, atof ldgp $gp, 0($26) stt $f0, d # 7 f = (float)atof(c); lda $16, c jsr $26, atof ldgp $gp, 0($26) cvtts $f0, $f10 sts $f10, f # 8 } ldq $26, 0($sp) lda $sp, 16($sp) ret $31, ($26), 1 .end caller
The following steps show an approach to use in writing an assembly-language procedure that can be called by a procedure written in a high-level language:
The following C program is a facsimile of the assembly-language program:
typedef char str[10]; typedef int boolean;
float global_r; int global_i; str global_s; boolean global_b;
boolean callee(float *r, int i, str s) { global_r = *r; global_i = i; global_s[0] = s[0]; return i == 3; }
cc -S -O callee.c
The -S option causes the compiler to produce the assembly-language listing; the -O option, though not required, reduces the amount of code generated, making the listing easier to read.
.comm global_r 4 .comm global_i 4 .comm global_s 10 .comm global_b 4 .text .globl callee # 10 { .ent callee 2 callee: ldgp $gp, 0($27) .frame $sp, 0, $26, 0 .prologue 1 addl $17, 0, $17 # 11 global_r = *r; lds $f10, 0($16) sts $f10, global_r # 12 global_i = i; stl $17, global_i # 13 global_s[0] = s[0]; ldq_u $1, 0($18) extbl $1, $18, $1 .set noat lda $28, global_s ldq_u $2, 0($28) insbl $1, $28, $3 mskbl $2, $28, $2 bis $2, $3, $2 stq_u $2, 0($28) .set at # 14 return i == 3; cmpeq $17, 3, $0 ret $31, ($26), 1 .end callee
The default memory allocation scheme used by the Alpha system gives every process two storage areas that can grow without bounds. A process exceeds virtual storage only when the sum of the two areas exceeds virtual storage space. By default, the linker and assembler use the scheme shown in Figure 6-3.