This chapter gives rules and examples to follow when creating an assembly language program.
The chapter addresses the following topics:
Why your assembly programs should use the calling conventions observed by the C compiler (Section 6.1)
An overview of the composition of executable programs (Section 6.2)
The use of registers, section and location counters, and stack frames (Section 6.3)
A technique for coding an interface between an assembly language procedure and a procedure written in a high-level language (Section 6.4)
The default memory-allocation scheme used by the Alpha system (Section 6.5)
This chapter does not address coding issues related to performance
or optimization.
See Appendix A of the
Alpha Architecture Reference Manual
for information on how to optimize assembly code.
6.1 Calling Conventions
When you write assembly language procedures, you should use the same calling conventions that the C compiler observes. The reasons for using the same calling conventions are as follows:
Often your code must interact with compiler-generated code, accepting and returning arguments or accessing shared global data.
The symbolic debugger gives better assistance in debugging programs that use standard calling conventions.
The conventions observed by the Tru64 UNIX compiler system are more complicated than those of some other compiler systems, mostly to enhance the speed of each procedure call. Specifically:
The C compiler uses the full, general calling sequence only when necessary; whenever possible, it omits unneeded portions of the sequence. For example, the C compiler does not use a register as a frame pointer if it is unnecessary to do so.
The C compiler and the debugger observe certain implicit rules
instead of communicating by means of instructions or data at execution time.
For example, the debugger looks at information placed in the symbol table
by a
.frame
directive at compilation time.
This technique
enables the debugger to tolerate the lack of a register containing a frame
pointer at execution time.
The linker performs code optimizations based on information that is not available at compile time. For example, the linker can, in some cases, replace the general calling sequence to a procedure with a single instruction.
A program consists of an executable image and zero or more shared images. Each image has an independent text and data area.
Each data segment contains a global offset table (GOT), which contains address constants for procedures and data locations that the text segment references. The GOT provides the means to access arbitrary 64-bit addresses and allows the text segment to be position-independent.
The size of the GOT is limited only by the maximum image size. However, because only 64 KB can be addressed by a single memory-format instruction, the GOT is segmented into one or more sections of 64 KB or less.
In addition to providing efficient access to the GOT, the
gp
register is also used to access global data within ±2
GB of the global pointer.
This area of memory is known as the global data
area.
A static executable image is not a special case in the program model.
It is simply an executable image that uses no shared libraries.
However, it
is possible for the linker to perform code optimizations.
In particular, if
a static executable image's GOT is less than or equal to 64 KB (that is, has
only one segment), the code to load, save, and restore the
gp
register is not necessary because all procedures will access the same GOT
segment.
6.3 General Coding Concerns
This section describes three general areas of concern to the assembly language programmer:
Usable and restricted registers (Section 6.3.1)
Control of section and location counters with directives (Section 6.3.2)
Stack frame requirements on entering and exiting a procedure (Section 6.3.3)
Another general coding consideration is the use of data structures to
communicate between high-level language procedures and assembly procedures.
In most cases, this communication is handled by means of simple variables:
pointers, integers, Booleans, and single- and double-precision real numbers.
Describing the details of the various high-level data structures that can
also be used -- arrays, records, sets, and so on -- is beyond the
scope of this manual.
6.3.1 Register Use
The main processor has 32 64-bit integer registers. The uses and restrictions of these registers are described in Table 6-1.
The floating-point coprocessor has 32 floating-point registers.
Each
register can hold either a single-precision (32 bit) or double-precision (64
bit) value.
See
Table 6-2
for details.
Table 6-1: Integer Registers
Register Name | Software Name (from regdef.h) | Use |
$0 |
v0 |
Used for expression evaluations and to hold the integer function results. Not preserved across procedure calls. |
$1-8 |
t0-t7 |
Temporary registers used for expression evaluations. Not preserved across procedure calls. |
$9-14 |
s0-s5 |
Saved registers. Preserved across procedure calls. |
$15
or
$fp |
s6
or
fp |
Contains the frame pointer (if needed); otherwise, a saved register. |
$16-21 |
a0-a5 |
Used to pass the first six integer type actual arguments. Not preserved across procedure calls. |
$22-25 |
t8-t11 |
Temporary registers used for expression evaluations. Not preserved across procedure calls. |
$26 |
ra |
Contains the return address. Preserved across procedure calls. |
$27 |
pv
or
t12 |
Contains the procedure value and used for expression evaluation. Not preserved across procedure calls. |
$28
or
$at |
AT |
Reserved for the assembler. Not preserved across procedure calls. |
$29
or
$gp |
gp |
Contains the global pointer. Not preserved across procedure calls. |
$30
or
$sp |
sp |
Contains the stack pointer. Preserved across procedure calls. |
$31 |
zero | Always has the value 0. |
Table 6-2: Floating-Point Registers
Register Name | Use |
$f0-f1 |
Used to hold floating-point type function
results ($f0 ) and complex type function results ($f0
has the real part and
$f1
has the imaginary
part).
Not preserved across procedure calls. |
$f2-f9 |
Saved registers. Preserved across procedure calls. |
$f10-f15 |
Temporary registers used for expression evaluation. Not preserved across procedure calls. |
$f16-f21 |
Used to pass the first six single- or double-precision actual arguments. Not preserved across procedure calls. |
$f22-f30 |
Temporary registers used for expression evaluations. Not preserved across procedure calls. |
$f31 |
Always has the value 0.0. |
6.3.2 Using Directives to Control Sections and Location Counters
Assembled code and
data are stored in the object file sections shown in
Figure 6-1.
Each section has an implicit location counter that begins at zero and increments
by one for each byte assembled in the section.
Location control directives
(.align
,
.data
,
.rconst
,
.rdata
,
.sdata
,
.space
, and
.text
) can be used to control what is stored in the various sections
and to adjust location counters.
The assembler always generates the text section before other sections. Additions to the text section are done in 4-byte units.
The
bss (block started by symbol) section holds data items (usually variables)
that are initialized to zero.
If a
.lcomm
directive defines
a variable, the assembler assigns that variable to either the
.bss
section or the
.sbss
(small bss) section, depending
on the variable's size.
The default size for variables in the
.sbss
section
is eight or fewer bytes.
You can change the size using the
-G
compilation option for the C compiler or the assembler.
Items smaller than
or equal to the specified size go in the
.sbss
section.
Items greater than the specified size go in the
.bss
section.
At run time, the
$gp
register points into the area
of memory occupied by the
.lita
section.
The
.lita
section is used to hold address literals for 64-bit addressing.
Figure 6-1: Sections and Location Counters for Nonshared Object Files
See the
Symbol Table/Object File Specification
manual for more information on section data.
6.3.3 The Stack Frame
The C compiler classifies each procedure into one of the following categories:
Nonleaf procedures. These procedures call other procedures.
Leaf procedures. These procedures do not themselves call other procedures. Leaf procedures are of two types: those that require stack storage for local variables and those that do not.
You must decide the procedure category before determining the calling sequence.
To write a program with proper stack frame usage and debugging capabilities, you should observe the conventions presented in the following list of steps. Steps 1 through 6 describe the code you must provide at the beginning of a procedure, step 7 describes how to pass parameters, and steps 8 through 12 describe the code you must provide at the end of a procedure:
Regardless of the type of procedure, you should include a
.ent
directive and an entry label for the procedure:
.ent procedure_name procedure_name:
The
.ent
directive generates
information for the debugger, and the entry label is the procedure name.
If you are writing a procedure that references static storage,
calls other procedures, uses constants greater than 31 bits in size, or uses
floating constants, you must load the
$gp
register with
the global pointer value for the procedure:
ldgp $gp,0($27)
Register
$27
contains the procedure value
(the address of this procedure as supplied by the caller).
If you are writing a leaf procedure that does not use the stack, skip to step 4. For a nonleaf procedure or a leaf procedure that uses the stack, you must adjust the stack size by allocating all of the stack space that the procedure requires:
lda $sp,-framesize($sp)
The framesize operand is the size of frame required, in bytes, and must be a multiple of 16. You must allocate space on the stack for the following items:
Local variables.
Saved general registers.
Space should be allocated only for
those registers saved.
For nonleaf procedures, you must save register
$26
, which is used in the calls to other procedures from this procedure.
If you use registers
$9
to
$15
, you
must also save them.
Saved floating-point registers.
Space should be allocated
only for those registers saved.
If you use registers
$f2
to
$f9
, you must also save them.
Procedure call argument area. You must allocate the maximum number of bytes for arguments of any procedure that you call from this procedure; this area does not include space for the first six arguments because they are always passed in registers.
Note
Once you have modified register
$sp
, you should not modify it again in the remainder of the procedure.
To generate information used by the debugger and exception
handler, you must include a
.frame
directive:
.frame framereg,framesize,returnreg
The
virtual frame pointer does not have a register allocated for it.
It consists
of the
framereg
($sp
, in most
cases) added to the
framesize
(see step 3).
Figure 6-2
shows the stack components.
Figure 6-2: Stack Organization
The
returnreg
argument for the
.frame
directive specifies the register
that contains the return address (usually register
$26
).
The usual values may change if you use a varying stack pointer or are specifying
a kernel trap procedure.
If the procedure is a leaf procedure that does not use the stack, skip to step 11. Otherwise, you must save the registers for which you allocated space in step 3.
Saving the general registers requires the following operations:
Specify which registers are to be saved using the following
.mask
directive:
.mask bitmask,frameoffset
The bit settings in
bitmask
indicate which registers are to be saved.
For example,
if register
$9
is to be saved, bit 9 in
bitmask
must be set to 1.
The value for
frameoffset
is the offset (negative) from the virtual frame pointer to the start of the
register save area.
Use the following
stq
instruction to save
the registers specified in the
mask
directive:
stq reg,framesize+frameoffset+N($sp)
The value
of
N
is the size of the argument build area for
the first register and is incremented by 8 for each successive register.
If
the procedure is a nonleaf procedure, the return address register ($26
) is the first register to be saved; it must be saved at
framesize+frameoffset+0($sp)
for exception handling.
For example, a nonleaf procedure that saves register
$9
and
$10
would use the following
stq
instructions:
stq $26,framesize+frameoffset($sp) stq $9,framesize+frameoffset+8($sp) stq $10,framesize+frameoffset+16($sp)
(Figure 6-2 shows the order in which the registers in the preceding example would be saved.)
Then, save any floating-point registers for which you allocated space in step 3:
.fmask bitmask,frameoffset stt reg,framesize+frameoffset+N($sp)
Saving floating-point registers
is identical to saving integer registers except you use the
.fmask
directive instead of
.mask
, and the storage
operations involve single- or double-precision floating-point data.
(The previous
discussion about how to save integer registers applies here as well.)
The final step in creating the procedure's prologue is to mark its end as follows:
.prologue flag
The
flag
is set
to 1 if the prologue contains an
ldgp
instruction (see
step 2); otherwise, it is set to zero.
This step describes parameter passing: how to access arguments passed into your procedure and how to pass arguments correctly to other procedures. For information on high-level, language-specific constructs (call-by-name, call-by-value, string or structure passing), see the programmer's guides for the high-level languages used to write the procedures that interact with your program.
General registers
$16
to
$21
and
floating-point registers
$f16
to
$f21
are used for passing the first six arguments.
All nonfloating-point arguments
in the first six arguments are passed in general registers.
All floating-point
arguments in the first six arguments are passed in floating-point registers.
Stack space is used for passing the seventh and subsequent arguments. The stack space allocated to each argument is an 8-byte multiple and is aligned on an 16-byte boundary.
Table 6-3
summarizes the location of procedure
arguments in the register or stack.
Table 6-3: Argument Locations
Argument Number | Integer Register | Floating-Point Register | Stack |
1 | $16
(a0 ) |
$f16 |
|
2 | $17
(a1 ) |
$f17 |
|
3 | $18
(a2 ) |
$f18 |
|
4 | $19
(a3 ) |
$f19 |
|
5 | $20
(a4 ) |
$f20 |
|
6 | $21
(a5 ) |
$f21 |
|
7-n |
|
On procedure exit, you must restore registers that were saved in step 5. To restore general purpose registers:
ldq reg,framesize+frameoffset+N($sp)
To restore the floating-point registers:
ldt reg,framesize+frameoffset+N($sp)
(See step 5 for a discussion of the value of N.)
Get the return address:
ldq $26,framesize+frameoffset($sp)
Clean up the stack:
lda $sp,framesize($sp)
Return:
ret $31,($26),1
End the procedure:
.end procedurename
The examples in this section show procedures written in C and the equivalent procedures written in assembly language.
Example 6-1
shows a nonleaf procedure.
Note that it creates
a stack frame and saves its return address.
It saves its return address because
it must put a new return address into register
$26
when
it makes a procedure call.
Example 6-1: Nonleaf Procedure
int nonleaf(i, j) int i, *j; { int abs(); int temp; temp = i - *j; return abs(temp); } .globl nonleaf # 1 int # 2 nonleaf(i, j) # 3 int i, *j; # 4 { .ent nonleaf 2 nonleaf: ldgp $gp, 0($27) lda $sp, -16($sp) stq $26, 0($sp) .mask 0x04000000, -16 .frame $sp, 16, $26, 0 .prologue 1 addl $16, 0, $18 # 5 int abs(); # 6 int temp; # 7 # 8 temp = i - *j; ldl $1, 0($17) subl $18, $1, $16 # 9 return abs(temp); jsr $26, abs ldgp $gp, 0($26) ldq $26, 0($sp) lda $sp, 16($sp) ret $31, ($26), 1 .end nonleaf
Example 6-2
shows a leaf procedure that does
not require stack space for local variables.
Note that it does not create
a stackframe and does not save a return address.
Example 6-2: Leaf Procedure Without Stack Space for Local Variables
int leaf(p1, p2) int p1, p2; { return (p1 > p2) ? p1 : p2; } .globl leaf # 1 leaf(p1, p2) # 2 int p1, p2; # 3 { .ent leaf 2 leaf: ldgp $gp, 0($27) .frame $sp, 0, $26, 0 .prologue 1 addl $16, 0, $16 addl $17, 0, $17 # 4 return (p1 > p2) ? p1 : p2; bis $17, $17, $0 cmplt $0, $16, $1 cmovne $1, $16, $0 ret $31, ($26), 1 .end leaf
Example 6-3
shows a leaf procedure that requires
stack space for local variables.
Note that it creates a stack frame but does
not save a return address.
Example 6-3: Leaf Procedure with Stack Space for Local Variables
int leaf_storage(i) int i; { int a[16]; int j; for (j = 0; j < 10; j++) a[j] = '0' + j; return a[i]; } .globl leaf_storage # 1 int # 2 leaf_storage(i) # 3 int i; # 4 { .ent leaf_storage 2 leaf_storage: ldgp $gp, 0($27) lda $sp, -80($sp) .frame $sp, 80, $26, 0 .prologue 1 addl $16, 0, $1 # 5 int a[16]; # 6 int j; # 7 for (j = 0; j < 10; j++) ldil $2, 48 stl $2, 16($sp) ldil $3, 49 stl $3, 20($sp) ldil $0, 2 lda $16, 24($sp) $32: # 8 a[j] = '0' + j; addl $0, 48, $4 stl $4, 0($16) addl $0, 49, $5 stl $5, 4($16) addl $0, 50, $6 stl $6, 8($16) addl $0, 51, $7 stl $7, 12($16) addl $0, 4, $0 addq $16, 16, $16 subq $0, 10, $8 bne $8, $32 # 9 return a[i]; mull $1, 4, $22 addq $22, $sp, $0 ldl $0, 16($0) lda $sp, 80($sp) ret $31, ($26), 1 .end leaf_storage
6.4 Developing Code for Procedure Calls
The rules and parameter requirements for passing control and exchanging data between procedures written in assembly language and procedures written in other languages are varied and complex. The simplest approach to coding an interface between an assembly procedure and a procedure written in a high-level language is to do the following:
Use the high-level language to write a skeletal version of the procedure that you plan to code in assembly language.
Compile the program using the
-S
option,
which creates an assembly language (.s
) version of the
compiled source file.
Study the assembly language listing and then, using the code in the listing as a guideline, write your assembly language code.
Section 6.4.1
and
Section 6.4.2
describe techniques you can use to create interfaces between procedures written
in assembly language and procedures written in a high-level language.
The
examples show what to look for in creating your interface.
Details such as
register numbers will vary according to the number, order, and data types
of the arguments.
In writing your particular interface, you should write and
compile realistic examples of the code you want to write in assembly language.
6.4.1 Calling a High-Level Language Procedure
The following steps show an approach to
use in writing an assembly language procedure that calls
atof
(3), a procedure written
in C that converts ASCII characters to numbers:
Write a C program that calls
atof.
Pass
global variables instead of local variables; this makes them easy to recognize
in the assembly language version of the C program (and ensures that optimization
does not remove any of the code on the grounds that it has no effect).
The following C program is an example of a program that calls
atof
:
char c[] = "3.1415"; double d, atof(); float f; caller() { d = atof(c); f = (float)atof(c); }
Compile the program using the following compiler options:
cc -S -O caller.c
The
-S
option causes
the compiler to produce the assembly language listing; the
-O
option, though not required, reduces the amount of code generated, making
the listing easier to read.
After compilation, examine the file
caller.s
.
The comments in the file show how the parameters are passed, the execution
of the call, and how the returned values are retrieved:
.globl c .data c: .ascii "3.1415\X00" .comm d 8 .comm f 4 .text .globl caller # 1 char c[] = "3.1415"; # 2 double d, atof(); # 3 float f; # 4 caller() # 5 { .ent caller 2 caller: ldgp $gp, 0($27) lda $sp, -16($sp) stq $26, 0($sp) .mask 0x04000000, -16 .frame $sp, 16, $26, 0 .prologue 1 # 6 d = atof(c); lda $16, c jsr $26, atof ldgp $gp, 0($26) stt $f0, d # 7 f = (float)atof(c); lda $16, c jsr $26, atof ldgp $gp, 0($26) cvtts $f0, $f10 sts $f10, f # 8 } ldq $26, 0($sp) lda $sp, 16($sp) ret $31, ($26), 1 .end caller
6.4.2 Calling an Assembly Language Procedure
The following steps show an approach to use in writing an assembly language procedure that can be called by a procedure written in a high-level language:
Using a high-level language, write a facsimile of the assembly language procedure you want to call. In the body of the procedure, write statements that use the same arguments you intend to use in the final assembly language procedure. Copy the arguments to global variables instead of local variables to make it easy for you to read the resulting assembly language listing.
The following C program is a facsimile of the assembly language program:
typedef char str[10]; typedef int boolean; float global_r; int global_i; str global_s; boolean global_b; boolean callee(float *r, int i, str s) { global_r = *r; global_i = i; global_s[0] = s[0]; return i == 3; }
Compile the program using the following compiler options:
cc -S -O callee.c
The
-S
option causes
the compiler to produce the assembly language listing; the
-O
option, though not required, reduces the amount of code generated, making
the listing easier to read.
After compilation, examine the file
callee.s
.
The comments in the file show how the parameters are passed, the execution
of the call, and how the returned values are retrieved:
.comm global_r 4 .comm global_i 4 .comm global_s 10 .comm global_b 4 .text .globl callee # 10 { .ent callee 2 callee: ldgp $gp, 0($27) .frame $sp, 0, $26, 0 .prologue 1 addl $17, 0, $17 # 11 global_r = *r; lds $f10, 0($16) sts $f10, global_r # 12 global_i = i; stl $17, global_i # 13 global_s[0] = s[0]; ldq_u $1, 0($18) extbl $1, $18, $1 .set noat lda $28, global_s ldq_u $2, 0($28) insbl $1, $28, $3 mskbl $2, $28, $2 bis $2, $3, $2 stq_u $2, 0($28) .set at # 14 return i == 3; cmpeq $17, 3, $0 ret $31, ($26), 1 .end callee
The default memory allocation scheme used by the Alpha
system gives every process two storage areas that can grow without bounds.
A process exceeds virtual storage only when the sum of the two areas exceeds
virtual storage space.
By default, the linker and assembler use the scheme
shown in
Figure 6-3.
Figure 6-3: Default Layout of Memory (User Program View)
This area is not allocated until a user requests it. (The same behavior is observed in System V shared memory regions.)
The heap is reserved for
sbrk
and
brk
system calls, and it is not always present.
See the Symbol Table/Object File Specification manual for details on the sections contained within the bss, data, and text segments.
The stack is used for local data in C programs.