This chapter describes lexical conventions associated with the following items:
Blank and tab characters (Section 2.1)
Comments (Section 2.2)
Identifiers (Section 2.3)
Constants (Section 2.4)
Physical lines (Section 2.5)
Statements (Section 2.6)
Expressions (Section 2.7)
Address formats (Section 2.8)
You can use blank and tab characters anywhere between operators, identifiers, and constants. Adjacent identifiers or constants that are not otherwise separated must be separated by a blank or tab.
These characters can also be used within character constants; however,
they are not allowed within operators and identifiers.
2.2 Comments
The number sign character (#
)
introduces a comment.
Comments that start with a number sign extend through
the end of the line on which they appear.
You can also use C language notation
(/*...*/
) to delimit comments.
Do not start a comment with a number sign in column one; the assembler
uses
cpp
(the C language preprocessor) to preprocess assembler
code, and
cpp
interprets number signs in the first column
as preprocessor directives.
2.3 Identifiers
An identifier consists of a case-sensitive sequence of alphanumeric characters (A-Z, a-z, 0-9) and the following special characters:
. (period)
_ (underscore)
$ (dollar sign)
Identifiers can be up to 31 characters long, and the first character cannot be numeric (0-9).
If an undefined identifier is referenced, the assembler assumes that
the identifier is an external symbol.
The assembler treats the identifier
like a name specified by a
.globl
directive (see
Chapter 5).
If the identifier is defined to the assembler and the identifier has
not been specified as global, the assembler assumes that the identifier is
a local symbol.
2.4 Constants
The assembler supports the following constants:
Scalar constants
Floating-point constants
String constants
The assembler interprets all scalar constants as two's complement numbers. Scalar constants can be any of the digits 0123456789abcdefABCDEF.
Scalar constants can be either decimal, hexadecimal, or octal constants:
Decimal constants consist of a sequence of decimal digits (0-9) without a leading zero.
Hexadecimal constants consist of the characters 0x (or 0X) followed by a sequence of hexadecimal digits (0-9abcdefABCDEF).
Octal constants consist of a leading zero followed by a sequence of octal digits (0-7).
2.4.2 Floating-Point Constants
Floating-point constants can appear only in floating-point directives (see Chapter 5) and in the floating-point load immediate instructions (see Section 4.2). Floating-point constants have the following format:
±d1[.d2][e|E±d3]
A decimal integer that denotes the integral part of the floating-point value.
A decimal integer that denotes the fractional part of the floating-point value.
A decimal integer that denotes a power of 10.
The + symbol (plus sign) is optional.
For example, the number .02173 can be represented as follows:
21.73E-3
The floating-point directives, such as
.float
and
.double
, may optionally use hexadecimal
floating-point constants instead of decimal constants.
A hexadecimal floating-point
constant consists of the following elements:
[+|-]0x[1|0].<hex-digits>h0x<hex-digits>
The assembler places the first set of hexadecimal digits (excluding the 0 or 1 preceding the decimal point) in the mantissa field of the floating-point format without attempting to normalize it. It stores the second set of hexadecimal digits in the exponent field without biasing them. If the mantissa appears to be denormalized, it checks to determine whether the exponent is appropriate. Hexadecimal floating-point constants are useful for generating IEEE special symbols and for writing hardware diagnostics.
For example, either of the following directives generates the single-precision number 1.0:
.float 1.0e+0 .float 0x1.0h0x7f
The assembler uses normal (nearest) rounding
mode to convert floating-point constants.
2.4.3 String Constants
All characters except the newline character are allowed in string constants. String constants begin and end with double quotation marks (").
The assembler observes
most of the backslash conventions used by the C language.
Table 2-1
shows the assembler's backslash conventions.
Table 2-1: Backslash Conventions
Convention | Meaning |
\a | Alert (0x07) |
\b | Backspace (0x08) |
\f | Form feed (0x0c) |
\n | Newline (0x0a) |
\r | Carriage return (0x0d) |
\t | Horizontal tab (0x09) |
\v | Vertical feed (0x0b) |
\\ | Backslash (0x5c) |
\" | Quotation mark (0x22) |
\' | Single quote (0x27) |
\nnn | Character whose octal value is nnn (where n is 0-7) |
\Xnn | Character whose hexadecimal value is nn (where n is 0-9, a-f, or A-F) |
Deviations from C conventions are as follows:
The assembler does not recognize "\?".
The assembler does not recognize the prefix "L" (wide character constant).
The assembler limits hexadecimal constants to two characters.
The assembler allows the leading "x" character in a hexadecimal constants to be either uppercase or lowercase; that is, both \xnn and \Xnn are allowed.
For octal notation, the backslash conventions require three characters when the next character could be confused with the octal number.
For hexadecimal notation, the backslash conventions require two characters
when the next character could be confused with the hexadecimal number.
Insert
a 0 (zero) as the first character of the single-character hexadecimal number
when this condition occurs.
2.5 Multiple Lines Per Physical Line
You can include multiple statements on the same line by separating the
statements with semicolons.
Note, however, that the assembler does not recognize
semicolons as separators when they follow comment symbols(#
or
/*
).
2.6 Statements
The assembler supports the following types of statements:
Null statements
Keyword statements
Each keyword statement can include an optional label, an operation code (mnemonic or directive), and zero or more operands (with an optional comment following the last operand on the statement):
[ label ] : opcode operand [
;opcode operand; ...
] [ # comment
]
Some keyword statements also support relocation operands (see
Section 2.6.4).
2.6.1 Labels
Labels can consist of label definitions or numeric values:
A label definition consists of an identifier followed by a colon. (See Section 2.3 for the rules governing identifiers.) Label definitions assign the current value and type of the location counter to the name. An error results when the name is already defined.
Label definitions always end with a colon. You can put a label definition on a line by itself.
A numeric label is a single numeric value (1-255).
Unlike
label definitions, the value of a numeric label can be applied to any number
of statements in a program.
To reference a numeric label, put an
f
(forward) or a
b
(backward) immediately after
the referencing digit in an instruction, for example,
br 7f
(which is a forward branch to numeric label 7).
The reference directs the
assembler to look for the nearest numeric label that corresponds to the specified
number in the lexically forward or backward direction.
A null statement is an empty statement that the assembler ignores. Null statements can have label definitions. For example, the following line has three null statements in it:
label: ; ;
A keyword statement contains a predefined keyword. The syntax for the rest of the statement depends on the keyword. Keywords are either assembler instructions (mnemonics) or directives.
Assembler instructions in the main instruction set and the floating-point
instruction set are described in
Chapter 3
and
Chapter 4, respectively.
Assembler directives are described
in
Chapter 5.
2.6.4 Relocation Operands
Relocation operands are generally useful in only two situations:
In application programs in which the programmer needs precise control over scheduling
In source code written for compiler development
Some macro instructions (for example,
ldgp
) require
special coordination between the machine-code instructions and the relocation
sequences given to the linker.
By using the macro instructions, the assembler
programmer relies on the assembler to generate the appropriate relocation
sequences.
In some instances, the use of macro instructions may be undesirable. For example, a compiler that supports the generation of assembly language files may not want to defer instruction scheduling to the assembler. Such a compiler will want to schedule some or all of the machine-code instructions. To do this, the compiler must have a mechanism for emitting an object file's relocation sequences without using macro instructions. The mechanism for establishing these sequences is the relocation operand.
A relocation operand can be placed after the normal operand on an assembly language statement:
opcode operand relocation_operand
The relocation_operand has the following form:
!relocation_type!sequence_number
Any one of the following relocation types can be specified:
literal
lituse_base
lituse_bytoff
lituse_jsr
gpdisp
gprelhigh
gprellow
The relocation types must be enclosed within a pair of exclamation points (!) and are not case-sensitive. See the Symbol Table/Object File Specification manual for descriptions of the different types of relocation operations.
The sequence number is a numeric constant with a value range of 1 to 2147483647. The constant can be base 8, 10, or 16. Bases other than 10 require a prefix (see Section 2.4.1).
The following examples contain relocation operands in the source code:
Example 1 -- Referencing multiple
lituse_base
relocations:
# Equivalent C statement: # sym1 += sym2 (Both external) # Assembly statements containing macro instructions: ldq $1, sym1 ldq $2, sym2 addq $1, $2, $3 stq $3, sym1 # Assembly statements containing machine-code instructions # requiring relocation operands: ldq $1, sym1($gp)!literal!1 ldq $2, sym2($gp)!literal!2 ldq $3, sym1($1)!lituse_base!1 ldq $4, sym2($1)!lituse_base!2 addq $3, $4, $3 stq $3, sym1($1)!lituse_base!1
The assembler stores the
sym1
and
sym2
address constants in the
.lita
section.
In this example, the code with relocation operands provides better performance than the other code because it saves on register usage and on the length of machine-code instruction sequences.
Example 2 -- Referencing an
ldgp
sequence
that is scheduled inside a
lituse_base
relocation:
# Assembly statements containing macro instructions: beq $2, L stq $31, sym ldgp $gp, 0($27) # Assembly statements containing machine-code instructions that # require relocation operands: ldq $at, sym($gp)!literal!1 beq $2, L # crosses basic block boundary ldah $gp, 0($27)!gpdisp!2 stq $31, sym($at)!lituse_base!1 lda $gp, 0($gp)!gpdisp!2
In this example, the programmer
has elected to schedule the load of the address of
sym
before the conditional branch.
Example 3 -- A routine call:
# Assembly statements containing macro instructions: jsr sym1 ldgp $gp, 0($ra) .extern sym1 .text # Assembly statements containing machine-code instructions that # require relocation operands: ldq $27, sym1($gp)!literal!1 jsr $26, ($27), sym1!lituse_jsr!1 # as1 puts in an R_HINT for the jsr instruction ldah $gp, 0($ra)!gpdisp!2 lda $gp, 0($gp)!gpdisp!2
In this example, the code with relocation operands does not provide any significant gains over the other code. This example is only provided to show the different coding methods.
An expression is a sequence of symbols that represents a value. Each expression and its result have data types. The assembler does arithmetic in two's complement integers with 64 bits of precision. Expressions follow precedence rules and consist of the following elements:
Operators
Identifiers
Constants
You can also use a single character string in place of an integer within an expression. For example, the following two pairs of statements are equivalent:
.byte "a" ; .word "a"+0x19 .byte 0x61 ; .word 0x7a
The assembler supports the
operators shown in
Table 2-2.
Table 2-2: Expression Operators
Operator | Meaning |
+ | Addition |
- | Subtraction |
* | Multiplication |
/ | Division |
% | Remainder |
<< | Shift left |
>> | Shift right (sign is not extended) |
^ | Bitwise EXCLUSIVE OR |
& | Bitwise AND |
| | Bitwise OR |
- | Minus (unary) |
+ | Identity (unary) |
~ | Complement |
2.7.2 Expression Operator Precedence Rules
For the order of operator evaluation within expressions, you can rely on the precedence rules or you can group expressions with parentheses. Unless parentheses enforce precedence, the assembler evaluates all operators of the same precedence strictly from left to right. Because parentheses also designate index registers, ambiguity can arise from parentheses in expressions. To resolve this ambiguity, put a unary + in front of parentheses in expressions.
The assembler has three precedence levels.
Table 2-3
lists the precedence rules from lowest to highest.
Table 2-3: Operator Precedence
Precedence | Operators |
Least binding, lowest precedence | Binary +, - |
. | |
. | Binary *, /, %, <<, >>, ^, &, | |
. | |
Most binding, highest precedence | Unary -, +, ~ |
Note
The assembler's precedence scheme differs from that of the C language.
Each symbol you
reference or define in an assembly program belongs to one of the type categories
shown in
Table 2-4.
Table 2-4: Data Types
Type | Description |
undefined | Any symbol that is referenced but not defined
becomes
global undefined.
(Declaring such a symbol in
a
.globl
directive merely makes its status clearer.) |
absolute | A constant defined in an assignment (=) expression. |
text | Any symbol defined while the
.text
directive is in effect belongs to the text section.
The text section
contains the program's instructions, which are not modifiable during execution. |
data | Any symbol defined while the
.data
directive is in effect belongs to the data section.
The data section
contains memory that the linker can initialize to nonzero values before your
program begins to execute. |
sdata | The type sdata is similar to the type data,
except that defining a symbol while the
.sdata
("small
data") directive is in effect causes the linker to place it within the
small data section.
This increases the chance that the linker will be able
to optimize memory references to the item by using gp-relative addressing. |
rdata and rconst | Any symbol defined while the
.rdata
or
.rconst
directives are in effect belongs
to this category.
The only difference between the types rdata and rconst is
that the former is allowed to have dynamic relocations and the latter is not.
(The types rdata and rconst are also similar to the type data but, unlike
data, cannot be modified during execution.) |
bss and sbss | Any symbol defined in a
If a symbol's size
is less than the number of bytes specified by the
Local
symbols in the
|
Symbols in the undefined category are always
global; that is, they are visible to the linker and can be shared with other
modules of your program.
Symbols in the absolute, text, data, sdata, rdata,
rconst, bss, and sbss type categories are local unless declared in a
.globl
directive.
2.7.4 Type Propagation in Expressions
For any expression, the result's type depends on the types of the operands and the operator. The following type propagation rules are used in expressions:
If an operand is undefined, the result is undefined.
If both operands are absolute, the result is absolute.
If the operator is a plus sign (+) and the first operand refers
to an undefined external symbol or a relocatable symbol in a
.text
section,
.data
section, or
.bss
section, the result has the first operand's type and the other operand must
be absolute.
If the operator is a minus sign (-) and the first operand
refers to a relocatable symbol in a
.text
section,
.data
section, or
.bss
section, the type propagation
rules can vary:
The second operand can be absolute (if it was previously defined) and the result has the first operand's type.
The second operand can have the same type as the first operand and the result is absolute.
If the first operand is external undefined, the second operand must be absolute.
The operators
*
,
/
,
%
,
<<
,
>>
,
~
,
^
,
&
, and
|
apply only to absolute symbols.
The assembler accepts addresses expressed
in the formats described in
Table 2-5.
Table 2-5: Address Formats
Format | Address Description |
(base-register) | Specifies an indexed address, which assumes a zero offset. The base register's contents specify the address. |
expression | Specifies an absolute address. The assembler generates the most locally efficient code for referencing the value at the specified address. |
expression(base-register) | Specifies a based address. To get the address, the value of the expression is added to the contents of the base register. The assembler generates the most locally efficient code for referencing the value at the specified address. |
relocatable-symbol | Specifies a relocatable address. The assembler generates the necessary instructions to address the item and generates relocation information for the linker. |
relocatable-symbol±expression | Specifies a relocatable address. To get the address, the value of the expression, which has an absolute value, is added or subtracted from the relocatable symbol. The assembler generates the necessary instructions to address the item and generates relocation information for the linker. If the symbol name does not appear as a label anywhere in the assembly, the assembler assumes that the symbol is external. |
relocatable-symbol(index-register) | Specifies an indexed relocatable address. To get the address, the index register is added to the relocatable symbol's address. The assembler generates the necessary instructions to address the item and generates relocation information for the linker. If the symbol name does not appear as a label anywhere in the assembly, the assembler assumes that the symbol is external. |
relocatable-symbol±expression(index-register) | Specifies an indexed relocatable address. To get the address, the assembler adds or subtracts the relocatable symbol, the expression, and the contents of index register. The assembler generates the necessary instructions to address the item and generates relocation information for the link editor. If the symbol name does not appear as a label anywhere in the assembly, the assembler assumes that the symbol is external. |