ASCII
American Standard Code for Information Interchange. ASCII
is the traditional UNIX codeset and defines 128 characters, including both
control characters and graphic characters, represented by 7-bit binary values
(see also ISO 646).
See also: character set, codeset, Portable Character Set
character
A sequence of one or more bytes that represents a single graphic
symbol or control code. Unlike the char datatype in C, a character
can be represented by a multibyte or single-byte value. The expression "multibyte character" is synonymous with the term "character;" that is, both refer to character values of any length, including
single-byte values.
See also: wide character
character set
A member of a set of elements used for the organization, control,
or representation of text.
See also: ASCII, codeset, Portable Character Set
character string
A contiguous sequence of bytes that is terminated by and includes
the null byte. A string is an array of type char in the C programming
language. The null byte has all bits set to zero (0).
An empty string is a character string whose first element is the null byte.
See also: character, wide-character string
coded character set
Same as codeset
codeset
A set of unambiguous rules that establishes a character set
and the one-to-one relationship between each character of the set and its
bit representation.
collating sequence
The ordering rules applied to characters or groups of characters
when they are sorted.
control character
A character, other than a graphic character, that affects
the recording, processing, transmission, or interpretation of text.
cultural data
The conventions of a geographical area for such things as
date, time, numeric, and currency values.
decomposed character
In Unicode, a character sequence that uses a base character,
such as e, followed by a combining character, such as acute ('), to represent
a single character in a native language.
See also: precomposed character
file code
The encoding format that applies to data outside the program.
See also: process code
graphic character
A character, other than a control character, that has a visual
representation when hand-written, printed, or displayed.
I18N
Same as internationalization
internationalization
The process of developing programs without prior knowledge
of the language, cultural data, or character-encoding schemes that the programs
are expected to handle. An internationalized program uses a set of interfaces
that allows the program to modify its behavior at run time for operation in
a specific native language environment. The mnemonic I18N is frequently used
as an abbreviation for internationalization.
See also: locale, localization
ISO 10646
The ISO Universal character set. The first 65,536 code positions
in this character set are called the Base Multilingual Plane (BMP) , in which
each character is 16 bits in length. This form of ISO 10646 is also known
as UCS-2. ISO 10646 also has a form, called UCS-4, in which each character
is 32 bits in length.
See also: Unicode
ISO 646
ISO 7-bit codeset for information interchange. The reference
version of ISO 646 contains 95 graphic characters, which are identical to
the graphic characters defined in the ASCII codeset.
ISO 6937
ISO 7-bit or 8-bit codeset for text communication using public
communication networks, private communication networks, or interchange media
such as magnetic tapes and disks.
ISO8859-1
ISO 8-bit single-byte codeset Part 1, Latin Alphabet No. 1.
The ISO8859-1 character set comprises 191 graphic characters covering
the requirements of most Western European languages.
L10N
Same as localization
LANG
An environment variable that specifies the locale to use for
all locale categories. The following environment variables can be set to override
the LANG setting in specific locale categories:
The LC_ALL environment variable also specifies locale. If set, this variable overrides all the preceding variables, including LANG.
See also: locale
langinfo database
Same as locale
local language
Same as native language
locale
A set of data, sometimes referred to as the "langinfo
database," that supports a particular combination of native (local)
language, cultural data, and codeset.
See also: codeset, cultural data, LANG, localization
localization
The process of implementing for an application the requirements
for local languages and customs. Some of these requirements are addressed
by locales. Other requirements are addressed by translations of program messages,
provision of appropriate fonts for printers and display devices, and, in some
cases, development of additional software. The mnemonic L10N is frequently
used as an abbreviation for localization.
See also: internationalization, locale
LOCPATH
An environment variable used to specify the search path for
locales.
See also: locale
message catalog
A file or storage area containing program messages, command
prompts, and responses to prompts for a particular native language, territory,
and codeset.
multibyte character
Same as character
native language
A computer user's spoken or written language, such as English,
French, Italian, or Spanish.
NLSPATH
An environment variable used to indicate the search path for
message catalogs.
Portable Character Set
A character set that is supported in both compile-time (source)
and run-time (executable) environments for all locales and that contains:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9
! " # $ % & [grave ] ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ' { | } ~
The Portable Character Set as defined by X/Open is similar to the basic source and basic execution character sets defined in ISO/IEC 9899:1990, except that the X/Open set also includes the dollar sign ($), commercial at sign (@), and grave accent ( [grave ] ) characters.
Some locales (for example, ISO 646 variants) may make substitutions for one or more of the preceding characters. In such cases, the substituted character has the same syntactic meaning as the character it replaces in the Portable Character Set. An example of character substitution might be the British pound sign ([pound ]) for the number sign (#).
See also: character set, codeset, ISO 646
precomposed character
In Unicode, a discrete code point that represents a sequence
of a base character, such as e, with a combining character, such as acute
( ' ).
See also: decomposed character, Unicode
process code
The wide-character encoding format used for manipulating data
inside programs.
See also: file code
radix character
The character that separates the integer part of a number
from the fractional part.
string
Same as character string
UCS
Same as ISO 10646
Unicode
A codeset (maintained by the Unicode consortium) that uses
a generalized multibyte encoding format to accommodate characters in all
native languages. Unicode is code-for-code identical with the UCS-2 form of
ISO 10646.
See also: codeset, ISO 10646
wide character
An integral type that is large enough to hold any member of
the extended execution character set. In program terms, a wide character is
an object of typewchar_t, which is defined in the header files /usr/include/stddef.h (for conformance to the X/Open Portability
Guide) and /usr/include/stdlib.h (for conformance to
the ANSI C standard). Although the file locations where the wchar_t
data type is defined are determined by standards organizations, its definition
is implementation specific. For example, implementations that support only
single-byte codesets (not the case for DEC OSF/1) might define wchar_t as a byte value.
The null wide character is a wchar_t value with all bits set to zero (0).
wide-character string
A contiguous sequence of wide characters that is terminated
by and includes the null wide character. A wide-character string is an array
of type wchar_t.
See also: character string, wide character