American Standard Code for Information Interchange. ASCII defines 128 characters, including control characters and graphic characters, represented by 7-bit binary values (see also ISO 646).
See also character set, coded character set, Portable Character Set
A sequence of one or more bytes that represents a single graphic
symbol or control code.
Unlike the
char
datatype in C,
a character can be represented by a value that is one byte or multiple bytes.
The expression
"multibyte character"
is synonymous with the term
"character;"
that is, both refer to character values of any length,
including single-byte values.
See also wide character
A member of a set of elements used for the organization, control, or representation of text.
See also ASCII, Portable Character Set, ISO 10646
A contiguous sequence of bytes that is terminated by and includes
the null byte.
A string is an array of type
char
in the
C programming language.
The null byte has all bits set to zero (0).
An empty string is a character string whose first element is the null byte.
See also character, wide-character string
A set of unambiguous rules that establishes a character set and the one-to-one relationship between each character of the set and its bit representation. On UNIX systems, the more common term is codeset. On MS-DOS and Microsoft Windows systems, the more common term is code page.
The ordering rules applied to characters or groups of characters when they are sorted.
A character, other than a graphic character, that affects the recording, processing, transmission, or interpretation of text.
The conventions of a geographical area for such things as date, time, numeric, and currency values.
In Unicode, a character sequence that uses a base character, such as e, followed by a combining character, such as acute ('), to represent a single character in a native language.
See also precomposed character
The currency adopted by European countries belonging to the
Economic and Monetary Union (EMU).
By the end of the year 2002, this new currency
is scheduled to replace local currencies for EMU member countries.
The euro
currency has a monetary sign that looks like an equal sign (=) superimposed
on the capital letter C and is identified by the string
EUR
in international currency documents.
The encoding format that applies to data outside the program.
See also process code
A character, other than a control character, that has a visual representation when hand-written, printed, or displayed.
The process of developing programs without prior knowledge of the language, cultural data, or character-encoding schemes that the programs are expected to handle. An internationalized program uses a set of interfaces that allows the program to modify its behavior at run time for operation in a specific native language environment. I18N is frequently used as an abbreviation for internationalization.
See also locale, localization
The ISO Universal Character Set (UCS). The first 65,536 code positions in this character set are called the Base Multilingual Plane (BMP) , in which each character is 16 bits in length. This form of ISO 10646 is also known as UCS-2. ISO 10646 also has a form called UCS-4 in which each character is 32 bits in length.
See also Unicode
ISO 7-bit codeset for information interchange. The reference version of ISO 646 contains 95 graphic characters, which are identical to the graphic characters defined in the ASCII codeset.
ISO 7-bit or 8-bit codeset for text communication using public communication networks, private communication networks, or interchange media such as magnetic tapes and disks.
ISO 8-bit single-byte codesets. In place of the asterisk (*) is a number that represents the part of the associated ISO standard. For example, the ISO8859-1 codeset conforms to ISO 8859 Part 1, Latin Alphabet No. 1, which defines 191 graphic characters covering the requirements of most Western European languages.
See localization
An environment variable that specifies the locale to use for
all locale categories not set individually.
The following environment variables
can be set to override the
LANG
setting in specific locale
categories:
LC_COLLATE
, for information on how to order
characters and strings in sorting, or collation, operations
LC_CTYPE
, for definitions of classes and
attributes of characters used operations such as case conversion
LC_MESSAGES
, for definitions of strings
that are valid for affirmative and negative responses
LC_MONETARY
, for rules and symbols used
to format monetary values
LC_NUMERIC
, for rules and symbols used
to format numeric values
LC_TIME
, for information related to date
and time
The
LC_ALL
environment variable also specifies locale.
If set, this variable overrides all the preceding variables, including
LANG
.
See also locale
A collection of information associated with the numeric, monetary, date/time, and messaging parts of a locale.
A name for a particular locale category or, in the case of
LC_ALL
, a reference to all parts of the locale.
Locale categories
include
LC_COLLATE
,
LC_CTYPE
,
LC_MESSAGES
,
LC_MONETARY
,
LC_NUMERIC
, and
LC_TIME
.
See also LANG
See native language
A set of data and rules that supports a particular combination of native (local) language, cultural data, and codeset.
See also coded character set, cultural data, LANG, langinfo database, localization
The process of providing language- or cultural-specifc information for computer systems. Some of these requirements are addressed by locales. Other requirements are addressed by translations of program messages, provision of appropriate fonts for printers and display devices, and, in some cases, development of additional software. L10N is sometimes used as an abbreviation for localization.
See also internationalization, locale
An environment variable used to specify the search path for locales.
See also locale
A file or storage area containing program messages, command prompts, and responses to prompts for a particular native language, territory, and codeset.
See character
A computer user's spoken or written language, such as English, French, Japanese, or Thai.
An environment variable used to indicate the search path for message catalogs.
A character set that is guaranteed to be supported in both compile-time (source) and run-time (executable) environments for all locales and that contains:
The 26 uppercase letters of the English alphabet:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
The 26 lowercase letters of the English alphabet:
a b c d e f g h i j k l m n o p q r s t u v w x y z
The 10 decimal digits:
0 1 2 3 4 5 6 7 8 9
The following 32 graphic characters:
! " # $ % & ` ( ) * + , - .
/ : ; < = >
? @ [
\ ] ^ _ ' { | } ~
The space character, plus control characters that represent the horizontal tab, vertical tab, and form feed.
In addition to the preceding characters, the execution version of the Portable Character Set contains control characters that represent alert, backspace, carriage return, and new line.
The Portable Character Set as defined for X/Open specifications is similar to the basic source and basic execution character sets defined in ISO/IEC 9899:1990, except that the X/Open set also includes the dollar sign ($), commercial at sign (@), and grave accent ( ` ) characters.
See also character set, coded character set, ISO 646
In Unicode, a single code point that represents a character with a diacritic or other mark. For example, è.
See also decomposed character, Unicode
The encoding format used for manipulating data inside programs.
See also file code
The character that separates the integer part of a number from the fractional part.
See character string
See ISO 10646
A coded character set (maintained by the Unicode consortium) that includes characters in all native languages. Unicode is code-for-code identical with the UCS-2 form of ISO 10646.
See also coded character set, ISO 10646
See ISO 10646
An integral type that is large enough to hold any member of
the extended execution character set.
In program terms, a wide character is
an object of type
wchar_t
, which is defined in the
/usr/include/stddef.h
(for conformance to X/Open specifications)
and
/usr/include/stdlib.h
(for conformance to the ANSI
C standard) header files.
Although the file locations where the
wchar_t
data type is defined are determined by standards organizations,
its definition is implementation specific.
For example, implementations that
support only single-byte codesets might define
wchar_t
as a byte value.
On Tru64 UNIX systems,
wchar_t
is a
4-byte (32-bit) value.
The null wide character is a
wchar_t
value with all
bits set to zero (0).
A contiguous sequence of wide characters that is terminated
by and includes the null wide character.
A wide-character string is an array
of type
wchar_t
.
See also character string, wide character