 |
Index for Section 5 |
|
 |
Alphabetical listing for I |
|
 |
Bottom of page |
|
i18n_intro(5)
NAME
i18n_intro, i18n, LANG, LC_ALL, LC_COLLATE, LC_CTYPE, LC_MESSAGES,
LC_MONETARY, LC_NUMERIC, LC_TIME - Introduction to internationalization
(I18N)
DESCRIPTION
Internationalization refers to the process of developing programs without
prior knowledge of the language, cultural data, or character-encoding
schemes that the programs are expected to handle. In other words,
internationalization refers to the availability and use of interfaces that
let programs modify their behavior at run time for operation in a specific
language environment. The abbreviation I18N is often used to stand for
internationalization, as there are 18 characters between the beginning "I"
and the ending "N" of that word.
The I18N interfaces and utilities provided with the operating system
conform to Issue 4 of X/Open CAE specifications.
A concept related to internationalization is localization (L10N), which
refers to the process of establishing information within a computer system
for each combination of native language, cultural data, and coded character
set (codeset). A locale is a database that provides information for a
unique combination of these three components. However, locales do not solve
all of the problems that localization must address. Many native languages
require additional support in the form of language-specific print filters,
fonts, codeset converters, character input methods, and other kinds of
specialized software.
See the following reference pages for additional introductory information
on topics related to internationalization:
l10n_intro(5)
For more information on localization and locales
iconv_intro(5)
For an introduction to codeset conversion
i18n_printing(5)
For a summary of printer support for native languages
Characters, Character Sets, and Codesets
A character is a member of a set of elements used for the organization,
control, or representation of data.
A character set is a set of alphabetic or other characters used to
construct the words and other elementary units of a native language or
computer language. A character set specifies only the characters that are
included in the set. ASCII, CNS 11643 and DTSCS are examples of character
sets.
A coded character set (codeset) is a set of unambiguous rules that support
one or more character sets and establishes the one-to-one relationship
between each character and its bit representation. In other words, a
codeset consists of the code points for characters in one or more character
sets. For example, DEC Hanyu (dechanyu) is a codeset for Chinese and
contains code points for characters in the ASCII, CNS 11643-1986 (plane 1
and plane 2), and DTSCS character sets.
Language Announcement (Setting Locale)
Language announcement is the mechanism by which language, cultural data,
and codeset requirements are set either for the system as a whole or by
individual users. An application can also set these requirements, although
it is more common for an internationalized application to use the setting
in effect for the user who runs the program. See the System Administration
manual for information about setting systemwide defaults for shells. See
setlocale(3) and Writing Software for the International Market for
information on how applications query or set locale requirements at run
time.
Language announcement is performed by setting one or more reserved
environment variables to the name of an installed locale. Each locale has
associated with it collating sequences, character conversion tables,
character classification tables, formats for different kinds of data, and
message catalogs. If the same locale meets user requirements in all these
categories, set only the LANG environment variable to the locale name. A
locale name usually has the following format:
language_territory.codeset[@modifier]
Where language represents the human language of the locale, territory
represents a geographic country or region, codeset is the coded character
set used in the locale, and the optional @modifier suffix represents
additional information for localization of data.
The following Korn shell example sets LANG to a locale supporting the
English language, United States cultural data, and ISO8859-1 codeset:
$ LANG=en_US.ISO8859-1
The following C shell example sets LANG to a locale supporting the
Traditional Chinese language, Hong Kong cultural data, and the DEC Hanyu
codeset:
% setenv LANG zh_HK.dechanyu
Locale name formats can vary from vendor to vendor. Use the locale -a
command to display the names of locales installed on your system. See
l10n_intro(5) for a list of the locales provided with the Tru64 UNIX
product.
An alternative way to set locale requirements for all locale categories is
to set the LC_ALL environment variable. The difference between the LANG and
LC_ALL variables is that LC_ALL is a high-precedence variable that
overrides all other locale variables, including LANG. The LANG variable, on
the other hand, is a low-precedence variable. When used by itself, the
LANG variable implicitly sets all locale categories to the specified locale
just as LC_ALL does. However, the LANG variable can be used together with
variables for specific locale categories to create a multilocale
environment. The category-specific locale variables and what they control
follow:
LC_COLLATE
String collation
LC_CTYPE
Character classification
LC_MESSAGES
Translations for messages and valid strings for "yes" and "no"
responses
LC_MONETARY
The currency symbol and the format of monetary values
LC_NUMERIC
The format of numeric values
LC_TIME The format of date and time values
A locale can support only one set of date and time formats;
however, there can be several sets of date and time formats in use
for a particular language and territory. See l10n_intro(5) for
information about creating a site-specific version of a locale to
support date and time formats different from those supported by an
installed locale.
The operating system provides dense code locales and Unicode locales.
Unicode locales are installed in /usr/i18n/lib/nls/ucsloc/. Dense code
locales are installed in /usr/i18n/lib/nls/loc/. The Unicode locales
enable consistent wchar_t values across locales and platform
interoperability. The system administrator, as root, can define the
systemwide default as Unicode locales or dense code locales by changing the
symbolic link /usr/i18n/lib/nls/dloc/ from ./ucsloc to ./loc. See
l10n_intro(5) for a more information on the Unicode locales and switching
between Unicode and dense code. See Unicode(5) for more information about
UCS-4 and UTF-8 formats.
Unicode locales, with a UTF-8 suffix, use UTF-32 as the internal process
code and UTF-8 as the file format.
The operating system also includes a complete set of non-UTF-8 Unicode
locales in /usr/i18n/lib/nls/ucsloc/ that provide UTF-32 internal process
code for applications that require file code in the format of the
traditional UNIX or a proprietary codeset.
A @modifier suffix indicates locale variants that support alternative rules
for collation in Asian languages. Use locales with these suffixes only
when setting LC_COLLATE. For example, three different sets of collation
rules (chuyin, radical, and stroke) can be used with the locale supporting
the Chinese language, Taiwanese cultural data, and the Taiwanese EUC
codeset. If Korn shell users want to use this locale, they might make the
following settings:
$ LANG=zh_TW.eucTW
$ LC_COLLATE=zh_TW.eucTW@stroke
The preceding example implicitly sets all locale category variables to
zh_TW.eucTW, except for the LC_COLLATE variable, which is set to
zh_TW.eucTW@stroke. The following locale command displays the variable
settings after these assignments:
$ locale
LANG=zh_TW.eucTW
LC_COLLATE=zh_TW.eucTW@stroke
LC_CTYPE="zh_TW.eucTW"
LC_MONETARY="zh_TW.eucTW"
LC_NUMERIC="zh_TW.eucTW"
LC_TIME="zh_TW.eucTW"
LC_MESSAGES="zh_TW.eucTW"
LC_ALL=
SEE ALSO
Commands: locale(1), setlocale(3)
Others: i18n_printing(5), iconv_intro(5), l10n_intro(5), Unicode(5)
Writing Software for the International Market
Using International Software
System Administration
 |
Index for Section 5 |
|
 |
Alphabetical listing for I |
|
 |
Top of page |
|