i18n_intro(5)

Index for
Section 5
Alphabetical
listing for I
Bottom of
page
i18n_intro(5)
NAME
  i18n_intro, i18n, LANG, LC_ALL, LC_COLLATE, LC_CTYPE, LC_MESSAGES,
  LC_MONETARY, LC_NUMERIC, LC_TIME - Introduction to internationalization
  (I18N)

DESCRIPTION
  Internationalization refers to the process of developing programs without
  prior knowledge of the language, cultural data, or character-encoding
  schemes that the programs are expected to handle. In other words,
  internationalization refers to the availability and use of interfaces that
  let programs modify their behavior at run time for operation in a specific
  language environment.	 The abbreviation I18N is often used to stand for
  internationalization, as there are 18 characters between the beginning "I"
  and the ending "N" of that word.

  The I18N interfaces and utilities provided with the operating system
  conform to Issue 4 of X/Open CAE specifications.

  A concept related to internationalization is localization (L10N), which
  refers to the process of establishing information within a computer system
  for each combination of native language, cultural data, and coded character
  set (codeset). A locale is a database that provides information for a
  unique combination of these three components. However, locales do not solve
  all of the problems that localization must address. Many native languages
  require additional support in the form of language-specific print filters,
  fonts, codeset converters, character input methods, and other kinds of
  specialized software.

  See the following reference pages for additional introductory information
  on topics related to internationalization:

  l10n_intro(5)
	  For more information on localization and locales

  iconv_intro(5)
	  For an introduction to codeset conversion

  i18n_printing(5)
	  For a summary of printer support for native languages

  Characters, Character Sets, and Codesets

  A character is a member of a set of elements used for the organization,
  control, or representation of data.

  A character set is a set of alphabetic or other characters used to
  construct the words and other elementary units of a native language or
  computer language.  A character set specifies only the characters that are
  included in the set.	ASCII, CNS 11643 and DTSCS are examples of character
  sets.

  A coded character set (codeset) is a set of unambiguous rules that support
  one or more character sets and establishes the one-to-one relationship
  between each character and its bit representation. In other words, a
  codeset consists of the code points for characters in one or more character
  sets. For example, DEC Hanyu (dechanyu) is a codeset for Chinese and
  contains code points for characters in the ASCII, CNS 11643-1986 (plane 1
  and plane 2), and DTSCS character sets.

  Language Announcement (Setting Locale)

  Language announcement is the mechanism by which language, cultural data,
  and codeset requirements are set either for the system as a whole or by
  individual users. An application can also set these requirements, although
  it is more common for an internationalized application to use the setting
  in effect for the user who runs the program. See the System Administration
  manual for information about setting systemwide defaults for shells. See
  setlocale(3) and Writing Software for the International Market for
  information on how applications query or set locale requirements at run
  time.

  Language announcement is performed by setting one or more reserved
  environment variables to the name of an installed locale. Each locale has
  associated with it collating sequences, character conversion tables,
  character classification tables, formats for different kinds of data, and
  message catalogs. If the same locale meets user requirements in all these
  categories, set only the LANG environment variable to the locale name. A
  locale name usually has the following format:

  language_territory.codeset[@modifier]

  Where language represents the human language of the locale, territory
  represents a geographic country or region, codeset is the coded character
  set used in the locale, and the optional @modifier suffix represents
  additional information for localization of data.

  The following Korn shell example sets LANG to a locale supporting the
  English language, United States cultural data, and ISO8859-1 codeset:

       $ LANG=en_US.ISO8859-1

  The following C shell example sets LANG to a locale supporting the
  Traditional Chinese language, Hong Kong cultural data, and the DEC Hanyu
  codeset:

       % setenv LANG zh_HK.dechanyu

  Locale name formats can vary from vendor to vendor. Use the locale -a
  command to display the names of locales installed on your system.  See
  l10n_intro(5) for a list of the locales provided with the Tru64 UNIX
  product.

  An alternative way to set locale requirements for all locale categories is
  to set the LC_ALL environment variable. The difference between the LANG and
  LC_ALL variables is that LC_ALL is a high-precedence variable that
  overrides all other locale variables, including LANG. The LANG variable, on
  the other hand, is a low-precedence variable.	 When used by itself, the
  LANG variable implicitly sets all locale categories to the specified locale
  just as LC_ALL does. However, the LANG variable can be used together with
  variables for specific locale categories to create a multilocale
  environment.	The category-specific locale variables and what they control
  follow:

  LC_COLLATE
	  String collation

  LC_CTYPE
	  Character classification

  LC_MESSAGES
	  Translations for messages and valid strings for "yes" and "no"
	  responses

  LC_MONETARY
	  The currency symbol and the format of monetary values

  LC_NUMERIC
	  The format of numeric values

  LC_TIME The format of date and time values

	  A locale can support only one set of date and time formats;
	  however, there can be several sets of date and time formats in use
	  for a particular language and territory. See l10n_intro(5) for
	  information about creating a site-specific version of a locale to
	  support date and time formats different from those supported by an
	  installed locale.

  The operating system provides dense code locales and Unicode locales.
  Unicode locales are installed in /usr/i18n/lib/nls/ucsloc/.  Dense code
  locales are installed in /usr/i18n/lib/nls/loc/.  The Unicode locales
  enable consistent wchar_t values across locales and platform
  interoperability. The system administrator, as root, can define the
  systemwide default as Unicode locales or dense code locales by changing the
  symbolic link /usr/i18n/lib/nls/dloc/ from ./ucsloc to ./loc. See
  l10n_intro(5) for a more information on the Unicode locales and switching
  between Unicode and dense code. See Unicode(5) for more information about
  UCS-4 and UTF-8 formats.

  Unicode locales, with a UTF-8 suffix, use UTF-32 as the internal process
  code and UTF-8 as the file format.

  The operating system also includes a complete set of non-UTF-8 Unicode
  locales in /usr/i18n/lib/nls/ucsloc/ that provide UTF-32 internal process
  code for applications that require file code in the format of the
  traditional UNIX or a proprietary codeset.

  A @modifier suffix indicates locale variants that support alternative rules
  for collation in Asian languages.  Use locales with these suffixes only
  when setting LC_COLLATE.  For example, three different sets of collation
  rules (chuyin, radical, and stroke) can be used with the locale supporting
  the Chinese language, Taiwanese cultural data, and the Taiwanese EUC
  codeset. If Korn shell users want to use this locale, they might make the
  following settings:

       $ LANG=zh_TW.eucTW
       $ LC_COLLATE=zh_TW.eucTW@stroke

  The preceding example implicitly sets all locale category variables to
  zh_TW.eucTW, except for the LC_COLLATE variable, which is set to
  zh_TW.eucTW@stroke. The following locale command displays the variable
  settings after these assignments:

       $ locale
       LANG=zh_TW.eucTW
       LC_COLLATE=zh_TW.eucTW@stroke
       LC_CTYPE="zh_TW.eucTW"
       LC_MONETARY="zh_TW.eucTW"
       LC_NUMERIC="zh_TW.eucTW"
       LC_TIME="zh_TW.eucTW"
       LC_MESSAGES="zh_TW.eucTW"
       LC_ALL=

SEE ALSO
  Commands: locale(1), setlocale(3)

  Others: i18n_printing(5), iconv_intro(5), l10n_intro(5), Unicode(5)

  Writing Software for the International Market

  Using International Software

  System Administration
Index for
Section 5
Alphabetical
listing for I
Top of
page