Index Index for
Section 5
Index Alphabetical
listing for L
Bottom of page Bottom of
page

l10n_intro(5)

NAME

l10n_intro, l10n, locales, LOCPATH - Introduction to localization (L10N)

DESCRIPTION

Localization refers to the process of establishing information within a computer system specific to each supported language, cultural data, and coded character set (codeset) combination. Each such combination gives rise to the definition of one locale. The abbreviation L10N is often used to stand for localization, as there are 10 characters between the beginning "L" and the ending "N" of that word. See i18n_intro(5) for introductory information about internationalization and how to use system commands to set a locale. See localedef(1), charmap(4), and locale(4) for information about creating locales. See Writing Software for the International Market for information about creating locales and writing applications that use locales. The current release of the operating system supports the following languages with locales. Each language is discussed separately in its own reference page. Catalan Chinese (Simplified and Traditional) Czech Danish Dutch English (discussed in this reference page) Finnish Flemish French German Greek Hebrew Hungarian Icelandic Italian Japanese Korean Lithuanian Norwegian Polish Portuguese Russian Slovak Slovene Spanish Swedish Thai Turkish For some of the languages, more than one codeset and country or territory are supported. Hence, multiple locales are supported for certain languages. The following list describes all the supported locales. For information about the character encoding used by a particular locale, see the reference page for the codeset specified in the last part of the locale name or, for those that end in .UTF-8, see Unicode(5). ca_ES.ISO8859-1 Catalan locale for Spain (uses the Latin-1 codeset) ca_ES.ISO8859-15 Catalan locale for Spain (uses the Latin-9 codeset) ca_ES.UTF-8 Catalan locale for Spain (uses the UTF-8 codeset) cs_CZ.ISO8859-2 Czech locale for Czech Republic (uses the Latin-2 codeset) cs_CZ.UTF-8 Czech locale for Czech Republic (uses the UTF-8 codeset) da_DK.ISO8859-1 Danish locale for Denmark (uses the Latin-1 codeset) da_DK.ISO8859-15 Danish locale for Denmark (uses the Latin-9 codeset) da_DK.UTF-8 Danish locale for Denmark (uses the UTF-8 codeset) de_CH.ISO8859-1 German locale for Switzerland (uses the Latin-1 codeset) de_CH.ISO8859-15 German locale for Switzerland (uses the Latin-9 codeset) de_CH.UTF-8 German locale for Switzerland (uses the UTF-8 codeset) de_DE.ISO8859-1 German locale for Germany (uses the Latin-1 codeset) de_DE.ISO8859-15 German locale for Germany (uses the Latin-9 codeset) de_DE.UTF-8 German locale for Germany (uses the UTF-8 codeset) el_GR.ISO8859-7 Greek locale for Greece (uses the ISO Greek codeset) el_GR.UTF-8 Greek locale for Greece (uses the UTF-8 codeset) en_EU.UTF-8@euro English locale that includes the euro character (uses the UTF-8 codeset) This locale both supports the euro character and defines the decimal point as a comma (,) and the thousands separator as a period (.). Therefore, this locale is useful in many European countries, not just those for which English is the native language, when assigned only to the LC_MONETARY locale category or environment variable. en_GB.ISO8859-1 English locale for Great Britain (uses the Latin-1 codeset) en_GB.ISO8859-15 English locale for Great Britain (uses the Latin-9 codeset) en_GB.UTF-8 English locale for Great Britain (uses the UTF-8 codeset) en_US.ISO8859-1 English locale for the United States (uses the Latin-1 codeset) en_US.ISO8859-15 English locale for the United States (uses the Latin-9 codeset) en_US.cp850 English locale for the United States (uses cp850 encoding) Use this locale with data that contains accented characters and that was generated on a PC using the cp850 code page for character encoding. This character encoding is usually the default for the DOS and Windows operating systems in Europe. The en_US.ISO8859-1 and en_US.cp850 locales encode English characters the same way but use different values for accented and other non-English characters in the Latin-1 character set. en_US.UTF-8 English locale for the United States (uses the UTF-8 codeset) en_US.UTF-8@euro English locale for the United States (uses the UTF-8 codeset) The @euro variant defines the local currency sign to be the euro character and the international currency sign to be EUR. See also en_EU.UTF-8@euro. es_ES.ISO8859-1 Spanish locale for Spain (uses the Latin-1 codeset) es_ES.ISO8859-15 Spanish locale for Spain (uses the Latin-9 codeset) es_ES.UTF-8 Spanish locale for Spain (uses the UTF-8 codeset) fi_FI.ISO8859-1 Finnish locale for Finland (uses the Latin-1 codeset) fi_FI.ISO8859-15 Finnish locale for Finland (uses the Latin-9 codeset) fi_FI.UTF-8 Finnish locale for Finland (uses the UTF-8 codeset) fr_BE.ISO8859-1 French locale for Belgium (uses the Latin-1 codeset) fr_BE.ISO8859-15 French locale for Belgium (uses the Latin-9 codeset) fr_BE.UTF-8 French locale for Belgium (uses the UTF-8 codeset) fr_CA.ISO8859-1 French locale for Canada (uses the Latin-1 codeset) fr_CA.ISO8859-15 French locale for Canada (uses the Latin-9 codeset) fr_CA.UTF-8 French locale for Canada (uses the UTF-8 codeset) fr_CH.ISO8859-1 French locale for Switzerland (uses the Latin-1 codeset) fr_CH.ISO8859-15 French locale for Switzerland (uses the Latin-9 codeset) fr_CH.UTF-8 French locale for Switzerland (uses the UTF-8 codeset) fr_FR.ISO8859-1 French locale for France (uses the Latin-1 codeset) fr_FR.ISO8859-15 French locale for France (uses the Latin-9 codeset) fr_FR.UTF-8 French locale for France (uses the UTF-8 codeset) he_IL.ISO8859-8 Hebrew locale for Israel (uses the ISO Hebrew codeset) hu_HU.ISO8859-2 Hungarian locale for Hungary (uses the Latin-2 codeset) hu_HU.UTF-8 Hungarian locale for Hungary (uses the UTF-8 codeset) is_IS.ISO8859-1 Icelandic locale for Iceland (uses the Latin-1 codeset) is_IS.ISO8859-15 Icelandic locale for Iceland (uses the Latin-9 codeset) is_IS.UTF-8 Icelandic locale for Iceland (uses the UTF-8 codeset) it_IT.ISO8859-1 Italian locale for Italy (uses the Latin-1 codeset) it_IT.ISO8859-15 Italian locale for Italy (uses the Latin-9 codeset) it_IT.UTF-8 Italian locale for Italy (uses the UTF-8 codeset) iw_IL.ISO8859-8 Hebrew locale for Israel (uses the ISO Hebrew codeset) This locale name is supported for backward compatibility. The recommended name to use for the ISO Hebrew locale is he_IL.ISO8859-8. ja_JP.deckanji Japanese locale for Japan (uses the DEC Kanji codeset) ja_JP.eucJP Japanese locale for Japan (uses the Japanese EUC codeset) ja_JP.sdeckanji Japanese locale for Japan (uses the Super DEC Kanji codeset) ja_JP.SJIS Japanese locale for Japan (uses the Shift JIS codeset) ja_JP.UTF-8 Japanese locale for Japan (uses the UTF-8 codeset) ko_KR.deckorean Korean locale for Korea (uses the DEC Korean codeset) ko_KR.eucKR Korean locale for Korea (uses the Korean EUC codeset) ko_KR.UTF-8 Korean locale for Korea (uses the UTF-8 codeset) lt_LT.ISO8859-4 Lithuanian locale for Lithuania (uses the Latin-4 codeset) lt_LT.UTF-8 Lithuanian locale for Lithuania (uses the UTF-8 codeset) nl_BE.ISO8859-1 Flemish locale for Belgium (uses the Latin-1 codeset) nl_BE.ISO8859-15 Flemish locale for Belgium (uses the Latin-9 codeset) nl_BE.UTF-8 Flemish locale for Belgium (uses the UTF-8 codeset) nl_NL.ISO8859-1 Dutch locale for the Netherlands (uses the Latin-1 codeset) nl_NL.ISO8859-15 Dutch locale for the Netherlands (uses the Latin-9 codeset) nl_NL.UTF-8 Dutch locale for the Netherlands (uses the UTF-8 codeset) no_NO.ISO8859-1 Norwegian locale for Norway (uses the Latin-1 codeset) no_NO.ISO8859-15 Norwegian locale for Norway (uses the Latin-9 codeset) no_NO.UTF-8 Norwegian locale for Norway (uses the UTF-8 codeset) pl_PL.ISO8859-2 Polish locale for Poland (uses the Latin-2 codeset) pl_PL.UTF-8 Polish locale for Poland (uses the UTF-8 codeset) pt_PT.ISO8859-1 Portuguese locale for Portugal (uses the Latin-1 codeset) pt_PT.ISO8859-15 Portuguese locale for Portugal (uses the Latin-9 codeset) pt_PT.UTF-8 Portuguese locale for Portugal (uses the UTF-8 codeset) ru_RU.ISO8859-5 Russian locale for Russia (uses the ISO Cyrillic codeset) ru_RU.UTF-8 Russian locale for Russia (uses the UTF-8 codeset) sk_SK.ISO8859-2 Slovak locale for Slovakia (uses the Latin-2 codeset) sk_SK.UTF-8 Slovak locale for Slovakia (uses the UTF-8 codeset) sl_SI.ISO8859-2 Slovene locale for Slovenia (uses the Latin-2 codeset) sl_SI.UTF-8 Slovene locale for Slovenia (uses the UTF-8 codeset) sv_SE.ISO8859-1 Swedish locale for Sweden (uses the Latin-1 codeset) sv_SE.ISO8859-15 Swedish locale for Sweden (uses the Latin-9 codeset) sv_SE.UTF-8 Swedish locale for Sweden (uses the UTF-8 codeset) th_TH.TACTIS Thai locale for Thailand (uses the TACTIS codeset) tr_TR.ISO8859-9 Turkish locale for Turkey (uses the Latin-5 codeset) tr_TR.UTF-8 Turkish locale for Turkey (uses the UTF-8 codeset) zh_CN.dechanzi Simplified Chinese locale for the People's Republic of China (uses the DEC Hanzi codeset) zh_CN.GBK Simplified Chinese locale for the People's Republic of China (uses the GBK codeset, an extension of the GB 2312-80 codeset) zh_CN.GB18030 Simplified Chinese locale for the People's Republic of China (uses the GB18030 codeset, which extends GBK by means of 4-byte encoding) zh_CN.UTF-8 Simplified Chinese locale for the People's Republic of China (uses the UTF-8 codeset) zh_HK.big5 Traditional Chinese locale for Hong Kong (uses the BIG-5 codeset) zh_HK.dechanyu Traditional Chinese locale for Hong Kong (uses the DEC Hanyu codeset) zh_HK.dechanzi Simplified Chinese locale for Hong Kong (uses the DEC Hanzi codeset) zh_HK.eucTW Traditional Chinese locale for Hong Kong (uses the Taiwanese EUC codeset) zh_HK.UTF-8 Traditional Chinese locale for Hong Kong (uses the UTF-8 codeset) zh_TW.big5 Traditional Chinese locale for Taiwan (uses the BIG-5 codeset) zh_TW.dechanyu Traditional Chinese locale for Taiwan (uses the DEC Hanyu codeset) zh_TW.eucTW Traditional Chinese locale for Taiwan (uses the Taiwanese EUC codeset) zh_TW.UTF-8 Traditional Chinese locale for Taiwan (uses the UTF-8 codeset) This locale supports Simplified Chinese as well as Traditional Chinese. For the zh_CN.dechanzi locale, the @pinyin, @radical, and @stroke variants are available for sorting by pinyin, radical, and stroke, respectively. For the zh_TW.big5, zh_TW.dechanyu, and zh_TW.eucTW locales, the @chuyin, @radical, and @stroke variants are available for sorting by chuyin, radical, and stroke, respectively. These variant locale names (those including the @collation_modifier suffix) are available for assignment to the LC_COLLATE variable. The .UTF-8 and .ISO8859-15 locales are the only locales that include the euro monetary symbol in the coded character set. The *.UTF-8@euro locales also define the local currency symbol to be the euro character and the international currency symbol to be EUR. See euro(5) for more information about the euro symbol and how it is supported. You can use the -a option with the locale command to list all the locales available on the system. The POSIX (or C) locale is always available because it must exist on all systems that conform to The Open Group's UNIX specifications. The POSIX locale is the default locale when locale variables are not set. Note The dxterm terminal emulator does not support locales based on the Unicode (UTF-8) or Latin-9 (ISO8859-15) codesets. Use dtterm, the default terminal emulator for the Common Desktop Environment (CDE), with locales based on the Latin-9 and UTF-8 codesets. System Locales When you install Worldwide Language Support, localization is supported by two types of locales: Unicode locales and dense code locales. Unicode locales conform to Unicode and ISO/IEC 10646 standards and use UTF-32 as the wide character encoding. Under UTF-32 wide character encoding, wchar_t values represent the same characters regardless of the locale and, because Unicode standards prevail, implementation is consistent across platforms. Locales whose names end in .UTF-8 use file code and internal process code (wchar_t encoding) defined in the ISO 10646 and Unicode standards. Other, non-UTF-8 Unicode locales use traditional UNIX and proprietary codesets for the file code while using UTF-32 as the internal process code. A subset of these Unicode locales have a @ucs4 modifier; however, they are the same as the locales without the @ucs4 modifier. The @ucs4 subset is provided for backward compatibility and may be removed in the future. You cannot select @ucs4 locales from the CDE login menu; you must specify the locale name in the LANG environment variable. The universal.UTF-8 locale is also available (for use by applications rather than end users). It supports the complete set of characters in the universal character set (UCS). See Unicode(5) for more information about encoding formats. For .UTF-8 locales, file code may include characters encoded in more than 1 byte; therefore, use these locales in applications that can process multibyte data. Design new applications based on multibyte .UTF-8 locales, which incorporate a large character repertoire, to enable the application to expand future character support without changing the character set. Dense code locales use dense code for wide character encoding to minimize table size (that is, codepoints are assigned consecutively with no empty positions). Under dense code locales, a wchar_t value for one locale may not represent the same character in another locale and, thus, is locale specific. Dense code locales are appropriate for applications that have no dependencies on the internal process code or, because dense code locales are slightly more efficient than Unicode locales, require better performance. All valid codepoints in multibyte character sets are mapped to valid codepoints in Unicode, including unmapped codepoints that are mapped to Unicode codepoints in the private use area. Thus, dense code locales are equivalent to Unicode locales. In general, the same charmaps and locale source can be used for Unicode and dense code locales. However, Unicode and dense code characters that are not defined in the LC_COLLATE section may be sorted differently. A Unicode locale exists for each dense code locale. (However, not all Unicode locales have a dense code version.) For Latin-1 locales (ISO8859-1), the dense code and Unicode locales are identical because Latin-1 characters are the same as the first 256 characters in Unicode. The operating system also supports three UCS transformation formats (UTFs), UTF-8, UTF-16, and UTF-32, all of which are defined in the Unicode standard. See Unicode(5) for a full description of Unicode, UCS-4, and the transformation formats. The Unicode locales are installed in /usr/i18n/lib/nls/ucsloc/. Dense code locales are installed in /usr/i18n/lib/nls/loc/. A symbolic link, /usr/i18n/lib/nls/dloc points to the system default locales. For example, the Japanese locale filename, /usr/lib/nls/loc/ja_JP.eucJP, is a symbolic link to /usr/i18n/lib/nls/dloc/ja_JP.eucJP, where /dloc is a symbolic link to either /ucsloc for the Unicode version, or /loc for the dense code version, of the Japanese locale. Keep in mind that the same locale name can refer to a Unicode locale or to a dense code locale, depending on the setting of the symbolic link. Thus, if running an application in a locale is problematic, check the symbolic link. Because Unicode locales use consistent values for characters in wchar_t form, a default link to Unicode locales can increase consistency across locales and platforms. However, some users may prefer the older, dense code locales that use proprietary algorithms to convert characters to wchar_t form, or an application may have dependencies on dense code wchar_t encoding. To switch between Unicode and dense code locales, the system administrator, as root, uses i18nconfig to change the systemwide default or manually changes the symbolic link /usr/i18n/lib/nls/dloc from ./ucsloc to ./loc. Environment Variables Related to Localization The following system environment variables can be set (usually only by installed applications or by programmers who are testing applications or converters under development) to override the default search path for certain kinds of localized files: LOCPATH Specifies the search path for locales and codeset converters. This environment variable is not defined by current industry standards. See iconv_intro(5), iconv_open(3), and setlocale(3) for more information. Because the LOCPATH variable is not defined by standards, it is recommended for use only when testing locales or converters under development and not as a systemwide method for finding installed converters or locales. When you set LOCPATH, make sure that the search path is valid for both locales and converters. Otherwise, application and system software can find only locales or only converters in environments where both kinds of files are required. NLSPATH Specifies the search path for message catalogs, which contain translated text for programs. This variable is used primarily by the catopen() function. See catopen(3) for detailed information on NLSPATH. Customizing Locales Partial source files, along with an associated Makefile, are available for many locales in the /usr/lib/nls/loc/src directory. By editing one of these source files and using the Makefile to rebuild the locale (make locale_name), you can customize one or more of the following features: · The format of affirmative and negative responses (LC_MESSAGES section) · Rules and symbols for formatting monetary numeric information (LC_MONETARY section) · Rules and symbols for formatting nonmonetary numeric information (LC_NUMERIC section) · Rules and symbols for formatting date and time information (LC_TIME section) As described in locale(4), the LC_CTYPE and LC_COLLATE sections of these locale sources are not customizable using this method. This means that you cannot use one of these sources to change how characters are classified or collated. By implication, this also means that you cannot add a new character to a locale that does not already support it. For example, you cannot add the European monetary character (euro) to a locale that does not already support that character. However, you can edit the LC_MONETARY section to define a string identifier for euro by using characters that the locale does support. For example, you could replace the existing monetary symbol with EUR. See locale(4) for more information on a locale source file. See Writing Software for the International Market for information on user customization of LC_CTYPE and LC_COLLATE. Caution Customized versions of locales that are provided with the operating system are not preserved when the operating system is reinstalled, even when an update installation procedure is used. Therefore, you must back up files for customized locales and their sources before reinstalling the operating system. After the reinstallation is complete, you must restore your customized locales to the system. If the newly installed sources have revisions when compared to the old sources, it might be preferable to apply your customizations to the newly installed sources and rebuild your customized locales.

SEE ALSO

Commands: locale(1), localedef(1) Functions: catopen(3) Files: charmap(4), locale(4) Others: Catalan(5), Chinese(5), Czech(5), dechanyu(5), dechanzi(5), deckanji(5), deckorean(5), Dutch(5), eucJP(5), eucKR(5), eucTW(5), euro(5), Finnish(5), French(5), GB18030(5), GBK(5) ,German(5), Greek(5), Hebrew(5), Hungarian(5), i18n_intro(5), i18n_printing(5), Icelandic(5), iconv_intro(5), iso2022(5), iso2022jp(5), iso8859-1(5), iso8859-2(5), iso8859-4(5), iso8859-5(5), iso8859-7(5), iso8859-8(5), iso8859-9(5), iso8859-15(5), Italian(5), Japanese(5), jiskanji(5), Korean(5), Lithuanian(5), Norwegian(5), Polish(5), Portuguese(5), Russian(5), sbig5(5), sdeckanji(5), shiftjis(5), Slovak(5), Slovene(5), Spanish(5), Swedish(5), TACTIS(5), telecode(5) Thai(5), Turkish(5), Unicode(5) Writing Software for the International Market Using International Software

Index Index for
Section 5
Index Alphabetical
listing for L
Top of page Top of
page