10    Internationalization

This chapter describes the internationalization features of Tru64 UNIX. The first section provides a brief internationalization overview (Section 10.1), after which the following topics are discussed:

10.1    Overview

The term "internationalization" is formally defined by The Open Group as a

"provision within a computer program of the capability of making itself adaptable to the requirements of different native languages, local customs, and coded character sets"

This essentially means that internationalized programs can run in any supported locale without having to be modified. A locale is a software environment that correctly handles the cultural conventions of a particular geographic area, such as China or France, and a language as it is used in that area. So by selecting a Chinese locale, for example, all commands, system messages, and keystrokes can be in Chinese characters and displayed in a way appropriate for Chinese.

Tru64 UNIX is an internationalized operating system that not only allows users to interact with existing applications in their native language, but also supports a full set of application interfaces, referred to as the Worldwide Portability Interfaces (WPI), to enable software developers to write internationalized applications. The original code for these interfaces came from the Open Software Foundation (OSF) and has been enhanced.

The internationalization support in the operating system conforms to The Open Group's CAE specifications for system interfaces and headers (XSH Issue 5), curses (XCURSES Issue 4.2), and commands and utilities (XCU Issue 5). These specifications align with current POSIX and ISO C standards. This conformance ensures that commands, utilities, and libraries have been internationalized, and their corresponding message catalogs have been included in the base system.

In addition, the operating system supports the X Input Method (XIM) and X Output Method(XOM) to facilitate input of local language characters, text drawing, measurement, and interclient communication. These functions are implemented according to the X11R6.3 specification and include some problem corrections specified by X11R6.4.

Note that the operating system also supports a 32-bit wchar_t datatype which in turn enables support for a wide array of codesets, including the one defined by the ISO 10646 standard.

For more comprehensive information, see Writing Software for the International Market.

10.2    Supported Languages

Table 10-1 lists the languages supported by the operating system and their corresponding locales. Most locales are included in Worldwide Language Support (WLS) subsets that are optionally installed. Some, as indicated in the table, are part of the mandatory base operating system.

Locale names that include @ucs4 and UTF-8 support character encoding as defined by the ISO 10646 standard. The wchar_t data type is 32 bits in length, with zero padding of leading bits for those character values that do not require all 32 bits. The first 256 values of the Universal Character Set (UCS) define the same characters as defined for the ISO 8859-1 (Latin-1) character set. Therefore, the Tru64 UNIX implementation of the wchar_t data type is identical to UCS-4 process code for all ISO8859-1 locales. ISO8859-1 locales differ from UTF-8 locales with the same base name in terms of data file encoding and the fact that only the UTF-8 locales support the euro currency symbol.

The English locale name that includes cp850 supports character encoding in PC code-page format.

For the most up-to-date list of supported languages and locales, refer to the l10n_intro(5) reference page. This book may not be updated for minor functional releases of the operating system and locales are sometimes added for such releases.

Table 10-1:  Languages and Locales

Language Locale Name
Catalan

ca_ES.ISO8859-1
[Footnote 2]
 
ca_ES.ISO8859-15
ca_ES.UTF-8

Simplified Chinese (PRC)

zh_CN.dechanzi
zh_CN.dechanzi@ucs4
zh_CN.dechanzi@pinyin
zh_CN.dechanzi@pinyin@ucs4
zh_CN.dechanzi@radical
zh_CN.dechanzi@radical@ucs4
zh_CN.dechanzi@stroke
zh_CN.dechanzi@stroke@ucs4
zh_CN.GBK

Traditional Chinese (Hong Kong)

zh_HK.big5
zh_HK.dechanyu
zh_HK.dechanyu@ucs4
zh_HK.dechanzi
zh_HK.dechanzi@ucs4
zh_HK.eucTW
zh_HK.eucTW@ucs4

Traditional Chinese (Taiwan)

zh_TW.big5
zh_TW.big5@chuyin
zh_TW.big5@radical
zh_TW.big5@stroke
zh_TW.dechanyu
zh_TW.dechanyu@ucs4
zh_TW.dechanyu@chuyin
zh_TW.dechanyu@chuyin@ucs4
zh_TW.dechanyu@radical
zh_TW.dechanyu@radical@ucs4
zh_TW.dechanyu@stroke
zh_TW.dechanyu@stroke@ucs4
zh_TW.eucTW
zh_TW.eucTW@ucs4
zh_TW.eucTW@chuyin
zh_TW.eucTW@chuyin@ucs4
zh_TW.eucTW@radical
zh_TW.eucTW@radical@ucs4
zh_TW.eucTW@stroke
zh_TW.eucTW@stroke@ucs4

Czech

cs_CZ.ISO8859-2
cs_CZ.ISO8859-2@ucs4

Danish

da_DK.ISO8859-1
[Footnote 2]
da_DK.ISO8859-15
da_DK.UTF-8

Dutch

nl_NL.ISO8859-1
[Footnote 2]
nl_NL.ISO8859-15
nl_NL.UTF-8

Belgian Dutch

nl_BE.ISO8859-1
[Footnote 2]
nl_BE.ISO8859-15
nl_BE.UTF-8

US English/ASCII C (POSIX) [Footnote 2]
US English

en_US.ISO8859-1
[Footnote 2]
en_US.ISO8859-15
en_US.cp850.
en_US.UTF-8,
en_US.UTF-8@euro
[Footnote 3]
 

GB English

en_GB.ISO8859-1
[Footnote 2]
en_GB.ISO8859-15
en_GB.UTF-8

European en_EU.UTF-8@euro [Footnote 4]
Finnish

fi_FI.ISO8859-1
[Footnote 2]
fi_FI.ISO8859-15
fi_FI.UTF-8

French

fr_FR.ISO8859-1
[Footnote 2]
fr_FR.ISO8859-15
fr_FR.UTF-8

Belgian French

fr_BE.ISO8859-1
[Footnote 2]
fr_BE.ISO8859-15
fr_BE.UTF-8

Canadian French

fr_CA.ISO8859-1
[Footnote 2]
fr_CA.ISO8859-15
fr_CA.UTF-8

Swiss French

fr_CH.ISO8859-1
[Footnote 2]
fr_CH.ISO8859-15
fr_CH.UTF-8

German

de_DE.ISO8859-1
[Footnote 2]
de_DE.ISO8859-15
de_DE.UTF-8

Swiss German

de_CH.ISO8859-1
[Footnote 2]
de_CH.ISO8859-15
de_CH.UTF-8

Greek

el_GR.ISO8859-7,
el_GR.ISO8859-7@ucs4
el_GR.UTF-8

Hebrew

he_IL.ISO8859-8
he_IL.ISO8859-8@ucs4
 

Hungarian

hu_HU.ISO8859-2
hu_HU.ISO8859-2@ucs4

Icelandic

is_IS.ISO8859-1
[Footnote 2]
is_IS.ISO8859-15

Italian

it_IT.ISO8859-1
[Footnote 2]
it_IT.ISO8859-15
it_IT.UTF-8

Japanese

ja_JP.eucJP
ja_JP.SJIS
ja_JP.SJIS@ucs4
ja_JP.deckanji
ja_JP.deckanji@ucs4
ja_JP.sdeckanji

Korean

ko_KR.deckorean
ko_KR.deckorean@ucs4
ko_KR.eucKR
ko_KR.KSC5601

Lithuanian

lt_LT.ISO8859-4
lt_LT.ISO8859-4@ucs4

Norwegian

no_NO.ISO8859-1
[Footnote 2]
no_NO.ISO8859-15
no_NO.UTF-8

Polish

pl_PL.ISO8859-2
pl_PL.ISO8859-2@ucs4

Portuguese

pt_PT.ISO8859-1
[Footnote 2]
pt_PT.ISO8859-15
pt_PT.UTF-8

Russian

ru_RU.ISO8859-5
ru_RU.ISO8859-5@ucs4

Slovak

sk_SK.ISO8859-2
sk_SK.ISO8859-2@ucs4

Slovene

sl_SI.ISO8859-2
sl_SI.ISO8859-2@ucs4

Spanish

es_ES.ISO8859-1
[Footnote 2]
es_ES.ISO8859-15
es_ES.UTF-8

Swedish

sv_SE.ISO8859-1
[Footnote 2]
sv_SE.ISO8859-15
sv_SE.UTF-8

Thai

th_TH.TACTIS

Turkish

tr_TR.ISO8859-9
tr_TR.ISO8859-9@ucs4

Note that you can switch languages or character sets as necessary and can even operate multiple processes in different languages or codesets in the same system at the same time.

For more information on a particular coded character set, such as ISO8859-9, see the reference page with the same name. For more information about UCS-4 and UTF-8 encoding, see Unicode(5). For more information about PC code pages, see code_page(5).

10.3    Locale Creation

The localedef utility allows programmers to create their own locales, compile their source code, and generate a unique name for their new locale.

For more information on creating locales, see Writing Software for the International Market.

10.4    Enhanced Terminal Subsystem for Asian Languages

The base tty terminal driver subsystem is extended to include additional BSD line disciplines and STREAMS terminal driver modules for processing data in Chinese, Japanese, Korean, and Thai. For example, the enhanced terminal subsystem supports the following capabilities for these languages:

For more information about the Asian and Thai terminal subsystems, see the atty(7), ttty(7), and stty(1) reference pages.

10.5    Enhanced Sorting for Asian Languages

The operating system supports the asort utility, an extension of the sort command, which allows characters of ideogrammatic languages, like Chinese and Japanese, to be sorted according to multiple collation sequences. For more information on the asort utility, see asort(1).

10.6    Multilingual Emacs Editor

The operating system supports the Multilingual Emacs editor (MULE) for Asian languages. See mule(1) for more information.

10.7    Support for User-Defined Characters

The operating system provides support for creating user-defined characters (UDCs) in Chinese, Japanese, and Korean, so that users can create and define character fonts and their attributes, including bitmap fonts, with the cedit and cgen utilities.

Font rendering facilities are available so that X clients can use UDC databases through the X server or font server to obtain bitmap fonts for user-defined characters.

For more information on user-defined characters, see Writing Software for the International Market, cedit(1) and cgen(1).

10.8    Codeset Conversion

The operating system includes the iconv utility and the iconv_open(), iconv(), and iconv_close() functions, which convert text from one codeset to another, thereby assisting programmers in the writing of international applications. For use with these interfaces, the operating system includes a large set of codeset converters.

In addition to conversion between different codesets for the same language, these converters support conversion between different Unicode formats, such as UCS-2, UCS-4, and UTF-8. There are also codeset converters that handle the most commonly used PC code-page formats.

Codeset conversion is also used by the printing subsystem and utilities, such as man, to allow processing of files in different languages and encoding formats. Additionally, codeset conversion is implemented in mail utilities for mail interchange with systems using different codesets and in the X Windows Toolkit for text input, drawing, and interclient communication. For more information on codeset conversion, see the iconv_intro(5) reference page. See the Unicode(5) and code_page(5) reference pages for a discussion of converters for Unicode encoding formats and PC code-page formats, respectively.

10.9    Unicode Support

The operating system provides both codeset converters and locales that support the Unicode and ISO 10646 standards. The codeset converter modules convert between other supported codesets and UCS-2, UCS-4, and UTF-8 formats. In addition to the country-specific and language-specific locales listed in Section 10.2, programmers can use the universal.UTF-8 locale to process characters in all languages by using UCS-4 encoding format.

The operating system provides a function called fold_string_w(), which maps one Unicode string to another and performs the specified Unicode transformations. For more information on the fold_string_w() function, see fold_string_w(3). For more information on Unicode support, see Unicode(5).

10.10    Support for the Euro Character

The operating system supports the new euro currency now being used by member countries of the European Economic and Monetary Union (EMU).

Locales that use the UTF-8 or Latin-9 (ISO8859-15) codesets support the euro character. Those locales whose names include the @euro suffix also define the local currency sign to be the euro character and the international currency sign to be EUR. See Section 10.2 for more information about locales.

The ISO Latin-9 codeset, which also includes the euro character, forms the basis of euro font support. The operating system includes both screen and PostScript outline fonts for this codeset. See ISO8859-15(5) for a list of these fonts.

The operating system does not provide native Unicode fonts that support the euro character. However, the X font library has been extended to combine a number of fonts together to form logical Unicode fonts for applications to use. The names of these logical fonts include the string ISO10646-1.

Printer support for the euro character is enabled by a generic PostScript print filter, wwpsof, which supports printing of file data in UTF-8 format. See wwpsof(8) for information on setting up printers with this print filter.

Keyboard entry of the euro character is supported by language-specific and keyboard-specific key sequences that are defined in keymaps (XKB format). The euro character also can be entered by using a Compose key sequence on those keyboards that support a Compose key. The euro(5) reference page lists these key sequences.

Finally, the operating system provides codeset converters to convert file data between the various encoding formats that support the euro character. Specifically, codeset converters can convert file data between:

See euro(5) for a more detailed discussion of the information in this section.

10.11    Internationalized Curses Library

The operating system supplies an internationalized Curses library in conformance with X/Open Curses, Issue 4 Version 2. This library provides functions for processing characters that span one or multiple bytes. These characters may be in either wide-character (wchar_t) or complex-character (cchar_t) formats. The complex-character format provides for a single logical character made up of multiple wide characters. Some of the components of the complex character may be nonspacing characters.

For information on the syntax and effect of Curses interfaces, see curses(3). For a description of the enhancements provided by the internationalized Curses routines, and their relationship to previous Curses routines, see Writing Software for the International Market.

10.12    Internationalized Printing

The operating system supports the printing of plain text and PostScript files for a variety of languages and provides outline fonts for high quality printing on PostScript printers. In addition to print filters for a variety of local-language printers, generic internationalized print filters are available for use with both Compaq and third-party printers. One of these filters, wwpsof, supports printing of local-language files on PostScript printers that do not include the required fonts. For more information on internationalized printing features, see the i18n_printing(5), pcfof(8), and wwpsof(8) reference pages.

10.13    Graphical Internationalization Configuration Tool

The I18N Configuration tool, available through the Application Manager, is one of the CDE System Administration Configuration applications. The I18N Configuration tool provides a graphical interface for the system administrator to configure I18N-specific settings. It also provides a convenient way to see which countries, locales, fonts, and keymaps are supported on the host system. System administrators can also use this tool to remove unused fonts and country support from the system.

10.14    Mail and 8-Bit Character Support

By default, the operating system provides support for 8-bit character encoding in mailx, dtmail, MH, and comsat.

For more information on these mail utilities, see mailx(1), dtmail(1), mh(1), and comsat(8).

10.15    Enhanced file Command

The filecommand is enhanced to recognize UCS-2 and UCS-4 encoding in any locale setting. For other encoding formats, the command recognizes file data encoding if it is valid for the current locale setting. This command also has a jfile alias that, in any locale, can recognize DEC Kanji, Japanese EUC, Shift JIS, and 7-bit JIS encoding.

10.16    Internationalization for Graphical Applications

Motif Version 1.2.3 takes advantage of many of the internationalization features of X11R6 and the C library to support locales. Motif Version 1.2.3 also supports the use of alternate input methods, which allows input of non-ISO Latin-1 keystrokes, and delivers an extensively rewritten XmText widget, which supports multibyte and wide-character format and on-the-spot input style.

Motif supports multibyte and wide-character encoding through the use of the internationalized X Library functions, and C Library functions. In addition, the compound string routines include the X11R6 XFontSet component to allow for the creation of localized strings.

The User Interface Language (UIL) supports the creation of localized UID files through the -s compile-time option on the UIL compiler, which causes the compiler to construct localized strings.

Alternate input methods can be specified by a resource on the VendorShell widget. Widgets that are parented by a Shell class widget can take advantage of this resource and register themselves as using a specific method for input.

The following sections discuss internationalization features of Motif widgets and internationalized client applications.

10.16.1    Internationalized Motif Widgets

The following lists contain the widgets in the Motif Toolkit and in the Extensions to the Motif Toolkit that support local language characters, I/O capabilities, and local language message displays.

Note that the Motif UIL compiler is extended to support local language characters in UIL files.

10.16.2    Internationalized CDE Clients

CDE is the default desktop for Tru64 UNIX. The following CDE clients are internationalized:

By default, client applications run in the language set by the user at the start of a CDE session. However, users can also change locale in a terminal emulation window and invoke an application in a language different from the session default.

10.16.3    Additional Internationalized Motif Clients

The operating system includes the following internationalized clients in addition to those common to all CDE implementations: