10 Internationalization

This chapter describes the internationalization features of Tru64 UNIX. The first section provides a brief internationalization overview (Section 10.1), after which the following topics are discussed:

Supported languages (Section 10.2)

Using the localedef utility to create locales (Section 10.3)

The enhanced terminal subsystem for Asian languages (Section 10.4)

Enhanced sorting for Asian languages (Section 10.6)

The Multilingual Emacs editor (Section 10.7)

Support for user-defined characters in Chinese, Japanese, and Korean (Section 10.8)

Converting text from one codeset to another (Section 10.9)

Support for the Unicode and ISO 10646 standards (Section 10.10)

Support for the euro currency symbol (Section 10.11)

The internationalized Curses library (Section 10.12)

Internationalized printing (Section 10.13)

The graphical internationalization configuration tool (Section 10.14)

Support for 8-bit character encoding in mail programs (Section 10.15)

Enhancements to the file command (Section 10.16)

Internationalization for graphical applications (Section 10.17)

10.1 Overview

The term "internationalization" is formally defined by The Open Group as a

"provision within a computer program of the capability of making itself adaptable to the requirements of different native languages, local customs, and coded character sets"

This essentially means that internationalized programs can run in any supported locale without having to be modified. A locale is a software environment that correctly handles the cultural conventions of a particular geographic area, such as China or France, and a language as it is used in that area. So by selecting a Chinese locale, for example, all commands, system messages, and keystrokes can be in Chinese characters and displayed in a way appropriate for Chinese.

Tru64 UNIX is an internationalized operating system that not only allows users to interact with existing applications in their native language, but also supports a full set of application interfaces, referred to as the Worldwide Portability Interfaces (WPI), to enable software developers to write internationalized applications. The original code for these interfaces came from the Open Software Foundation (OSF) and has been enhanced.

The internationalization support in the operating system conforms to The Open Group's CAE specifications for system interfaces and headers (XSH Issue 5), curses (XCURSES Issue 4.2), and commands and utilities (XCU Issue 5). These specifications align with current POSIX and ISO C standards. This conformance ensures that commands, utilities, and libraries have been internationalized, and their corresponding message catalogs have been included in the base system.

In addition, the operating system supports the X Input Method (XIM) and X Output Method(XOM) to facilitate input of local language characters, text drawing, measurement, and interclient communication. These functions are implemented according to the X11R6.3 specification and include some problem corrections specified by X11R6.4.

Note that the operating system also supports a 32-bit wchar_t datatype which in turn enables support for a wide array of codesets, including the one defined by the ISO 10646 standard.

For more comprehensive information, see Writing Software for the International Market.

10.2 Supported Languages

Table 10-1 lists the languages supported by the operating system and their corresponding locales. Most locales are included in Worldwide Language Support (WLS) subsets that are optionally installed. Some, as indicated in the table, are part of the mandatory base operating system.

Locale names that include @ucs4 and UTF-8 support character encoding as defined by the ISO 10646 standard. The wchar_t data type is 32 bits in length, with zero padding of leading bits for those character values that do not require all 32 bits. The first 256 values of the Universal Character Set (UCS) define the same characters as defined for the ISO 8859-1 (Latin-1) character set. Therefore, the Tru64 UNIX implementation of the wchar_t data type is identical to UCS-4 process code for all ISO8859-1 locales. ISO8859-1 locales differ from UTF-8 locales with the same base name in terms of data file encoding and the fact that only the UTF-8 locales support the euro currency symbol.

The English locale name that includes cp850 supports character encoding in PC code-page format.

For the most up-to-date list of supported languages and locales, refer to the l10n_intro(5) reference page. This book may not be updated for minor functional releases of the operating system and locales are sometimes added for such releases.

Table 10-1: Languages and Locales

Language	Locale Name
Catalan	`ca_ES.ISO8859-1` ^{[Footnote 2]} `ca_ES.ISO8859-15` `ca_ES.UTF-8`
Simplified Chinese (PRC)	`zh_CN.dechanzi` `zh_CN.dechanzi@ucs4` `zh_CN.dechanzi@pinyin` `zh_CN.dechanzi@pinyin@ucs4` `zh_CN.dechanzi@radical` `zh_CN.dechanzi@radical@ucs4` `zh_CN.dechanzi@stroke` `zh_CN.dechanzi@stroke@ucs4` `zh_CN.GBK` `zh_CN.GB18030`
Traditional Chinese (Hong Kong)	`zh_HK.big5` `zh_HK.dechanyu` `zh_HK.dechanyu@ucs4` `zh_HK.dechanzi` `zh_HK.dechanzi@ucs4` `zh_HK.eucTW` `zh_HK.eucTW@ucs4`
Traditional Chinese (Taiwan)	`zh_TW.big5` `zh_TW.big5@chuyin` `zh_TW.big5@radical` `zh_TW.big5@stroke` `zh_TW.dechanyu` `zh_TW.dechanyu@ucs4` `zh_TW.dechanyu@chuyin` `zh_TW.dechanyu@chuyin@ucs4` `zh_TW.dechanyu@radical` `zh_TW.dechanyu@radical@ucs4` `zh_TW.dechanyu@stroke` `zh_TW.dechanyu@stroke@ucs4` `zh_TW.eucTW` `zh_TW.eucTW@ucs4` `zh_TW.eucTW@chuyin` `zh_TW.eucTW@chuyin@ucs4` `zh_TW.eucTW@radical` `zh_TW.eucTW@radical@ucs4` `zh_TW.eucTW@stroke` `zh_TW.eucTW@stroke@ucs4`
Czech	`cs_CZ.ISO8859-2` `cs_CZ.ISO8859-2@ucs4`
Danish	`da_DK.ISO8859-1` ^{[Footnote 2]} `da_DK.ISO8859-15` `da_DK.UTF-8`
Dutch	`nl_NL.ISO8859-1` ^{[Footnote 2]} `nl_NL.ISO8859-15` `nl_NL.UTF-8`
Belgian Dutch	`nl_BE.ISO8859-1` ^{[Footnote 2]} `nl_BE.ISO8859-15` `nl_BE.UTF-8`
US English/ASCII	`C` (POSIX) ^{[Footnote 2]}
US English	`en_US.ISO8859-1` ^{[Footnote 2]} `en_US.ISO8859-15` `en_US.cp850`. `en_US.UTF-8`, `en_US.UTF-8@euro` ^{[Footnote 3]}
GB English	`en_GB.ISO8859-1` ^{[Footnote 2]} `en_GB.ISO8859-15` `en_GB.UTF-8`
European	`en_EU.UTF-8@euro` ^{[Footnote 4]}
Finnish	`fi_FI.ISO8859-1` ^{[Footnote 2]} `fi_FI.ISO8859-15` `fi_FI.UTF-8`
French	`fr_FR.ISO8859-1` ^{[Footnote 2]} `fr_FR.ISO8859-15` `fr_FR.UTF-8`
Belgian French	`fr_BE.ISO8859-1` ^{[Footnote 2]} `fr_BE.ISO8859-15` `fr_BE.UTF-8`
Canadian French	`fr_CA.ISO8859-1` ^{[Footnote 2]} `fr_CA.ISO8859-15` `fr_CA.UTF-8`
Swiss French	`fr_CH.ISO8859-1` ^{[Footnote 2]} `fr_CH.ISO8859-15` `fr_CH.UTF-8`
German	`de_DE.ISO8859-1` ^{[Footnote 2]} `de_DE.ISO8859-15` `de_DE.UTF-8`
Swiss German	`de_CH.ISO8859-1` ^{[Footnote 2]} `de_CH.ISO8859-15` `de_CH.UTF-8`
Greek	`el_GR.ISO8859-7`, `el_GR.ISO8859-7@ucs4` `el_GR.UTF-8`
Hebrew	`he_IL.ISO8859-8` `he_IL.ISO8859-8@ucs4`
Hungarian	`hu_HU.ISO8859-2` `hu_HU.ISO8859-2@ucs4`
Icelandic	`is_IS.ISO8859-1` ^{[Footnote 2]} `is_IS.ISO8859-15`
Italian	`it_IT.ISO8859-1` ^{[Footnote 2]} `it_IT.ISO8859-15` `it_IT.UTF-8`
Japanese	`ja_JP.eucJP` `ja_JP.SJIS` `ja_JP.SJIS@ucs4` `ja_JP.deckanji` `ja_JP.deckanji@ucs4` `ja_JP.sdeckanji`
Korean	`ko_KR.deckorean` `ko_KR.deckorean@ucs4` `ko_KR.eucKR` `ko_KR.KSC5601`
Lithuanian	`lt_LT.ISO8859-4` `lt_LT.ISO8859-4@ucs4`
Norwegian	`no_NO.ISO8859-1` ^{[Footnote 2]} `no_NO.ISO8859-15` `no_NO.UTF-8`
Polish	`pl_PL.ISO8859-2` `pl_PL.ISO8859-2@ucs4`
Portuguese	`pt_PT.ISO8859-1` ^{[Footnote 2]} `pt_PT.ISO8859-15` `pt_PT.UTF-8`
Russian	`ru_RU.ISO8859-5` `ru_RU.ISO8859-5@ucs4`
Slovak	`sk_SK.ISO8859-2` `sk_SK.ISO8859-2@ucs4`
Slovene	`sl_SI.ISO8859-2` `sl_SI.ISO8859-2@ucs4`
Spanish	`es_ES.ISO8859-1` ^{[Footnote 2]} `es_ES.ISO8859-15` `es_ES.UTF-8`
Swedish	`sv_SE.ISO8859-1` ^{[Footnote 2]} `sv_SE.ISO8859-15` `sv_SE.UTF-8`
Thai	`th_TH.TACTIS`
Turkish	`tr_TR.ISO8859-9` `tr_TR.ISO8859-9@ucs4`

Note that you can switch languages or character sets as necessary and can even operate multiple processes in different languages or codesets in the same system at the same time.

For more information on a particular coded character set, such as ISO8859-9, see the reference page with the same name. For more information about UCS-4 and UTF-8 encoding, see Unicode(5). For more information about PC code pages, see code_page(5).

10.3 Locale Creation

The localedef utility allows programmers to create their own locales, compile their source code, and generate a unique name for their new locale.

For more information on creating locales, see Writing Software for the International Market.

10.4 Enhanced Terminal Subsystem for Asian Languages

The base tty terminal driver subsystem is extended to include additional BSD line disciplines and STREAMS terminal driver modules for processing data in Chinese, Japanese, Korean, and Thai. For example, the enhanced terminal subsystem supports the following capabilities for these languages:

Japanese Kana-Kanji conversion input method

Character-based line processing in cooked mode

Input line history and editing (BSD line discipline only)

Software on-demand-loading for user-defined characters

Conversion between terminal code and application code

For more information about the Asian and Thai terminal subsystems, see the atty(7), ttty(7), and stty(1) reference pages.

10.5 Enhanced wwconfig Command

The wwconfig command configures terminal (tty) options for Asian countries. The command has been enhanced to include the following:

The -vmunix and -kernel options, which display I18N modules that are statically linked into /vmunix or currently used by the running kernel

The -pty option, which specifies the use of a BSD or streams pseudo-driver for remote logins and telnet sessions

The -config option, which specifies a kernel configuration file

The-utx option, which specifies the addition of Kana-Kanji, On Demand Loading, or Software Phrase Input as Asian tty driver options

The -code option, which specifies the addition of BIG5, Mitac, Chinese mapping support, and UTF-8 character set support as Asian tty driver options

The -[no]thai option, which includes or excludes the Thai tty driver

The -utxnum option, which specifies the number of utx pseudo-devices to be created

For information on the use and specification of these options, see the wwconfig(8) reference page.

10.6 Enhanced Sorting for Asian Languages

The operating system supports the asort utility, an extension of the sort command, which allows characters of ideogrammatic languages, like Chinese and Japanese, to be sorted according to multiple collation sequences. For more information on the asort utility, see asort(1).

10.7 Multilingual Emacs Editor

The operating system supports the Multilingual Emacs editor (MULE) for Asian languages. See mule(1) for more information.

10.8 Support for User-Defined Characters

The operating system provides support for creating user-defined characters (UDCs) in Chinese, Japanese, and Korean, so that users can create and define character fonts and their attributes, including bitmap fonts, with the cedit and cgen utilities.

Font rendering facilities are available so that X clients can use UDC databases through the X server or font server to obtain bitmap fonts for user-defined characters.

For more information on user-defined characters, see Writing Software for the International Market, cedit(1) and cgen(1).

10.9 Codeset Conversion

The operating system includes the iconv utility and the iconv_open(), iconv(), and iconv_close() functions, which convert text from one codeset to another, thereby assisting programmers in the writing of international applications. For use with these interfaces, the operating system includes a large set of codeset converters.

A new en_US.UTF-8 X locale database file contains font definitions that include all the various fonts used with the operating system. Thus, applications running under the en_US.UTF-8 locale can display all the font characters installed with Worldwide Language Support (WLS). Applications running under the Asian locales display all of the WLS installed fonts, except for ISO8859-2, -4, -5, -7, -8, -9, and TACTIS.

In addition to conversion between different codesets for the same language, these converters support conversion between different Unicode formats, such as UCS-2, UCS-4, and UTF-8. There are also codeset converters that handle the most commonly used PC code-page formats.

Codeset conversion is also used by the printing subsystem and utilities, such as man, to allow processing of files in different languages and encoding formats. Additionally, codeset conversion is implemented in mail utilities for mail interchange with systems using different codesets and in the X Windows System Toolkit for text input, drawing, and interclient communication. For more information on codeset conversion, see the iconv_intro(5) reference page. See the Unicode(5) and code_page(5) reference pages for a discussion of converters for Unicode encoding formats and PC code-page formats, respectively.

10.10 Unicode Support

The operating system provides both codeset converters and locales that support the Unicode and ISO 10646 standards. The codeset converter modules convert between other supported codesets and UCS-2, UCS-4, and UTF-8 formats. In addition to the country-specific and language-specific locales listed in Section 10.2, programmers can use the universal.UTF-8 locale to process characters in all languages by using UCS-4 encoding format.

The operating system provides a function called fold_string_w(), which maps one Unicode string to another and performs the specified Unicode transformations. For more information on the fold_string_w() function, see fold_string_w(3). For more information on Unicode support, see Unicode(5).

10.11 Support for the Euro Character

The operating system supports the new euro currency now being used by member countries of the European Economic and Monetary Union (EMU).

Locales that use the UTF-8 or Latin-9 (ISO8859-15) codesets support the euro character. Those locales whose names include the @euro suffix also define the local currency sign to be the euro character and the international currency sign to be EUR. See Section 10.2 for more information about locales.

The ISO Latin-9 codeset forms the basis of euro font support. The operating system includes both screen and PostScript outline fonts for this codeset. See ISO8859-15(5) for a list of these fonts.

The operating system does not provide native Unicode fonts that support the euro character. However, the X font library has been extended to combine a number of fonts together to form logical Unicode fonts for applications to use. The names of these logical fonts include the string ISO10646-1.

Printer support for the euro character is enabled by a generic PostScript print filter, wwpsof, which supports printing of file data in UTF-8 or Latin-9 format. See wwpsof(8) for information on setting up printers with this print filter.

In the UTF-8 and Latin-9 locales, keyboard entry of the euro character is supported by language-specific and keyboard-specific key sequences that are defined in keymaps (XKB format). The euro character also can be entered by using a Compose key sequence on those keyboards that support a Compose key. The euro(5) reference page lists these key sequences.

Finally, the operating system provides codeset converters to convert file data between the various encoding formats that support the euro character. Specifically, codeset converters can convert file data between:

Unicode encoding formats and PC code-page formats that support the euro

Unicode encoding formats and ISO8859-15 encoding

See euro(5) for a more detailed discussion of the information in this section.

10.12 Internationalized Curses Library

The operating system supplies an internationalized Curses library in conformance with X/Open Curses, Issue 4 Version 2. This library provides functions for processing characters that span one or multiple bytes. These characters may be in either wide-character (wchar_t) or complex-character (cchar_t) formats. The complex-character format provides for a single logical character made up of multiple wide characters. Some of the components of the complex character may be nonspacing characters.

For information on the syntax and effect of Curses interfaces, see curses(3). For a description of the enhancements provided by the internationalized Curses routines, and their relationship to previous Curses routines, see Writing Software for the International Market.

10.13 Internationalized Printing

The operating system supports the printing of plain text and PostScript files for a variety of languages and provides outline fonts for high quality printing on PostScript printers. In addition to print filters for a variety of local-language printers, generic internationalized print filters are available for use with both Compaq and third-party printers.

One of these filters, wwpsof, supports printing of local-language files on PostScript printers that do not include the required fonts. For more information on internationalized printing features, see the i18n_printing(5), pcfof(8), and wwpsof(8) reference pages.

10.14 Graphical Internationalization Configuration Tool

The I18N Configuration tool, available through the Application Manager, is one of the CDE System Administration Configuration applications.

The I18N Configuration tool provides a graphical interface for the system administrator to configure I18N-specific settings. It also provides a convenient way to see which countries, locales, fonts, and keymaps are supported on the host system. System administrators can also use this tool to remove unused fonts and country support from the system.

10.15 Mail and 8-Bit Character Support

By default, the operating system provides support for 8-bit character encoding in mailx, dtmail, MH, and comsat.

For more information on these mail utilities, see mailx(1), dtmail(1), mh(1), and comsat(8).

10.16 Enhanced file Command

The filecommand is enhanced to recognize UCS-2 and UCS-4 encoding in any locale setting. For other encoding formats, the command recognizes file data encoding if it is valid for the current locale setting. This command also has a jfile alias that, in any locale, can recognize DEC Kanji, Japanese EUC, Shift JIS, and 7-bit JIS encoding.

10.17 Internationalization for Graphical Applications

Motif Version 1.2.3 takes advantage of many of the internationalization features of X11R6 and the C library to support locales. Motif Version 1.2.3 also supports the use of alternate input methods, which allows input of non-ISO Latin-1 keystrokes, and delivers an extensively rewritten XmText widget, which supports multibyte and wide-character format and on-the-spot input style.

Motif supports multibyte and wide-character encoding through the use of the internationalized X Library functions, and C Library functions. In addition, the compound string routines include the X11R6 XFontSet component to allow for the creation of localized strings.

The User Interface Language (UIL) supports the creation of localized UID files through the -s compile-time option on the UIL compiler, which causes the compiler to construct localized strings.

Alternate input methods can be specified by a resource on the VendorShell widget. Widgets that are parented by a Shell class widget can take advantage of this resource and register themselves as using a specific method for input.

The following sections discuss internationalization features of Motif widgets and internationalized client applications.

10.17.1 Internationalized Motif Widgets

The following lists contain the widgets in the Motif Toolkit and in the Extensions to the Motif Toolkit that support local language characters, I/O capabilities, and local language message displays.

Note that the Motif UIL compiler is extended to support local language characters in UIL files.

Motif Toolkit
- Command
- FileSelectionBox
- Label
- MessageBox
- SelectionBox
- Text
- TextField

Extensions to Motif Toolkit
- ColorMix
- CSText
- Help
- Print
- Structured Visual Navigation (SVN)

10.17.2 Internationalized CDE Clients

CDE is the default desktop for Tru64 UNIX. The following CDE clients are internationalized:

Application Manager

Calculator

Calendar

Create Action

File Manager

Front Panel

Help Viewer

Icon Editor

Login Screen

Message

Mailer

Print Manager

Style Manager

Terminal Emulator

Trash Can

By default, client applications run in the language set by the user at the start of a CDE session. However, users can also change locale in a terminal emulation window and invoke an application in a language different from the session default.

10.17.3 Additional Internationalized Motif Clients

The operating system includes the following internationalized clients in addition to those common to all CDE implementations:

Differences

Keycap

DECterm

X Display Manager