This chapter describes the internationalization features of Tru64 UNIX. The first section provides a brief internationalization overview (Section 10.1), after which the following topics are discussed:
Supported languages (Section 10.2)
Using the
localedef
utility to create locales
(Section 10.3)
The enhanced terminal subsystem for Asian languages (Section 10.4
Enhanced sorting for Asian languages (Section 10.5)
The Multilingual Emacs editor. (Section 10.6)
Support for user-defined characters in Chinese, Japanese, and Korean (Section 10.7)
Converting text from one codeset to another. (Section 10.8)
Support for the Unicode and ISO 10646 standards. (Section 10.9)
Support for the euro currency symbol (Section 10.10)
The internationalized Curses library (Section 10.11)
Internationalized printing (Section 10.12)
The graphical internationalization configuration tool (Section 10.13)
Support for 8-bit character encoding in mail programs (Section 10.14)
Enhancements to the
file
command (Section 10.15)
Internationalization for graphical applications. (Section 10.16)
The term "internationalization" is formally defined by The Open Group as a
"provision within a computer program of the capability of making itself adaptable to the requirements of different native languages, local customs, and coded character sets"
This essentially means that internationalized programs can run in any supported locale without having to be modified. A locale is a software environment that correctly handles the cultural conventions of a particular geographic area, such as China or France, and a language as it is used in that area. So by selecting a Chinese locale, for example, all commands, system messages, and keystrokes can be in Chinese characters and displayed in a way appropriate for Chinese.
Tru64 UNIX is an internationalized operating system that not only allows users to interact with existing applications in their native language, but also supports a full set of application interfaces, referred to as the Worldwide Portability Interfaces (WPI), to enable software developers to write internationalized applications. The original code for these interfaces came from the Open Software Foundation (OSF) and has been enhanced.
The internationalization support in the operating system conforms to The Open Group's CAE specifications for system interfaces and headers (XSH Issue 5), curses (XCURSES Issue 4.2), and commands and utilities (XCU Issue 5). These specifications align with current POSIX and ISO C standards. This conformance ensures that commands, utilities, and libraries have been internationalized, and their corresponding message catalogs have been included in the base system.
In addition, the operating system supports the X Input Method (XIM) and X Output Method(XOM) to facilitate input of local language characters, text drawing, measurement, and interclient communication. These functions are implemented according to the X11R6.3 specification and include some problem corrections specified by X11R6.4.
Note that the operating system also supports a 32-bit
wchar_t
datatype which in turn enables support for a wide array of codesets,
including the one defined by the ISO 10646 standard.
For more comprehensive information, see
Writing Software for the International Market.
10.2 Supported Languages
Table 10-1 lists the languages supported by the operating system and their corresponding locales. Most locales are included in Worldwide Language Support (WLS) subsets that are optionally installed. Some, as indicated in the table, are part of the mandatory base operating system.
Locale names that include
@ucs4
and
UTF-8
support character encoding as defined by the ISO 10646 standard.
The
wchar_t
data type is 32 bits in length, with zero
padding of leading bits for those character values that do not require all
32 bits.
The first 256 values of the Universal Character Set (UCS) define
the same characters as defined for the ISO 8859-1 (Latin-1) character
set.
Therefore, the Tru64 UNIX implementation of the
wchar_t
data type is identical to UCS-4 process code for all
ISO8859-1
locales.
ISO8859-1
locales differ from
UTF-8
locales with the same base name in terms of data file encoding
and the fact that only the
UTF-8
locales support the euro
currency symbol.
The English locale name that includes
cp850
supports character encoding in PC code-page format.
For the most up-to-date list of supported languages and locales, refer
to the
l10n_intro
(5)
reference page.
This book may not be updated for minor functional
releases of the operating system and locales are sometimes added for such
releases.
Table 10-1: Languages and Locales
Language | Locale Name |
Catalan |
|
Simplified Chinese (PRC) |
|
Traditional Chinese (Hong Kong) |
|
Traditional Chinese (Taiwan) |
|
Czech |
|
Danish |
|
Dutch |
|
Belgian Dutch |
|
US English/ASCII | C
(POSIX)
[Footnote 2] |
US English |
|
GB English |
|
European | en_EU.UTF-8@euro
[Footnote 4]
|
Finnish |
|
French |
|
Belgian French |
|
Canadian French |
|
Swiss French |
|
German |
|
Swiss German |
|
Greek |
|
Hebrew |
|
Hungarian |
|
Icelandic |
|
Italian |
|
Japanese |
|
Korean |
|
Lithuanian |
|
Norwegian |
|
Polish |
|
Portuguese |
|
Russian |
|
Slovak |
|
Slovene |
|
Spanish |
|
Swedish |
|
Thai |
|
Turkish |
|
Note that you can switch languages or character sets as necessary and can even operate multiple processes in different languages or codesets in the same system at the same time.
For more information on a particular coded character set, such as
ISO8859-9
, see the reference page with the same name.
For more information
about
UCS-4
and
UTF-8
encoding, see
Unicode
(5).
For more information about PC code pages, see
code_page
(5).
10.3 Locale Creation
The
localedef
utility allows programmers to create
their own locales, compile their source code, and generate a unique name for
their new locale.
For more information on creating locales, see
Writing Software for the International Market.
10.4 Enhanced Terminal Subsystem for Asian Languages
The base
tty
terminal driver subsystem is extended
to include additional BSD line disciplines and STREAMS terminal driver modules
for processing data in Chinese, Japanese, Korean, and Thai.
For example, the
enhanced terminal subsystem supports the following capabilities for these
languages:
Japanese Kana-Kanji conversion input method
Character-based line processing in cooked mode
Input line history and editing (BSD line discipline only)
Software on-demand-loading for user-defined characters
Conversion between terminal code and application code
For more information about the Asian and Thai terminal subsystems, see
the
atty
(7),
ttty
(7), and
stty
(1)
reference pages.
10.5 Enhanced Sorting for Asian Languages
The operating system supports the
asort
utility,
an extension of the
sort
command, which allows characters
of ideogrammatic languages, like Chinese and Japanese, to be sorted according
to multiple collation sequences.
For more information on the
asort
utility, see
asort
(1).
10.6 Multilingual Emacs Editor
The operating system supports the Multilingual Emacs editor (MULE) for
Asian languages.
See
mule
(1)
for more information.
10.7 Support for User-Defined Characters
The operating system provides support for creating user-defined characters
(UDCs) in Chinese, Japanese, and Korean, so that users can create and define
character fonts and their attributes, including bitmap fonts, with the
cedit
and
cgen
utilities.
Font rendering facilities are available so that X clients can use UDC databases through the X server or font server to obtain bitmap fonts for user-defined characters.
For more information on user-defined characters, see
Writing Software for the International Market,
cedit
(1)
and
cgen
(1).
10.8 Codeset Conversion
The operating system includes the
iconv
utility and
the
iconv_open()
,
iconv()
, and
iconv_close()
functions, which convert text from one codeset to another,
thereby assisting programmers in the writing of international applications.
For use with these interfaces, the operating system includes a large set of
codeset converters.
In addition to conversion between different codesets for the same language,
these converters support conversion between different Unicode formats, such
as
UCS-2
,
UCS-4
, and
UTF-8
.
There are also codeset converters that handle the most commonly
used PC code-page formats.
Codeset conversion is also used by the printing subsystem and utilities,
such as
man
, to allow processing of files in different
languages and encoding formats.
Additionally, codeset conversion is implemented
in mail utilities for mail interchange with systems using different codesets
and in the X Windows Toolkit for text input, drawing, and interclient communication.
For more information on codeset conversion, see the
iconv_intro
(5)
reference
page.
See the
Unicode
(5)
and
code_page
(5)
reference pages for a discussion
of converters for Unicode encoding formats and PC code-page formats, respectively.
10.9 Unicode Support
The operating system provides both codeset converters and locales that
support the Unicode and ISO 10646 standards.
The codeset converter modules
convert between other supported codesets and
UCS-2
,
UCS-4
, and
UTF-8
formats.
In addition to the
country-specific and language-specific locales listed in
Section 10.2,
programmers can use the
universal.UTF-8
locale to process
characters in all languages by using
UCS-4
encoding format.
The operating system provides a function called
fold_string_w()
, which maps one Unicode string to another and performs the specified
Unicode transformations.
For more information on the
fold_string_w()
function, see
fold_string_w
(3).
For more information on Unicode
support, see
Unicode
(5).
10.10 Support for the Euro Character
The operating system supports the new euro currency now being used by member countries of the European Economic and Monetary Union (EMU).
Locales that use the UTF-8 or Latin-9 (ISO8859-15)
codesets support the euro character.
Those locales whose names include the
@euro
suffix also define the local currency sign to be the euro
character and the international currency sign to be
EUR
.
See
Section 10.2
for more information about locales.
The ISO Latin-9 codeset, which also includes the euro character, forms
the basis of euro font support.
The operating system includes both screen
and PostScript outline fonts for this codeset.
See
ISO8859-15
(5)
for
a list of these fonts.
The operating system does not provide native Unicode fonts that support
the euro character.
However, the X font library has been extended to combine
a number of fonts together to form logical Unicode fonts for applications
to use.
The names of these logical fonts include the string
ISO10646-1
.
Printer support for the euro character is enabled by a generic PostScript
print filter,
wwpsof
, which supports printing of file data
in UTF-8 format.
See
wwpsof
(8)
for information on setting up printers with this print filter.
Keyboard entry of the euro character is supported by language-specific
and keyboard-specific key sequences that are defined in keymaps (XKB format).
The euro character also can be entered by using a Compose key sequence on
those keyboards that support a Compose key.
The
euro
(5)
reference
page lists these key sequences.
Finally, the operating system provides codeset converters to convert file data between the various encoding formats that support the euro character. Specifically, codeset converters can convert file data between:
Unicode encoding formats and PC code-page formats that support the euro
Unicode encoding formats and
ISO8859-15
encoding
See
euro
(5)
for a more detailed discussion of the information in this
section.
10.11 Internationalized Curses Library
The operating system supplies an internationalized Curses library in
conformance with X/Open Curses, Issue 4 Version 2.
This library provides functions
for processing characters that span one or multiple bytes.
These characters
may be in either wide-character (wchar_t
) or complex-character
(cchar_t
) formats.
The complex-character format provides
for a single logical character made up of multiple wide characters.
Some of
the components of the complex character may be nonspacing characters.
For information on the syntax and effect of Curses interfaces, see
curses
(3).
For a description of the enhancements provided by the internationalized Curses
routines, and their relationship to previous Curses routines, see
Writing Software for the International Market.
10.12 Internationalized Printing
The operating system supports the printing of plain text and PostScript
files for a variety of languages and provides outline fonts for high quality
printing on PostScript printers.
In addition to print filters for a variety
of local-language printers, generic internationalized print filters are available
for use with both Compaq and third-party printers.
One of these filters,
wwpsof
, supports printing of local-language files on PostScript
printers that do not include the required fonts.
For more information on internationalized
printing features, see the
i18n_printing
(5),
pcfof
(8), and
wwpsof
(8)
reference pages.
10.13 Graphical Internationalization Configuration Tool
The I18N Configuration tool, available through the Application Manager,
is one of the CDE System Administration Configuration applications.
The I18N
Configuration tool provides a graphical interface for the system administrator
to configure I18N-specific settings.
It also provides a convenient way to
see which countries, locales, fonts, and keymaps are supported on the host
system.
System administrators can also use this tool to remove unused fonts
and country support from the system.
10.14 Mail and 8-Bit Character Support
By default, the operating system provides support for 8-bit character
encoding in
mailx
,
dtmail
,
MH
, and
comsat
.
For more information on these mail utilities, see
mailx
(1),
dtmail
(1),
mh
(1),
and
comsat
(8).
10.15 Enhanced file Command
The
file
command is enhanced to recognize UCS-2 and
UCS-4 encoding in any locale setting.
For other encoding formats, the command
recognizes file data encoding if it is valid for the current locale setting.
This command also has a
jfile
alias that, in any locale,
can recognize DEC Kanji, Japanese EUC, Shift JIS, and 7-bit JIS encoding.
10.16 Internationalization for Graphical Applications
Motif Version 1.2.3 takes advantage of many of the internationalization
features of X11R6 and the C library to support locales.
Motif Version 1.2.3
also supports the use of alternate input methods, which allows input of non-ISO
Latin-1 keystrokes, and delivers an extensively rewritten
XmText
widget, which supports multibyte and wide-character format and
on-the-spot input style.
Motif supports multibyte and wide-character encoding through the use
of the internationalized X Library functions, and C Library functions.
In
addition, the compound string routines include the X11R6
XFontSet
component to allow for the creation of localized strings.
The User Interface Language (UIL) supports the creation of localized UID files through the -s compile-time option on the UIL compiler, which causes the compiler to construct localized strings.
Alternate input methods can be specified by a resource on the
VendorShell
widget.
Widgets that are parented by a
Shell
class widget can take advantage of this resource and register themselves
as using a specific method for input.
The following sections discuss internationalization features of Motif
widgets and internationalized client applications.
10.16.1 Internationalized Motif Widgets
The following lists contain the widgets in the Motif Toolkit and in the Extensions to the Motif Toolkit that support local language characters, I/O capabilities, and local language message displays.
Note that the Motif UIL compiler is extended to support local language characters in UIL files.
Motif Toolkit
Command
FileSelectionBox
Label
MessageBox
SelectionBox
Text
TextField
Extensions to Motif Toolkit
ColorMix
CSText
Help
Structured Visual Navigation (SVN)
10.16.2 Internationalized CDE Clients
CDE is the default desktop for Tru64 UNIX. The following CDE clients are internationalized:
Application Manager
Calculator
Calendar
Create Action
File Manager
Front Panel
Help Viewer
Icon Editor
Login Screen
Message
Mailer
Print Manager
Style Manager
Terminal Emulator
Trash Can
By default, client applications run in the language set by the user
at the start of a CDE session.
However, users can also change locale in a
terminal emulation window and invoke an application in a language different
from the session default.
10.16.3 Additional Internationalized Motif Clients
The operating system includes the following internationalized clients in addition to those common to all CDE implementations:
Differences
Keycap
DECterm
X Display Manager