This chapter describes the internationalization features of Tru64 UNIX. The first section provides a brief internationalization overview (Section 10.1), after which the following topics are discussed:
Supported languages (Section 10.2)
Using the
localedef
utility to create locales
(Section 10.3)
The enhanced terminal subsystem for Asian languages (Section 10.4)
Enhanced sorting for Asian languages (Section 10.6)
The Multilingual Emacs editor (Section 10.7)
Support for user-defined characters in Chinese, Japanese, and Korean (Section 10.8)
Converting text from one codeset to another (Section 10.9)
Support for the Unicode and ISO 10646 standards (Section 10.10)
Support for the euro currency symbol (Section 10.11)
The internationalized Curses library (Section 10.12)
Internationalized printing (Section 10.13)
The graphical internationalization configuration tool (Section 10.14)
Support for 8-bit character encoding in mail programs (Section 10.15)
Enhancements to the
file
command (Section 10.16)
Internationalization for graphical applications (Section 10.17)
The term "internationalization" is formally defined by The Open Group as a
"provision within a computer program of the capability of making itself adaptable to the requirements of different native languages, local customs, and coded character sets"
This essentially means that internationalized programs can run in any supported locale without having to be modified. A locale is a software environment that correctly handles the cultural conventions of a particular geographic area, such as China or France, and a language as it is used in that area. So by selecting a Chinese locale, for example, all commands, system messages, and keystrokes can be in Chinese characters and displayed in a way appropriate for Chinese.
Tru64 UNIX is an internationalized operating system that not only allows users to interact with existing applications in their native language, but also supports a full set of application interfaces, referred to as the Worldwide Portability Interfaces (WPI), to enable software developers to write internationalized applications. The original code for these interfaces came from the Open Software Foundation (OSF) and has been enhanced.
The internationalization support in the operating system conforms to The Open Group's CAE specifications for system interfaces and headers (XSH Issue 5), curses (XCURSES Issue 4.2), and commands and utilities (XCU Issue 5). These specifications align with current POSIX and ISO C standards. This conformance ensures that commands, utilities, and libraries have been internationalized, and their corresponding message catalogs have been included in the base system.
In addition, the operating system supports the X Input Method (XIM) and X Output Method(XOM) to facilitate input of local language characters, text drawing, measurement, and interclient communication. These functions are implemented according to the X11R6.3 specification and include some problem corrections specified by X11R6.4.
Note that the operating system also supports a 32-bit
wchar_t
datatype which in turn enables support for a wide array of codesets,
including the one defined by the ISO 10646 standard.
For more comprehensive information, see
Writing Software for the International Market.
10.2 Supported Languages
Table 10-1 lists the languages supported by the operating system and their corresponding locales. Most locales are included in Worldwide Language Support (WLS) subsets that are optionally installed. Some, as indicated in the table, are part of the mandatory base operating system.
Locale names
that include
@ucs4
and
UTF-8
support
character encoding as defined by the ISO 10646 standard.
The
wchar_t
data type is 32 bits in length, with zero padding of leading bits for those
character values that do not require all 32 bits.
The first 256 values
of the Universal Character Set (UCS) define the same characters as defined
for the ISO 8859-1 (Latin-1) character set.
Therefore, the Tru64 UNIX
implementation of the
wchar_t
data type is identical to
UCS-4 process code for all
ISO8859-1
locales.
ISO8859-1
locales differ from
UTF-8
locales
with the same base name in terms of data file encoding and the fact that only
the
UTF-8
locales support the euro currency symbol.
The English locale name that includes
cp850
supports character encoding in PC code-page format.
For the most up-to-date list of supported languages and locales, refer
to the
l10n_intro
(5)
reference page.
This book may not be updated for minor functional
releases of the operating system and locales are sometimes added for such
releases.
Table 10-1: Languages and Locales
Language | Locale Name |
Catalan |
|
Simplified Chinese (PRC) |
|
Traditional Chinese (Hong Kong) |
|
Traditional Chinese (Taiwan) |
|
Czech |
|
Danish |
|
Dutch |
|
Belgian Dutch |
|
US English/ASCII | C
(POSIX)
[Footnote 2] |
US English |
|
GB English |
|
European | en_EU.UTF-8@euro
[Footnote 4]
|
Finnish |
|
French |
|
Belgian French |
|
Canadian French |
|
Swiss French |
|
German |
|
Swiss German |
|
Greek |
|
Hebrew |
|
Hungarian |
|
Icelandic |
|
Italian |
|
Japanese |
|
Korean |
|
Lithuanian |
|
Norwegian |
|
Polish |
|
Portuguese |
|
Russian |
|
Slovak |
|
Slovene |
|
Spanish |
|
Swedish |
|
Thai |
|
Turkish |
|
Note that you can switch languages or character sets as necessary and can even operate multiple processes in different languages or codesets in the same system at the same time.
For more information on a particular coded character set, such as
ISO8859-9
, see the reference page with the same name.
For more information
about
UCS-4
and
UTF-8
encoding, see
Unicode
(5).
For more information about PC code pages, see
code_page
(5).
10.3 Locale Creation
The
localedef
utility allows programmers to create their own locales, compile
their source code, and generate a unique name for their new locale.
For more information on creating locales, see
Writing Software for the International Market.
10.4 Enhanced Terminal Subsystem for Asian Languages
The base
tty
terminal
driver subsystem is extended to include additional BSD line disciplines and
STREAMS terminal driver modules for processing data in Chinese, Japanese,
Korean, and Thai.
For example, the enhanced terminal subsystem supports the
following capabilities for these languages:
Japanese Kana-Kanji conversion input method
Character-based line processing in cooked mode
Input line history and editing (BSD line discipline only)
Software on-demand-loading for user-defined characters
Conversion between terminal code and application code
For more information about the Asian and Thai terminal subsystems, see
the
atty
(7),
ttty
(7), and
stty
(1)
reference pages.
10.5 Enhanced wwconfig Command
The
wwconfig
command configures terminal (tty) options
for Asian countries.
The command has been enhanced to include the following:
The
-vmunix
and
-kernel
options,
which display I18N modules that are statically linked into
/vmunix
or currently used by the running kernel
The
-pty
option, which specifies the use of
a BSD or streams pseudo-driver for remote logins and
telnet
sessions
The -config option, which specifies a kernel configuration file
The-utx option, which specifies the addition of Kana-Kanji, On Demand Loading, or Software Phrase Input as Asian tty driver options
The -code option, which specifies the addition of BIG5, Mitac, Chinese mapping support, and UTF-8 character set support as Asian tty driver options
The -[no]thai option, which includes or excludes the Thai tty driver
The -utxnum option, which specifies the number of utx pseudo-devices to be created
For information on the use and specification of these options,
see the
wwconfig
(8)
reference page.
10.6 Enhanced Sorting for Asian Languages
The operating system supports the
asort
utility, an extension of the
sort
command,
which allows characters of ideogrammatic languages, like Chinese and Japanese,
to be sorted according to multiple collation sequences.
For more information
on the
asort
utility, see
asort
(1).
10.7 Multilingual Emacs Editor
The operating
system supports the Multilingual Emacs editor (MULE) for Asian languages.
See
mule
(1)
for more information.
10.8 Support for User-Defined Characters
The operating system provides
support for creating user-defined characters (UDCs) in Chinese, Japanese,
and Korean, so that users can create and define character fonts and their
attributes, including bitmap fonts, with the
cedit
and
cgen
utilities.
Font rendering facilities are available so that X clients can use UDC databases through the X server or font server to obtain bitmap fonts for user-defined characters.
For more information on user-defined characters, see
Writing Software for the International Market,
cedit
(1)
and
cgen
(1).
10.9 Codeset Conversion
The
operating system includes the
iconv
utility and the
iconv_open()
,
iconv()
, and
iconv_close()
functions, which convert text from one codeset to another, thereby
assisting programmers in the writing of international applications.
For use
with these interfaces, the operating system includes a large set of codeset
converters.
A new
en_US.UTF-8 X
locale database file contains
font definitions that include all the various fonts used with the operating
system.
Thus, applications running under the
en_US.UTF-8
locale can display all the font characters installed with Worldwide Language
Support (WLS).
Applications running under the Asian locales display all of
the WLS installed fonts, except for
ISO8859-2
,
-4
,
-5
,
-7
,
-8
,
-9
, and
TACTIS
.
In addition to conversion between different codesets for the same language,
these converters support conversion between different Unicode formats, such
as
UCS-2
,
UCS-4
, and
UTF-8
.
There are also codeset converters that handle the most commonly
used PC code-page formats.
Codeset conversion is also used by the printing subsystem and utilities,
such as
man
, to allow processing of files in different
languages and encoding formats.
Additionally, codeset conversion is implemented
in mail utilities for mail interchange with systems using different codesets
and in the X Windows System Toolkit for text input, drawing, and interclient
communication.
For more information on codeset conversion, see the
iconv_intro
(5)
reference page.
See the
Unicode
(5)
and
code_page
(5)
reference
pages for a discussion of converters for Unicode encoding formats and PC code-page
formats, respectively.
10.10 Unicode Support
The operating system provides both codeset converters
and locales that support the Unicode and ISO 10646 standards.
The codeset
converter modules convert between other supported codesets and
UCS-2
,
UCS-4
, and
UTF-8
formats.
In addition to the country-specific and language-specific locales listed in
Section 10.2, programmers can use the
universal.UTF-8
locale to process characters in all languages by using
UCS-4
encoding format.
The operating system provides a function called
fold_string_w()
, which maps one Unicode string to another and performs the specified
Unicode transformations.
For more information on the
fold_string_w()
function, see
fold_string_w
(3).
For more information on Unicode
support, see
Unicode
(5).
10.11 Support for the Euro Character
The operating system supports the new euro currency now being used by member countries of the European Economic and Monetary Union (EMU).
Locales that use the UTF-8 or Latin-9 (ISO8859-15)
codesets support the euro character.
Those locales whose names include the
@euro
suffix also define the local currency sign to be the euro
character and the international currency sign to be
EUR
.
See
Section 10.2
for more information about locales.
The ISO Latin-9 codeset forms the basis of euro font support.
The operating
system includes both screen and PostScript outline fonts for this codeset.
See
ISO8859-15
(5)
for a list of these fonts.
The operating system does not provide native Unicode fonts that support
the euro character.
However, the X font library has been extended to combine
a number of fonts together to form logical Unicode fonts for applications
to use.
The names of these logical fonts include the string
ISO10646-1
.
Printer support for the euro character is enabled by a generic PostScript
print filter,
wwpsof
, which supports printing of file data
in UTF-8 or Latin-9 format.
See
wwpsof
(8)
for information on setting up
printers with this print filter.
In the UTF-8 and Latin-9 locales, keyboard entry of the euro character
is supported by language-specific and keyboard-specific key sequences that
are defined in keymaps (XKB format).
The euro character also can be entered
by using a Compose key sequence on those keyboards that support a Compose
key.
The
euro
(5)
reference page lists these key sequences.
Finally, the operating system provides codeset converters to convert file data between the various encoding formats that support the euro character. Specifically, codeset converters can convert file data between:
Unicode encoding formats and PC code-page formats that support the euro
Unicode encoding formats and
ISO8859-15
encoding
See
euro
(5)
for a more detailed discussion of the information in this
section.
10.12 Internationalized Curses Library
The operating system supplies an internationalized
Curses library in conformance with X/Open Curses, Issue 4 Version 2.
This
library provides functions for processing characters that span one or multiple
bytes.
These characters may be in either wide-character (wchar_t
) or complex-character (cchar_t
) formats.
The
complex-character format provides for a single logical character made up of
multiple wide characters.
Some of the components of the complex character
may be nonspacing characters.
For information on the syntax and effect of Curses interfaces, see
curses
(3).
For a description of the enhancements provided by the internationalized Curses
routines, and their relationship to previous Curses routines, see
Writing Software for the International Market.
10.13 Internationalized Printing
The operating system supports the printing of plain text and PostScript files for a variety of languages and provides outline fonts for high quality printing on PostScript printers. In addition to print filters for a variety of local-language printers, generic internationalized print filters are available for use with both Compaq and third-party printers.
One of these filters,
wwpsof
, supports printing of
local-language files on PostScript printers that do not include the required
fonts.
For more information on internationalized printing features, see the
i18n_printing
(5),
pcfof
(8), and
wwpsof
(8)
reference pages.
10.14 Graphical Internationalization Configuration Tool
The I18N Configuration tool, available through the Application Manager, is one of the CDE System Administration Configuration applications.
The I18N Configuration tool provides a graphical interface for the system
administrator to configure I18N-specific settings.
It also provides a convenient
way to see which countries, locales, fonts, and keymaps are supported on the
host system.
System administrators can also use this tool to remove unused
fonts and country support from the system.
10.15 Mail and 8-Bit Character Support
By default,
the operating system provides support for 8-bit character encoding in
mailx
,
dtmail
,
MH
, and
comsat
.
For more information on these mail utilities, see
mailx
(1),
dtmail
(1),
mh
(1),
and
comsat
(8).
10.16 Enhanced file Command
The
file
command is enhanced to recognize UCS-2
and UCS-4 encoding in any locale setting.
For other encoding formats, the
command recognizes file data encoding if it is valid for the current locale
setting.
This command also has a
jfile
alias that, in any
locale, can recognize DEC Kanji, Japanese EUC, Shift JIS, and 7-bit
JIS encoding.
10.17 Internationalization for Graphical Applications
Motif Version 1.2.3 takes advantage of many of the internationalization
features of X11R6 and the C library to support locales.
Motif Version 1.2.3
also supports the use of alternate input methods, which allows input of non-ISO
Latin-1 keystrokes, and delivers an extensively rewritten
XmText
widget, which supports multibyte and wide-character format and
on-the-spot input style.
Motif supports multibyte and wide-character encoding through the use
of the internationalized X Library functions, and C Library functions.
In
addition, the compound string routines include the X11R6
XFontSet
component to allow for the creation of localized strings.
The User Interface Language (UIL) supports the creation of localized UID files through the -s compile-time option on the UIL compiler, which causes the compiler to construct localized strings.
Alternate input methods can be specified by a resource on the
VendorShell
widget.
Widgets that are parented by a
Shell
class widget can take advantage of this resource and register themselves
as using a specific method for input.
The following sections discuss internationalization features of Motif
widgets and internationalized client applications.
10.17.1 Internationalized Motif Widgets
The following lists contain the widgets in the Motif Toolkit and in the Extensions to the Motif Toolkit that support local language characters, I/O capabilities, and local language message displays.
Note that the Motif UIL compiler is extended to support local language characters in UIL files.
Motif Toolkit
Command
FileSelectionBox
Label
MessageBox
SelectionBox
Text
TextField
Extensions to Motif Toolkit
ColorMix
CSText
Help
Structured Visual Navigation (SVN)
10.17.2 Internationalized CDE Clients
CDE is the default desktop for Tru64 UNIX. The following CDE clients are internationalized:
Application Manager
Calculator
Calendar
Create Action
File Manager
Front Panel
Help Viewer
Icon Editor
Login Screen
Message
Mailer
Print Manager
Style Manager
Terminal Emulator
Trash Can
By default, client applications run in the language set by the user
at the start of a CDE session.
However, users can also change locale in a
terminal emulation window and invoke an application in a language different
from the session default.
10.17.3 Additional Internationalized Motif Clients
The operating system includes the following internationalized clients in addition to those common to all CDE implementations:
Differences
Keycap
DECterm
X Display Manager