Common Desktop Environment: Internationalization Programmer's Guide

1 Introduction to Internationalization

Contents of Chapter:

Overview of Internationalization

Current State of Internationalization
Internationalization Standards
Common Internationalization System

Locales

Fonts, Font Sets, and Font Lists

Font Specification
Font Set Specification
Font List Specification
Base Font Name List Specification

Text Drawing

Input Methods

Preedit Area
Status Area
Auxiliary Area
MainWindow Area
Focus Area

Interclient Communications Conventions (ICCC)

Internationalization is the designing of computer systems and applications for users around the world. Such users have different languages and may have different requirements for the functionality and user interface of the systems they operate. In spite of these differences, users want to be able to implement enterprise-wide applications that run at their sites worldwide. These applications must be able to interoperate across country boundaries, run on a variety of hardware configurations from multiple vendors, and be localized to meet local users' needs. This open, distributed computing environment is the reasoning behind common open software environments. The internationalization technology identified within this specification provides these benefits to a global market.

Overview of Internationalization

Multiple environments may exist within a common open system for support of different national languages. Each of these national environments is called a locale, which considers the language, its characters, fonts, and the customs used to input and format data. The Common Desktop Environment is fully internationalized such that any application can run using any locale installed in the system.

A locale defines the behavior of a program at run time according to the language and cultural conventions of a user's geographical area. Throughout the system, locales affect the following:

Encoding and processing of text data
Identifying the language and encoding of resource files and their text values
Rendering and layout of text strings
Interchanging text that is used for interclient text communication
Selecting the input method (which code set will be generated) and the processing of text data
Encoding and decoding for interclient text communication
Bitmap/icon files
Actions and file types
User Interface Definition (UID) files

An internationalized application contains no code that is dependent on the user's locale, the characters needed to represent that locale, or any formats (such as date and currency) that the user expects to see and interact with. The desktop accomplishes this by separating language- and culture-dependent information from the application and saving it outside the application.

Figure 1-1 shows the kinds of information that should be external to an application to simplify internationalization.

Figure 1-1 Information external to the application

By keeping the language- and culture-dependent information separate from the application source code, the application does not need to be rewritten or recompiled to be marketed in different countries. Instead, the only requirement is for the external information to be localized to accommodate local language and customs.

An internationalized application is also adaptable to the requirements of different native languages, local customs, and character-string encodings. The process of adapting the operation to a particular native language, local custom, or string encoding is called localization. A goal of internationalization is to permit localization without program source modifications or recompilation.

For a quick overview of internationalization, refer to X/Open CAE Specification System Interface Definition, Issue 4, X/Open Company Ltd., 1992, ISBN: 1-872630-46-4.

Current State of Internationalization

Previously, the industry supplied many variants of internationalization from proprietary functions to the new set of standard functions published by X/Open. Also, there have been different levels of enabling, such as simple ASCII support, Latin/European support, Asian multibyte support, and Arabic/Hebrew bidirectional support.

The interfaces defined within the X/Open specification are capable of supporting a large set of languages and territories, including:

Script: Description
Latin Language: Americas, Eastern/Western European
Greek: Greece
Turkish: Turkey
East Asia: Japanese, Korean, and Chinese
Indic: Thai
Bidirectional: Arabic and Hebrew

Furthermore, the goal of the Common Desktop Environment is that localization of these technologies (translation of messages and documentation and other adaptation for local needs) be done in a consistent way, so that a supported user anywhere in the world will find the same common localized environment from vendor to vendor. End users and administrators can expect a consistent set of localization features that provide a complete application environment for support of global software.

Internationalization Standards

Through the work of many companies, the functionality of the internationalization application program interface has been standardized over time to include additional requirements and languages, particularly those of East Asia. This work has been centered primarily in the Portable Operating System Interface for Computer Environments (POSIX) and X/Open specifications. The original X/Open specification was published in the second edition of the X/Open Portability Guide (XPG2) and was based on the Native Language Support product released by Hewlett-Packard. The latest published X/Open internationalization standard is referred to as XPG4.

It is important that each layer within the desktop use the proper set of standards interfaces defined for internationalization to ensure end users get a consistent, localized interface. The definition of a locale and the common open set of locale-dependent functions are based on the following specifications:

X Window System, The Complete Reference to Xlib, Xprotocol, ICCCM, XLFD - X Version, Release 5, Digital Press, 1992, ISBN 1-55558-088-2.
ANSI/IEEE Standard Portable Operating System Interface for Computer Environments, IEEE.
OSF Motif 1.2 Programmer' Reference, Revision 1.2, Open Software Foundation, Prentice Hall, 1992, ISBN 0-13-643115-1.
X/Open CAE Specification Commands and Utilities, Issue 4, X/Open Company Ltd., 1992, ISBN 1-872630-48-0.

Within this environment, software developers can expect to develop worldwide applications that are portable, can interoperate across distributed systems (even from different vendors), and can meet the diverse language and cultural requirements of multinational users supported by the desktop standard locales.

Common Internationalization System

Figure 1-2 shows a view of how internationalization is pervasive across a specific single-host system. The goal is that the applications (clients) are built to be shipped worldwide for the set of locales supported in the underlying system. Using standard interfaces improves access to global markets and minimizes the amount of localization work needed by application developers. In addition, country representatives can be ensured of consistent localization within systems adhering to the principles of the desktop.

Figure 1-2 Common internationalized system

Locales

Most single-display clients operate in a single locale that is determined at run time from the setting of the environment variable, which is usually $LANG or the xnlLanguage resource. Locale environment variables, such as LC_ALL, LC_CTYPE, and LANG, can be used to control the environment. See "Xt Locale Management" for more information.

The LC_CTYPE category of the locale is used by the environment to identify the locale-specific features used at run time. The fonts and input method loaded by the toolkit are determined by the LC_CTYPE category.

Programs that are enabled for internationalization are expected to call the XtSetLanguageProc() function (which calls setlocale() by default) to set the locale desired by the user. None of the libraries call the setlocale() function to set the locale, so it is the responsibility of the application to call XtSetLanguageProc() with either a specific locale or some value loaded at run time. If applications are internationalized and do not use XtSetLanguageProc(), obtain the locale name from one of the following prioritized sources to pass it to the setlocale() function:

A command-line option
A resource
The empty string ("")

The empty string makes the setlocale() function use the $LC_* and $LANG environment variables to determine locale settings. Specifically, setlocale (LC_ALL, "") specifies that the locale should be checked and taken from environment variables in the order shown in Table 1-1 for the various locale categories.

Table 1-1 Locale Categories

The toolkit already defines a standard command-line option (-lang) and a resource (xnlLanguage). Also, the resource value can be set in the server RESOURCE_MANAGER, which may affect all clients that connect to that server.

Fonts, Font Sets, and Font Lists

All X clients use fonts for drawing text. The basic object used in drawing text is XFontStruct, which identifies the font that contains the images to be drawn.

The desktop already supports fonts by way of the XFontStruct data structure defined by Xlib; yet, the encoding of the characters within the font must be known to an internationalized application. To communicate this information, the program expects that all fonts at the server are identified by an X Logical Font Description (XLFD) name. The XLFD name enables users to describe both the base characteristics and the charset (encoding of font glyphs). The term charset is used to denote the encoding of glyphs within the font, while the term code set means the encoding of characters within the locale. The charset for a given font is determined by the CharSetRegistry and CharSetEncoding fields of the XLFD name. Text and symbols are drawn as defined by the codes in the fonts.

A font set (for example, an XFontSet data structure defined by Xlib) is a collection of one or more fonts that enables all characters defined for a given locale to be drawn. Internationalized applications may be required to draw text encoded in the code sets of the locale where the value of an encoded character is not identical to the glyph index. Additionally, multiple fonts may be required to render all characters of the locale using one or more fonts whose encodings may be different than the code set of the locale. Since both code sets and charsets may vary from locale to locale, the concept of a font set is introduced through XFontSet.

While fonts are identified by their XLFD name, font sets are identified by a list of XLFD names. The list can consist of one or more XLFD names with the exception that only the base characteristics are significant; the encoding of the desired fonts is determined from the locale. Any charsets specified in the XLFD base name list are ignored and users need only concentrate on specifying the base characteristics, such as point size, style, and weight. A font set is said to be locale-sensitive and is used to draw text that is encoded in the code set of the locale. Internationalized applications should use font sets instead of font structs to render text data.

A font list is a libXm Toolkit object that is a collection of one or more font list entries. Font sets can be specified within a font list. Each font list entry designates either a font or a font set and is tagged with a name. If there is no tag in a font list entry, a default tag (XmFONTLIST_DEFAULT_TAG) is used. The font list can be used with the XmString functions found in the libXm Toolkit library. A font list enables drawing of compound strings that consist of one or more segments, each identified by a tag. This allows the drawing of strings with different base characteristics (for example, drawing a bold and italic string within one operation). Some non-XmString-based widgets, such as XmText of the libXm library, use only one font list entry in the font list. Motif font lists use the suffix : (colon) to identify a font set within a font list.

The user is generally asked to specify either a font list (which may contain either a font or font set) or a font set. In an internationalized environment, the user must be able to specify fonts that are independent of the code set because the specification can be used under various locales with different code sets than the character set (charset) of the font. Therefore, it is recommended that all font lists be specified with a font set.

Font Specification

The font specification can be either an X Logical Function Description (XLFD) name or an alias for the XLFD name. For example, the following are valid font specifications for a 14-point font:

-dt-application-medium-r-normal-serif-*-*-*-*-p-*-iso8859-1

-*-r-*-14-*iso8859-1

Font Set Specification

The font set specification is a list of names (XLFD names or their aliases) and is sometimes called a base name list. All names are separated by commas, with any blank spaces before or after the comma being ignored. Pattern-matching (wildcard) characters can be specified to help shorten XLFD names.

Remember that a font set specification is determined by the locale that is running. For example, the ja_JP Japanese locale defines three fonts (character sets) necessary to display all of its characters; the following identifies the set of Gothic fonts needed.

Example of full XLFD name list:

-dt-mincho-medium-r-normal--14-*-*-m-*-jisx0201.1976-0,
-dt-mincho-medium-r-normal--28-*-*-*-m-*-jisx0208.1983-0:

Example of single XLFD pattern name:
```
-dt-*-medium-*-24-*-m-*:
```

The preceding two cases can be used with a Japanese locale as long as fonts exist that match the base name list.

Font List Specification

A font list specification can consist of one or more entries, each of which can be either a font specification or a font set specification.

Each entry can be tagged with a name that is used when drawing a compound string. The tags are application-defined and are usually names representing the expected style of font; for example, bold, italic, bigbold. A null tag is used to denote the default entry and is associated with the XmFONTLIST_DEFAULT_TAG identifier used in XmString functions.

A font tag is identified when it is prefixed with an = (equal sign); for example, =bigbold (this matches the first font defined at the server). If an = is specified but there is no name following it, the specification is considered the default font list entry.

A font set tag is identified when it is prefixed with a : (colon); for example, :bigbold (this matches the first server set of fonts that satisfy the locale). If a : is specified but no name is given, the specification is considered the default font list entry. Within a font list entry specification, a base name list is separated by ; (semicolons) rather than by , (commas).

Example Font List Specification

For the Latin 1 locales, enter:

-*-r-*-14-*: ,\ # default font list entry

-*-b-*-18-*:bigbold # Large Bold fonts

Base Font Name List Specification

The base font name list is a list of base font names associated with a font set as defined by the locale. The base font names are in a comma-separated list and are assumed to be characters from the portable character set; otherwise, the result is undefined. Blank space immediately on either side of a separating comma is ignored.

Use of XLFD font names permits international applications to obtain the fonts needed for a variety of locales from a single locale-independent base font name. The single base font name specifies a family of fonts whose members are encoded in the various charsets needed by the locales of interest.

An XLFD base font name can explicitly name the font's charset needed for the locale. This enables the user to specify an exact font for use with a charset required by a locale, fully controlling the font selection.

If a base font name is not an XLFD name, an attempt is made to obtain an XLFD name from the font properties for the font.

The following algorithm is used to select the fonts that are used to display text with font sets.

For each charset required by the locale, the base font name list is searched for the first of the following cases that names a set of fonts that exist at the server.

The first XLFD-conforming base font name that specifies the required charset or a superset of the required charset in its CharSetRegistry and CharSetEncoding fields.
The first set of one or more XLFD-conforming base font names that specify one or more charsets that can be remapped to support the required charset. The Xlib implementation can recognize various mappings from a required charset to one or more other charsets and use the fonts for those charsets. For example, JIS Roman is ASCII with the ~ (tilde) and \ (backslash) characters replaced by the yen and overbar characters; Xlib can load an ISO8859-1 font to support this character set if a JIS Roman font is not available.
The first XLFD-conforming font name, or the first non-XLFD font name for which an XLFD font name can be obtained, combined with the required charset (replacing the CharSetRegistry and CharSetEncoding fields in the XLFD font name). In the first instance, the implementation can use a charset that is a superset of the required charset.
The first font name that can be mapped in some locale-dependent manner to one or more fonts that support imaging text in the charset.

For example, assume a locale requires the following charsets:

ISO8859-1
JISX0208.1983
JISX0201.1976
GB2312-1980.0

You can supply a base font name list that explicitly specifies the charsets, ensuring that specific fonts are used if they exist, as shown in the following example:

"-dt-mincho-Medium-R-Normal-*-*-*-*-*-M-*-JISX0208.1983-0,\ -dt-mincho-Medium-R-Normal-*-*-*-*-*-M- \ *-JISX0201.jisx0201\.1976-1,\ -dt-song-Medium-R-Normal-*-*-*-*-*-M-*-GB2312-1980.0,\ -*-default-Bold-R-Normal-*-*-*-*-M-*-ISO8859-1"

You can supply a base font name list that omits the charsets, which selects fonts for each required code set, as shown in the following example:

"-dt-Fixed-Medium-R-Normal-*-*-*-*-*-M-*,\ -dt-Fixed-Medium-R-Normal-*-*-*-*-*-M-*,\ -dt-Fixed-Medium-R-Normal-*-*-*-*-*-M-*,\ -*-Courier-Bold-R-Normal-*-*-*-*-M-*"

Alternatively, the user can supply a single base font name that selects from all available fonts that meet certain minimum XLFD property requirements, as shown in the following example:

"-*-*-*-R-Normal--*-*-*-*-*-M-*"

Text Drawing

The desktop provides various functions for rendering localized text, including simple text, compound strings, and some widgets. These include functions within the Xlib and Motif libraries.

Input Methods

The Common Desktop Environment provides the ability to enter localized input for an internationalized application that is using the Xm Toolkit. Specifically, the XmText[Field] widgets are enabled to interface with input methods provided by each locale. In addition, the dtterm client is enabled to use input methods.

By default, each internationalization client that uses the libXm Toolkit uses the input method associated with a locale specified by the user. The XmNinputMethod resource is provided as a modifier on the locale name to allow a user to specify any alternative input method.

The user interface of the input method consists of several elements. The need for these areas is dependent on the input method being used. They are usually needed by input methods that require complex input processing and dialogs. See Figure 1-3 for an illustration of these areas.

Figure 1-3 Example of VendorShell widget with auxiliary (Japanese)

Preedit Area

A preedit area is used to display the string being preedited. The input method supports four modes of preediting: OffTheSpot, OverTheSpot (default), Root, and None.

Note: A string that has been committed cannot be reconverted. The status of the string is moved from the preedit area to the location where the user is entering characters..

OffTheSpot

In OffTheSpot mode preediting using an input method, the location of preediting is fixed at just below the MainWindow area and on the right side of the status area as shown in Figure 1-4. A Japanese input method is used for the example.

Figure 1-4 Example of OffTheSpot preediting with the VendorShell widget (Japanese)

In the system environment, when preediting using an input method, the preedit string being preedited may be highlighted in some form depending on the input method.

To use OffTheSpot mode, set the XmNpreeditType resource of the VendorShell widget either with the XtSetValues() function or with a resource file. The XmNpreeditType resource can also be set as the resource of a TopLevelShell, ApplicationShell, or DialogShell widget, all of which are subclasses of the VendorShell widget class.

OverTheSpot (Default)

In OverTheSpot mode, the location of the preedit area is set to where the user is trying to enter characters (for example, the insert cursor position of the Text widget that has the current focus). The characters in a preedit area are displayed at the cursor position as an overlay window, and they can be highlighted depending on the input method.

Although a preedit area may consist of multiple lines in OverTheSpot mode. The preedit area is always within the MainWindow area and cannot cross its edges in any direction.

Keep in mind that although the preEdit string under construction may be displayed as though it were part of the Text widget's text, it is not passed to the client and displayed in the underlying edit screen until preedit ends. See Figure 1-5 for an illustration.

To use OverTheSpot mode explicitly, set the XmNpreeditType resource of the VendorShell widget either with the XtSetValues() function or with a resource file. The XmNpreeditType resource can be set as the resource of a TopLevelShell, ApplicationShell, or DialogShell widget because these are subclasses of the VendorShell widget class.

Figure 1-5 Example of OverTheSpot preediting with the VendorShell widget (Japanese)

Root

In Root mode, the preedit and status areas are located separate from the client's window. The Root mode behavior is similar to OffTheSpot. See Figure 1-6 for an illustration.

Figure 1-6 Example of Root preediting with the VendorShell widget (Japanese)

Status Area

A status area reports the input or keyboard status of the input method to the users. For OverTheSpot and OffTheSpot styles, the status area is located at the lower left corner of the VendorShell window.

If Root style, the status area is placed outside the client window.
If the preedit style is OffTheSpot mode, the preedit area is displayed to the right of the status area.

The VendorShell widget provides geometry management so that a status area is rearranged at the bottom corner of the VendorShell window if the VendorShell window is resized.

Auxiliary Area

An auxiliary area helps the user with preediting. Depending on the particular input method, an auxiliary area can be created. The Japanese input method in Figure 1-3 creates the following types of auxiliary areas:

ZENKOUHO
JIS NUMBER
Switching conversion method
- SAKIYOMI-REN-BUNSETSU
- IKKATSU-REN-BUNSETSU
- TAN-BUNSETSU
- FUKUGOU-GO

MainWindow Area

A MainWindow area is the widget used as the working area of the input method. In the system environment, the sole child of the VendorShell widget is the MainWindow widget. It can be any container widget, such as a RowColumn widget. The user creates the container widget as the child of the VendorShell widget.

Focus Area

A focus area is any descendant widget under the MainWindow widget subtree that currently has focus. The Motif application programmer using existing widgets does not need to worry about the focus area. The important information to remember is that only one widget can have input method processing at a time. The input method processing moves to the window (widget) that currently has the focus.

Interclient Communications Conventions (ICCC)

The Interclient Communications Conventions (ICCC) defines the mechanism used to pass text between clients. Because the system is capable of supporting multiple code sets, it may be possible that two applications that are communicating with each other are using different code sets. ICCC defines how these two clients agree on how the data is passed between them. If two clients have incompatible character sets (for example, Latin1 and Japanese (JIS)), some data may be lost when characters are transported.

However, if two clients have different code sets but compatible character sets, ICCC enables these clients to pass information with no data lost. If code sets of the two clients are not identical, CompoundText encoding is used as the interchange with the COMPOUND_TEXT atom used. If data being communicated involves only portable characters (7-bit, ASCII, and others) or the ISO8859-1 code set, the data is communicated as is with no conversion by way of the XA_STRING atom.

Titles and icon names need to be communicated to the Window Manager using the COMPOUND_TEXT atom if nonportable characters are used; otherwise, the XA_STRING atom can be used. Any other encoding is limited to the ability to convert to the locale of the Window Manager. The Window Manager runs in a single locale and supports only titles and icon names that are convertible to the code set of the locale under which it is running.

The libXm library and all desktop clients should follow these conventions.

Generated with CERN WebMaker