3 Creating and Using Message Catalogs

A message catalog, like the langinfo database, is a file of localization data that programs can access. The difference between the two sets of localization data is that data elements in the langinfo database are used by all applications, including the library routines, commands, and utilities provided by the operating system. The langinfo database is generated from the source files that define locales. Message catalogs, on the other hand, meet the specific localization needs of one program or a set of related programs. Message catalogs are generated from message text source files that contain error and informational messages, prompts, background text for forms, and miscellaneous strings and constants that must vary for language and cultural reasons.

X and Motif applications, which include a graphical user interface, usually access X resource files, rather than message catalogs, for the small segments of text that belong to the title bars, menus, buttons, and simple messages for a particular window. Motif applications can also use a User Interface Language (UIL) file, along with a text library file, to access help, error message, and other kinds of text. However, both X and Motif applications can access text in message catalogs as well.

This chapter focuses on message catalogs and explains how to:

Create, edit, extract, and translate message text source files

Generate message catalogs

Access message catalogs interactively and from scripts

Access message catalogs from programs

Refer to the OSF/Motif Programmer's Guide for information on handling text with Motif routines in internationalized applications. Refer to X Window System for information about using text from message catalogs with X routines. For X and Motif programmers, Section 3.1.6 of this chapter includes some guidelines that apply to text that will be translated, regardless of the method used to retrieve and display it.

3.1 Creating Message Text Source Files

Before creating and using a message catalog, you must first understand the components, syntax, and semantics of a message text source file. A brief overview of a source file example can help provide context for later sections that focus on particular kinds of file entries and processing operations. Example 3-1 shows extracts from a message text source file for the online example xpg4demo.

Example 3-1: Message Text Source File

$ /*   [1]
$  * XPG4 demo program message catalogue.    [1]
$  *   [1]
$  */  [1]
[2]
$quote "   [3]
$set MSGError   [4]
E_COM_EXISTBADGE        "Employee entry for badge number %ld \   [5]
already exists"
E_COM_FINDBADGE         "Cannot find badge number %ld"   [5]
E_COM_INPUT             "Cannot input"   [5]
E_COM_MODIFY            "Data file contains no records to modify"  [5]
E_COM_NOENT             "Data file contains no records to display" [5]
E_COM_NOTDEL            "Data file contains no records to delete"  [5]

.
.
.
$set MSGInfo   [4]
I_COM_NEWEMP            "New employee"    [5]
I_COM_YN_DELETE         "Do you want to delete this record?"   [5]
I_COM_YN_MODIFY         "Do you want to modify this record?"   [5]
I_COM_YN_REPLACE        "Are these the changes you want to make?"  [5]

.
.
.
I_SCR_IN_DATE_FMT       "%2$d/%3$d/%1$d"   [6]
$set MSGString   [4]
$
$ One-character commands.
$S_COM_CREATE           "c"    [7]
S_COM_DELETE            "d"    [7]
S_COM_EXIT              "e"    [7]
 

.
.
.
S_COM_LIST_TITLE        "Badge         Name                  Surname \
               CC      DOJ\n"    [8]
S_COM_LIST_LINE         "--------------------------------------------\
---------------------------------\n"    [8]

.
.
.
$
$ If surname comes before first name, "y" should be specified.
$
S_SCR_SNAME1ST          "n"    [9]

.
.
.

Lines that begin with the dollar sign ($), followed by either a space or tab, are comment lines. Section 3.1.5 discusses comment lines. [Return to example]

To improve readability, blank lines are allowed anywhere in the file. [Return to example]

This line specifies the quote character used to delimit message text. Section 3.1.4 discusses quote directives. [Return to example]

These lines define identifiers that mark the beginning of a message set. There are three sets of messages in this source file: error messages (in the MSGError set), informational messages (in the MSGInfo set), and miscellaneous strings and formats (in the MSGString set). Refer to Section 3.1.2 for more information about defining and removing message sets. [Return to example]

Most lines in the source file are message entries, whose components are a unique identifier and a message text string. The first message entry is continued to the next line by using the backslash (\). Other entries contain special character sequences, such as \n (newline), that affect how the message is printed. Refer to Section 3.1.3 for more information about message entries. Section 3.1.1 also discusses some rules and options that apply to message entries. [Return to example]

This entry allows translators to vary the order in which users are prompted to input date elements. Note that you frequently use message entries to allow format control. [Return to example]

Message entries such as these define word abbreviations, which often need special attention to preserve uniqueness from one language to another. [Return to example]

Message entries also define header lines for menu displays so that translators can adjust the field order and line length to match other adjustments that the program allows for cultural variation. [Return to example]

In the xpg4demo program, you can change the order of first and last name (surname). This message entry defines a constant whose value controls how the program orders name fields. [Return to example]

You can use one or more message text source files to create message catalogs (.cat files) that programs can access at run time. To create a message catalog from the source file in Example 3-1:

Use the mkcatdefs command to convert symbolic identifiers for message sets and messages to numbers that indicate the ordinal positions of the message sets within the catalog and of messages within each set.

Use the gencat command to create the message catalog from mkcatdefs output.

Section 3.4 discusses the mkcatdefs and gencat commands.

3.1.1 General Rules

This section contains general guidelines that apply to message text source files. A message text source file (.msg file) comprises sequences of messages. Optionally, you can order these messages within one or more message sets. For a given application, there are usually separate message source files for each localization; for example, there are source files for each locale (each combination of codeset, language, and territory) with which users can run the application.

If you do not quote values for identifiers, specify a single space or tab, as defined by the source codeset, to separate fields in lines of the source file. Otherwise, the extra spaces or tabs are treated as part of the value. Using the character specified in a quote directive to delimit all message strings prevents extra spaces or tabs between the identifier and the string from being treated as part of the string. Quoting message strings is also the only way to indicate that the message text includes a trailing space or tab.

Message text strings can contain ordinary characters, plus sequences for special characters as shown in Table 3-1.

Table 3-1: Coding of Special Characters in Message Text Source Files

Description	Symbol	Coding Sequence
Newline	NL (LF)	\n
Horizontal tab	HT	\t
Vertical tab	VT	\v
Backspace	BS	\b
Carriage return	CR	\r
Form feed	FF	\f
Backslash	\	\\
Octal value	ddd	\ddd ^{[Footnote 1]}
Hexadecimal value	dddd	\xdddd ^{[Footnote 2]}

A backslash in a message file is ignored when followed by coding sequences other than those described in Table 3-1. For example, the sequence \m prints in the message as m. When you use octal or hexadecimal values to represent characters, include leading zeros if the characters following the numeric encoding of the special character are also valid octal or hexadecimal digits. For example, to print $5.00 when 44 is the octal number for the dollar sign, you must specify \0445.00 to prevent the 5 from being parsed as part of the octal value.

A newline character normally separates message entries; however, you can continue the same message string from one line to another by entering a backslash before the newline character. In this context, entering a newline character means pressing the Return or Enter key on English keyboards. For example, the following two entries are equivalent and do not affect how the string appears to the program user:

MSG_ID        This line continues \
to the next line.
MSG_ID        This line continues to the next line.

Any empty lines in a message source file are ignored; you are therefore free to use blank lines wherever you choose to improve the readability of the file.

3.1.2 Message Sets

Message sets are an optional component within message text source files. You can use message sets to group messages for any reason. In an application built from multiple program source files, you can create message sets to organize messages by program module or, as done for the online example xpg4demo, group messages that belong to the same semantic category (error, informational, defined strings). An advantage of grouping messages by program module is that, should the module later be removed from the application, you can easily find and delete its messages from the catalog. Grouping messages by semantic category supports message sharing among modules of the same application; when messages are grouped by semantic category, programmers writing new modules or maintaining existing modules for an application can easily determine if a message meeting their needs already exists in the file.

A set directive specifies the set identifier of subsequent messages until another set directive or end-of-file is encountered. Set directives have the following format:

$SET set_id [comment]

The set_id variable can be one of the following:

A number in the range [1 - NL_SETMAX]
The NL_SETMAX constant is defined in the /usr/include/limits.h file. Numeric set identifiers must occur in ascending order within the source file; however, the numbers need not be contiguous values. Furthermore, set identifier numbers must occur in ascending order from one source file to the next when multiple message source files are processed by the gencat command to create a message catalog.

A user-defined symbolic identifier, such as MSGErrors
When you specify symbolic set identifiers, you must use the mkcatdefs command to convert the symbols to the numeric set identifiers required by the gencat command.

Any characters following the set identifier are treated as a comment.

If the message-text source file contains no set directives, all messages are assigned to a default message set. The numeric value for this set is defined by the constant NL_SETD in the /usr/include/nl_types.h file. When a program calls the catgets() function to retrieve a message from a catalog that has been generated from sources that do not contain set directives, the NL_SETD constant is specified on the call as the set identifier.

Note

Do not specify NL_SETD in a set directive of a message text source file or try to mix default and user-defined message sets in the same message catalog. Doing so can result in errors from the mkcatdefs or gencat utility. Furthermore, the value assigned to the NL_SETD constant is vendor defined; using NL_SETD as a symbolic identifier in the message text source file can result in mkcatdefs output that is not portable from one system to another.

The rest of this section discusses entries that delete message sets from an existing message catalog. Section 3.4.3 addresses the topic of catalog maintenance more generally.

Message text source files can contain delset directives, which are used to delete message sets from existing message catalogs. The delset directive has the following format:

$delset n [comment]

The n variable must be the number that identifies the set in the existing catalog to the gencat command. Unlike the case for the set directive, you cannot specify symbolic set identifiers in delset directives. When message files are preprocessed using the mkcatdefs command, you have the option of creating a separate header file that equates your symbolic identifiers with the set numbers and message numbers assigned by the mkcatdefs utility. If you later want to delete one of the message sets, you first refer to this header file to find the number that corresponds to the symbolic identifier for the set you want to delete. This is the number that you specify in the delset directive to delete that set.

Suppose that you are removing program module a_mod.c from an application whose associated message text source file is appl.msg. Messages used only by a_mod.c are contained in the message set whose symbolic identifier is A_MOD_MSGS. The file appl_msg.h contains the following definition statement:


.
.
.
#define A_MOD_MSGS 2

.
.
.

The associated delset directive could then be:

$delset 2   Removing A_MOD_MSG set for a_mod.c in appl.cat.

You can specify delset directives either in a source file by themselves or as part of a more general message source file revision that includes both delset and set directives. In the latter case, make sure that multiple directives occur in ascending order according to the specifier.

Assume that the preceding example is contained in a single-directive source file named kill_mod_a_msgs.msg and existing message catalogs reside in the /usr/lib/nls/msg directory. In this case, the following ksh loop would carry out the message set deletion in catalogs for all locales:

for i in /usr/lib/nls/msg/*/appl.cat
do
        gencat $i kill_mod_a_msgs.msg
done

3.1.3 Message Entries

A message entry has the following format:

msg_id message_text

The msg_id can be either of the following:

A number in the range [1 - NL_MSGMAX]
The constant NL_MSGMAX is defined in the /usr/include/limits.h file. Message numbers are associated with the message set defined by the preceding set directive or, if not preceded by a set directive, with the default message set NL_SETD, a constant defined in the /usr/include/nl_types.h file. Message numbers must occur in ascending order within a message set; however, the numbers need not be contiguous values. If message numbers are not in ascending order within a set, the gencat command returns an error on attempts to generate a message catalog from the source file.

A user-defined symbolic name, for example, ERR_INVALID_ID
When a message text source file contains symbolic names, you must use the mkcatdefs command to convert the symbolic names to numbers that the gencat command can process.

The message_text is a string that the program refers to by msg_id. You can quote this string if a quote directive enables a quotation character before the message entry is encountered. Section 3.1.1 discusses the advantages of quoting message text. Section 3.1.4 lists the rules for quote directives.

The total length of message_text cannot exceed NL_TEXTMAX bytes. The constant NL_TEXTMAX is defined in the /usr/include/limits.h file.

The rest of this section discusses entries that delete specific messages from an existing message catalog. Refer to Section 3.4.3 for a general discussion of message catalog maintenance.

To delete a particular message from an existing message catalog, enter the identifier for the message on a line by itself. This type of entry allows you to delete a message without affecting the ordinal position of subsequent messages. For the message deletion to be carried out correctly, use the following guidelines:

Specify a numeric message identifier.
If you usually use symbolic identifiers in your message text source files, you can obtain the associated numbers from the message header file that is produced when the source file was last processed by the mkcatdefs command. Unlike the case for deleting message sets with the delset directive, mkcatdefs does not generate an error if you use a symbolic message identifier to delete a message; however, you will delete the wrong message if the symbol is not preceded by the same number of message entries as is in the catalog.

The identifier cannot be followed by any character other than a newline. If msg_id is followed by a space or tab separator, the message is not deleted; rather, the message text is revised to be an empty string.

If the catalog contains user-defined message sets, make sure the appropriate set directive precedes the entry to delete the message; otherwise, the message may be deleted from the wrong message set. For reasons similar to those noted for message identifiers in step 1, use a numeric rather than symbolic set identifier in the set directive.

Use only the gencat command to process the file if you are not replacing all messages in a set. The mkcatdefs utility generates a delset directive before each set directive you specify in the input file. This is helpful when you want to replace all messages in a message set, but it will not produce the results you intend if your input source refers only to one or two messages that you want to delete.

The following example shows message text source input that could be specified to the gencat command to delete message 5 from message set 2:

$set 2
5

If this source input were preprocessed by the mkcatdefs command, the addition of the delset directive would result in all messages in set 2 being deleted from the message catalog:

$delset 2
$set 2
5

3.1.4 Quote Directive

A quote directive specifies or disables a quote character that you use to surround message text strings. The quote directive has the following format:

$quote[ c ]

The c variable is the character to be recognized as the message string delimiter. In the following example, the quote directive specifies the double quotation mark as the message string delimiter:

$quote "

By default, or if c is omitted, quoting of message text strings is not recognized.

A source text message file can contain more than one quote directive, in which case each directive affects the message entries that follow it in the file. Usually, however, a message file contains only one quote directive, which occurs before the first message entry.

3.1.5 Comment Lines

A line beginning with the dollar sign ($), followed by a space or tab, is treated as a comment. Neither the mkcatdefs nor the gencat commands further interpret the line.

Remember that message files may be translated by individuals who are not programmers. Be sure to include comment lines with instructions to translators on how to handle message entries whose strings contain literals and substitution format specifiers. For example:

$ Note to translators: Translate only the text that is within
$ quotation marks ("text text text") on a given line.
$ If you need to continue your translation onto the next line,
$ type a backslash (\) before pressing the newline
$ (Return or Enter) key to finish the message.
$ For an example of line continuation, see the
$ line that starts with the message identifier E_COM_EXISTBADGE.

.
.
.
$ Note to translator: When users see the following message, a badge
$ number appears in place of the %ld directive.
$ You can move the %ld directive to another position
$ in the translated message, but do not delete %ld or replace %ld with
$ a word.
$
E_COM_EXISTBADGE        "Employee entry for badge number %ld \
already exists"

.
.
.
$
$ Note to translator:  The item %2$d/%1$d/%3$d indicates month/day/year
$ as expressed in decimal numbers; for example, 3/28/81.
$ To improve the appropriateness of this date input format, you can change
$ only the order of the date elements and the delimiter (/).
$ For example, you can change the string to %1$d/%2$d/%3$d or
$ %1$d.%2$d.%3$d to indicate day/month/year or day.month.year
$ (28/3/81 or 28.3.81).
$
I_SCR_IN_DATE_FMT        "%2$d/%1$d/%3$d"

.
.
.

Tru64 UNIX provides the trans utility, discussed in Section 3.3, to help translators quickly locate and edit the translatable text in a message source file. This utility does not eliminate the need for information from the programmer on message context and program syntax.

3.1.6 Style Guidelines for Messages

When creating messages and other text strings in English, you need to keep the following information in mind:

Text strings in English are usually shorter than equivalent text strings in other languages. When text strings are translated, their length increases an average of 30 to 40 percent. Expect even larger increases for strings containing fewer than 20 characters.
The following guidelines result from the likelihood that text strings will grow when translated from English to another language:
- If you must limit a text string to one line (for example, 80 characters), make sure the English text occupies no more than half of the available space. Whenever possible, allow text to wrap to a subsequent line rather than restricting it to an arbitrary length.
- Do not design a menu, form, screen, or window in which English text uses most of the available space.
- Design a dialog box so that its components can be moved around. The developers who localize your application may have to reorganize the contents of a dialog box because of text length changes and, for Asian languages, to accommodate a particular character input method.
- Do not embed text in a graphic. When text is embedded in a graphic, the entire graphic must be redone when the application is localized. Furthermore, the translated text may cause the graphic to grow in size or to lose visual appeal.

Nouns in languages other than English may have gender that affects the spelling of the noun itself and associated adjectives and verbs. The way a noun is spelled can also change, depending on whether the noun is the subject or object of a verb, or the object of a preposition. There can be additional grammatical rules, such as those for creating affirmative, negative and imperative verb forms, that are very different from English. For these reasons:
- Do not create a message at run time by concatenating different kinds of strings; for example, strings that represent different nouns, adjectives, verbs, or combinations of these.
  If adjectives and verbs can have multiple referents, each with a different gender, the translator may not be able to create a grammatically correct counterpart for all the possible sentences that the user may see. In this case, the developer who is localizing the application may have to redesign the error-handling logic so that the application returns several distinct messages rather than one.
- Be careful about inserting the same text variable into different strings; word spelling may have to change if each string represents a different grammatical context. Furthermore, you cannot assume that there is a one-to-one correspondence between English words and their counterparts in other languages. For example, you can create a negative statement in English by inserting a text variable that contains the word "not" into a verb phrase. The message could not be translated to French, however, which usually requires two words, "ne" before the verb and "pas" after the verb, to negate meaning.
  Pathnames, file names, and strings that are complete sentences are usually safe to insert into other strings.
- Avoid using the word "None" as a button label or menu item; this word may be impossible to translate if its referents have different gender.
- Create messages that are complete sentences; in particular, do not start messages (other than imperatives where the subject "you" is understood) with a verb.
  The following messages cannot be translated into some languages because the translator cannot determine the subject of the sentence or the correct form of the verb in the local language:
```
Is a directory.
 
Could not open file.
```
  If your message is constructed of a facility identifier, followed by informational or error text, you can break the rule about starting messages with a verb. In this case, be sure to include comments to the translator in your message source file about how the message is constructed, the facility identifier that appears with the message, and the kind of component (server, compiler, utility, and so forth) the identifier represents. Refer to Section 3.1.5 for information about adding comments to message source files.

Unique identifiers that are based on the first letters of words may not be unique when the words are translated. For example, a common practice in applications that prompt users to choose among several items is to accept a single character as the item identifier. Make sure your application does not require this character to be the first character or first several characters in the item name. The translator should have the option of substituting any character or a number for the item identifier.

Languages can have syntax rules that require translators to change word order. Therefore, use substitution specifiers as described in Section 2.3.2 so that translators can change the order of message components to meet local language requirements.

Translations of messages with vague, ambiguous, or telegraphic wording are likely to be incorrect. Use the following guidelines to help ensure accurate translation:
- Include articles (the, a, an) and forms of the verb "to be" where appropriate. Programmers often omit these words to reduce the size of message strings; however, the omission sometimes makes it difficult to distinguish nouns from verbs, subject nouns from predicate nouns, and active voice from passive voice. The message "Maximum parameter count exceeded" illustrates this problem.
- You can include very common contractions, such as "can't" and "don't", but avoid less commonly used contractions, like "should've". If you are using contractions in English to conserve line space, be aware that your objective is likely to be lost in translation.
- Avoid using most abbreviations, particularly terms, such as pkt, msg, tbl, ack, and max, that programmers commonly use in variable names and code comments. These abbreviations do not appear in a dictionary, and translators may have to guess at what they mean. On the other hand, you can use formal abbreviations for product and utility names and abbreviations for names of standards, protocols, and so forth that appear in commercial literature.
- Use words only in grammatically correct form. English speakers have a tendency to create new verbs or adjectives out of existing nouns and new nouns out of existing verbs. This practice is confusing to translators, particularly when the intended usage is not one of those noted in an English dictionary. For example, consider the use of the word "parameter" as an adjective in the message "Invalid parameter delimiter."
- Avoid using slang or words whose intended meaning is not included in a dictionary. It is probable that these words either have no equivalent in another language or would be misinterpreted. For example, the message "Server hang" may be meaningful to English speakers who develop software or manage systems, but the meaning of the message may be transformed in another language to "The system lynched the waiter." The message "The %s server failed." is more likely to be translated correctly.

3.2 Extracting Message Text from Existing Programs

If you have an existing program that you want to internationalize, Tru64 UNIX provides the following tools to help you extract message strings into a message source file and to change calls to retrieve messages from a message catalog:

Tool	Description
`extract` command	Interactively extracts text strings from program source files and writes each string to a source message file. The command also replaces each extracted string with a call to the `catgets` function.
`strextract` command	Performs string extraction operation in batch.
`strmerge` command	Reads strings from the message file produced by `strextract` and, in the program source, replaces those strings with calls to the `catgets` function.

Consider the following call:

printf("Hello, world\n");

You can use the extract command, or the strextract command followed by the strmerge command, to:

Create the following entries in a message text source file (assuming that "Hello, world" was the first string extracted):
```
$set 1
$quote "
1 "Hello, world\n"
```

Change the printf() call to:

printf(catgets(cat, 1, 1, "Hello, world\n"));

Assuming that input to the commands is a program source file named prog.c, the commands create three new files: prog.msg (message text source file), nl_prog.c (internationalized version of the program source), and prog.str (an intermediate strings file that other utilities can reference). The commands use the following files along with the input source program:

A patterns file
This file specifies patterns that the extraction commands use to find strings in the program. You can specify your own patterns file; by default, the extraction commands use the /usr/lib/nls/patterns file.

An optional ignore file
This file specifies strings that the extraction commands should ignore.

The extract, strextract, and strmerge commands do not perform all the revisions necessary to internationalize a program. For example, you must manually edit the revised program source to add calls to setlocale(), catopen(), and catclose(). In addition, you may need to add routines for multibyte-character conversion (for Asian locales) and improve user-defined routines to vary behavior according to values defined in message catalogs or the langinfo database.

Figure 3-1 shows the files and tools that help you change an existing program to use a message catalog.

Figure 3-1: Converting an Existing Program to Use a Message Catalog

For detailed instructions on using the extract, strextract, and strmerge commands, see the extract(1), strextract(1), strmerge(1), and patterns(4) reference pages.

3.3 Editing and Translating Message Source Files

You can use any text editor to edit message text source files, provided that:

The input device is capable of generating the necessary characters

If 8-bit or multibyte characters are required, the editor can transparently handle this data

The first requirement is satisfied for languages other than Western European by terminal drivers, locales, fonts, and other components that are available with localized software subsets.

The ed, ex, and vi editors satisfy the second of the preceding requirements. Localized software subsets may also include enhanced versions of additional editors, such as Emacs, that can handle 8-bit and multibyte characters.

The operating system includes the trans command to assist those who translate message text source files for different locales. The command provides a multiwindow environment so users can see both the original and translated versions of the file. In addition, the command automatically guides users in the file from one translatable string to the next. For more information on the trans command, refer to trans(1). Refer to Section 3.1.5 for examples of comments that should be included in message text source files to ensure that messages are correctly translated.

For examples of translated message text source files, search the /usr/examples/xpg4demo/src directory for *.msg files, as follows:

% cd /usr/examples/xpg4demo/src
% ls *.msg

.
.
.

A translated message catalog is associated with a particular locale and encoding format. Many languages are supported by multiple locales and encoding formats, and this generates a requirement that messages in the same language be available in multiple encoding formats. Although you can use codeset converters to convert message source files, building and installing multiple versions of the same catalog for a single language is expensive. Therefore the catopen() and catgets() functions support dynamic codeset conversion of message catalogs. A set of .msg_conv-locale_name files in the /usr/share directory controls codeset conversion of message catalogs. See catopen(3) for detailed information.

3.4 Generating Message Catalogs

The gencat command generates message catalogs from one or more message-text source files. If the source files contain symbolic rather than numeric identifiers for message sets, message entries, or both, those source files must first be preprocessed by the mkcatdefs command. Example 3-2 illustrates interactive processing of message text source files with symbolic identifiers for a default and nondefault locale. This example provides context for later sections that discuss each command.

Example 3-2: Generating a Message Catalog Interactively

% mkcatdefs xpg4demo xpg4demo.msg | gencat xpg4demo.cat   [1]
mkcatdefs: xpg4demo_msg.h created    [2]
% setenv LANG fr_FR.ISO8859-1    [3]
% mkdir fr_FR    [4]
% mkcatdefs xpg4demo xpg4demo_fr_FR.msg -h | gencat \
fr_FR/xpg4demo.cat     [5]
mkcatdefs: no msg.h created    [6]

The mkcatdefs command specifies:
- The root name to use for the header file that maps symbolic identifiers used in the program to their numeric values in the message catalog
- The name of the message text source file being processed
The preprocessed message source is piped to the gencat command, which specifies the name of the message catalog. [Return to example]

The mkcatdefs command prints the name of the header file it created to standard output. The utility appends _msg.h to the root name to create a name for the header file. [Return to example]

When generating a message file for a nondefault locale, you must set the LANG environment variable to the name of the locale that the message catalog will support, in this case, fr_FR.ISO8859-1. [Return to example]

Because the name of the message catalog opened by the program does not vary by locale name, you must create a directory in which to store each message catalog variant. [Return to example]

This line creates the local variant of the message catalog. The header file created by the mkcatdefs utility does not vary by locale. The header file has already been created for the default message catalog so this mkcatdefs command includes the -h flag to disable creation of another header file. The catalog specified to the gencat command is directed to the temporary locale directory. On user systems, this version of the catalog could be moved to the /usr/lib/nls/msg/fr_FR.ISO8859-1 default directory or stored in a directory that is application specific. [Return to example]

The mkcatdefs command announces that no header file has been created, in this case, as intended. [Return to example]

Refer to the /usr/examples/xpg4demo/src/Makefile file for an example of how you can integrate generation of a message catalog into the makefile that builds an application.

3.4.1 Using the mkcatdefs Command

The mkcatdefs command preprocesses one or more message source files to change symbolic identifiers to numeric constants. The utility has the following features:

Sends preprocessed message source to standard output, so you can either pipe the output to the gencat command as shown in Example 3-2 or use the > redirection specifier to print the output to a file

Creates a header file that maps numbers that will identify message sets and messages in the new message catalog with the symbolic identifiers referred to in source programs
You must include this header file in all the program modules that open this catalog and refer to message sets and messages that use symbolic identifiers.

The advantage of symbolic identifiers is that you can specify them in place of numbers when you code calls whose arguments include message set and message identifiers. Symbolic identifiers improve the readability of your program source code and make the code independent of the order in which set and message entries occur in the message catalog. Each time that the mkcatdefs utility processes a message text source file, it produces an associated header file to equate set and message symbols with numbers. Updating your program after a message file revision can be as simple as recompiling it with the new header file.

The option of defining symbolic identifiers for message sets and catalogs is not specified by the X/Open UNIX standard, so you should not assume that the mkcatdefs command is available on all operating systems that conform to this standard. However, the source text message file and program header file produced by the mkcatdefs command should be portable among systems that conform to the X/Open UNIX standard.

The mkcatdefs command does not refer to the header file for an existing message catalog to map symbolic identifiers to the numbers assigned when that catalog was created. The command assigns numbers to symbols based on the ordinal position of those symbols in the message source input stream currently being processed. When you are processing changes to an existing catalog, it is your responsibility to ensure correct mapping between the symbols you specify in the source input to the mkcatdefs command and numeric counterparts for those symbols in the existing message catalog.

In general, consider the mkcatdefs utility a tool for regenerating an entire message catalog, not just parts of it. Use the following guidelines:

For message and message set deletions, specify numeric identifiers in place of symbols at strategic points in the message source input to prevent deletions of message sets and individual messages from affecting the ordinal position of subsequent entries.

Define new sets at the end of the input source stream (at the end of the last source file if a catalog is generated from a sequence of source files).

Define new messages for an existing message set at the end of that set.

Specify source entries for the entire catalog; otherwise, mkcatdefs will not produce a complete message header file. You will need a complete header file for recompiling programs that use both current and new symbols to identify messages. In addition, mkcatdefs generates a delset directive before each set directive you specify in the input source; in other words, it expects your input to completely replace all messages in the referenced set.

If the catalog was generated from multiple source files, specify source files in the same order as they were specified to generate the existing catalog; otherwise, you will invalidate headers used to compile all program modules that open the catalog. You can avoid recompiling programs that do not refer to new messages as long as you do not invalidate the symbol-number mapping in the message header file with which those programs were compiled.

3.4.2 Using the gencat Command

The gencat command merges one or more message text source files into a message catalog. For example:


# gencat en_US/test_program.cat test_program_en_US.msg

The gencat command creates the message catalog if the specified catalog path does not identify an existing catalog; otherwise, the command uses the specified message text source file (or files) to modify the catalog. The gencat command accepts message source data from standard input, so you can omit the source file argument when piping input to gencat from another facility, such as the mkcatdefs command.

The X/Open UNIX standard does not specify file name extensions for message source files and catalogs; on Tru64 UNIX systems, the convention is to use the .msg extension for source files and the .cat extension for catalogs. Because the message catalogs produced by the gencat command are binary encoded, they may not be portable between different types of systems. Message text source files preprocessed by the mkcatdefs command should be portable between systems that conform to X/Open UNIX CAE specifications.

Refer to gencat(1) for more details on gencat command syntax and use.

3.4.3 Design and Maintenance Considerations for Message Catalogs

Message sets and message entries are identified at run time by numbers that represent ordinal positions within one version of a message catalog. Adding and deleting message sets and entries in an existing catalog can, if not done carefully, change the ordinal position specifiers that identify messages occurring after the point in the file where a modification is made. Consider a message whose English text "Enter street address: " is identified as 3 : 10 (tenth message of the third message set) in the original generation of a message catalog. That message will have a different identifier in the next version of the catalog if the revised source input to the gencat command performs any of the following operations:

Inserts message sets at the beginning of the input source

In the third message set, inserts any messages before the "Enter street address: " entry

In the third message set, deletes messages before the "Enter street address: " entry without specifying a message deletion directive (a message number followed by no other characters on the line)

When program source refers to messages by numeric identifiers, any changes in ordinal positions of message sets and message entries require changes to program calls that refer to messages. When a program source file refers to messages by symbolic identifiers, the maintenance cost of ordinal position changes is sharply reduced on a per-module basis; in other words, you can synchronize any particular program module with the new version of a message catalog by recompiling with the new header file generated by the mkcatdefs utility.

The ability to recompile program source to synchronize with new message catalog versions does not address issues of complex applications where multiple source files refer to the same message catalog. For such applications, a usual goal is to ensure module-specific maintenance updates. In other words, after an application is installed at end-user sites, you should be able to update a specific module and its associated message catalogs without recompiling and reinstalling all modules in the application. You can achieve this goal in a number of ways. The following design options can help you decide on a message system design strategy that works best for applications developed and maintained at your site:

One message source file and catalog per program module
- Advantages
  This is the easiest strategy to implement for the individual programmer as it eliminates problems that arise when programmers share one source. Software, such as the Revision Control System (RCS) and the Source Code Control System (SCCS) help to manage files that multiple programmers maintain. Sometimes, however, programmers work on different application versions in parallel. This additional layer of complexity is not easy to manage. A one-to-one correspondence between message source files and associated program sources makes it easier to determine whose changes are needed in the message file to build the application for a particular release cycle at a specific point in time.
  When the message catalog is module specific, you can replace the entire message catalog when a new binary module is installed at end-user sites, without risk to the run-time behavior of other modules in the same application.
- Disadvantages
  At run time, the application may need to open and close as many message catalogs as there are modules. Opening a message catalog entails some performance overhead and adds to the number of open file descriptors assigned both to the user's process and the system-wide open file table. There is a system-wide and process-specific maximum for the number of files that can be open simultaneously, and these limits vary from one system to another. On Tru64 UNIX systems, opened message catalogs are mapped into memory (and the file closed) to improve performance of message retrieval; this operation also means that opening multiple message catalogs has little impact on open file limits. This situation, however, may not exist on other platforms to which you might need to port your application.

One message source file per program source, single catalog for application
- Advantages
  The same advantages exist as discussed for the preceding option, plus the single catalog design eliminates any problems associated with numerous open operations if you port your application to systems other than Tru64 UNIX.
- Disadvantages
  When you generate a message catalog from multiple source files, maintainability problems can occur if you do not carefully control message set directives. The best rule to follow is to define a fixed number of sets per source file; for example, one set for errors, one set for informational displays, one set for miscellaneous strings. If you allow programmers to change the number of message sets for different versions of their message source files, the message set numbers for subsequent program modules are likely to change from one version of the catalog to another. This means that other modules whose source code was not changed may have to be included in an update release simply for synchronization with a new version of the message catalog.
  There are similar maintainability problems if no source files define message sets or only some of them do. The mkcatdefs and gencat commands concatenate input source files together so that the end-of-file marker exists only at the end of the last input source file. This means that, if no sets are defined in any file, all messages are considered part of the default message set. (In program calls, the NL_SETD constant refers to the default message set.) In this case, adding messages to any source file other than the last one changes the numeric identifiers of messages in all source files that follow on the input stream.
  Finally, if only some message source files define message sets, message sets can cross source file boundaries. Messages defined in source files that occur later on the input stream are considered part of a message set defined by a source file processed earlier. This arrangement can also result in message entry position changes when new messages are added to different source files.
  Another disadvantage of the multiple source file to single message catalog design arises when the resulting message catalog is extremely large and memory is limited. As mentioned earlier, message catalogs are mapped into memory when opened so that disk I/O for message retrieval does not impede performance. If the users who run your application typically use software and messages that are associated only with a subset of the available modules, module-specific message catalogs can conserve the total amount of memory used when message catalogs are opened for a particular execution cycle.

Combination strategy
Depending on your application, it might make sense to have one or more message catalogs that are generated from multiple, module-specific source files and some that are generated from a single source file that is maintained by all programmers. For example, if many modules in the application generate messages for the same error conditions, message text consistency is a desirable goal. In this case, you could generate one message catalog with a single message text source file where error messages are defined. This source file could define message sets for errors, warnings, and so forth. Programmers would be instructed to add new messages only to the end of each set and to delete messages no longer used by using message deletion directives (which remove messages from the catalog without changing the position numbers for subsequent messages in the same set).

3.5 Displaying Messages and Locale Data Interactively or from Scripts

After a message catalog is created, you may want to display its contents to make sure that the catalog contains the messages you intended and that both messages and message sets are in the proper order. Your application might also include scripts that, like programs, need to determine locale settings, retrieve locale-dependent data, and display messages in a locale-dependent manner at execution time. The following list describes three commands that display messages in a message catalog and one command that displays information for the current locale:

dspcat
The dspcat command can display all messages, all messages in a particular set, or a specific message. The following example displays the fourth message in the second set of the xpg4demo.cat catalog:
```
% cd /usr/examples/xpg4demo/en_US
% dspcat xpg4demo.cat 2 4
Are these the changes you want to make?%
```
The dspcat command also includes a -g flag that reformats the output stream for an entire catalog or message set so that it can be piped to the gencat command. This option may be useful if you need to add or replace message sets in one catalog by using message sets in another catalog, perhaps as part of an application update procedure at end-user sites. You can also use the dspcat -g command to create a source file from an existing message catalog. You can then translate or customize the source file for end users before building the translated source into a new catalog with the gencat command.
The following example first displays the message source for the message catalog used by the du command for the en_US.ISO8859-1 locale and then redirects that source to a file that can be edited:
```
% dspcat -g \
/usr/lib/nls/msg/en_US.ISO8859-1/du.cat
 
$delset 1
$set 1
$quote "
 
1       "usage: du [-a|-s] [-klrx] [name ...]\n"
2       "du: Cannot find the current directory.\n"
3       "du: %s\n\
The specified path name exceeded 255 bytes.\n"
4       "du: %s\n\
The generated path name exceeded 255 bytes.\n"
5       "du: Cannot change directory to ../%s \n"
6       "Out of memory"
% dspcat -g \
/usr/lib/nls/msg/en_US.ISO8859-1/du.cat > \
du.msg
```

dspmsg
The dspmsg command displays a particular message from a catalog and optionally allows you to substitute text strings for all %s or %n $s specifiers in the message. For example:
```
% dspmsg xpg4demo.cat -s 1 9 'Cannot open %s for output' xpg4demo.dat
Cannot open xpg4demo.dat for output%
```

locale

The locale command displays information for the current locale setting or tells you what locales are installed on the system. In the following example, the locale command displays the current settings of all locale variables, then the keywords and values for a specific variable (LC_MESSAGES), and finally the value for a particular item of locale data:


% locale
LANG=en_US.ISO8859-1
LC_COLLATE="en_US.ISO8859-1"
LC_CTYPE="en_US.ISO8859-1"
LC_MONETARY="en_US.ISO8859-1"
LC_NUMERIC="en_US.ISO8859-1"
LC_TIME="en_US.ISO8859-1"
LC_MESSAGES="en_US.ISO8859-1"
LC_ALL=
% locale -ck LC_MESSAGES
LC_MESSAGES
yesexpr="^([yY]|[yY][eE][sS])"
noexpr="^([nN]|[nN][oO])"
yesstr="yes:y:Y"
nostr="no:n:N"
% locale yesexpr
^([yY]|[yY][eE][sS])

printf command
The printf command writes a formatted string to standard output. Like the printf( ) function, the command supports conversion specifiers that let you format messages in a way that is locale dependent. You can also use this command in scripts, along with the locale command, to interpret "yes/no" responses in the user's native language. For example:
```
if printf "%s\n" "$response" | grep -Eq "`locale yesexpr`"
then
        <processing for an affirmative response goes here>
else
        <processing for a response other than affirmative goes here>
fi
```

Refer to dspcat(1), dspmsg(1), locale(1), and printf(1) for more information on the preceding commands.

3.6 Accessing Message Catalogs in Programs

Programs call the following functions to work with a message catalog:

catopen() to open the file

catclose() to close the file

catgets() to retrieve messages

Message catalogs are usually located through the setting of the NLSPATH environment variable. The following sections discuss this variable and the calls in the preceding list.

3.6.1 Opening Message Catalogs

Programs call the catopen() function to open a message catalog. For example:

#include <locale.h>
#include <nl_types.h>

.
.
.
nl_catd        MsgCat;

.
.
.
setlocale(LC_ALL, "");

.
.
.
MsgCat = catopen("new_application.cat", NL_CAT_LOCALE);

In this example, if successful, the catopen() function returns a message catalog descriptor to the MsgCat variable. The variable that contains the descriptor is declared as type nl_catd. The catopen() function and the nl_catd type are defined in the /usr/include/nl_types.h header file, which the program must include. A call to catopen() requires two arguments:

The name of the catalog
The catalog name is customarily specified as filename.cat (or a program variable whose value is filename.cat) without the preceding directory path. At run time, the catopen() function determines the full pathname of the catalog by integrating the name argument into pathname formats defined by the NLSPATH environment variable. If you specify any slash (/) characters in the catalog name argument, the catopen() function assumes that the specified catalog name represents a full pathname and does not refer to the value of the NLSPATH variable at run time.

An oflag argument
This argument is either the NL_CAT_LOCALE constant (defined in /usr/include/nl_types.h) or zero (0). If you specify NL_CAT_LOCALE, catopen() searches for a message catalog that supports the locale set for the LC_MESSAGES environment variable. If you specify 0, catopen() searches for a message catalog that supports the locale set for the LANG environment variable. A 0 argument is supported for compatibility with XPG3. The NL_CAT_LOCALE argument conforms to The Open Group's current UNIX CAE specifications and is recommended. Although the LC_MESSAGES setting is usually inherited from the LANG setting rather than set explicitly, there are circumstances when programs or users set LC_MESSAGES to a different locale than set for LANG.

The names and locations of message catalogs are not standard from one system to another. The Open Group's UNIX standard therefore specifies the NLSPATH environment variable to define the search paths and pathname format for message catalogs on the system where the program runs. The catopen() function refers to the variable setting at run time to find the catalog being opened by the program. If you do not install your application's message catalogs in customary locations on the user's system, your application's startup procedure will need to prepend an appropriate pathname format to the current search path for NLSPATH.

The syntax for setting the NLSPATH environment variable is as follows:

NLSPATH= [ [ [:] ] [ /directory ] [ [ [/] ] [ substitution-field ] [ literal ] ] ... [ [:]alternate_pathname ] ...]

A leading colon (:) or two adjacent colons (::) indicate the current directory; subsequent colons act solely as separators between different pathnames. Each pathname in the search path is assembled from the following components:

/directory to indicate the full directory path to the catalog
You can also specify ./directory to indicate a relative path.

substitution-field, which can be one of the following directives:
- %N
  The value of the first argument to catopen(), for example, xpg4demo.cat in the following call:
```
catopen("xpg4demo.cat", NL_CAT_LOCALE);
```
- %L
  The locale set for:
  LC_MESSAGES, if the second argument to catopen() is the NL_CAT_LOCALE constant
  LANG, if the second argument to catopen() is zero (0)
  This substitution field represents an entire locale name, such as fr_FR.ISO8859-1.
- %l
  The language component of the locale set for either the LC_MESSAGES or LANG variable (as determined by the same conditions specified for %L)
  Given the locale name fr_FR.ISO8859-1, this substitution field represents the component fr.
- %t
  The territory component of the locale set for either the LC_MESSAGES or LANG variable (as determined by the same conditions specified for %L)
  Given the locale name fr_FR.ISO8859-1, this substitution field represents the component FR.
- %c
  The codeset component of the locale set for either the LC_MESSAGES or LANG variable (as determined by the same conditions specified for %L)
  Given the locale name fr_FR.ISO8859-1, this substitution field represents the component ISO8859-1.
- %%
  A single % character

literal to indicate:
- Directory or file names that cannot be specified using substitution fields
- Field separators, for example, an underscore (_) or period (.) between the language, territory, and codeset substitution fields or a slash (/) between the %L and %N substitution fields

To clarify how the LC_MESSAGES setting, NLSPATH setting, and the catopen() function interact, consider the following set of conditions:

The locale set for LC_MESSAGES is fr_FR.ISO8859-1. (Unless explicitly set by the user or program, the locale set for LC_MESSAGES is derived from the locale set for LANG.)

The NLSPATH variable is set to the following value:

:%l_%t/%N:/usr/kits/xpg4demo/msg/%l_%t/%N:\
/usr/lib/nls/msg/%L/%N

The program initializes the locale with the following call:
```
.
.
.
setlocale(LC_ALL, "");

.
.
.
```

The program opens a message catalog with the following call:
```
.
.
.
MsgCat = catopen("xpg4demo.cat", NL_CAT_LOCALE);

.
.
.
```

Given the preceding conditions, the catopen() function looks for catalogs at run time in the following pathname order:

xpg4demo.cat

./fr_FR/xpg4demo.cat

/usr/kits/xpg4demo/msg/fr_FR/xpg4demo.cat

/usr/lib/nls/msg/fr_FR.ISO8859-1/xpg4demo.cat

When troubleshooting run-time problems, it is worthwhile to consider how catopen() behaves when certain variables are not set.

If LC_MESSAGES is not set (directly or through the LANG variable), the %L and %l fields contain the value C (the default locale for LC_MESSAGES) and the %t and %c substitution fields are omitted from the search path. In this case, catopen() searches for:

xpg4demo.cat

./C_/xpg4demo.cat

/usr/kits/xpg4demo/msg/C/xpg4demo.cat

/usr/lib/nls/msg/C/xpg4demo.cat

If LC_MESSAGES is set but the NLSPATH variable is not set, the catopen() function searches for the catalog by using a default search path that is vendor defined. On Tru64 UNIX systems, the default search path is /usr/lib/nls/msg/%L/%N:. For the sample set of conditions under discussion now, this default would result in catopen() searching for:

/usr/lib/nls/msg/fr_FR.ISO8859-1/xpg4demo.cat

xpg4demo.cat

Finally, if neither LC_MESSAGES nor NLSPATH is set, catopen() would search for:

/usr/lib/nls/msg/xpg4demo.cat

./xpg4demo.cat

If catopen() fails to find a message catalog that matches the locale, the function next checks for an appropriate /usr/share/.msg_conv-locale-name file. This file, if it exists, specifies another locale for which a message catalog is available and from which messages can be converted. If this file is found, the available message catalog is opened and the appropriate codeset converter is invoked to convert messages to the codeset of the LC_MESSAGES setting. For example, the .msg_conv-fr_FR.UTF-8 file specifies that, if catalog_name exists for French in ISO8859-1 format, that catalog can be opened and its messages converted to UTF-8 format.

The catopen() function does not return an error status when a message catalog cannot be opened. To improve program performance, the catalog is not actually opened until execution of the first catgets() call that refers to the catalog. If you need to detect the open file failure at the point in your program where the catopen() call executes, you must include a call to catgets() immediately following catopen(). You can then design your program to exit on an error returned by the catgets() call. Including an early call to catgets() may be important to do in programs that perform a good deal of work before they retrieve any messages from the message catalog. However, informing the user of this particular error is a problem, given that you cannot retrieve an error message in the user's native language unless the catalog is opened successfully.

For additional information on the catopen() function, including its error-handling behavior and support for codeset conversion, refer to catopen(3).

Note

When running in a process whose effective user ID is root, the catopen() function ignores the NLSPATH setting and searches for message catalogs by using the /usr/lib/nls/msg/%L/%N path. If a program runs with an effective user ID of root, you must therefore do one of the following:

Install all message catalogs used by the program in locale directories identified as /usr/lib/nls/msg/%L.

Install message catalogs used by the program in another directory and create links in the /usr/lib/nls/msg/%L directories to those catalog files.

This restriction does not apply to a program when it is run by a user who is logged in as root. The restriction applies only to a program that executes the setuid(\|) call to spawn a subprocess whose effective user ID is root.

3.6.2 Closing Message Catalogs

The catclose() function closes a message catalog. This function has one argument, which is the catalog descriptor returned by the catopen() function. For example:

(void) catclose(MsgCat);

The exit() function also closes open message catalogs when a process terminates.

3.6.3 Reading Program Messages

The catgets() function reads messages into the program. This function takes four arguments:

The message catalog descriptor returned by the catopen() call

The symbolic or numeric identifier of the message set
Use the NL_SETD constant when retrieving messages from message catalogs that do not contain user-defined message sets.

The symbolic or numeric identifier of the message

The default message string
The program uses this string when the program cannot retrieve the specified message from a catalog, usually because the catalog was not found or opened.

You ordinarily use the catgets() function in conjunction with another routine, either directly or as part of a program-defined macro. The following code from the xpg4demo program defines a macro to access a specific message set, then uses the macro as an argument to the printf routine:


.
.
.
#define GetMsg(id, defmsg)\
                        catgets(MsgCat, MSGInfo, id, defmsg)

.
.
.
printf(GetMsg(I_COM_DISP_LIST_FMT,
               "%6ld  %20S %-30S %3S %10s\n"),
               emp->badge_num,
               emp->first_name,
               emp->surname,
               emp->cost_center,
               buf);

.
.
.

Refer to catgets(3) for more information about the catgets() function.

Note

The gettxt() function also reads messages from message catalogs. This function is included in the System V Interface Definition (SVID) but is not recognized by the X/Open UNIX standard. For information about this function, refer to gettxt(3).