A    Summary Tables of Worldwide Portability Interfaces

This appendix lists and summarizes worldwide portability interfaces (WPI) that are defined by Version 5 of the X/Open CAE specification for system interfaces and headers (XSH). All these interfaces support the wide-character data type. Tables in this appendix also list older ISO C functions that use the char data type and therefore cannot perform character-by-character processing in all languages. The reference pages (manpages) provide detailed information for each interface. Refer to standards(5) for information about compiling a program in the appropriate definition environment for XSH Version 5.

A.1    Locale Announcement

Programs call the following function to use the appropriate locale (language, territory, and codeset) at run time:

WPI Function Description
setlocale( ) Establishes localization data at run time.

A.2    Character Classification

The following character classification functions classify values according to the codeset defined in the locale category LC_CTYPE.

WPI Function Older ISO C Function Description
iswalnum( ) isalnum( ) Tests if a character is alphanumeric.
iswalpha( ) isalpha( ) Tests if a character is alphabetic.
iswcntrl( ) iscntrl( ) Tests if a character is a control character.
iswdigit( ) isdigit( ) Tests if a character is a decimal digit in the portable character set.
iswgraph( ) isgraph( ) Tests if a character is a graphic character.
iswlower( ) islower( ) Tests if a character is lowercase.
iswprint( ) isprint( ) Tests if a character is a printing character.
iswpunct( ) ispunct( ) Tests if a character is a punctuation mark.
iswspace( ) isspace( ) Tests if a character determines white space in displayed text.
iswupper( ) isupper( ) Tests if a character is uppercase.
iswxdigit( ) isxdigit( ) Tests if a character is a hexadecimal digit in the portable character set.

In addition to the functions for each character classification, the WPI includes two functions that provide a common interface to all classification categories:

The 11 WPI functions listed in the preceding table can be replaced by calls to the wctype( ) and iswctype( ) functions as shown in the following table:

Call Using Classification Function Equivalent Call Using wctype( ) and iswctype( )
iswalnum(wc ) iswctype(wc , wctype("alnum"))
iswalpha(wc ) iswctype(wc , wctype("alpha"))
iswcntrl(wc ) iswctype(wc , wctype("cntrl"))
iswdigit(wc ) iswctype(wc , wctype("digit"))
iswgraph(wc ) iswctype(wc , wctype("graph"))
iswlower(wc ) iswctype(wc , wctype("lower"))
iswprint(wc ) iswctype(wc , wctype("print"))
iswpunct(wc ) iswctype(wc , wctype("punct"))
iswspace(wc ) iswctype(wc , wctype("space"))
iswupper(wc ) iswctype(wc , wctype("upper"))
iswxdigit(wc ) iswctype(wc , wctype("xdigit"))

In this table, the quoted literals in the call to wctype are the character classes defined in the X/Open UNIX standard for Western European and many Eastern European languages; however, a locale can define other character classes. The Unicode standard defines character classes that do not have class-specific functions, and a locale for an Asian language might define additional character classes to distinguish ideographic from phonetic characters. You must use the wctype() and iswctype() functions to test if a character belongs to a class when no class-specific function exists for the test. See locale(4) for details about character classes and testing equivalence between classes defined in the XSH and the Unicode standards.

Note

The calls in the second column of the preceding table illustrate only functional equivalence to the calls shown in the first column of the table. In most programming applications, iswctype() needs to execute multiple times for each execution of wctype(). In such cases, you would code calls in the second column of the table as follows to achieve performance equivalence to corresponding calls in the first column:

wctype_t    property_handle;
wint_t      wc;
int         yes_or_no;
 .
 .
 .
     property_handle=wctype("alnum");
 .
 .
 .
     while (...) {
       .
       .
       .
     yes_or_no=iswctype(wc, property_handle);
       .
       .
       .
     }
 

A.3    Case and Generic Property Conversion

The following case conversion functions let you switch the case of a character according to the codeset defined in the locale category LC_CTYPE:

WPI Function Older ISO C Function Description
towlower( ) tolower( ) Converts a character to lowercase.
towupper( ) toupper( ) Converts a character to uppercase.

The WPI also includes the following functions to map and convert a character according to properties defined in the current locale:

Currently, the only properties defined in Tru64 UNIX locales are toupper and tolower. The following example of using wctrans( ) and towctrans( ) performs the same conversion as towupper( ):

wint_t     from_wc, to_wc;
wctrans_t  conv_handle;
.
.
.
       conv_handle=wctrans("toupper");
.
.
.
       while (...) {
         .
         .
         .
       to_wc=towctrans(from_wc,conv_handle);
         .
         .
         .
       }
 
 

A.4    Character Collation

The functions in the following table sort strings according to rules specified in the locale defined for the LC_COLLATE category:

WPI Function Older ISO C Function Description
wcscoll( ) strcoll( ) Collates character strings.

You can also use the wcsxfrm( ) and wcscmp( ) functions, summarized in Section A.11, to transform and then compare wide-character strings.

A.5    Access to Data That Varies According to Language and Custom

The functions in the following table allow programs to retrieve, according to locale setting, data that is language specific or country specific:

WPI Function Description
nl_langinfo( ) A general-purpose function that retrieves language and cultural data according to the locale setting.
strfmon( ) Formats a monetary value according to the locale setting.
localeconv( ) Returns information used to format numeric values according to the locale setting.

A.6    Conversion and Format of Date/Time Values

The ctime( ) and asctime( ) functions do not have the flexibility needed for language independence. The WPI therefore includes the following interfaces to format date and time strings according to information provided by the locale:

WPI Function Description
strftime( ) Formats a date and time string based on the specified format string and according to the locale setting.
wcsftime( ) Formats a date and time string based on a specified format string and according to the locale setting, then returns the result in a wide-character array.
strptime( ) Converts a character string to a time value according to a specified format string; reverses the operation performed by strftime( ).

A.7    Printing and Scanning Text

The WPI extends definitions of the following ISO C functions to support internationalization requirements. The WPI extensions are described after the table that lists the functions.

WPI/ISO C Function Description
fprintf( ) Prints formatted output to a file by using a vararg parameter list.
fwprintf( ) Prints formatted wide characters to the specified output stream by using a vararg parameter list.
printf( ) Prints formatted output to the standard output stream by using a vararg parameter list.
sprintf( ) Formats one or more values and writes the output to a character string by using a vararg parameter list.
swprintf( ) Prints formatted wide characters to the specified address by using a vararg parameter list.
vfprintf( ) Prints formatted output to a file by using a stdarg parameter list.
vfwprintf( ) Prints formatted wide characters to the specified output stream by using a stdarg parameter list.
vprintf( ) Prints formatted output to the standard output stream by using a stdarg parameter list.
vsprintf( ) Formats a stdarg parameter list and writes the output to a character string.
vswprintf( ) Prints formatted output to the specified address by using a stdarg parameter list.
vwprintf( ) Prints formatted wide characters to the standard output by using a stdarg parameter list.
wprintf( ) Prints formatted wide characters to the standard output by using a vararg parameter list.
fscanf( ) Converts formatted input from a file.
fwscanf( ) Converts formatted wide characters from the specified output stream.
scanf( ) Converts formatted input from the standard input stream.
sscanf( ) Converts formatted data from a character string.
swscanf( ) Converts formatted wide characters from the specified address.
wscanf( ) Converts formatted wide characters from the standard input.

The WPI extensions to the preceding functions include:

A.8    Number Conversion

Functions in the following table convert strings to various numeric formats:

WPI Function Older ISO C Function Description
wcstod( ) strtod( ) Converts the initial portion of a string to a double-precision floating-point number.
wcstol( ) strtol( ) Converts the initial portion of a string to a long integer number.
wcstoul( ) strtoul( ) Converts the initial portion of a string to an unsigned long integer number.

A.9    Conversion of Multibyte and Wide-Character Values

To allow an application to get data from or write data to external files (as multibyte data) and process it internally (as wide-character data), the WPI defines various functions to convert between multibyte data and wide-character data.

WPI Function Description
btowc( ) Converts a single byte from multibyte-character format to wide-character format.
mblen( )

Determines the number of bytes in a character according to the locale setting. You should modify all string manipulation statements, which assume the size of a character is always 1 byte, to call this function. The following statement updates a pointer to the next character, cp:

cp++;

The following example incorporates the mblen( ) function to ensure language-independent operation at run time; the MB_CUR_MAX variable is defined by the locale to be the maximum number of bytes that any character can occupy:

cp += mblen(cp, MB_CUR_MAX);
 

mbrlen() Performs the same operation as mblen() but can be restarted for use with locales that include shift-state encoding. [Footnote 4]
mbrtowc() Performs the same operation as mbtowc() but can be restarted for use with locales that include shift-state encoding. [Footnote 4]
mbsrtowcs() Performs the same operation as mbstowcs() but can be restarted for use with locales that include shift-state encoding. [Footnote 4]
mbstowcs( ) Converts a multibyte-character string to a wide-character string.
mbtowc( ) Converts a multibyte character to a wide character.
wcstombs( ) Converts a wide-character string to a multibyte-character string.
wcrtomb() Performs the same operation as wctomb() but can be restarted for use with locales that include shift-state encoding. [Footnote 4]
wcsrtombs() Performs the same operation as wcstombs() but can be restarted for use with locales that include shift-state encoding. [Footnote 4]
wctob( ) Converts a wide character to a single byte in multibyte-character format, if possible.
wctomb( ) Converts a wide character to a multibyte character.

Note

You do not always need to explicitly handle the conversion to and from file code (multibyte data). Functions for printing and scanning text (discussed in Section A.7) include the %S and %C format specifiers that automatically handle multibyte to wide-character conversion. The WPI alternatives for older ISO C input/output functions (see Section A.10) also perform multibyte/wide-character conversions automatically.

A.10    Input and Output

The WPI functions listed in the following table automatically convert between file code (usually multibyte encoding) and process code (wide-character encoding) for text input and output operations:

WPI Function Older ISO C Function Description
fgetwc( ) fgetc( ) Gets a character from an input stream and advances the file position pointer.
fgetws( ) fgets( ) Gets a string from an input stream.
fputwc( ) fputc( ) Writes a character to an output stream.
fputws( ) fputs( ) Writes a string to an output stream.
fwide() None Sets stream orientation to byte or wide character. This function is not useful within current locale environments. [Footnote 5]
getwc( ) getc( ) Gets a character from an input stream.
getwchar( ) getchar( ) Gets a character from the standard input stream.
None gets( ) Use fgetws( ).
mbsinit() None Determines, for locales that use shift-state encoding, whether a multibyte string is in the initial conversion state. [Footnote 5]
putwc( ) putc( ) Writes a character to an output stream.
putwchar( ) getchar( ) Writes a character to the standard output stream.
None puts( ) Use fputws( ).
ungetwc( ) ungetc( ) Pushes a character back onto an input stream.

A.11    String Handling

The WPI defines alternatives and additions to ISO C byte-oriented functions to support manipulation of character strings. The WPI functions support both single-byte and multibyte characters.

String Concatenation:

WPI Function Older ISO C Function Description
wcscat( ) strcat( ) Appends a copy of a string to the end of another string.
wcsncat( ) strncat( ) Similar to the functions in the preceding row except that the number of characters to be appended is limited by the n parameter.

String Searching:

WPI Function Older ISO C Function Description
wcschr( ) strchr( ) Locates the first occurrence of a character in a string.
wcsrchr( ) strrchr( ) Locates the last occurrence of a character in a string.
wcspbrk( ) strpbrk( ) Locates the first occurrence of any characters from one string in another string.
wcsstr( ) strstr( ) Finds a substring. Note that the wcsstr() function also supercedes the wcswcs() function included in versions of the XSH specification earlier than Issue 5.
wcscspn( ) strcspn( ) Returns the number of initial elements of one string that are all characters not included in a second string.
wcsspn( ) strspn( ) Returns the number of initial elements of one string that are all characters included in a second string.

String Copying:

WPI Function Older ISO C Function Description
wcscpy( ) strcpy( ) Copies a string.
wcsncpy( ) strncpy( ) Similar to functions in the preceding row except that the number of characters to be copied is limited by the n parameter.

String Comparison:

WPI Function Older ISO C Function Description
wcscmp( ) strcmp( ) Compares two strings.
wcsncmp( ) strncmp( ) Similar to functions in the preceding row except that the number of characters to be compared is limited by the n parameter.

String Length Determination:

WPI Function Older ISO C Function Description
wcslen( ) strlen( ) Determines the number characters in a string.

String Decomposition:

WPI Function Older ISO C Function Description
wcstok( ) strtok( ) Decomposes a string into a series of tokens, each delimited by a character from another string.

Printing Position Determination:

WPI Function Older ISO C Function Description
wcswidth( ) None Determines the number of printing positions required for a number of characters in a string.
wcwidth( ) None Determines the number of printing positions required for a character.

Performing Memory Operations on Strings:

WPI Function Older ISO C Function Description
wmemcpy( ) memcpy( ) Copies characters from one buffer to another.
wmemchr( ) memchr( ) Searches a buffer for the specified character.
wmemcmp( ) memcmp( ) Compares the specified number of characters in two buffers.
wmemmove( ) memmove( ) Copies characters from one buffer to another in a nondestructive manner.
wmemset( ) memset( ) Copies the specified character into the specified number of locations in a destination buffer.

A.12    Codeset Conversion

The WPI provides codeset conversion capabilities through a set of functions for program use or the iconv command for interactive use. You specify for these interfaces the source and target codesets and the name of a language text file to be converted. The codesets define a conversion stream through which the language text is passed.

The following table summarizes the three functions you use for codeset conversion. These functions reside in the libiconv.a library.

WPI Function Older ISO C Function Description
iconv_open( ) None Initializes a conversion stream by identifying the source and the target codesets.
iconv_close( ) None Closes the conversion stream.
iconv( ) None Converts an input string encoded in the source codeset to an output string encoded in the target codeset.

Refer to Section 6.13 for a description of the iconv command and the types of conversions that are supported.