INF: Troubleshooting Code Page Conversion in COMTI (303993)

MORE INFORMATION

COMTI does code page conversion for data that is sent to or received from a host system.

Code page conversion is configured on the Locale tab of the COMTI Remote Environment properties. The code page that is specified on the Locale tab represents the code page of the host system with which COMTI is communicating.

IMPORTANT: COMTI deals only with Unicode characters during COMTI code page conversion. COMTI uses the specified host code page to convert any data that is received from the host system to the equivalent Unicode characters. Conversely, this same conversion process also converts any data that is sent from COMTI to the host from Unicode to the correct host code page.

The conversion is performed by the SNA National Language Support (NLS) API. Applications such as COMTI can use the SNANLS API to convert single-byte character stream (SBCS) EBCDIC-to-Unicode-to-ANSI and SBCS ANSI-to-Unicode-to-EBCDIC by leveraging the Win32 NLS API. The NLS API uses resource files that contain NLS conversion tables. When you specify a host code page for use with COMTI, you are specifying which NLS conversion table is used to perform the EBCDIC-to-Unicode and Unicode-to-EBCDIC conversions. (EBCDIC is the initialism of Extended Binary Coded Decimal Interchange Code, which is a character encoding scheme by IBM.)

COMTI is used by Windows-based applications, and many Windows-based applications output the data that is received from COMTI onto a screen or printer, or they save the data to a file. When Windows-based applications output this data, a second conversion generally occurs, in which the data is converted from Unicode format to Windows code page format. For example, when data is received from a host system by COMTI, and then is displayed by a Visual Basic application in a dialog box, the process occurs as follows:

COMTI converts the data from EBCDIC to Unicode by using the SNANLS API and the code page that is specified in the COMTI Remote Environment.
COMTI delivers the data to Visual Basic as a Unicode string.
Visual Basic converts the Unicode string to the Windows code page using the Win32 NLS API.

NOTE: This article uses Visual Basic in the examples, but COMTI also returns Unicode character strings to any COM aware programming language that is calling it, including Microsoft Visual C++, Microsoft VBScript, Sybase PowerBuilder, and other languages.

Because the Unicode specification uses two bytes to represent each character, Unicode can represent virtually every character in every language. Windows code pages and IBM EBCDIC code pages, however, use a single byte to represent each character. This limits each code page to 0xFF or 256 characters per code page. For many languages (such as Arabic), individual characters can have several different forms, depending on the context and placement of the character. In these cases, a 256-character code page cannot represent every possible form of the character. The NLS conversion process is basically a one-to-one mapping between the character in the code page and its Unicode equivalent. This can cause conversion problems when you translate from Unicode to a host or Windows code page. Specifically, if a Unicode character string contains a character that is not represented in the Windows or host code page, the character is translated to a default character, such as a question mark (?).

Troubleshooting

When you troubleshoot code page conversion problems in COMTI, Microsoft recommends that you examine the two conversion steps individually. In most cases, you find the problem in the conversion from Unicode to either the host or the Windows code page.

For example, a Visual Basic program writes XML files by using COMTI to retrieve data from an Arabic host sytem. Certain characers in the XML files are incorrectly saved as question marks. To isolate this problem during troubleshooting, follow these steps:

Use an SNA application Advanced Program-to-Program Communications (APPC) API trace or Microsoft Network Monitor trace to examine the data that is sent from the host system to COMTI. This can verify that the correct EBCDIC characters are being sent from the host system.
Use the Visual Basic design environment to step through the application and examine the Unicode strings as they come into Visual Basic from COMTI. By doing this, you can verify that the EBCDIC characters that are sent from the host have been converted to the correct Unicode characters.
Examine the resulting XML file to determine which Unicode characters are being incorrectly converted to question marks.
Examine the NLS conversion table to determine whether there is a mapping for the Unicode characters that are received from COMTI. The absence of a mapping explains why the characters are converted to question marks.

REFERENCES

NLS APIs

For more information about the SNANLS API, see the SNA Server or Host Integration Server 2000 SDK documentation, or visit the following MSDN Web site:

SNA National Language Support
http://msdn.microsoft.com/library/?url=/library/en-us/his/snanls_2lmb.asp?frame=true

For more information about the SNANLS API functions, visit the following MSDN Web site:

SNANLS API Functions
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/his/snanls_50xf.asp

For more information about the Win32 NLS API, see the Microsoft Platform SDK, or visit the following MSDN Web site:

National Language Support
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/int,l/nls_19f8.asp?frame=true

Unicode

For more information about the Unicode specification, and to download Unicode code charts, visit the following Unicode Web site:

http://www.unicode.org

For more information about NLS code page mappings, visit the following Unicode Web site:

http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/