INF: SQL Server 7.0 BCP and Code Page Conversion (199819)



The information in this article applies to:

  • Microsoft SQL Server 7.0

This article was previously published under Q199819

SUMMARY

The bulk copy program (BCP) utility included with SQL Server 7.0 works differently from the BCP utility included with previous versions of SQL Server. These changes can have an effect on code page conversion during bulk copy operations. This article describes those changes and provides examples of code page conversion to clarify the issues involved.

MORE INFORMATION

Previous versions of the BCP utility were based on the DB-Library API. Character code translation between ANSI and OEM code pages was determined by the AutoAnsiToOem client configuration setting. This setting was disabled by default in SQL Server 4.21a and enabled by default in SQL Server 6.0 and 6.5. However, the character code translation was only enabled when data was copied from OEM clients to ANSI SQL Servers or from ANSI clients to OEM SQL Servers. No character code conversion would occur if AutoAnsiToOem was disabled or if data was being transferred between an OEM server and an OEM client.

If your SQL Server has a code page of 125n (where "n" is a single digit; for example, 1252) you have an ANSI SQL Server. If your SQL Server has any other code page, it is an OEM code page SQL Server. Code page 850 (Multilingual) was the default character code page for SQL Server 4.21 and code page 1252 (also known as Latin 1 or ANSI) was the default code page for SQL Server 6.0 and 6.5. You can determine which code page is used by your SQL Server by running the sp_helpsort stored procedure from Query Analyzer.

The client code page is the OEM code page for the operating system on which BCP is run. Microsoft Windows NT, Microsoft Windows 95, and Windows 98 operating systems have both an ANSI code page and an OEM code page; these are determined by the country settings. The computer's ANSI code page is stored in the following registry value:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage\ACP

For U.S. English versions of Windows NT, Windows 95, and Windows 98, the default OEM code page is 437 (U.S. English). The computer's OEM code page is stored in the following registry value:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage\OEMCP

To verify the client OEM code page on Windows NT, Windows 95 and Windows 98 computers, run chcp from the command line. To verify that data in a file is stored in the proper code page, you must use the appropriate editor. ANSI data should look correct in any GUI editor (such as Notepad, WordPad, or Microsoft Word). OEM data should look correct in a command-line tool (such as type or edit). To verify that data in SQL Server is stored in the proper code page, you can use the ASCII(char_expr) string function.

AutoAnsiToOem code page conversion only applies to DB-Library applications. Previous versions of the BCP utility, being DB-Library applications, were affected by this setting. However, the SQL Server 7.0 BCP utility is not a DB-Library application and is not affected by the AutoAnsiToOem setting. Unless you specify otherwise, BCP in SQL Server 7.0 will always convert data exported from and imported to SQL Server 7.0 using the client computer's OEM code page. (The SQL Server 7.0 Books Online is incorrect when it states that the ANSI/Windows code page is used for conversion during import.)

The version of BCP included with SQL Server 7.0 can be instructed to convert data according to a code page different from the client's OEM code page with the use of the new /C (or -C) command-line switch. The code page specified with the /C parameter determines the code page of the input (or output) file when data is bulk copied into or out of char, varchar, or text columns. The following are examples of how the /C parameter can be used:

/CACPConvert to/from using the client ANSI code page
/COEMCPConvert to/from using the client OEM code page
/C850Convert to/from using code page 850 (can be any code page)
/CRAWNo conversion

If you used a previous version of BCP with an ANSI SQL Server and AutoAnsiToOem enabled, you do not need to use the /C switch. If you used previous versions of BCP with AutoAnsiToOem disabled or against an OEM SQL Server, you can disable automatic character code conversion using the /CRAW parameter with BCP for SQL Server 7.0.

In order for BCP for SQL Server 7.0 to convert to or from a given code page, that code page must be installed on the client. For information about how to install additional code pages, please see the following article in the Microsoft Knowledge Base:

177561 HOWTO: Add and Enable Additional Languages in Windows NT


The /C parameter only applies when copying character data types in character (/c) or native modes (/n). It has no effect when using Unicode data types by including the /w or /N arguments.

The remainder of this article provides some examples to help illustrate these concepts.

Example 1

SQL Server code page 1252 and client OEM code page 437:
At the command line, type:
   BCP pubs.dbo.authors in C:\authors.txt
				

Because the client's OEM code page is 437, BCP will interpret the character code for the characters in the Authors.txt input file using the 437 code page and convert to the code for the equivalent character in the 1252 code page. If the Authors.txt file was saved as code page 437 character data, the import will succeed.

However, if the file contains data from some other code page (for example, the file was exported from a SQL Server using code page 850), extended characters may be incorrectly translated. As an example, if Authors.txt contains character data from an 850 code page (where the yen symbol is character code 190), it would be incorrectly interpreted as the character associated with code 190 in the 437 code page and translated incorrectly into SQL Server. To perform the correct translation, you would use the -C850 switch to ensure that the code page 850 characters were correctly interpreted so that they could be translated to the ANSI code page equivalent (where the yen symbol is character code 165).

Keep in mind that even if the file does contain code page 437 character data, some characters may be lost if there is not an equivalent character in the 1252 code page.


Example 2

SQL Server code page 932 and client with OEM code page 850:
At the command line, type:

   BCP pubs.dbo.authors out C:\authors.txt
				

BCP will attempt to convert the code page 932 character data to the client's code page 850 before saving it to the data file. Because some characters in the 932 code page do not exist in code page 850, data may be lost in this transfer.

You would need to add the /CRAW option to prevent any character translation. This is different behavior from previous versions of SQL Server where BCP would not convert characters between OEM SQL Servers and OEM clients.

Example 3

SQL Server with code page 850 and client with code page 437:
At the command line, type:

   BCP pubs.dbo.authors out C:\authors.txt -Usa -CRAW
				

BCP will not attempt to convert the data, and it will be saved in its original code page. Alternatively, -C850 could be specified, although this would require that code page 850 be installed on the client. However, extended characters in the data will not appear correctly when viewed on this client because the data is stored as code page 850 character data and the client's code page is 437.

If you want to import this file into another SQL Server using BCP, you would need to add the /C850 option so that the characters would be correctly translated into the destination SQL Server. Use the /CRAW option only if the destination SQL Server is using the 850 code page. If you do not use the /C switch, the characters in the file will be interpreted according to the 437 client code page and extended characters may be improperly imported into SQL Server.

Example 4

SQL Server 7.0 with code page 850 and client with code page 437; file contains characters exported from a SQL Server 6.5 with code page 1252 where AutoANSItoOEM was disabled:
At the command line, type:

   BCP pubs.dbo.authors in C:\authors.txt -Usa -C1252
				

BCP will use code page 1252 to interpret the characters in the Authors.txt file. All extended characters that exist in the 850 code page will be converted properly. If you did not use the /C switch, the characters in the file would be incorrectly interpreted using the 437 client code page.

References

For additional information, see the following article or articles in the Microsoft Knowledge Base:

153449 INF: SQL Server Code Pages and AutoAnsiToOem Behavior

165051 INF: Installing a DBCS Code Page for SQL Server

177561 HOWTO: Add and Enable Additional Languages in Windows NT


Modification Type:MajorLast Reviewed:8/28/2002
Keywords:kbinfo KB199819