Globalization issues in ASP and ASP.NET (893663)



The information in this article applies to:

  • Microsoft Active Server Pages
  • Microsoft ASP.NET (included with the .NET Framework 1.0)
  • Microsoft ASP.NET (included with the .NET Framework 1.1)
  • Microsoft ASP.NET 2.0

ASP.NET Support Voice Column

To customize this column to your needs, we invite you to submit your ideas about topics that interest you and issues that you want to see addressed in future Knowledge Base articles and Support Voice columns. You can submit your ideas and feedback by using the Ask For It form. There's also a link to the form at the bottom of this column.

Introduction

Welcome! This is Sukesh Khare with the Microsoft ASP.NET Developer Support team. This is the first time I've authored a Support Voice column. I look forward to authoring more such columns in the months ahead.

For this month's column, I am going to discuss globalization issues in Active Server Pages (ASP) and ASP.NET, the issues that we face in ASP, how things have changed in ASP.NET 1x, and what's up with ASP.NET 2.0 on the globalization front.

Note If you come across a term you don't understand, see the Glossary section at the bottom of this column.

Globalization issues in ASP

Before ASP.NET, there was no structured support for the development of applications for global users. During the early development of ASP, developers such as myself found only scattered support for globalization in operating systems, browsers, ASPs and back-end systems. However, we seldom observed any automatic connectivity across these applications. Fortunately, we did understand concepts such as character sets, code pages, browser languages, and fonts which we could leverage for the development of applications for global users.

It would be too difficult to separate into categories all of the globalization issues that those of us in ASP.NET have seen. Instead, I'll list a series of concepts that relate to a variety of those issues.

Character sets and codepages

We all know that the characters on our computer screen are just a series of bytes. The byte series can be created and interpreted in any number of ways. If the interpretation uses an encoding that's different than the encoding that the byte array was created with, the interpretation will display as garbage. Character sets (charsets) are encoding formats that are usually used by browsers. The Codepage property, which is more applicable for server-side conversions, is just a conversion table that specifies how characters are encoded.

Browsers encode the form post data according to the current character set. If the current character set is "windows-1256," then the byte transmission to the server is also encoded as "windows-1256."

When the ASP is being interpreted, the Form and Querystring collections are not built until they are referenced in code. When they are being built, the string data is transformed to Unicode according to the current codepage. (By default, both ASP and ASP.NET process content by using Unicode format). It is very important that you set the correct codepage before referencing the collections; otherwise, the Unicode representation in memory won't be correct.

To set a codepage, use Session.Codepage or Response.Codepage. The Response.Codepage is only available in Microsoft Internet Information Services (IIS) 5.1 or later versions. For information about the integer values (which correspond to the character set) that we would set these properties to, visit the following Microsoft Web site:For example, to set the codepage for the Arabic language, use the following code:
Session.Codepage = 1256
Response.Codepage will only affect the current response. However, Session.Codepage will affect all the responses made by the current user. When the codepage is set by using one of these properties and the Form and Querystring collections are built, this change in the current codepage causes the Response.Write method to transform the Unicode in memory to the current codepage. For more information about this topic, visit the following MSDN Web site: The bottom-line when it comes to the issues related to charsets and codepages is that client charset and server codepage should match.

Accept languages

If an ASP developer wants to know which languages a user has set in his browser, the developer can use the Request.ServerVariables ("HTTP_ACCEPT_LANGUAGE") variable to find the list of languages that the user would like to read the response in, (such as English, German, or Indian) and the order of preference that the user would like to see these languages in. In ASP.NET, similar information is present in the Request.UserLanguages property as an array. For more information about how to use this information in ASP code, click the following article number to view the article in the Microsoft Knowledge Base:

229690 How to set the ASP Locale ID per the browser's language settings

Displaying multi-byte character sets in Internet Explorer

The only encoding format that can show a multi-byte character set is Unicode (UTF-8). With UTF-8, we can display Cyrillic, Indian, and Japanese all on the same page. If we do not use UTF-8, we can only show one of these languages at a time. To set the charset of the browser, use the Response.CharSet property.

Static multi-byte characters on a page

To display multi-byte characters stored directly in the page, we must first save the page with specific encoding. UTF-8 will be best, but a specific codepage (matched to the codepage of the characters) will work as well.

Saving an ASP file using Microsoft Visual InterDev doesn't help here, since Visual InterDev can only save in ANSI English or Unicode. Any ASP page saved as Unicode is not supported by ASP.

In Microsoft Visual Studio .NET, you can save a file in any encoding. There are two ways to do this. The default way is to save the file by using the current codepage for the user. An additional way to save a file with an encoding is as follows:
On the File menu, click Save File As. In the Save File As dialog box, click the drop-down arrow on the Save button. When you click the arrow, the options are Save and Save with Encoding. When you click Save with Encoding, the Advanced Save Options dialog box appears where you can select the type of encoding that you want to apply from a list of the codepages that are installed on the computer.

Note This changes the encoding for the save operation, but is for one time only. The next save will be set back to the default.

To change the default codepage, click Advanced Save Options on the File menu. In the Advanced Save Options dialog box, you can set the default encoding for save operations to the codepage of your choice.

These methods are related to how the file is saved on disk. However, to control the output for ASP, as already discussed, we need to set the Session.CodePage and the Response.CharSet properties. With IIS 5.1 and later versions, we can also use the Response.CodePage property.

Default CODEPAGE on server

The default locale and the default codepage for the page depend on the registry settings for the .DEFAULT user. We can find the international key at registry hive HKEY_USERS\.DEFAULT\Control Panel\International. We can also change the behavior of the locale that is chosen by IIS. For more information, see the "IIS 5.0 " section in the following Knowledge Base article:

306044 Behavior of Date/Time format differs when accessed from Active Server Pages

If the logged on user has the same locale set as the above key or the system default, the user setting takes precedence.

Example: Default locale has date format set as 11.1.2004, while the logged on user (with the same locale set) has the date format as 11/1/2004. The 11/1/2004 setting will take effect for ASP.

(For ASP.NET, this can vary. In some installations, the ASPNET user will have its own profile that will show up under HKEY_USERS when it is loaded. In others, it will use the .DEFAULT profile. We can also use the codepage attribute in the <%@ %> declaration. This should be used when the file is saved with a different encoding then the default, such as codepage 932 (Japanese)).

Codepage issues versus font conversion issues: which is which?

At times, you may see a question mark (?) character or a box where a character is supposed to appear.Codepage conversion issuesWhen a character is replaced by a question mark (?) character, this is an indication that a codepage conversion issue, has occurred. The question mark (?) is a default character for the codepage conversion and basically means that the operating system does not know how to handle the character value and convert it. It replaces the character value with a question mark (?). This could mean that the character has an invalid value for the codepage or that the codepage that is needed for the conversion is not installed.Font conversion issuesWhen a character is replaced by a box, this is an indication that a font conversion issue has occurred. This occurs on the client side when the client does not have the correct font installed to display this character correctly. For example, when a character is from the Japanese charset, and the client does not have the Japanese fonts installed, the Japanese character is displayed as a box.

Next, I'll talk about how things changed in ASP.NET 1.x, and how those changes affect globalization issues in the context of ASP.NET.

Globalization issues in ASP.NET 1.x:

With ASP.NET, three great things were introduced:
  • The <globalization> tag in web.config file
    The <globalization> tag takes us away from the incoherent concepts of codepages and charsets and lets us control most of the variants within ASP.NET.
  • The System.Globalization namespace
    The Globalization namespace provides us with the programmatic power of handling globalization.
  • The concept of resource files has been greatly improved.
    We don't deal with resource files in the way that we used to in ASP. Now, the resource files are in the form of XML files when we design and develop them, and they exist as assembles at runtime.
The Globalization configuration tag:

Two important settings in the tag are as follows:
<globalization 
            requestEncoding="utf-8" 
            responseEncoding="utf-8"  />
Other possible settings areas follows:
fileEncodingSpecifies the default encoding for .aspx, .asmx, and .asax file parsing. Unicode and UTF-8 files saved with the byte order mark prefix (with signature) will be automatically recognized, regardless of the value of fileEncoding.
CultureSpecifies the default culture for processing incoming Web requests (applicable on methods of classes from the System.Globalization namespace).
uiCultureSpecifies the default culture for processing locale-dependent resource searches (satellite assemblies).
For more information about culture strings (values of culture and uiculture), visit the following Microsoft Web site:These settings are applied by ASP.NET after the response is complete, and before the request is handed off to your application. For responseEncoding, the buffer that is created to store the output is set to this encoding. Everything that goes into this buffer will be encoded according to the setting as it is inserted into the buffer.

For requestEncoding, the runtime will read the request and interpret it according to the setting in this section. This is a setting that can cause problems, however. The table below shows the bit layout of a valid UTF-8 byte sequence. For more information about UTF-8's history, features and disadvantages, see the UTF-8 section at the following Web site:

Unicode Transformation Formats
http://www.czyborra.com/utf/

Microsoft provides third-party contact information to help you find technical support. This contact information may change without notice. Microsoft does not guarantee the accuracy of this third-party contact information.

If the character value falls in the ASCII 7 bit standard, the byte value is not modified. If the value is above 127, it must follow the rules below. The leading set of bits shows how many characters are in the sequence. Each byte after the first must start with the first bit set to 1.

UTF-8 byte layout:
Bytes bitsrepresentation
1 7 0vvvvvvv
211110vvvvv 10vvvvvv
3161110vvvv 10vvvvvv 10vvvvvv
4 2111110vvv 10vvvvvv 10vvvvvv 10vvvvvv
This is where the problem comes. If the browser encodes the request according to a single byte encoding (such as iso-8859-1), the values above 127 will not be valid according the above layout. When they are read into the UTF-8 buffer, the invalid characters are simply dropped from the output.

Runtime encoding changes

In the Application_BeginRequest event, we can modify the value of requestEncoding and have it take effect before the request is processed. For the response, the Page_PreRender event is the last chance to modify the encoding of the output. Also note that Response.Write will put characters into this buffer as soon as we call it, so be sure to have the right encoding set before using Response.Write.

Original data is non Unicode: How to still make Internet Explorer interpret multi-byte charsets?

We can also make ASP.NET behave like ASP if we need to. To make this occur, we need to set the responseEncoding and requestEncoding to windows-1252 (a more complete encoding than iso-8859-1), and use the Response.Charset property to display the text correctly. This works because windows-1252 is a single byte encoding scheme, and does not modify any bytes that are added to the buffer. Thus, double-byte characters are sent as a series of single bytes. We can then tell Internet Explorer how to interpret the bytes by using the Response.Charset property. This scenario may be necessary if the original data is not stored as Unicode or UTF-8, such as a return value from a COM object, or if the data is stored in Microsoft SQL Server in a non N field (such as varchar).

SQL Server and ASP.NET globalization issues

Unicode data input to SQL ServerThe best way to store data in SQL Server is to utilize Unicode. Whenever we use INSERT, UPDATE, etc, if there is even a least chance of Unicode data, we need to add an N before the value. This tells the database that the value is Unicode. A good example of this is the ADO objects. They do this automatically if we use the Recordset object to add new records.

The following is an example:
INSERT INTO MusicAlbum (Album_ID, [Year], Name, Artist_ID, Company_ID) VALUES (12345, 2005, N'Abida', 4653, 403)
Or:
Dim t As String = "INSERT INTO MusicAlbum(Album_ID, [Year], Name, Artist_ID, Company_ID) VALUES (12345, 2005, N'" & TextBox1.Text & "', 4653, 403)"
Date/Time input to SQL ServerUsually we have the knowledge about the culture and locale of the date/time being interpreted within our ASP.NET application. However, while pushing and pulling the date/time data to and from external sources, we run the risk of misinterpreting the date/time formats. This is because we cannot always guarantee the culture and locale of the external source to be the same as in our application. In SQL Server this can be solved by using the 'current language' attribute in the connectionstring of the connection being established to the SQL database. We can provide the same language setting in the connectionstring as is the culture in our application. This protects us from the risk of misinterpretation, because SQL Server always accepts and sends the date/time data in consent with the above-mentioned setting.

System.Globalization namespace

This namespace is the core of globalization and localization in the .NET Framework. The main class used in this namespace is the CultureInfo class. It holds culture-specific information, such as the date/time format, number formats, comparison information, and text information. For more information about the CultureInfo class, visit the following MSDN Web site:

Neutral cultures vs. specific cultures

A neutral culture is a culture that is associated with a language, but not a specific country or region. A specific culture is associated both with a language and a specific country.

An example: "DE" (neutral culture) is for the German language, but "de-AT" (specific culture) is for the German language as it's spoken in Austria. Neutral cultures cannot be used for formatting.

Current thread and culture awareness of .NET Framework classes

All classes and methods in the .NET Framework library where we would expect the output to be culture-dependent have two built-in behaviors:
  • They let us specify the culture code while supplying the arguments so that the output is based on the culture specified. This is optional.
  • If this is missed (usually it is), the classes are intelligent enough to keep a check on the Thread.CurrentThread.CurrentCulture property and work according to that.
We can modify this property's value with code that is similar to the following:
    Dim ci As CultureInfo
        ci = New CultureInfo("de-AT")
        Thread.CurrentThread.CurrentCulture = ci
In this code example, "de" represents the German language, and "AT" represents Austria. So, in this instance, the DateTime.Now().ToString method would return the date and time in a format corresponding the way the date and time are expressed in the German language in Austria.

The framework ensures (as follows) that the CurrentCulture property is always initialized:
  1. Whatever it is set to programmatically.
  2. In case it is not explicitly set by the programmer, the property is picked from the configuration files (<globalization> tag).
  3. If the property is missing there, it is the culture on which the Web server is running. This is usually the neutral culture that corresponds to the language of the operating system.

Resource files

All .resx, .resource files, and files that have the Build Action attribute set to Embedded Resource that are added to an ASP.NET project in Visual Studio .NET, are automatically compiled and embedded within application assembly as part of its manifest. This can even be done manually by using the Resource File Generator (RESGEN) utility via a Visual Studio .NET command prompt. For more information, visit the following MSDN Web site:This is a general concept that is applicable whenever we need to manage application resources that are unrelated to globalization. However, when we are implementing globalization, we should use satellite assemblies.

Satellite assemblies

Satellite assemblies can be used in an ASP.NET project when you make sure the following are true:
  1. All the user-interface elements in all aspx files need to be equipped with id and runat=server attributes.
  2. We create separate .resx files. Each one must correspond to each culture we want our application to support.
  3. We must decide a common first name for all these files for ex. 'Strings'.
  4. We name the separate .resx files with the following naming convention commonfirstname. languagecode-regioncode.resx (for example: Strings.de-AT.resx, Strings.en-GB.resx ).
  5. We should have the resource file commonfirstname.resx (Strings.resx) that has all the strings as we want displayed in the default case.
  6. Write code to detect user's culture and set the Thread.CurrentThread.CurrentUICulture property to match to it.
  7. Write code to load the resources by using the ResourceManager class.
  8. Write code to extract strings from the loaded object, and assign them to user interface elements.
When you have performed these steps, Visual Studio.NET will compile Strings.resx and embed it into the application assembly (MyGlobalizationTestProjectName.dll). However, for all other .resx files, it will generate separate dll files which do not have executable code but only resource data. These actually are called satellite assemblies. Also, Visual Studio .NET places these in folder structure similar to the following:
MyGlobalizationTestProjectName
        |------- bin
                |------en-US
                        MyGlobalizationTestProjectName.resources.dll
                |------ja-JP
                        MyGlobalizationTestProjectName.resources.dll
                |------de-AT
                        MyGlobalizationTestProjectName.resources.dll

Difference between CurrentCulture and CurrentUICulture

While the methods of classes in the System.Globalization namespace depend on the Thread.CurrentThread.CurrentCulture property to give their output, the ResourceManager class that loads the resource assembly depends on the Thread.CurrentThread.CurrentUICulture property to load the appropriate satellite assembly. The following is an example of C# code:
using System.Globalization;
using System.Threading;
using System.Resources;

//Load resources. 
protected ResourceManager gStrings = new ResourceManager("MyGlobalizationTestProjectName.strings", typeof(MyTestWebFormName).Assembly);	

// Get the user's preferred language.
string sLang = Request.UserLanguages[0];
// Set the thread's culture for formatting, comparisons, etc.   
Thread.CurrentThread.CurrentCulture =  CultureInfo.CreateSpecificCulture(sLang);
// Set the thread's UICulture to load resources
// from satellite assembly.
Thread.CurrentThread.CurrentUICulture = new CultureInfo(sLang);

private void Page_Load(object sender, System.EventArgs e) 

{ 

 if (!IsPostBack)  
 
 {      
// Get strings from resource file and assign to UI elements.
head1.InnerHtml = gStrings.GetString("satellite.head1");
p1.InnerHtml = gStrings.GetString("satellite.p1");
sp1.InnerHtml = gStrings.GetString("satellite.sp1");
sp2.InnerHtml = gStrings.GetString("satellite.sp2");
butOK.Text = gStrings.GetString("satellite.butOK");
butCancel.Value = gStrings.GetString("satellite.butCancel");
   }

 }

Order in which ASP.NET selects satellite assemblies:

When you have set the thread's CurrentUICulture, ASP.NET automatically selects the resources that match, in the following order:
  • If a satellite assembly is found with a matching culture, the resources from that assembly are used.
  • If a satellite assembly is found with a neutral culture that matches the CurrentUICulture, resources from that assembly are used.
  • If a match is not found for the CurrentUICulture, the fallback resources stored in the executable assembly are used.
Note This is based on the more general Resource Fallback Process. For more information, visit the following MSDN Web site:

Manually creating satellite assemblies:

This use of satellite assemblies is where Visual Studio .NET creates the assemblies itself. Visual Studio .NET doesn't strong name satellite assemblies by default, however. If you want to change these options, you would need to create satellite assemblies manually. For more information, visit the following MSDN Web site: .

What's up with ASP.NET 2.0 on the globalization front?

The widespread usage of ASP.NET and the kinds of issues that we would see with respect to globalization features in ASP.NET 2.0 are still some distance ahead. However, it would be good to take a brief look at what direction the globalization methodology is headed for web applications.

Globalization support in ASP.NET 2.0 has undergone a radical change and Web developers have been given the ability to make the localization of Web applications as easy as it is for Windows-based applications. The following is a list of features that are the foundation of globalization methodology in ASP.NET 2.0:

Strongly-typed resources At the core of the .NET Framework 2.0 release is support for strongly-typed resources that provide developers with Intellisense and simplifies code required to access resources at runtime.

Managed Resource Editor Visual Studio .NET 2.0 includes a new resource editor with better support for creating and managing resource entries including strings, images, external files, and other complex types.

Resource generation for Web Forms Windows Forms developers have already enjoyed the benefits of automatic internationalization. Visual Studio .NET 2005 will now support rapid internationalization by automatically generating resources for Web Forms, user controls, and master pages.

Improved runtime support ResourceManager instances are managed by the runtime and readily accessible to server code through more accessible programming interfaces.

Localization expressions Modern declarative expressions for Web pages support mapping resource entries to control properties, HTML properties, or static content regions. These expressions are also extensible, providing additional ways to control the process of attaching localized content to HTML output.

Automatic culture selection Managing culture selection for each Web request can be automatically linked to browser preferences.

Resource provider model A new resource provider model allows developers to host resources in alternate data sources such as flat files and database tables, while the programming model for accessing those resources remains consistent.

For more information about globalization methodology in ASP.NET 2.0, visit the following MSDN Web site:

ASP.NET 2.0 Localization Features: A Fresh Approach to Localizing Web Applications
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnvs05/html/ASP2local.asp

Conclusion

That's all for now about globalization issues in ASP and ASP.NET. I hope this article will help a few customers troubleshoot their globalization issues in ASP and ASP.NET before they opt to contact Microsoft Support. I will end with the following thought:

"Wherever and whenever you are developing, think about the millions of people you can empower around the world. Make your solutions World-Ready! Microsoft tools and technologies make internationalization easier."

We will catch up again next month with another interesting topic.

Thank you for your time.
For more information about globalization issues in ASP and ASP.NET, see the following Microsoft Web sites:

Go Global: Localizing Dynamic Web Apps with IIS 5.0 and SQL Server
http://msdn.microsoft.com/msdnmag/issues/01/05/global/default.aspx

Go Global: Designing Your ASP-based Web Site to Support Globalization
http://msdn.microsoft.com/msdnmag/issues/0700/localize/default.aspx

315616 How To Detect a Client Language in an Active Server Pages Page in IIS
http://support.microsoft.com/?id=315616

Design and Implementation Guidelines for Web Clients - Globalization and Localization
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnpag/html/diforwc-ch07.asp

Official Microsoft site - Global Development and Computing Portal
http://www.microsoft.com/globaldev/getwr/dotneti18n.mspx

Enterprise Localization Toolkit - For Developing Localized Microsoft ASP.NET Applications
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnaspp/html/entloctoolkit.asp

839861 A System.Resources.MissingManifestResourceException exception occurs when you try to access a localized resource

Glossary

ANSI Stands for American National Standards Institute. In this context, it represents a specific codepage for a specific language/character set. Most often refers to the English codepage (windows-1252).

ASCII A 1 byte (or 7 bit) encoding scheme. Only the characters in the range 0-127 are standardized. The range 128-255 is extensions to ASCII and not part of the standard. An example of this is the difference between the upper range of the OEM ASCII chart and the VB ASCII chart.

CharSet Setting used mostly for Internet Explorer and browsers that tells the browser how to interpret the character data. Example: Response.charSet = "iso-8859-1."

Codepage A conversion table that specifies how characters are encoded (usually used for servers).

Globalization Globalization is a process of designing and creating an application so that the unique requirements of a culture, region, or national and linguistic needs can be met. In other words designing an application in a way that it can be localized later is globalization.

Locale/Culture Language and region specific formats/preferences including, date and calendar formats, time formats, currency formats, casing, sorting and string comparison, address formats, phone number formats, paper sizes, unit of measure, writing direction, etc.

LocaleID (LCID) A DWORD value that specifies the language identifier and sorting ID. It can be used to specify the specific region formats for ex date/time etc should be formatted according to.

Localizability Ability of an application to present content for the demanded language/locale.

Localization Localization is the process of translating a user interface into specific languages and/or locales.

Multibyte character set A character set in which the characters are composed of two or more bytes, such as Japanese. UTF-8 also falls under this category. (Unicode technically is in this category, but in Windows, it has its own category.)

Unicode A 2-byte encoding scheme. Windows uses Unicode internally. Any APIs specifically for Unicode are signified by a "W" on the end of the function name. Also known as wide char; cannot be directly used from web applications.

UTF-8 A character encoding where a character can be represented by 1-6 bytes. In Windows, the range is 1-3 bytes. Not supported under NT4 for web applications. For more information, click the following article number to view the article in the Microsoft Knowledge Base:

175392 UTF8 support


Wide character set An alias for Unicode. Also known as DBCS (double byte character set), UCS-2, UTF-16.
As always, feel free to submit ideas on topics you want addressed in future columns or in the Knowledge Base by using the Ask For It form.

Modification Type:MajorLast Reviewed:4/26/2006
Keywords:kbASPNET kbhowto kbASP KB893663 kbAudITPRO kbAudDeveloper