Unicode and Microsoft Windows NT (99884)
The information in this article applies to:
- Microsoft Windows NT Server 3.1
- Microsoft Windows NT Workstation 3.1
This article was previously published under Q99884 SUMMARY
Windows NT version 3.1 employs a relatively new standard of character
representation called Unicode. This new standard allows for greater
flexibility in adding support for localized versions of Microsoft
Windows NT.
MORE INFORMATION
The first and most prominent character standard in use by computers
today is ASCII. This format is adequate for western languages, but as
computers became more popular in European countries, the limitations
of ASCII became clear.
In an effort to overcome some of these limitations, the International
Standards Organization (ISO) established a new standard called Latin-1
that defined European characters that were omitted from ASCII.
Microsoft Windows modified the Latin-1 standard even further and
called the character set Windows ANSI. However, by continuing use of
an 8-bit coding scheme, ASCII is only capable of representing 256
unique symbols--considerably less than the 10,000 symbols that are
common in such languages as Chinese, Korean, and Japanese.
In addition to the language barriers, as the capabilities of computers
broaden beyond uppercase, mono-spaced fonts, the requirements for a
large set of unique characters (for example, letters, punctuation,
mathematical and technical symbols, and publishing characters) have
also grown far beyond the capabilities of 8-bit text.
The lowest level of localization (adaptation to a particular language)
is the actual binary representation of characters: the code set. To
overcome the limitations of the other coding methods, several major
computer companies, including Apple Computer, Inc., Sun Microsystems,
Inc., Xerox Corp., and IBM (International Business Machines Corp.),
formed Unicode Inc., a non-profit consortium, to set out to define a
new standard for international character sets. At the same time, the
ISO began developing a standard. Eventually, these standards merged
and became Unicode. Unicode is published as The Unicode Standard,
Worldwide Character Encoding.
Unicode employs a 16-bit coding scheme that allows for 65,536 distinct
characters--more than enough to include all languages in use today. In
addition, it supports several archaic or arcane languages such as
Sanskrit and Egyptian hieroglyphs. Unicode also includes
representations for punctuation marks, mathematical symbols, and
dingbats, with room left for future expansion. Because it establishes
a unique code for each character in each script, Windows NT can ensure
that the character translation from one language to another is
accurate.
Unicode in Windows NT
Unicode is the native code set of Windows NT, but the Win32 subsystem
provides both ANSI and Unicode support. Character strings in the
system, including object names, path names, and file and directory
names are represented with 16-bit Unicode characters. The Win32
subsystem converts any ANSI characters it receives into Unicode
strings before manipulating them. It then converts them back to ANSI,
if necessary, upon exit from the system.
REFERENCES
Unicode Inc.
1965 Charleston Road
Mountain View, CA 94043
Phone (415) 961-4189
"Inside Windows NT," by Helen Custer, Microsoft Press, 1992
"Program Migration to Unicode," by Amus Freytag, Proceedings of the
First Unicode Implementers Workshop, The Unicode Consortium, Mountain
View, California, August, 1991
"Adapt Your Program for Worldwide Use with Windows
Internationalization Support," by William S. Hall, Microsoft Systems
Journal, Vol. 6, No. 6, Nov./Dec. 1991
"Operating Systems Design and Implementation," by Andrew S. Tanenbaum,
Prentice-Hall, Inc., Englewood Cliffs; New Jersey, 1987
Modification Type: | Major | Last Reviewed: | 11/4/2003 |
---|
Keywords: | kbother KB99884 |
---|
|