FIX: UNICODE Byte Order Marks Ignored by Internet Explorer 4.0x (190837)
The information in this article applies to:
- Microsoft Internet Explorer (Programming) 4.0
- Microsoft Internet Explorer (Programming) 4.01
- Microsoft Internet Explorer (Programming) 4.01 SP1
This article was previously published under Q190837 SYMPTOMS
A UNICODE HTML page that include a Byte Order Mark will display garbage
characters when displayed in Internet Explorer 4.0 and 4.01. If the UNICODE
HTML page is in Big-Endian order then the viewed source of the page will
also contain garbage characters, primarily a black box character.
CAUSE
Internet Explorer 4.0X does not support the use of Byte Order Marks in
UNICODE HTML files.
RESOLUTION
The only resolution at this time is to normalize and strip Byte Order Marks
from UNICODE HTML files before display in Internet Explorer 4.0 or 4.01.
Normalization requires that all UNICODE characters in an HTML file be in
Little-Endian format, least significant byte first. For instance, in Big-
Endian format, the left angle bracket UNICODE character would appear as 00
3C in a binary dump. This character would need to be byte swapped to 3C 00,
Little-Endian format, before being processed and displayed by Internet
Explorer.
STATUSMicrosoft has confirmed that this is a bug in the Microsoft products that are listed at the beginning of this article. This bug was corrected in Microsoft Internet Explorer 5. MORE INFORMATION
The UNICODE Specification Version 2.0 describes a "Byte Order Mark" in
section 2.4, but does not insist on its use. According to the
specification, the byte sequence FE FF at the beginning of a file indicates
that the following characters are probably UNICODE, normalized for the
memory architecture of the current machine. If the byte sequence FF FE is
found at the beginning of a file it indicates that the remaining bytes are
not normalized and should be byte swapped before use.
Windows, and therefore Internet Explorer, assumes that the memory
architecture of the machine they are running on is "Little Endian." The
first byte of a two-byte sequence is actually the least significant byte.
In UNICODE 00 3C is the right angle bracket character. In Little Endian
this character would be stored in memory as 3C 00, the least significant
portion, 3C, comes first.
A Little Endian format UNICODE HTML file is permitted by the UNICODE
standard to include the UNICODE Byte Order Mark FE FF at the beginning of
the file. In Little Endian, the Byte Order Mark is swapped like all
characters so a binary dump of the Byte Order Mark would actually display
as FF FE. In other words, the Byte Order Mark is UNICODE FE FF, but since
Little Endian machines automatically swap their bytes, a binary dump of the
mark would be FF FE.
When Internet Explorer 4.0x processes a UNICODE HTML file containing the
Little Endian, FF FE, normalized UNICODE mark, it ignores the purpose of
the mark and displays the Byte Order Mark as two UNICODE characters, in
English these characters look like garbage characters, somewhat like "py".
If the UNICODE non-normalized, Byte Order Mark, FF FE, is encountered in a
file, it indicates that the characters should be byte swapped (in a Little
Endian architecture FF FE would appear as FE FF if the file were dumped).
Internet Explorer does not recognize this form of the Byte Order Mark
either, and since UNICODE FF FE is not a valid UNICODE character, Internet
Explorer will not display garbage characters. Internet Explorer 4.0X will
also not swap the bytes, so nothing at all will be displayed. If the HTML
source for the page is viewed from within Internet Explorer a mix of valid
and invalid characters will be seen, the invalid characters appearing as
small, dark box characters.
REFERENCES
For additional information, please see the following
article(s) in the Microsoft Knowledge Base:
102025 Explanation of Big Endian and Little Endian Architecture
The UNICODE Standard, Version 2.0, The Unicode Consortium
Modification Type: | Major | Last Reviewed: | 10/16/2002 |
---|
Keywords: | kbBug kbhtml kbie500fix kbIntl kbIntlDev KB190837 |
---|
|