INFO: BIFF8 BOUNDSHEET Record Data for Uncompressed Unicode (187919)
The information in this article applies to:
- Microsoft Excel 2002
- MSPRESS Microsoft Excel 97 Developer's Kit ISBN 1-57231-498-2
- Microsoft Visual C++, 32-bit Editions 5.0
This article was previously published under Q187919 SUMMARY
The Binary Interchange File Format version 8.0 (BIFF8) record data
information in the Microsoft Developer Network (MSDN) and in the "Microsoft
Excel 97 Developer's Kit" book does not mention a new flag that specifies
whether the name of the worksheet is represented in uncompressed Unicode.
Without this information, a developer might interpret the name field of the
BOUNDSHEET record incorrectly if the name is stored in uncompressed
Unicode.
The "BIFF8 Record Data" table at the top of page 291 of the "Microsoft
Excel 97 Developer's Kit" book states that the cch (count of characters)
field beginning at offset 10 is two bytes in size. This is incorrect and
should state that the cch field is one byte, and that there is a one-byte
flag field that reflects whether the name field is stored in compressed
Unicode (one byte per character) or uncompressed Unicode (two bytes per
character) at offset 11.
NOTE: The BOUNDSHEET record is entitled "BUNDLESHEET" by the Microsoft
Biffview utility program.
MORE INFORMATION
The default representation for sheet name is compressed Unicode. Compressed
Unicode uses one byte to represent the two-byte Unicode value of a
character. It correctly assumes the high-order byte is zero, and stores
only the low-order code for the letter or number at that character
location.
If the sheet name is truly double-byte code, it is stored as uncompressed
Unicode. Each character requires two bytes. Consequently, the name requires
more space than that required for compressed Unicode.
The BIFF8 record uses the single byte at offset 11 to hold a flag
indicating uncompressed Unicode. If that flag is binary one, the cch value
at offset 10 is the count of double-byte characters beginning at offset 12.
The BIFF8 Record Data table at the top of Page 291 should read as follows:
OFFSET NAME SIZE CONTENTS
----------------------------------------------------------------------
4 lbPlyPos 4 Stream position of the start of the BOF
record for the sheet.
8 grbit 2 Option flags.
10 cch 1 Length of sheet name in characters, not bytes.
11 grbitChr 1 Compressed/uncompressed Unicode.
12 rgch var Sheet name.
The following examples compare values with and without Unicode compression:
Uncompressed: Beginning at Offset 4 (16 bytes)
20 0b 00 00 00 00 04 01 e5 5d 5c 4f 68 88 31 00
The BOF for this sheet starts at 00 00 0b 20. Note the byte-swapping that
is explained on page 268 of the printed edition of the Excel SDK.
The option flags 00 00 tell you that this BOUNDSHEET record applies to a
visible worksheet.
The cch value of 04 says the sheet name is 4 characters long.
The grbitChr value 01 means the sheet name is uncompressed Unicode, and
each character is stored in 2 bytes - seen in the rgch field.
In the next 8 bytes the rgch field stores 5d-e5 4f-5c 88-68 00-31
Compressed: Beginning at Offset 4 (14 bytes)
17 0d 00 00 00 00 06 00 53 68 65 65 74 32
The BOF for this sheet starts at 00 00 0d 17.
The option flags, 00 00, tell you that the BOUNDSHEET record applies to a
worksheet that is visible.
The cch value of 06 says the sheet name is 6 characters long.
The grbitChr value of 00 means the sheet name is compressed Unicode, each
character in the name is stored in one byte, with an assumed value of 00hex
in the missing high-order byte of the character.
Hence, the sheet name is in 6 characters stored in 6 bytes as 53 68 65 65
74 32.
REFERENCES
"Microsoft Excel 97 Developer's Kit", Microsoft Press, ISBN 1-57231-498-2
Modification Type: | Major | Last Reviewed: | 12/15/2003 |
---|
Keywords: | kbinfo KB187919 |
---|
|