MORE INFORMATION
Unicode is a standard that is designed to replace the ANSI standard for
encoding characters in a numeric form. Because the ANSI standard only uses
a single byte to represent each character, it is limited to a maximum of
256 different characters. While this is sufficient for the needs of an
English speaking audience, it falls short when the worldwide software
market is considered. With the Unicode standard, each character is
represented by two bytes, so that the entire Unicode character set includes
65,536 possible locations.
Microsoft Windows NT, Microsoft Windows 2000, and Microsoft OLE 2.0 are entirely Unicode based,
and Visual Basic (4.0 and higher) represents all strings internally in
Unicode format. The AscW and ChrW functions allow access to the full range
of Unicode characters. These functions work in the same way as the original
Asc and Chr functions except that they support arguments from 0 to 65,535
instead of just from 0 to 255. Many Visual Basic objects (such as the debug
window and the label and text box) return a "?" when these objects do not
know how to display an Unicode character.
Because all strings are now represented internally in Unicode format, it
is not as simple as it used to be to represent binary data in a string.
Using the Chr function to assign data to a string does not result in the
same behavior as before. For example:
stringvar = Chr(65)
results in a two-byte long string, where byte 1 has a value of 65 and byte
2 has a value of 0 (this is the Unicode representation of the letter "A").
Be sure to keep in mind that converting from ANSI to Unicode does not
always entail just adding a second byte with a value of zero as it does in
this case. For example, most of the ANSI character codes in the range
130-159 have completely different Unicode values. Try executing a
'Debug.Print AscW(Chr(130))' and you a value of 8218 is displayed.
Currently, Microsoft Windows requires a little endian processor, which
means that in a multiple byte entity the first byte is the least
significant, and significance increases in successive bytes. This explains
why the Unicode character "A" is represented internally as the following:
-------------------
| 65 | 0 |
-------------------
byte 0 byte 1
The AscB and ChrB functions can be used to replicate what used to be
accomplished by the Asc and Chr functions, because these functions allow
the manipulation of single byte quantities. If you would like a four-byte
string that has the binary values of 65, 66, 67, and 68 consecutively then
using the Chr function will not work. You must instead use the ChrB
function. For example:
stringvar = ChrB(65) & ChrB(66) & ChrB(67) & ChrB(68)
Alternatively, you can use the ability to create arrays of the new byte
data type and manipulate your binary data that way.
Listed below is an explanation of the results of some simple uses of these
functions to further clarify this information.
Print Asc(Chr(255)) --> "255"
Nothing new here, except that the Chr function is returning a Unicode
character that occupies two bytes instead of a one-byte ANSI character.
Print Asc(ChrB(255)) --> 5 - Invalid procedure call.
This usage returns an error because the Asc function always expects at
least a two-byte parameter and the ChrB function is only returning a single
byte.
Print Asc(Chr(256)) --> 5 - Invalid procedure call.
Although the Chr function returns a two-byte Unicode character, it still
only takes numbers between 0 and 255 for its argument (note that on a DBCS
enabled system, Asc/Chr handle two-byte DBCS characters, converting them to
and from Unicode). Using the ChrW function allows access to the full 65,536
Unicode character locations.
Print AscW(ChrW(256)) --> "256"
This is the new version of the first statement in this section. The ChrW
function takes a value from 0 to 65,536 and returns that character (on
32-bit systems). The AscW function interprets this two-byte character as a
Unicode character and returns the correct Unicode value for that character.
Print Asc(ChrW(256)) --> "65"
Print Asc(ChrW(5000)) --> "63"
What is happening here is that the ChrW function is being evaluated first.
ChrW(256) is the character "A", and so the function reduces to Asc("A"),
and the Unicode (and ANSI) number for "A" is 65. Because Visual Basic
does not know how to display the character represented by Chr(5000) it just
displays a "?", and as expected, the Unicode and ANSI value for "?" is 63.
Print AscB(Chr(65)) --> "65"
Print AscB(ChrW(256)) --> "0"
Print AscB(ChrW(257)) --> "1"
Print AscB(ChrW(555)) --> "43"
Print AscB(ChrW(65535)) --> "255"
All of these return values can be explained by understanding how each
character is represented internally (see the little-endian reference above)
and by the fact that the AscB function looks only at the first byte of the
character it receives. Visually it looks like the following diagram:
-------------------
Chr(65) | 65 | 0 |
-------------------
Chr(256) | 0 | 1 |
-------------------
Chr(257) | 1 | 1 |
-------------------
Chr(555) | 43 | 1 |
-------------------
Chr(65535)| 255 | 255 |
-------------------
byte 0 byte 1
The AscB function just returns whatever the first byte of the character is.
Print ChrB(65) --> ""
Visual Basic prints nothing for this call to the ChrB function because the
ChrB function is only returning a one-byte string. One byte strings like
this mean nothing to Visual Basic because they do not constitute a valid
Unicode character (or series of characters).
Print ChrB(65) & ChrB(0) --> "A"
In this case, we are concatenating two one-byte strings into a single
two-byte string. Because the resulting bit pattern is the same as the bit
pattern for the Unicode "A", that is what Visual Basic prints.