The sort order used by lstrcmp and lstrcmpi is:
- Nonalphanumeric characters (in ASCII/ANSI order)
- Numeric characters
- Alphabetic characters
For performance reasons, the internal sort routine treats accented
characters as nonalphanumeric. Therefore, when the internal routine is
used, accented characters appear towards the beginning of the sort between
punctuation and numbers. In contrast, when a language driver is used,
accented characters appear near their unaccented equivalents because the
language drivers sort accented characters as alphabetic characters.
The following illustrate the differences in character order ("..."
signifies omitted characters):
ANSI Order:
!"#...0...9:;<...ABC...XYZ[\]{|}...accented characters
Internal (English/American) Sort Routine Order:
!"#...:;<...[\]...{|}... accented characters 0...9AaBbCc...XxYyZz
Language Driver Order:
...!"#...:;<...[\]...{|}...0...9A accented characters aBbCc...XxYyZz...
Note that the accented characters are intermixed with their alphabetic
counterparts here.
Primary and secondary weights of characters also affect the sort order when
a language module is installed. In this case, sorting is done by primary
weight for the entire length of the string, then by length, and lastly by
secondary weight if the primary weights of all the characters and the
lengths of the strings are equal. The secondary (diacritic) weights are
important only when there is a tie in the entire string. The internal
(English/American) sort routine does not sort extended characters; as
mentioned above, they are sorted as punctuation rather than alphabetic
characters. Therefore, the internal routines produce completely different
results than the language routines in some cases.
Character weights are important with case-sensitive sorting also. For
example, using lstrcmp will produce: A < a < B < b; it will also produce:
Aaa < aaa < Aab. These examples use proper dictionary sort order, but the
second example is not necessarily obvious because if A < a, then it seems
Aab < aaa should also be true. In that case, it is said that A and a
"collide" (that is, their primary weights are the same) and a delayed
comparison must be performed if the remainders of the strings are equal.
The strings continue to be compared character by character. Because a < b,
then aaa < Aab and the comparison is complete. If the strings were equal
all the way through (such as Aaa and aaa) then A and a would collide once
again. The rest of the strings would be equal, and then the secondary
weights of A and a would be checked to determine that Aaa < aaa.
Note: Special notation is used below to represent accented characters
due to limitations in the distribution media for this article. For
example <u umlaut> is used to represent the letter "u", which has an
umlaut over it. Likewise, <a tilde> represents the letter "a" accented
with a tilde.
Nonaccented Characters
The sort works on a character-by-character comparison, checking
primary weights in a string. As soon as the primary (alphabetic)
weights show one string greater than the other, the comparison stops.
Therefore, sorted lists resemble the following:
Accented Characters
The secondary (diacritic) weights are important only when there is a tie in
the string. For example, in the following sorted list, the characters in
the first two strings have identical primary sort weights ("s", "a", "m",
"e") and the strings are the same length. Because of the tie in primary
weights and string lengths, the secondary weights are then compared. The
first difference in secondary weight ("a" versus <a tilde>) breaks the tie.
Secondary weights are also a factor when comparing strings 4 and 5 below:
- same
- s<a tilde>me
- sandy
- schon
- sch<o umlaut>n
- school
It is important to apply the primary weights to the whole string first, and
only use the secondary weights in a tie. The primary weights must also
carry more importance than the secondary weights as well. Otherwise, an
incorrect sort would result (such as schon less than school less than sch<o
umlaut>n). This type of weighting creates a sort that makes a distinction
between "unique," hard-coded letters of a language and mere variants of a
letter (which are only distinguished by diacritics).
Here is a sorted list using the English (International) driver:
<a grave>
<a grave>pple
<a grave>pples
l<u umlaut>
l<u umlaut>b
Here is the same list, using the internal [English (American)]
<a grave>
<a grave>pple
<a grave>pples
l<u umlaut>
l<u umlaut>b