Comparing SQL collations to Windows collations (322112)
The information in this article applies to:
- Microsoft SQL Server 2000 (all editions)
- Microsoft SQL Server 2005 Developer Edition
- Microsoft SQL Server 2005 Enterprise Edition
- Microsoft SQL Server 2005 Express Edition
- Microsoft SQL Server 2005 Standard Edition
- Microsoft SQL Server 2005 Workgroup
This article was previously published under Q322112 SUMMARYIn Microsoft SQL Server 2000 and in Microsoft SQL Server 2005, a "collation" specifies how strings are
compared and sorted, and what character set is used for non-Unicode data. SQL
Server 2000 supports two types of collations: - SQL collations
- Windows collations
For a description of each type of collation and a good
overview of how to decide which collation to use, see the "Selecting Collations" topic in SQL Server 2000 Books Online, or see the "Collation Types" topic in SQL Server 2005 Books Online. This article discusses additional considerations that may affect your
decision about whether to choose a Windows collation or a SQL collation when
you install SQL Server 2000 or SQL Server 2005. MORE INFORMATIONCollation semantics For a Windows collation, a comparison of non-Unicode data is
implemented by using the same algorithm as Unicode data. Both Unicode and
non-Unicode sorting are compatible with string comparison rules in a particular
version of Windows. This provides consistency across data types in SQL Server.
It also enables developers who use the CompareString Win32 API function to sort strings in their application by using the
same rules that SQL Server uses. In a SQL collation, SQL Server
defines different comparison semantics for non-Unicode data. SQL Server bases
these comparison semantics on a SQL "sort order." For a mapping of sort orders
to SQL collations, see the "SQL Collation Name" topic in SQL Server Books
Online. A SQL collation's rules for sorting non-Unicode data are
incompatible with any sort routine that is provided by the Microsoft Windows
operating system; however, the sorting of Unicode data is compatible with a
particular version of the Windows sorting rules. Because the comparison rules
for non-Unicode and Unicode data are different, when you use a SQL collation
you might see different results for comparisons of the same characters,
depending on the underlying data type. For example, if you are using the SQL
collation "SQL_Latin1_General_CP1_CI_AS", the non-Unicode string 'a-c' is less
than the string 'ab' because the hyphen ("-") is sorted as a separate character
that comes before "b". However, if you convert these strings to Unicode and you
perform the same comparison, the Unicode string N'a-c' is considered to be
greater than N'ab' because the Unicode sorting rules use a "word sort" that
ignores the hyphen. String compare performance Unicode sorting rules are much more complex than the rules for a
non-Unicode SQL sort order. When SQL Server compares Unicode data, the
characters are assigned a weight that is dynamically modified based on the
collation's locale. The data is also modified by comparison style settings such
as width, accent, or Kana-sensitivity. The Unicode sort routines support more
intelligent sort behaviors like word sorting. Additionally, because the routines must handle Unicode data,
they are flexible enough to handle the sorting and the comparison of several
thousand distinct characters, instead of the maximum 255 characters that most
SQL Server sort orders can handle. For these reasons, raw string comparison
work that uses the Unicode sorting rules is typically more expensive in terms
of both time and CPU cycles than a similar string comparison that uses a
non-Unicode SQL sort order. What this means for the possible
combinations of data types and collation types in SQL Server: - If you are storing and handling your data by using
non-Unicode data types (char, varchar, text), and you are using a SQL collation, string comparisons will be
performed with a non-Unicode SQL sort order.
- If you are storing and handling your data by using
non-Unicode data types (char, varchar, text), and you are using a Windows collation, string comparisons will
be performed with the Unicode sorting rules. This may cause certain operations
that are unusually dependent on string sorting performance to take longer and
to use more CPU than a similar operation that is performed with a SQL
collation.
- If you are using Unicode data types (nchar, nvarchar, ntext), there is no difference in the sorting behavior for the SQL and
the Windows collations. Both will use Unicode sorting rules.
Generally, the degree of performance difference between the
Windows and the SQL collations will not be significant. The difference only
appears if a workload is CPU-bound, rather than being constrained by I/O or
by network speed, and most of this CPU burden is caused by the
overhead of string manipulation or comparisons performed in SQL Server. An
example of an application where the performance difference might be pronounced
is a system where an application passes a long string value to a SQL Server
stored procedure. The stored procedure then parses the string through extensive
use of Transact-SQL string manipulation functions like CHARINDEX or PATINDEX.
If the workload is fairly one-dimensional and it is dominated by executions of
this string parsing stored procedure, the difference in performance between a
SQL collation and a Windows collation might be noticeable. However, the design
of most applications does not lead to a situation where the performance
difference is significant. Recommendations- SQL collations are provided for backward compatibility with
earlier versions of SQL Server. Windows collations provide consistent string
comparisons for both Unicode and for non-Unicode text in SQL Server that are
also consistent with string comparisons in the Windows operating system. For
all these reasons, Windows collations are preferred unless there are backward
compatibility issues or specific performance issues that require a SQL
collation.
- If you are considering a SQL collation based only on the
performance characteristics of a SQL collation, realize that the
performance of most applications does not benefit significantly from a change
in collation. Make sure that you have isolated queries that show a benefit from
a SQL collation. As soon as you identify the affected queries, consider the
following alternatives to a change in collation. Both of these alternatives may provide a performance benefit that is greater than what you will see if you change the instance collation to a SQL collation:
- If the overhead for the Windows collations is traced to
Transact-SQL routines that perform explicit string manipulation or parsing, and if you are using
non-Unicode data types, you may want to specify a SQL collation or a binary
Windows collation for the operation that is frequently executed and that is
most expensive. Suppose you use the PATINDEX function to determine whether a
text column in a table contains the "x" character. If you force a SQL collation
for that particular comparison operation, and you continue to use a Windows
collation for the rest of the database and the application, you do not have to
change the collation for the whole system:
SELECT PATINDEX ('%x%', MemoFld COLLATE SQL_Latin1_General_Cp1_CI_AS) FROM ... - If the overhead for the Windows collations is traced to
more mundane queries that do not use complex string manipulation functions,
improved index or query designs might provide improvements that dwarf those you
would see by changing to a SQL collation. A query that can be satisfied by
highly selective seeks on appropriate indexes will not be sensitive to minor
changes in string comparison cost. In contrast, a small amount of overhead per
string comparison can add up quickly in a query that must perform a table scan
and compare a particular value to each of millions of rows. If you prevent
large table or index scans from the query plan by changing the indexing or the
query itself, your query will perform much faster than it would if you change
to a SQL collation.
Note There is a third type of collation that is a variation of a SQL
collation. This third collation is known as a "compatibility collation" or an
"obsolescent collation." A compatibility collation is a set of sorting and
comparison rules that do not have a predefined collation name in SQL Server
2000. For example, if you set up SQL Server 7.0 with an inconsistent
case-sensitivity setting for Unicode and for non-Unicode data, you will have a
compatibility collation when you upgrade this instance of SQL Server 7.0 to SQL
Server 2000. In the discussion earlier in this article, the information about
SQL collations also applies to compatibility collations.
For more information about compatibility
collations, click the following article number to view the article in the Microsoft Knowledge Base:
270042
INF: Description of SQL Server compatibility collations
Modification Type: | Major | Last Reviewed: | 12/13/2005 |
---|
Keywords: | kbhowto kbinfo KB322112 kbAudDeveloper kbAudITPRO |
---|
|