INF: Optimizing Comparisons with Numeric, Integer, and Others (198625)

SUMMARY

Data type optimization with numeric, decimal, and integer data types use some very strict rules. This article explains and clarifies the conditions and constraints that the optimizer uses to perform exact numeric data type comparisons. Review the following section to help clarify the terms and concepts used in this article.

"Precision" is the number of digits in a number. "Scale" is the number of digits to the right of the decimal point in a number. For example, the number 123.45 has a precision of 5 and a scale of 2.

Due to limitations with the binary numbering system used by computers, some decimal fractions simply cannot be represented exactly. For example, the decimal fraction 0.1 does not have an exact binary representation. It can only be approximated. It is for this reason that floating point and real data types are considered approximate values; whereas integer, numeric, and decimal data types are treated as exact data types.

The terms "strict" or "exact" are referred to in their computational definition. For example, a numeric(10,1) will not exactly or strictly compare with a numeric(10,2) even if the numbers involved are mathematically equivalent. Mathematically, a value of 10.1 and 10.10 are exactly the same. However, due to the difference in scale, they are not treated as a computationally exact match.

MORE INFORMATION

To understand the optimizer's comparison choices, you must first understand the way incoming data is parsed and handled.

When the SQL Server receives a value of 10 or fewer digit positions without a decimal point, the value is treated as an integer data type.
When the SQL Server receives a value of 11 or more digit positions, it is treated as an exact data type (decimal or numeric).
When the SQL Server receives a value with a decimal point, it is treated as an exact data type.
Numeric and decimal (8,0) or smaller values that do not require decimal positions can be represented by an integer value. Whenever a strict comparison is performed on a numeric or decimal (that is, (8,0), (7,0), (6,0), and so on), the integer value is always considered a more precise data type.

Exact data types require "strict" comparison. Unlike a float or real data type, rounding is not acceptable to maintain the integrity of the data type and associated comparisons.

The optimizer makes a choice when it decides to complete the plan: is the incoming argument more or less precise than the table definition?

If the argument value is more precise than the column data, the column data must be promoted to the argument precision and scale. This requires the conversion of the data in the column, and can result in plans than contain table scans.

If the argument value is less precise than the column data, the argument can be promoted to the precision and scale of the column. This generally results in a plan that can use an index or more direct approach to the data retrieval effort.

To help clarify this concept, some examples are given below.

Example 1

create table tblTest( a numeric(8,0) PRIMARY KEY, b int)
select * from tblTest where a = 123

Referring back to the parsing rules, the 123 is fewer than 10 digits and it does not contain a decimal point. Therefore:

The incoming argument is considered an integer value.
The column definition is (8,0) or less, so the integer is considered a more precise data type. Thus, the conversion must be applied to the data column and a less than optimal plan may be selected.

Example 2

create table tblTest( a numeric(10,0) PRIMARY KEY, b int)
select * from tblTest where a = 123.

The incoming argument contains a decimal point and no decimal positions are represented in the WHERE clause so the argument is seen as an exact data type of decimal or numeric (n,0).
The column definition is (10,0) so the argument can be promoted and compared exactly. This generally results in a query plan that will use a direct index search.

Example 3

create table tblTest( a numeric(10,0) PRIMARY KEY, b int)
select * from tblTest where a = 123.0

The incoming argument contains a decimal point and a single decimal position. This results in an exact data type (n,1).

NOTE: The decimal position in this query is very important. The query submitted is stating that it mathematically wants to compare at a precision and scale of (n,1) NOT (n,0).
The column definition is (10,0) so the argument is considered a more precise data type. Thus, the conversion must be applied to the data column and a less than optimal plan may be selected.

Remember, the table definition was of an exact data type. If the server attempted to do rounding, it would not maintain the rules of an exact data type and would result in invalid result processing.

Consider the following as an example:

create table tblTest( a numeric(10,1) PRIMARY KEY, b int)
select * from tblTest where a = 123.15

If the SQL Server did not chose a plan to convert the column data of the incoming argument, (n,2) precision and scale, but instead elected to round the argument to the column precision and scale, it would elect 123.2 when the user most likely wanted to update 123.1. In this case, there should be no match.

All numeric values with scale less than 10 and precision 0 can be stored as a tinyint, smallint, or integer type, taking 1, 2, and 4 bytes of storage, respectively. By comparison, even the smallest numeric value will take 5 bytes, with more storage required as the precision increases. The following table shows the mapping between the types.

Precision	New type
<= 2	tinyint
<= 4	smallint
< 10	int

In addition to the benefits available by using indexes more effectively, you may also save considerable space by using one of these data types where appropriate.

SQL Server 7.0 adds the ALTER TABLE .. ALTER COLUMN statements, which can be used to dynamically change the data type of a column. In earlier versions of SQL Server, this conversion can be done only by creating a new table with the desired definitions and doing an INSERT .. SELECT to populate the table. Note that this change may require you to update any triggers, stored procedures, or other code that may use variables expecting the column to be a numeric type.