FIX: Incorrect Cardinality Estimate And Bad Plan for Query with OR and AND on Same Column (302615)
The information in this article applies to:
This article was previously published under Q302615
BUG #: 101624 (SQLBUG_70)
SYMPTOMS
When dealing with an expression of the form ((col = a OR col < b) AND col BETWEEN c AND d), the SQL Server optimizer may incorrectly estimate the number of rows that meet the condition. The estimate will be too low.
For more complex queries, this estimate being too low may lead to a bad query plan, such as choosing the wrong join order or the wrong index.
RESOLUTIONTo resolve this problem, obtain the latest service pack for Microsoft SQL Server 7.0. For additional information, click the following article number to view the article in the
Microsoft Knowledge Base:
301511 INF: How to Obtain the Latest SQL Server 7.0 Service Pack
NOTE: The following hotfix was created prior to Microsoft SQL Server 7.0 Service Pack 4.
The English version of this fix should have the following file attributes or later:
Version File name Platform
--------------------------------------
7.00.998 s70998i.exe x86
7.00.998 s70998a.exe Alpha
NOTE: Because of file dependencies, the most recent hotfix or feature that contains the preceding files may also contain additional files. STATUSMicrosoft has confirmed that this is a problem in the Microsoft products that are listed at the beginning of this article. This problem was first corrected in Microsoft SQL Server 7.0 Service Pack 4.
MORE INFORMATION
You can view the estimated rows for the expression by looking at the EstimateRows column of the SHOWPLAN_ALL or STATISTICS PROFILE output.
In most cases, the estimates for a query using an OR expression should be the sum of the estimates for each individual expression. For example, if you have an expression ( col = a OR col < b), the estimates should be calculated as the sum of the estimates for ( col = a) and ( col < b). If the value b is larger than the value a, estimating the expression individually like this results in too large of an estimate (the rows that match a would are counted twice), so special consideration is given for this case. However, in no situation should the estimate for the OR as a whole be lower than that of each of the individual expressions in the OR.
Following is an example that shows the bug in the pubs database. Also included is sample output from SHOWPLAN_ALL, which has been edited to remove irrelevant columns:
select count(*) from sales where
(qty = 20 or qty < 25) and qty between 1 and 50
StmtText EstimateRows
--------------------------------------------------------------------------------------------------------------------------------------- ------------
select count(*) from sales where ((qty = 20) or (qty < 25)) and qty between 1 and 50 1.0
|--Stream Aggregate(DEFINE:([Expr1002]=Count(*))) 1.0
|--Filter(WHERE:([sales].[qty]=20 OR [sales].[qty]<25)) 1.0
|--Clustered Index Scan(OBJECT:([pubs].[dbo].[sales].[UPKCL_sales]), WHERE:([sales].[qty]>=1 AND [sales].[qty]<=50)) 20.0
Note that in the preceding output, the Filter condition for qty=20 or qty < 25 has an EstimateRows value of 1.0. However, there are 4 rows that qualify for the qty=20 expression, and 12 rows that qualify for qty < 25. Clearly the estimate of 1 row output is too low.
Note that this only occurs if there is an OR on the same column, and an additional, outer expression on the same column.
Modification Type: | Major | Last Reviewed: | 10/15/2002 |
---|
Keywords: | kbBug kbDSupport KB302615 |
---|
|