FIX: Incorrect Cardinality Estimate And Bad Plan for Query with OR and AND on Same Column (302615)



The information in this article applies to:

  • Microsoft SQL Server 7.0

This article was previously published under Q302615
BUG #: 101624 (SQLBUG_70)

SYMPTOMS

When dealing with an expression of the form ((col = a OR col < b) AND col BETWEEN c AND d), the SQL Server optimizer may incorrectly estimate the number of rows that meet the condition. The estimate will be too low.

For more complex queries, this estimate being too low may lead to a bad query plan, such as choosing the wrong join order or the wrong index.

RESOLUTION

To resolve this problem, obtain the latest service pack for Microsoft SQL Server 7.0. For additional information, click the following article number to view the article in the Microsoft Knowledge Base:

301511 INF: How to Obtain the Latest SQL Server 7.0 Service Pack

NOTE: The following hotfix was created prior to Microsoft SQL Server 7.0 Service Pack 4.

The English version of this fix should have the following file attributes or later:
   Version      File name       Platform
   --------------------------------------

   7.00.998     s70998i.exe     x86
   7.00.998     s70998a.exe     Alpha
				
NOTE: Because of file dependencies, the most recent hotfix or feature that contains the preceding files may also contain additional files.

STATUS

Microsoft has confirmed that this is a problem in the Microsoft products that are listed at the beginning of this article. This problem was first corrected in Microsoft SQL Server 7.0 Service Pack 4.

MORE INFORMATION

You can view the estimated rows for the expression by looking at the EstimateRows column of the SHOWPLAN_ALL or STATISTICS PROFILE output.

In most cases, the estimates for a query using an OR expression should be the sum of the estimates for each individual expression. For example, if you have an expression (col = a OR col < b), the estimates should be calculated as the sum of the estimates for (col = a) and (col < b). If the value b is larger than the value a, estimating the expression individually like this results in too large of an estimate (the rows that match a would are counted twice), so special consideration is given for this case. However, in no situation should the estimate for the OR as a whole be lower than that of each of the individual expressions in the OR.

Following is an example that shows the bug in the pubs database. Also included is sample output from SHOWPLAN_ALL, which has been edited to remove irrelevant columns:
select count(*) from sales where 
   (qty = 20 or qty < 25) and qty between 1 and 50
				
StmtText                                                                                                                                EstimateRows
--------------------------------------------------------------------------------------------------------------------------------------- ------------    
select count(*) from sales where ((qty = 20) or (qty < 25)) and qty between 1 and 50                                                    1.0
  |--Stream Aggregate(DEFINE:([Expr1002]=Count(*)))                                                                                     1.0
       |--Filter(WHERE:([sales].[qty]=20 OR [sales].[qty]<25))	                                                                        1.0
            |--Clustered Index Scan(OBJECT:([pubs].[dbo].[sales].[UPKCL_sales]), WHERE:([sales].[qty]>=1 AND [sales].[qty]<=50))        20.0
				
Note that in the preceding output, the Filter condition for qty=20 or qty < 25 has an EstimateRows value of 1.0. However, there are 4 rows that qualify for the qty=20 expression, and 12 rows that qualify for qty < 25. Clearly the estimate of 1 row output is too low.

Note that this only occurs if there is an OR on the same column, and an additional, outer expression on the same column.

Modification Type:MajorLast Reviewed:10/15/2002
Keywords:kbBug kbDSupport KB302615