BUG: DOC: IGNORE_DUP_ROW and Tables with Text Data (187966)



The information in this article applies to:

  • Microsoft SQL Server 6.5

This article was previously published under Q187966
BUG #: 17982 (SQLBUG_65)

SYMPTOMS

The documentation for the IGNORE_DUP_ROW option for creating a nonunique clustered index is incomplete in SQL Server Books Online for SQL Server versions 6.0 and 6.5. Due to the nature of the way text and image data is stored, the option may not eliminate duplicate rows if the table contains columns of either text or image data types. If the table has text or image data, the only case in which a duplicate row will be eliminated is if the text or image field has not been initialized. This information was inadvertently omitted from the Books Online for SQL Server 6.0 and 6.5.

CAUSE

When text or image columns exist, the table simply stores the address pointer to the beginning of a text or image page chain in the associated column. Each of these pointers is unique. The IGNORE_DUP_ROW option checks these pointer values, rather than the actual text or image data. Therefore, even if the text or image data is exactly the same, the pointers in the column will have different values, so the row will not qualify as unique. If the value of a text or image column is uninitialized, the pointers will evaluate the same, so the duplicate row is eliminated.

WORKAROUND

The documentation should state that the IGNORE_DUP_ROW option will not eliminate duplicate rows on a table with text or image data.

If you need to find and eliminate duplicates rows in a table with text or image data, you can use the following query to identify rows that may be duplicates:
   SELECT col1, col2, col3, ..... count(*)
   FROM t1
   GROUP BY col1, col2, col3, ...
   HAVING count(*)>1
				

Include all of the table columns in the select and group by lists, except for the text or image columns. A maximum of 16 columns may be specified in a GROUP BY clause. If a table has more than 16 columns, pick the 16 columns that have the best chance of being unique for the table.

This will give you a list of rows that may be duplicates, excluding the text or image data. You must manually compare the text or image data for the rows to determine whether they are duplicates and need to be eliminated.

For more information on removing duplicate rows, see the following article in the Microsoft Knowledge Base:

139444 : INF: How to Remove Duplicate Rows From a Table

STATUS

Microsoft has confirmed this to be a problem in the Books Online for SQL Server versions 6.0 and 6.5. We are researching this problem and will post new information here in the Microsoft Knowledge Base as it becomes available.

Modification Type:MajorLast Reviewed:10/3/2003
Keywords:kbBug kbpending KB187966