BUG: DOC: IGNORE_DUP_ROW and Tables with Text Data (187966)
The information in this article applies to:
This article was previously published under Q187966
BUG #: 17982 (SQLBUG_65)
SYMPTOMS
The documentation for the IGNORE_DUP_ROW option for creating a nonunique
clustered index is incomplete in SQL Server Books Online for SQL Server
versions 6.0 and 6.5. Due to the nature of the way text and image data is
stored, the option may not eliminate duplicate rows if the table contains
columns of either text or image data types. If the table has text or image
data, the only case in which a duplicate row will be eliminated is if the
text or image field has not been initialized. This information was
inadvertently omitted from the Books Online for SQL Server 6.0 and 6.5.
CAUSE
When text or image columns exist, the table simply stores the address
pointer to the beginning of a text or image page chain in the associated
column. Each of these pointers is unique. The IGNORE_DUP_ROW option checks
these pointer values, rather than the actual text or image data. Therefore,
even if the text or image data is exactly the same, the pointers in the
column will have different values, so the row will not qualify as unique.
If the value of a text or image column is uninitialized, the pointers will
evaluate the same, so the duplicate row is eliminated.
WORKAROUND
The documentation should state that the IGNORE_DUP_ROW option will not
eliminate duplicate rows on a table with text or image data.
If you need to find and eliminate duplicates rows in a table with text or
image data, you can use the following query to identify rows that may be
duplicates:
SELECT col1, col2, col3, ..... count(*)
FROM t1
GROUP BY col1, col2, col3, ...
HAVING count(*)>1
Include all of the table columns in the select and group by lists, except
for the text or image columns. A maximum of 16 columns may be specified in
a GROUP BY clause. If a table has more than 16 columns, pick the 16 columns
that have the best chance of being unique for the table.
This will give you a list of rows that may be duplicates, excluding the
text or image data. You must manually compare the text or image data for
the rows to determine whether they are duplicates and need to be
eliminated.
For more information on removing duplicate rows, see the following article
in the Microsoft Knowledge Base:
139444
: INF: How to Remove Duplicate Rows From a Table
STATUS
Microsoft has confirmed this to be a problem in the Books Online for SQL
Server versions 6.0 and 6.5. We are researching this problem and will post
new information here in the Microsoft Knowledge Base as it becomes
available.
Modification Type: | Major | Last Reviewed: | 10/3/2003 |
---|
Keywords: | kbBug kbpending KB187966 |
---|
|