DataSet contains duplicate records based on the SQL outer join query (317058)



The information in this article applies to:

  • Microsoft ADO.NET (included with the .NET Framework) 1.0
  • Microsoft ADO.NET (included with the .NET Framework 1.1)
  • Microsoft ADO.Net 2.0
  • Microsoft Visual Basic .NET (2002)
  • Microsoft Visual Basic .NET (2003)
  • Microsoft Visual Basic 2005

This article was previously published under Q317058
This article refers to the following Microsoft .NET Framework Class Library namespaces:
  • System.Data
  • System.Data.SqlClient

SYMPTOMS

If you call the Fill method of the DataAdapter object twice on a DataTable in a DataSet that is based on a SQL outer join statement, if you do not first clear the current contents of the DataTable, the DataTable may contain two copies of each record.

CAUSE

The records are duplicated because a DataTable that is based on a SQL outer join query does not contain primary key information to identify each record uniquely. Instead of merging with the existing records based on key information, a second set of exactly the same records is appended.

RESOLUTION

To resolve this problem, clear the DataTable before you reload the data. To do this, call the Clear method of the DataTable or the DataSet before you call the Fill method of the DataAdapter the second time.

This is the recommended method. This method is more efficient than if you require the DataAdapter to merge the new data with the existing data.

STATUS

This behavior is by design.

MORE INFORMATION

A SQL outer join query requests records from two or more tables, even when some of the records in one table may have no matching records in the other table. For example, if you want a list of all of the customers in the Northwind sample database, including those who had never placed an order, you use the following query:
USE Northwind
SELECT Customers.CustomerID, Customers.ContactName, Orders.OrderID, Orders.OrderDate
FROM Customers LEFT OUTER JOIN Orders
ON Customers.CustomerID = Orders.CustomerID
ORDER BY Customers.CustomerID
				
In the case of a customer who had not placed any orders, the results appear as follows:

CustomerIDContactNameOrderIDOrderDate
FISSADiego RoelNULLNULL

Because the Primary Key column of the Orders table (OrderID) may contain these NULL values, Microsoft SQL Server marks the column as nullable. However, because the DataSet does not accept a nullable Primary Key column, the DataSet does not import any primary key information with the table meta data. The lack of primary key information causes the records to be duplicated instead of merged.

Steps to reproduce behavior

  1. Start Visual Studio .NET or Visual Studio 2005, and then create a new Visual Basic Windows application.
  2. Add the following statements at the top of the code window for the default form:
    Imports System.Data
    Imports System.Data.SqlClient
    					
  3. Drag a Button control from the toolbox onto the form.
  4. Add the following code in the Button1_Click event:
    Try
       Dim cn As SqlConnection
       Dim cmd As SqlCommand
       Dim da As SqlDataAdapter
       Dim ds As DataSet
       Dim dt As DataTable
       Dim strSelect As String
       Dim duplicateRows As DataRow()
       cn = New SqlConnection("Data Source=YourSQLServer;" & _
         "Initial Catalog=Northwind;User ID=YourUser;Password=YourPassword;")
       cmd = New SqlCommand()
       cmd.CommandText = "SELECT Customers.CustomerID, " & _
         "Customers.ContactName, Orders.OrderID, Orders.OrderDate " & _
         "FROM Customers LEFT OUTER JOIN Orders " & _
         "ON Customers.CustomerID = Orders.CustomerID " & _
         "ORDER BY Customers.CustomerID"
       Console.WriteLine(cmd.CommandText)
       cmd.Connection = cn
       da = New SqlDataAdapter(cmd)
       ds = New DataSet()
       da.Fill(ds, "CustomerOrders")
       dt = ds.Tables("CustomerOrders")
       Console.WriteLine("Row count after first Fill: " & _
         dt.Rows.Count.ToString)
       strSelect = "CustomerID='ALFKI' AND OrderID=10643"
       duplicateRows = dt.Select(strSelect)
       Console.WriteLine("Rows for ALFKI order 10643: " & _
          duplicateRows.Length)
       da.Fill(ds, "CustomerOrders")
       Console.WriteLine("Row count after second Fill: " & _
          dt.Rows.Count.ToString)
       duplicateRows = dt.Select(strSelect)
       Console.WriteLine("Rows for ALFKI order 10643: " & _
          duplicateRows.Length)
       Catch exc As Exception
         MessageBox.Show(exc.Message)
       End Try
    					
  5. Modify the connection string in the declaration for cn as necessary for your environment.
  6. Run the project. Review the results in the Output window. Notice that, after the second execution of the Fill method of the DataAdapter, the "CustomersOrders" DataTable contains duplicate records.

Modification Type:MinorLast Reviewed:10/3/2006
Keywords:kbtshoot kbDataAdapter kbprb kbSqlClient kbSystemData KB317058 kbAudDeveloper