FIX: Losing DTC Service May Leave Orphaned Transactions (195542)



The information in this article applies to:

  • Microsoft SQL Server 6.5

This article was previously published under Q195542
BUG #: 17882 (SQLBUG_65)

SYMPTOMS

When a connection is involved in a distributed transaction, SQL Server relies on the transaction coordinator (Microsoft Distributed Transaction Coordinator, or DTC) to notify it regarding transaction outcome so that it knows whether to commit or abort the transaction. DTC uses a two phase commit protocol among all enlisted servers and only commits if all servers have indicated that they are prepared to commit.

Any time before the beginning of the two phase commit process, you can use the Transact-SQL KILL statement to close a SQL Server connection that is causing blocking or other undesirable behavior. However, after the two phase commit protocol begins and SQL Server has indicated that it has the transaction in a prepared state, it will no longer respond to the KILL statement.

CAUSE

According to the two-phase commit protocol, any resource manager (that is, SQL Server) that has successfully responded to the transaction coordinator that it is in a prepared state must guarantee that it can successfully commit or roll back the transaction after the transaction coordinator has determined the transaction outcome. To do this, SQL Server will hold all locks acquired during the transaction and will not respond to the Transact-SQL KILL statement.

If DTC unexpectedly fails, or fails to notify SQL Server regarding the transaction outcome, the resources used by the transaction will be tied up until manual intervention occurs.

STATUS

Microsoft has confirmed this to be a problem in SQL Server 6.5. This problem has been corrected in U.S. Service Pack 5 for Microsoft SQL Server 6.5. For information about how to download and install the latest SQL Server Service Pack, see the following Microsoft Web site: For more information, contact your primary support provider.

MORE INFORMATION

There are no automated methods to restart DTC if it fails, so you must manually restart the service. When DTC is restarted and is recovering its transaction log, it will detect that a transaction was pending and some resource managers may not have been notified. It will then notify the enlisting servers of the transaction outcome.

After SQL Server has responded that it is prepared, it will wait on notification from DTC regarding the transaction outcome. If it has not received any notification within 60 seconds, it will print a message in the error log similar to the following:
98/10/20 09:15:43.76 spid11 Awaiting DTC message.
UOW: e2a10172-bda9-11d1-bf4c-00600893905d State: PREPARED
The UOW value (a Unit Of Work ID) is a unique identifier for the transaction. This message will repeat every 60 seconds until notification has been received from DTC. If you see this message, open the DTC Administrative Console, or the DTC service in SQL Enterprise Manager. On the Transaction tab, you will see a list of DTC transactions and their current state, and you should see a transaction with a Unit of Work ID value that matches the one in the error log message. From this console, you can select the transaction, right-click, and force the transaction to commit or abort. If using the DTC Administrative Console does not alleviate the condition, and stopping and restarting the DTC service doesn't either, the only way to remedy the situation is to stop and restart SQL Server. During recovery, SQL Server will interrogate DTC to determine the transaction outcome.

There may be rare circumstances where, even after restarting SQL Server, the transaction outcome cannot be determined. This may occur if there is a permanent network failure between the SQL Server and the commit coordinator, or it may occur in situations where a hardware failure resulted in the loss of the DTC log file. In such circumstances, SQL Server will log an error 3437 and stop recovery of the database:
Msg 3437: Error recovering database 'pubs' - could not connect to
the DTC to check completion status of xact: Rid pageid is
0x132f and row num is 0x11
Beginning in SQL Server 6.5 Service Pack 5, SQL Server has two enhancements that may be useful in resolving situations such as this. First, a new sysprocesses waittype value, 0x0411, has been added. This value can be used to confirm that SQL Server is waiting on DTC. Second, if the above methods have failed to resolve the problem, the Transact-SQL KILL statement has been enhanced to allow an optional WITH COMMIT or WITH ABORT clause, as shown in the following syntax:
KILL spid [WITH {COMMIT | ABORT}]
				
A warning will be written to the error log, indicating the UOW ID and the forced transaction outcome. Take caution when using this new KILL statement syntax, because it may result in inconsistencies among the various servers enlisted in the transaction. This syntax should only be used when the other methods have proved unable to resolve the issue. If the KILL ... WITH COMMIT/ABORT syntax is used and the target spid is not waiting on DTC, error 6108 will be raised.

If you find that this issue recurs, open a support incident to pursue the underlying issues with SQL Server or DTC that are causing this situation.

Modification Type:MajorLast Reviewed:7/19/2006
Keywords:kbBug kbfix KB195542