Goal
To identify, to troubleshoot,
and to recommend a solution for resolving a deadlock.
Introduction
This article examines a deadlock situation and provides steps for resolving the deadlock. Each deadlock may be different and can be caused by several different environment variables. The information provided in this article can help you identify and resolve a deadlock.Case study
In a case study, we examine a 911 system that has six operators. During peak activity, the Microsoft Visual Basic front-end application they are using experiences broken connections. Because of the broken connections, the operators must re-input data. For a 911 system that operates 24 hours a day, seven days a week, this behavior is unacceptable.What is a deadlock?
A deadlock occurs when two system server process IDs
(SPIDs) are waiting for a resource and neither process can advance because the
other process is preventing it from getting the resource.
The lock
manager's thread checks for deadlocks. When a lock manager's deadlock detection
algorithm detects a deadlock, the lock manager chooses one of the SPIDs as a
victim. The lock manager initiates a 1205 error message that is sent to the
client, and the lock manager kills the SPID. Killing the SPID frees the
resources and allows the other SPID to continue. Killing the SPID that is the
deadlock victim is what causes the broken connection that the Visual Basic
front-end application experiences.
In a well designed application, the front-end application should trap for the 1205 error,
reconnect to SQL Server, and then re-submit the transaction.
Although
deadlocks can be minimized, they cannot be completely avoided. That is why the
front-end application should be designed to handle deadlocks.How to identify a deadlock
Step
1To identify a deadlock, you must first obtain log
information. If you suspect a deadlock, you must gather information about the
(SPIDs) and the resources that are involved in the deadlock. To do this, add
the -T1204 and the -T3605 startup parameters to SQL Server. To add these two
startup parameters, follow these steps:
- Start SQL Server Enterprise Manager.
- Select, and then right-click the server.
- Click Properties.
- Click Startup Parameters.
- In the Startup Parameters dialog box,
type -T1204 in the Parameters text box,
and then click Add.
- In the Parameters text box, type
-T3605, and then click Add.
- Click OK.
The startup parameters will take effect when SQL Server
is stopped and then re-started.
The -T1204 startup parameter collects
information about the process and the resources when the deadlock detection
algorithm encounters a deadlock. The -T3605 startup parameter writes this
information to the SQL Server error logs.
The -T1205 startup
parameter collects information every time that
the deadlock algorithm checks for a deadlock, not when
a deadlock is encountered. You do not have to use the -T1205 startup parameter.
If you do use the -T1205 startup parameter, the following is a sample
of the output that will be in the SQL Server error log:
2003-05-14 11:46:26.76 spid4 Starting deadlock search 1
2003-05-14 11:46:26.76 spid4 Target Resource Owner:
2003-05-14 11:46:26.76 spid4 ResType:LockOwner Stype:'OR' Mode: S SPID:55 ECID:0 Ec:(0x43CAB580) Value:0x42bdf340
2003-05-14 11:46:26.76 spid4 Node:1 ResType:LockOwner Stype:'OR' Mode: S SPID:55 ECID:0 Ec:(0x43CAB580) Value:0x42bdf340
2003-05-14 11:46:26.76 spid4
2003-05-14 11:46:26.76 spid4 End deadlock search 1 ... a deadlock was not found.
2003-05-14 11:46:26.76 spid4 ----------------------------------
2003-05-14 11:46:31.76 spid4 ----------------------------------
2003-05-14 11:46:31.76 spid4 Starting deadlock search 2
Sometimes, you might not be able to stop and re-start SQL Server. In that case, you
can use Query Analyzer to run the following command to enable the deadlock
trace flags.
Note This way you can gather information about the deadlocks
immediately. The "-1" indicates all SPIDs.
dbcc traceon (1204, 3605, -1)
go
dbcc tracestatus(-1)
go
Step
2Next,
you must collect a SQL Profiler trace. If
you turn on the deadlock trace flag, you will get most of the required
information, but not always. For example, in a
case study the trace flag output identified
that a
sp_cursoropen system stored procedure and an "UPDATE tblQueuedEvents set
notifyid = 3, ResynchDate" statement were involved in a
deadlock. Unfortunately, you do not know the
definition of the
sp_cursoropen system stored procedure.
You also
do not have the complete UPDATE statement because it was
truncated.
SQL Profiler can obtain the full statements in addition to
the execution plans of the statements. A SQL Profiler trace also has a lock
event for "deadlock" and for "deadlock chain." "Deadlock" corresponds to the
-T1204 flag, and "deadlock chain" corresponds to the -T1205 flag. Turning on
the deadlock trace flags and running a SQL Profiler trace during the occurrence
of a deadlock should provide you the data that you must have to troubleshoot a
deadlock. In this case, and in others, running SQL Profiler changes the timing
of execution enough to prevent the deadlock. Therefore, you will typically
capture the deadlock information with the trace flags, and then you run SQL
Profiler.
Troubleshooting a deadlock
After a deadlock occurs, you
can gather information about the deadlock by using the
sqldiag utility and by using SQL Profiler. In the output of the
SQLDiag.txt file, look for a "Wait-for-graph" entry. A "Wait-for graph" entry
indicates that a deadlock was encountered.
The following is a sample
of the output that you might see in the SQL Server error log when you use the
-T1205 startup parameter.
2003-05-05 15:11:50.80 spid4 Wait-for graph
2003-05-05 15:11:50.80 spid4 Node:1
2003-05-05 15:11:50.80 spid4 ResType:LockOwner Stype:'OR' Mode: S SPID:55 ECID:0 Ec:(0x33AE1538) Value:0x193
2003-05-05 15:11:50.80 spid4 Victim Resource Owner:
2003-05-05 15:11:50.80 spid4 ResType:LockOwner Stype:'OR' Mode: X SPID:60 ECID:0 Ec:(0x1F1BB5B0) Value:0x193
2003-05-05 15:11:50.80 spid4 Requested By:
2003-05-05 15:11:50.80 spid4 Input Buf: RPC Event: sp_cursoropen;1
2003-05-05 15:11:50.80 spid4 SPID: 55 ECID: 0 Statement Type: EXECUTE Line #: 1
2003-05-05 15:11:50.80 spid4 Owner:0x1937f2a0 Mode: S Flg:0x0 Ref:1 Life:00000000 SPID:55 ECID:0
2003-05-05 15:11:50.80 spid4 Grant List 0::
2003-05-05 15:11:50.80 spid4 KEY: 8:1653632984:2 (da00ce043a9e) CleanCnt:1 Mode: U Fl ags: 0x0
2003-05-05 15:11:50.80 spid4 Node:2
2003-05-05 15:11:50.80 spid4 ResType:LockOwner Stype:'OR' Mode: S SPID:55 ECID:0 Ec:(0x33AE1538) Value:0x193
2003-05-05 15:11:50.80 spid4 Requested By:
2003-05-05 15:11:50.80 spid4 Input Buf: Language Event: Update tblQueuedEvents Set NotifyID = 2, ResynchDate
2003-05-05 15:11:50.80 spid4 SPID: 60 ECID: 0 Statement Type: UPDATE Line #: 1
2003-05-05 15:11:50.80 spid4 Owner:0x1936e420 Mode: X Flg:0x0 Ref:0 Life:02000000 SPID:60 ECID:0
2003-05-05 15:11:50.80 spid4 Grant List 0::
2003-05-05 15:11:50.80 spid4 KEY: 8:1653632984:1 (2d018af70d80) CleanCnt:1 Mode: X Flags: 0x0
In the "Wait-for-graph" entry, you have Node 1 and Node
2. In each node, you have a grant section and a request section. The grant
section is the "Grant List", and the request section is the "Request
By."
In each node, you can identify the following:
- The SPID.
- The command the SPID was executing.
- The resource.
- The lock mode on the resource.
For example, in Node 1, the Grant List, SPID 55 had
been granted an update lock, Mode: U, on resource KEY: 8:1653632984:2. 8=DBID,
1653632984=ObjectID, and 2=Indid. To obtain the database identification number,
run the
sp_helpdb stored procedure. To obtain the table, run the following code:
select * from sysobjects where id = 1653632984
To obtain the index, run the following code:
select * from sysindexes where indid = 2 and id = 1653632984
If IndexId is equal to 2, you know the index is a nonclustered
index. The command that SPID 55 was executing was the
sp_cursoropen stored procedure.
In Node 2, the Grant List, SPID 60 has
been granted an exclusive lock, Mode: X, on resource KEY: 8:1653632984:1.
8=DBID, 1653632984=ObjectID, 1=Indid. This is on the same table but index 1 is
the clustered index. The command that SPID 60 was executing was:
Update tblQueuedEvents Set NotifyID = 2, ResynchDate
An IndexId that is equal to 1 is a clustered index.
An
IndexId that is equal to 2 is a nonclustered index.
Note Deadlocks are very time sensitive.
Next, in Node 1,
Request By, SPID 55 requested a shared lock, Mode: S, on IndexId=1. In Node 2,
Request By, SPID 60 requested an exclusive lock, Mode: X, on IndexId=2. Because
these lock requests occur at the same time, the deadlock occurs. Each SPID's
granted locks are preventing the requested locks from continuing.
The
following table shows the lock compatibility chart. For more information about
lock compatibility, see the "Lock Compatibility" topic in SQL Server 2000 Books
Online.
Lock compatibility chartRequested
mode | IS | | S | | U | | IX | | SIX | | X |
Intent shared
(IS) | Yes | | Yes | | Yes | | Yes | | Yes | | No |
Shared
(S) | Yes | | Yes | | Yes | | No | | No | | No |
Update (U)
| Yes | | Yes | | No | | No | | No | | No |
Intent exclusive
(IX) | Yes | | No | | No | | Yes | | No | | No |
Shared with intent exclusive (SIX)
| Yes | | No | | No | | No | | No | | No |
Exclusive
(X) | No | | No | | No | | No | | No | | No |
Next, by looking at the output, you identify
ObjectId 1653632984 as the
tblQueuedEvents table, and you obtain a
sp_help stored procedure output for the table. There were two indexes on
the table. The two indexes were
ix_tblQueuedEvents and
PK_tblQueuedEvent.
ix_tblQueuedEvents is a clustered index on ResynchDate, and
PK_tblQueuedEvent is a primary key, unique nonclustered index on EventSID.
The SQL Profiler trace was not able to capture the deadlock
occurrence. Remember, deadlocks are very time dependent. The overhead of SQL
Profiler probably added some time to the execution of one of the processes and
that prevented SQL Profiler from getting in a deadlock situation. However, it
did provide information that
you can use to troubleshoot the issue. You found the
full update
tblQueuedEvents statement to be similar to the following:
Update tblQueuedEvents Set NotifyID = 2, ResynchDate = '5/7/2003 10:44:16' where eventSID = 73023
You also found the execution plan. You still do not have the full
sp_cursoropen stored procedure statement, but you do have enough information to
recommend a solution that will resolve the deadlock.
Here is the
execution plan.
Note This particular execution plan is read right to left
and bottom to top.
StmtText
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Update tblQueuedEvents Set NotifyID = 2, ResynchDate = '5/7/2003 10:44:16'
where eventSID = 73023
|--Clustered Index
Update(OBJECT:([SOTS].[dbo].[tblQueuedEvents].[ix_tblQueuedEvents ]),
SET:([tblQueuedEvents].[NotifyID]=[@1],
[tblQueuedEvents].[ResynchDate]=[Expr1004]))
|--Top(1)
|--Compute Scalar(DEFINE:([Expr1004]=Convert([@2])))
|--Index
Seek(OBJECT:([SOTS].[dbo].[tblQueuedEvents].[PK_tblQueuedEvents]),
SEEK:([tblQueuedEvents].[EventSID]=[@3])