INF: Understanding and Resolving SQL Server 7.0 or 2000 Blocking Problems (224453)
The information in this article applies to:
- Microsoft SQL Server 7.0
- Microsoft SQL Server 2000 (all editions)
This article was previously published under Q224453 SUMMARY This article is an update for SQL Server 7.0
of the following article, which applies to SQL Server 6.x, in the Microsoft
Knowledge Base:
162361 INF: Understanding and Resolving SQL Server Blocking Problems
Much of the information contained in the above
article has already been updated and included in SQL Server 7.0 Books Online in
the "Understanding and Avoiding Blocking" topic. Carefully review this
information before proceeding with this article; it will not be repeated here.
This article focuses on how to monitor SQL Server to capture pertinent system
information and how to analyze that information to successfully resolve
blocking issues. This article uses the same standard terminology as
defined in the above information. In this discussion, the term "connection"
refers to a single logged-on session of the database. Each connection appears
as a system process ID (SPID). Each of these SPIDs is often referred to as a
process, although it is not a separate process context in the usual sense.
Rather, each SPID consists of the server resources and data structures
necessary to service the requests of a single connection from a given client. A
single client application may have one or more connections. From the
perspective of SQL Server, there is no difference between multiple connections
from a single client application on a single client computer and multiple
connections from multiple client applications or multiple client computers. One
connection can block another connection, regardless of whether they emanate
from the same application or separate applications on two different client
computers. MORE INFORMATION Blocking is an unavoidable characteristic of any relational
database management system (RDBMS) with lock-based concurrency. On SQL Server,
blocking occurs when one SPID holds a lock on a specific resource and a second
SPID attempts to acquire a conflicting lock type on the same resource.
Typically, the time frame for which the first SPID locks the resource is very
small. When it releases the lock, the second connection is free to acquire its
own lock on the resource and continue processing. This is normal behavior and
may happen many times throughout the course of a day with no noticeable effect
on system performance. The duration and transaction context of a
query determine how long its locks are held and, thereby, their impact on other
queries. If the query is not executed within a transaction (and no lock hints
are used), the locks for SELECT statements will only be held on a resource at
the time it is actually being read, not for the duration of the query. For
INSERT, UPDATE, and DELETE statements, the locks are held for the duration of
the query, both for data consistency and to allow the query to be rolled back
if necessary. For queries executed within a transaction, the duration
for which the locks are held are determined by the type of query, the
transaction isolation level, and whether or not lock hints are used in the
query. For a description of locking, lock hints, and transaction isolation
levels, see the following topics in SQL Server 7.0 Books Online:
- "Understanding Locking in SQL Server"
- "Locking Architecture"
- "Lock Compatibility"
- "Locking Hints"
- "Changing Default Locking Behavior in Oracle and SQL
Server"
When locking and blocking increase to the point where there is
a detrimental effect on system performance, it is usually due to one of the
following reasons: - A SPID holds locks on a set of resources for an extended
period of time before releasing them. This type of blocking resolves itself
over time, but can cause performance degradation.
- A SPID holds locks on a set of resources and never releases
them. This type of blocking does not resolve itself and prevents access to the
affected resources indefinitely.
In the first scenario above, the blocking problem resolves
itself over time as the SPID releases the locks. However, the situation can be
very fluid as different SPIDs cause blocking on different resources over time,
creating a moving target. For this reason, these situations can be difficult to
troubleshoot using SQL Server Enterprise Manager or individual SQL queries. The
second situation results in a consistent state that can be easier to diagnose. Gathering Blocking Information To counteract the difficulty of troubleshooting blocking
problems, a database administrator can use SQL scripts that constantly monitor
the state of locking and blocking on SQL Server. These scripts can provide
snapshots of specific instances over time, leading to an overall picture of the
problem. For a description of how to monitor blocking with SQL scripts, see the
following articles in the Microsoft Knowledge Base:
251004 INF: How to Monitor SQL Server 7.0 Blocking
271509 INF: How to Monitor SQL Server 2000 Blocking
The scripts in this article will perform the tasks
below. Where possible, the method for obtaining this information from
Enterprise Manager or a specific SQL query is given.
- Identify the SPID at the head of the blocking chain.
In addition to using the scripts in the above article, you
can also identify the head of the blocking chain by using SQL Enterprise
Manager as follows:
- Expand the server group; then expand the
server.
- Expand Management; then expand Current Activity.
- Expand Locks / Process ID. The SPIDs, along with their blocking information, are displayed
in the details pane. SPIDs that are blocking others will appear as
"(Blocking)."
Note however, that it is sometimes necessary to use queries
instead of Enterprise Manager, because some types of tempdb blocking problems
may prevent you from running queries that use temporary table operations. Using
direct queries gives you the control necessary to avoid this problem.
- Find the query that the blocking SPID is running.
The script method uses the following query to determine the
command issued by a particular SPID:
DBCC INPUTBUFFER (<spid>)
Alternately, you can use SQL Enterprise Manager as follows:
- Expand the server group; then expand the
server.
- Expand Management; then expand Current Activity.
- Click Process Info. The SPIDs are displayed in the details pane.
- Double-click the blocking SPID to see the last
Transact-SQL command batch the SPID executed.
- Find the type of locks the blocking SPID is holding.
You can determine this information by executing the sp_lock system stored procedure. Alternatively, you can use Enterprise
Manager as follows:
- Expand the server group; then expand the
server.
- Expand Management; then expand Current Activity.
- Expand Locks / Process ID. The SPIDs, along with the information on the locks they are
holding, are displayed in the details pane.
- Find the transaction nesting level and process status of the blocking SPID.
The transaction nesting level of a SPID is available in the
@@TRANCOUNT global variable. However, it can be determined from outside the
SPID by querying the sysprocesses table as follows:
SELECT open_tran FROM SYSPROCESSES WHERE SPID=<blocking SPID number>
go
The value returned is the @@TRANCOUNT value for the SPID. This shows
the transaction nesting level for the blocking SPID, which in turn can explain
why it is holding locks. For example, if the value is greater than zero, the
SPID is in the midst of a transaction (in which case it is expected that it
retains certain locks it has acquired, depending on the transaction isolation
level).
You can also check to see if any long-term open transaction
exists in the database by using DBCC OPENTRAN
database_name.
Gathering SQL Server Profiler Trace Information In addition to the above information, it is often necessary to
capture a Profiler trace of the activities on the server to thoroughly
investigate a blocking problem on SQL Server. If a SPID executes multiple
statements within a transaction, only the last statement will appear in the
DBCC INPUTBUFFER output. However, one of the earlier commands may be the reason
locks are still being held. A Profiler trace will enable you to see all of the
commands executed by a SPID within the current transaction. The following steps
help you to set up SQL Server Profiler to capture a trace.
- Open SQL Server Profiler.
- On the Tools menu, click Options.
- Ensure that the All Event Classes and All Data Columns options are selected.
- Click OK.
- On the File menu, point to New then click Trace.
- On the General tab, specify a trace name and a file to capture the data
to.
- On the Events tab, add the following event types to your trace:
|
Error and Warning | Exception | This event
indicates that an exception has occurred. Exceptions with severity less than 25
indicate that an error was returned to the client from SQL Server. Exceptions
of severity 25 are internal SQL Server exceptions and should be filtered out as
described below. | Misc. | Attention | This event indicates that
an attention signal has been raised. Normal causes of an attention signal are a
client cancel or query timeout. | Sessions | Connect | This event indicates
that a new connection has been made. | Sessions | Disconnect | This event indicates
that a client has disconnected. | Sessions | Existing Connection | This event
indicates that a connection existed at the time when the SQL Profiler trace was
started. | TSQL | RPC:Starting | This event indicates
that execution of a remote procedure call (RPC) has started. | TSQL | SQL:BatchStarting | This event
indicates that execution of a Transact-SQL batch has started. | Stored Procedures | SP: StmtStarting | This
event indicates when a statement within a stored procedure is starting
execution. The stored procedure name is indicated at the start of the Text for the event. |
Additionally, you may include the following events
for further information. If you are running in a high-volume production
environment, you may decide to use only the above events, as they are
sufficient to troubleshoot blocking problems. Including the additional events
below may make it easier to quickly determine the source of a problem, but will
also add to the load on the system and increase the trace output size.
|
Misc. | Execution Plan | This event shows the
plan tree of the Transact-SQL statement that was executed. | Transactions | DTCTransaction | This event
tracks Microsoft Distributed Transaction Coordinator (MS DTC) transactions
between two or more databases or servers. | Transactions | SQLTransaction | This event
tracks SQL BEGIN, SAVE, COMMIT, and ROLLBACK TRANSACTION statements. | TSQL | RPC:Completed | This event indicates
that execution of a remote procedure call (RPC) has completed. | TSQL | SQL:BatchCompleted | This event
indicates that execution of a Transact-SQL batch has completed. | Stored Procedures | SP: StmtCompleted | This
event indicates that a statement within a stored procedure has completed
execution. |
- On the Data Columns tab, ensure that the following columns are included: Start Time,
End Time, Connection ID, SPID, Event Class, Text, Integer Data, Binary Data,
Application Name, NT User Name, and SQL User Name. If you included the
additional events from the second table above, also include the following data
columns as well: Duration, CPU, Reads, and Writes.
- On the Filters tab, exclude SQL Server internal exceptions. In the Trace Event Criteria box, select Severity and type 24 in the Maximum box. Then click OK.
For more information on monitoring errors sent to
clients from SQL Server, see the following article in the Microsoft Knowledge
Base:
199037 INF: Trapping Error Messages Sent to Clients from a SQL Server
For information about using the Profiler, please see SQL Server
Books Online.
Identifying and Resolving Common Blocking Scenarios By examining the above information, you can determine the cause
of most blocking problems. The rest of this article is a discussion of how to
use this information to identify and resolve some common blocking scenarios.
This discussion assumes you have used the blocking scripts in article Q251004
(referenced earlier) to capture information on the blocking SPIDs and have made
a Profiler trace with the events described above. Viewing the Blocking Script Output- Examine the sysprocesses output to determine the heads of the blocking chains.
If you did not specify fast mode for the blocking scripts,
there will be a section titled "SPIDs at the head of blocking chains" that
lists the SPIDs blocking others in the script output:
SPIDs at the head of blocking chains
spid
------
9
10
If you specified the fast option, you can still determine the
blocking heads by looking at the sysprocesses output. The following is an abbreviated sysprocesses output:
spid status blocked
9 sleeping 0
10 sleeping 0
11 sleeping 13
12 sleeping 10
13 sleeping 9
14 sleeping 12
In this case, you can see that SPIDs 9 and 10 both have 0 in the blocked column, meaning that they are not being blocked, yet they both
appear in the blocked column for other SPIDs. This indicates that SPIDs 9 and 10 are
each at the head of separate blocking chains. - Examine the sysprocesses output for information on the SPIDs at the head of the blocking chain.
It is important to evaluate the following sysprocesses fields:
- Status
This column gives a quick look at the
status of a particular SPID. Typically, a sleeping status indicates that the SPID has completed execution and is
waiting for the application to submit another query or batch. A runnable status indicates the SPID is currently processing a query. The
following table gives brief explanations for the various status
values.
|
Background | SPID is performing a background
task. | Sleeping | SPID is not currently executing.
This usually indicates that the SPID is awaiting a command from the
application. | Runnable | SPID is currently executing. | Dormant | Same as Sleeping, except Dormant also
indicates that the SPID has been reset after completing an RPC event. The reset
cleans up resources used during the RPC event. This is a normal state and the
SPID is available and waiting to execute further commands. | Rollback | The SPID is in rollback of a
transaction. | Defwakeup | Indicates that a SPID is waiting on
a resource that is in the process of being freed. The waitresource field should indicate the resource in question. | Spinloop | Process is waiting while attempting
to acquire a spinlock used for concurrency control on SMP systems. |
- Open_tran
This field tells you the transaction
nesting level of the SPID. If this value is greater than 0, the SPID is within
an open transaction and may be holding locks acquired by any statement within
the transaction. - Lastwaittype, waittype, and waittime
The lastwaittype field tells you the last or current waittype of the SPID. This
field is new in SQL Server 7.0 and is a string representation of the waittype field (which is a reserved internal binary column). If the waittype is 0x0000, then the SPID is not currently waiting on anything and
the lastwaittype value indicates the last waittype the SPID had. If waittype is non-zero, the lastwaittype value indicates the current waittype of the SPID.
For
a brief description of the different lastwaittype and waittype values, please see the following article in the Microsoft
Knowledge base:
244455 INF: Definition of Sysprocesses Waittype and Lastwaittype Fields
The waittime value can be useful to determine if the SPID is making progress.
When a query against the sysprocesses table returns a value in the waittime column that is less than the waittime value from a previous query
of sysprocesses, this indicates that the prior lock was acquired and released and
is now waiting on a new lock (assuming non-zero waittime). This can be verified
by comparing the waitresource between sysprocesses output. - Waitresource
This field indicates the resource
that a SPID is waiting on. The following table lists common waitresource formats and their meaning:
|
Table | DatabaseID:ObjectID | TAB:
5:261575970 In this case, database ID 5 is the pubs sample database and object ID 261575970 is the titles table. | Page | DatabaseID:FileID:PageID | PAG:
5:1:104 In this case, database ID 5 is pubs, file ID 1 is the primary data file, and page 104 is a page
belonging to the titles table. | Key | DatabaseID:ObjectID:IndexID (Hash value
for index key) | KEY: 5:261575970:1 (5d0164fb1eac) In this case,
database ID 5 is pubs, object ID 261575970 is the titles table, index ID 1 is the clustered index, and the hash value
indicates the index key value for the particular row. |
- Other columns
The remaining sysprocesses columns can provide insight into the root of a problem as well.
Their usefulness varies depending on the circumstances of the problem. For
example, you can determine if the problem happens only from certain clients
(hostname), on certain network libraries (net_library), when the last batch
submitted by a SPID was (last_batch), and so on. For a brief description of all
of the sysprocesses columns, please see the "sysprocesses (T-SQL)" topic in SQL
Server 7.0 Books Online.
NOTE: The SUID column is not included in the blocking script output because it
is a derived column that was only included for backwards compatibility. It is
not used internally by SQL Server, and you can cause a performance degradation
by querying it (because it is derived), so it was not included.
- Examine the DBCC INPUTBUFFER output.
For any SPID at the head of a blocking chain or with a
non-zero waittype, the blocking script will execute DBCC INPUTBUFFER to
determine the current query for that SPID:
DBCC INPUTBUFFER FOR SPID 9
EventType Parameters EventInfo
-------------- ---------- --------------------------------------------
Language Event 0 update titles set title = title
In many cases, this is the query that is causing the locks that
are blocking other users to be held. However, if the SPID is within a
transaction, the locks may have been acquired by a previously executed query,
not the current one. Therefore, you should also view the Profiler output for
the SPID, not just the inputbuffer.
NOTE: Because the blocking script consists of multiple steps, it is
possible that a SPID may appear in the first section as the head of a blocking
chain, but by the time the DBCC INPUTBUFFER query is executed, it is no longer
blocking and the INPUTBUFFER is not captured. This indicates that the blocking
is resolving itself for that SPID and it may or may not be a problem. At this
point, you can either use the fast version of the blocking script to try to
ensure you capture the inputbuffer before it clears (although there is still no
guarantee), or view the Profiler data from that time frame to determine what
queries the SPID was executing.
Row | DatabaseID:FileID:PageID:Slot(row) | RID:
5:1:104:3 In this case, database ID 5 is pubs, file ID 1 is the primary data file, page 104 is a page belonging
to the titles table, and slot 3 indicates the row's position on the page. | Compile | DatabaseID:ObjectID | TAB: 5:834102012
[[COMPILE]] |
In this case, database ID 5 is pubs, but the object ID 834102012 is a stored procedure. This
indicates that the SPID is waiting to compile a plan for the stored procedure.
Viewing the Profiler Data Viewing Profiler data efficiently is extremely valuable in
resolving blocking issues. The most important thing to realize is that you do
not have to look at everything you captured; be selective. Profiler provides
capabilities to help you effectively view the captured data. In the Properties dialog box (on the File menu, click Properties), Profiler allows you to limit the data displayed by removing
data columns or events, grouping (sorting) by data columns and applying
filters. You can search the whole trace or only a specific column for specific
values (on the Edit menu, click Find). You can also save the Profiler data to a SQL Server table (on
the File menu, point to Save As and then click Table) and run SQL queries against it. Be careful that you perform filtering only on a previously saved trace file. If you perform these steps on an active trace, you
risk losing data that has been captured since the trace was started. Save an
active trace to a file or table first (on the File menu, click Save As) and then reopen it (on the File menu, click Open) before proceeding. When working on a saved trace file, the
filtering does not permanently remove the data being filtered out, it just does
not display all the data. You can add and remove events and data columns as
needed to help focus your searches. What to look for:- What commands has the SPID at the head of a blocking chain
executed within the current transaction?
Filter the trace data for a
particular SPID that is at the head of a blocking chain (on the File menu, click Properties; then on the Filters tab specify the SPID value). You can then examine the commands it
has executed prior to the time it was blocking other SPIDs. If you include the
Transaction events, they can easily identify when a transaction was started.
Otherwise, you can search the Text column for BEGIN, SAVE, COMMIT, or ROLLBACK TRANSACTION
operations. Use the open_tran value from the sysprocesses table to ensure that you catch all of the transaction events.
Knowing the commands executed and the transaction context will allow you to
determine why a SPID is holding locks.
Remember, you can remove
events and data columns. Instead of looking at both starting and completed
events, choose one. If the blocking SPIDs are not stored procedures, remove the
SP:Starting or SP:Completed events; the SQLBatch and RPC events will show the procedure call. Only view the SP events when
you need to see that level of detail. - What is the duration of the queries for SPIDs at the head
of blocking chains?
If you include the completed events above, the Duration column will show the query execution time. This can help you
identify long-running queries that are causing blocking. To determine why the
query is performing slowly, view the CPU, Read, and Writes columns, as well as the Execution Plan event.
Categorizing Common Blocking Scenarios The table below maps common symptoms to their probable causes.
The number indicated in the Scenario column corresponds to the number in the "Common Blocking
Scenarios and Resolutions" section of this article below. The Waittype, Open_Tran, and Status columns refer to sysprocesses information. The Resolves? column indicates whether or not the blocking will resolve on its
own. |
1 | Non-zero | >=
0 | runnable | Yes, when query finishes. | Physical_IO, CPU
and/or Memusage columns will increase over time. Duration for the query will be
high when completed. | 2 | 0x0000 | >0 | sleeping | No,
but SPID can be killed. | An attention signal may be seen in the Profiler
trace for this SPID, indicating a query timeout or cancel has occurred. | 3 | 0x0000 | >= 0 | runnable | No.
Will not resolve until client fetches all rows or closes connection. SPID can
be killed, but it may take up to 30 seconds. | If open_tran = 0, and the SPID holds locks while the transaction isolation
level is default (READ COMMMITTED), this is a likely cause. | 4 | Varies | >= 0 | runnable | No.
Will not resolve until client cancels queries or closes connections. SPIDs can
be killed, but may take up to 30 seconds. | The hostname column in sysprocesses for the SPID at the head of a blocking chain will be the same as
one of the SPID it is blocking. | 5 | 0x0000 | >0 | rollback | Yes. | An
attention signal may be seen in the Profiler trace for this SPID, indicating a
query timeout or cancel has occurred, or simply a rollback statement has been
issued. | 6 | 0x0000 | >0 | sleeping | Eventually.
When Windows NT determines the session is no longer active, the SQL Server
connection will be broken. | The last_batch value in sysprocesses is much earlier than the current time. |
Common Blocking Scenarios and Resolutions The scenarios listed below will have the characteristics listed
in the table above. This section provides additional details when applicable,
as well as paths to resolution.
- Blocking Caused by a Normally Running Query with a Long Execution Time
Resolution: The solution to this type of blocking problem is to look for
ways to optimize the query. Actually, this class of blocking problem may just
be a performance problem, and require you to pursue it as such. For information
on troubleshooting a specific slow-running query, see the following article in
the Microsoft Knowledge Base:
243589 INF: Troubleshooting Slow-Running Queries on SQL Server 7.0
For overall application performance
troubleshooting, see the following article in the Microsoft Knowledge Base:
224587 HOW TO: Troubleshoot Application Performance with SQL Server
If you have a long-running query that is blocking
other users and cannot be optimized, consider moving it from an OLTP
environment to a decision support system. - Blocking Caused by a Sleeping SPID That Has Lost Track of the Transaction Nesting Level
This type of blocking can often be identified by a SPID
that is sleeping or awaiting a command, yet whose transaction nesting level
(@@TRANCOUNT, open_tran from sysprocesses) is greater than zero. This can occur if the application
experiences a query timeout, or issues a cancel without also issuing the
required number of ROLLBACK and/or COMMIT statements. When a SPID receives a
query timeout or cancel, it will the terminate the current query and batch, but
does not automatically roll back or commit the transaction. The application is
responsible for this, as SQL Server cannot assume that an entire transaction
must be rolled back simply due to a single query being canceled. The query
timeout or cancel will appear as an ATTENTION signal event for the SPID in the
Profiler trace.
To demonstrate this, issue the following simple query
from Query Analyzer:
BEGIN TRAN
SELECT * FROM SYSOBJECTS S1, SYSOBJECTS S2
-- Issue this after canceling query
SELECT @@TRANCOUNT
ROLLBACK TRAN
While the query is executing, click the red Cancel button. After the query is canceled, SELECT @@TRANCOUNT indicates
that the transaction nesting level is one. Had this been a DELETE or an UPDATE
query, or had HOLDLOCK been used on the SELECT, all the locks acquired would
still be held. Even with the query above, if another query had acquired and
held locks earlier in the transaction, they would still be held when the above
SELECT was canceled.
Resolutions:
- Applications must properly manage transaction nesting
levels, or they may cause a blocking problem following the cancellation of the
query in this manner. This can be accomplished in one of several ways:
- In the error handler of the client application,
submit an IF @@TRANCOUNT > 0 ROLLBACK TRAN following any error, even if the
client application does not believe a transaction is open. This is required,
because a stored procedure called during the batch could have started a
transaction without the client application's knowledge. Note that certain
conditions, such as canceling the query, prevent the procedure from executing
past the current statement, so even if the procedure has logic to check IF
@@ERROR <> 0 and abort the transaction, this rollback code will not be
executed in such cases.
- Use SET XACT_ABORT ON for the connection, or in any
stored procedures which begin transactions and are not cleaning up following an
error. In the event of a run-time error, this setting will abort any open
transactions and return control to the client. Note that T-SQL statements
following the statement which caused the error will not be executed.
- If connection pooling is being used in an
application that opens the connection and runs a small number of queries before
releasing the connection back to the pool, such as a Web-based application,
temporarily disabling connection pooling may help alleviate the problem until
the client application is modified to handle the errors appropriately. By
disabling connection pooling, releasing the connection will cause a physical
logout of the SQL Server connection, resulting in the server rolling back any
open transactions.
- If connection pooling is enabled and the
destination server is SQL Server 2000, upgrading the client computer to MDAC
2.6 or later may be beneficial. This version of the MDAC components adds code
to the ODBC driver and OLE DB provider so that the connection would be "reset"
before it is reused. This call to sp_reset_connection aborts any
server-initiated transactions (DTC transactions initiated by the client app are
not affected), resets the default database, SET options, and so forth. Note
that the connection is not reset until it is reused from the connection pool,
so it is possible that a user could open a transaction and then release the
connection to the connection pool, but it might not be reused for several
seconds, during which time the transaction would remain open. If the connection
is not reused, the transaction will be aborted when the connection times out
and is removed from the connection pool. Thus, it is optimal for the client
application to abort transactions in their error handler or use SET XACT_ABORT
ON to avoid this potential delay.
- Actually, this class of blocking problem may also be a
performance problem, and require you to pursue it as such. If the query
execution time can be diminished, the query timeout or cancel would not occur.
It is important that the application be able to handle the timeout or cancel
scenarios should they arise, but you may also benefit from examining the
performance of the query.
For information on troubleshooting a
specific slow-running query, see the following article in the Microsoft
Knowledge Base:
243589 INF: Troubleshooting Slow-Running Queries on SQL Server 7.0
For overall application performance
troubleshooting, see the following article in the Microsoft Knowledge Base:
224587 HOW TO: Troubleshoot Application Performance with SQL Server
If you have a long-running query that is blocking
other users and cannot be optimized, consider moving it from an OLTP
environment to a decision support system.
- Blocking Caused by a SPID Whose Corresponding Client Application Did Not Fetch All Result Rows to Completion
After sending a query to the server, all applications
must immediately fetch all result rows to completion. If an application does
not fetch all result rows, locks can be left on the tables, blocking other
users. If you are using an application that transparently submits SQL
statements to the server, the application must fetch all result rows. If it
does not (and if it cannot be configured to do so), you may be unable to
resolve the blocking problem. To avoid the problem, you can restrict
poorly-behaved applications to a reporting or a decision-support
database.
Resolution:
The application must be re-written to fetch all rows of
the result to completion. - Blocking Caused by a Distributed Client/Server Deadlock
Unlike a conventional deadlock, a distributed deadlock
is not detectable using the RDBMS lock manager. This is due to the fact that
only one of the resources involved in the deadlock is a SQL Server lock. The
other side of the deadlock is at the client application level, over which SQL
Server has no control. The following are two examples of how this can happen,
and possible ways the application can avoid it.
- Client/Server Distributed Deadlock with a Single Client
Thread
If the client has multiple open connections, and a single thread of
execution, the following distributed deadlock may occur. For brevity, the term
"dbproc" used here refers to the client connection structure.
SPID1------blocked on lock------->SPID2
/\ (waiting to write results
| back to client)
| |
| | Server side
| ================================|==================================
| <-- single thread --> | Client side
| \/
dbproc1 <------------------- dbproc2
(waiting to fetch (effectively blocked on dbproc1, awaiting
next row) single thread of execution to run)
In the case shown above, a single client application thread has
two open connections. It asynchronously submits a SQL operation on dbproc1.
This means it does not wait on the call to return before proceeding. The
application then submits another SQL operation on dbproc2, and awaits the
results to start processing the returned data. When data starts coming back
(whichever dbproc first responds -- assume this is dbproc1), it processes to
completion all the data returned on that dbproc. It fetches results from
dbproc1 until SPID1 gets blocked on a lock held by SPID2 (because the two
queries are running asynchronously on the server). At this point, dbproc1 will
wait indefinitely for more data. SPID2 is not blocked on a lock, but tries to
send data to its client, dbproc2. However, dbproc2 is effectively blocked on
dbproc1 at the application layer as the single thread of execution for the
application is in use by dbproc1. This results in a deadlock that SQL Server
cannot detect or resolve because only one of the resources involved is a SQL
Server resource. - Client/Server Distributed Deadlock with a Thread per
Connection
Even if a separate thread exists for each connection on
the client, a variation of this distributed deadlock may still occur as shown
by the following.
SPID1------blocked on lock-------->SPID2
/\ (waiting on net write) Server side
| |
| |
| INSERT |SELECT
| ================================|==================================
| <-- thread per dbproc --> | Client side
| \/
dbproc1 <-----data row------- dbproc2
(waiting on (blocked on dbproc1, waiting for it
insert) to read the row from its buffer)
This case is similar to Example A, except dbproc2 and SPID2 are
running a SELECT statement with the intention of performing row-at-a-time
processing and handing each row through a buffer to dbproc1 for an INSERT,
UPDATE, or DELETE statement on the same table. Eventually, SPID1 (performing
the INSERT, UPDATE, or DELETE) becomes blocked on a lock held by SPID2
(performing the SELECT). SPID2 writes a result row to the client dbproc2.
Dbproc2 then tries to pass the row in a buffer to dbproc1, but finds dbproc1 is
busy (it is blocked waiting on SPID1 to finish the current INSERT, which is
blocked on SPID2). At this point, dbproc2 is blocked at the application layer
by dbproc1 whose SPID (SPID1) is blocked at the database level by SPID2. Again,
this results in a deadlock that SQL Server cannot detect or resolve because
only one of the resources involved is a SQL Server resource. Both examples A and B are fundamental issues that
application developers must be aware of. They must code applications to handle
these cases appropriately.
Resolutions:
Two reliable solutions are to use either a query
timeout or bound connections.
- Query Timeout
When a query timeout has been
provided, if the distributed deadlock occurs, it will be broken when then
timeout happens. See the DB-Library or ODBC documentation for more information
on using a query timeout. - Bound Connections
This feature allows a client
having multiple connections to bind them into a single transaction space, so
the connections do not block each other. For more information, see the "Using
Bound Connections" topic in SQL Server 7.0 Books Online.
- Blocking Caused by a SPID That Is in a "Golden," or Rollback, State
A data modification query that is KILLed, or canceled
outside of a user-defined transaction, will be rolled back. This can also occur
as a side effect of the client computer restarting and its network session
disconnecting. Likewise, a query selected as the deadlock victim will be rolled
back. A data modification query often cannot be rolled back any faster than the
changes were initially applied. For example, if a DELETE, INSERT, or UPDATE
statement had been running for an hour, it could take at least an hour to roll
back. This is expected behavior, because the changes made must be completely
rolled back, or transactional and physical integrity in the database would be
compromised. Because this must happen, SQL Server marks the SPID in a "golden"
or rollback state (which means it cannot be KILLed or selected as a deadlock
victim). This can often be identified by observing the output of sp_who, which may indicate the ROLLBACK command. The Status column of sysprocesses will indicate a ROLLBACK status, which will also appear in sp_who output or the SQL Enterprise Manager Current Activity screens. Resolution:
You must wait for the SPID to finish rolling back the
changes that were made.
If the server is shut down in the midst of
this operation, the database will be in recovery mode upon restarting, and it
will be inaccessible until all open transactions are processed. Startup
recovery takes essentially the same amount of time per transaction as run-time
recovery, and the database is inaccessible during this period. Thus, forcing
the server down to fix a SPID in a rollback state will often be
counterproductive.
To avoid this situation, do not perform large
batch INSERT, UPDATE, or DELETE operations during busy hours on OLTP systems.
If possible, perform such operations during periods of low activity.
- Blocking Caused by an Orphaned Connection
If the client application traps or the client
workstation is restarted, the network session to the server may not be
immediately canceled under some conditions. From the server's perspective, the
client still appears to be present, and any locks acquired may still be
retained. For more information, see the "Orphaned Connections" topic in SQL
Server 7.0 Books Online.
Resolution:
If the client application has disconnected without
appropriately cleaning up its resources, you can terminate the SPID by using
the KILL command. The KILL command takes the SPID value as input. For example,
to kill SPID 9, simply issue the following command:
KILL 9
NOTE: The KILL command may take up to 30 seconds to complete, due to
the interval between checks for the KILL command.
Application Involvement in Blocking Problems There may be a tendency to focus on server-side tuning and
platform issues when facing a blocking problem. However, this does not usually
lead to a resolution, and can absorb time and energy better directed at
examining the client application and the queries it submits. No matter what
level of visibility the application exposes regarding the database calls being
made, a blocking problem nonetheless frequently requires both the inspection of
the exact SQL statements submitted by the application and the application's
exact behavior regarding query cancellation, connection management, fetching
all result rows, and so on. If the development tool does not allow explicit
control over connection management, query cancellation, query timeout, result
fetching, and so on, blocking problems may not be resolvable. This potential
should be closely examined before selecting an application development tool for
SQL Server, especially for business-critical OLTP environments. It is
vital that great care be exercised during the design and construction phase of
the database and application. In particular, the resource consumption,
isolation level, and transaction path length should be evaluated for each
query. Each query and transaction should be as lightweight as possible. Good
connection management discipline must be exercised. If this is not done, it is
possible that the application may appear to have acceptable performance at low
numbers of users, but the performance may degrade significantly as the number
of users scales upward. With proper application and query design,
Microsoft SQL Server is capable of supporting many thousands of simultaneous
users on a single server, with little blocking. Please see the "Application
Design" and "Understanding and Avoiding Blocking" topics in SQL Server 7.0
Books Online for more information. The successful sites that reach these
numbers of users typically use the techniques described in these topics.
REFERENCES For more information, refer to the following book: For more information, refer to the following Microsoft Training
& Certification course:
Modification Type: | Major | Last Reviewed: | 3/22/2004 |
---|
Keywords: | kbinfo KB224453 kbAudDeveloper |
---|
|