New concurrency and scheduling diagnostics have been added to SQL Server (319892)
The information in this article applies to:
- Microsoft SQL Server 2000 (all editions) SP3
- Microsoft SQL Server 7.0 Service Pack 4
This article was previously published under Q319892 BUG #: 102179 (SQLBUG_70) BUG #: 356317 (SQLBUG_80)
SUMMARY To remain highly dynamic, SQL Server contains several
internal processes to ensure stability. One example is the lock monitor that
you can use to identify and resolve deadlock situations. SQL Server
7.0 Service Pack 4 (SP4) and SQL Server 2000 Service Pack 3 (SP3) have added
enhanced process health monitoring. Health monitoring enhancements have taken
place in the following areas: - Blocking
- Network problems
- Input/Output (IO)
- Memory
- CPU
When SQL Server detects health problems, a series of new
error messages, such as the following, are logged in the SQL Server error log.
For details about these error message conditions, see the "More Information"
section of this article. SQL Server 7.0Error 1223: Process ID %d:%d cannot acquire
lock "%s" on resource %s because a potential deadlock exists on Scheduler %d
for the resource. Process ID %d:%d holds a lock "%h" on this resource.
SQL Server 2000Error 1229: Process ID %d:%d owns resources
that are blocking processes on scheduler %d. The new error messages
include the following series. Extended Lock Detection: 1223, 1229 Scheduler Hung Detection: 17881, 17883 All Schedulers Hung
Detection: 17882, 17884 Warning Note that a health related problem is often the result of a
condition that was experienced previously. You must study the SQL Server error
log and the system event logs carefully to determine the actual root cause.
For example, a 17883 error message may indicate a scheduler problem.
However, the error log may show a previous exception that incorrectly left the
SQL Server process in a poor state, or the application might have caused a
severe blocking condition. Note Microsoft tries to keep all content up-to-date with the latest
17883 conditions. However, the 17883 error message is a health detection
message that can be triggered for many reasons. Microsoft has not only
corrected known issues with the SQL Server software product but has also
encountered the 17883 error in a variety of situations that are unrelated to
the SQL Server software. For example, the error has occurred with external
application CPU consumption and hardware failures. You must determine the root
cause of the 17883 error message if you want to avoid an unwanted reoccurrence
of the error. MORE INFORMATIONThis section outlines the health enhancements and associated
error messages that can be logged to the SQL Server error log.UMSTo better understand some of the additional health
diagnostics, you must first understand how SQL Server uses a User Mode
Scheduling (UMS) Ums.dll helper file.
Both SQL Server 7.0 and
Microsoft SQL Server 2000 use logical schedulers. These schedulers help to make
sure that SQL Server maximizes operating system resource usage in relation to
key database action paths. The UMS layer makes sure that SQL Server correctly
uses Win32 events to strictly control thread and fiber (or both) scheduling
visibility to the operating system. By strictly controlling the threads or
fibers that can run, SQL Server can maximize CPU usage as it relates to
database primitives such as locking.
For example, logical scheduling
permits lock waiters to sleep (WaitForSingleObject on a Win32 event) until the
lock owner releases the lock and signals (SetEvent) them to wake up.Trace flagsSQL Server contains a trace flag to disable this health
reporting. To disable the reporting behavior, use one of the following methods: - Establish a startup parameter (-T###)
- Use DBCC traceon(###)
SQL Server 7.0: -T1216 SQL Server 2000: -T1261 Note This is not a typographical error. -T1216 on SQL Server 2000 is
already used as the trace flag associated with deadlock output. Therefore, 1261
is used instead. Sample ScenarioClient 1 connects to SQL Server. Client 1 runs
a Transact-SQL command that starts a transaction and performs data
modification. For example: begin tran
update authors set au_lname = 'test' Client 1 becomes IDLE, shows up as sleeping, and awaiting a command with
an open transaction in the sysprocesses system table. Clients 2 through 255: Approximately 254
more clients log on to SQL Server and issue a SELECT from the authors table.
These clients will all become blocked on the original update. Client 1
tries to commit the transaction but it becomes queued because all the worker
threads are tied up by clients 2 through 255. BlockingThis error message commonly indicates an extended blocking
situation. Each time the lock monitor runs (approximately every 5 seconds),
messages can be added to the SQL Server error log. Note A message is logged for each SPID/ECID that is experiencing the
resource problem. Therefore, several messages can be logged during the same
lock monitor iteration. SQL Server does not automatically resolve
this situation. However, it will indicate the problem as an error message (1223
or 1229) accordingly. When this problem occurs, you can resolve it in a number
of ways. Lock or query timeoutIf the queries use a lock or a query timeout, the situation
will commonly resolve itself as the timeouts occur. However, the situation
warrants investigation because it indicates an application induced reduction in
concurrency. Transact-SQL KILLIf the administrator is able to query the sysprocesses system table, they can use the Transact-SQL KILL command to
terminate the BLOCKING SPID and to terminate the appropriate BLOCKED SPIDS to
free worker threads and return the system to a normal state.
251004 How to monitor SQL Server 7.0 blocking
271509 How to monitor SQL Server 2000 blocking
263889 SQL blocking due to [[COMPILE]] locks
Support assistanceIf you cannot obtain the sysprocesses system table information, then get a process dump of the
(Sqlservr.exe) process and contact Microsoft SQL Server support for additional
investigation. Parallel queriesIn rare circumstances, this error message can occur because
of a poor parallel query plan selection. If the parallel query elects to use a
significant number of available SQL Server workers to complete the query, it
can exhaust the SQL Server worker pool. The sysprocesses system table contains an ECID column to indicate the number of
workers that are being used on behalf of the individual SPID. If the ECID value
is high in relationship to the physical CPUs on the computer, it is generally
an indication of a poorly tuned query. Review the query plan and the max
degree of parallelism (MAXDOP) query option setting to correctly tune the query in
question. Schedulers matterThe number of logical schedulers matters. When SQL Server
starts, the max worker thread setting is divided equally among the logical schedulers. As the
number of CPUs available to the SQL Server increases the worker queue is
divided more. An application that exhibits undesirable transaction scope
activity can manifest resource shortage scenarios faster when more CPUs are
involved. For this type of scenario, the applications transaction scope is
immediately corrected. The following table shows the worker pool
assignments based on the number of CPUs if the sp_configure stored procedure setting for max
worker threads is equal to 255. CPUs blocking | Chain length | 1 | 255 | 2 | 128 | 4 | 64 | 8 | 32 | 16 | 16 |
We recommend that you keep the max worker thread setting at the default of 255.
For more information, click the following article number to view the article in the Microsoft Knowledge Base:
319942
How to determine proper SQL Server configuration settings
UMS healthA new internal routine has been added to check the health of
the logical scheduler every 60 seconds. If the scheduler is deemed as stalled,
or it has stopped responding, the appropriate error message is logged to the
SQL Server error log. An error is logged every 60 seconds until the issue is
resolved. As mentioned earlier, these messages are often an
indication of a previous event. Carefully consult the SQL Server error log and
application event logs to help you determine the root cause of the
problem. Note A snapshot is taken every 60 seconds. Therefore, it can be 120
seconds before the condition is first detected.
Modification Type: | Major | Last Reviewed: | 5/18/2005 |
---|
Keywords: | kbSample kberrmsg kbBug kbfix kbinfo KB319892 kbAudDeveloper |
---|
|