MORE INFORMATION
To better understand the conditions for the message, the following brief overview of the UMS processing will be helpful.
When SQL Server is started, it creates a UMS scheduler object for each processor that SQL Server is designated to use. The worker threads, set according to the
max worker threads configuration value, are divided equally across the UMS schedulers. As new connection requests are accepted, the system process ID (SPID) is assigned to a scheduler. The SPID will remain in the scheduler for its lifetime.
When all worker threads are busy handling requests, subsequent requests to the scheduler are queued. The worker threads are a pooled resource, and as they move back to the Idle state, they look for queued work requests on the scheduler to process.
Note Current connections can queue requests to run queries and other operations such as Distributed Transaction Coordinator (DTC) activities. The queuing of requests is not limited to new connections.
The condition for reporting the message is a simple process as outlined in the following list.
- When a new scheduler request is received, look for idle worker threads and assign the request to one.
- If an idle worker was not located, see whether a new worker can be created. If one can be created without exceeding the max worker threads configuration option assigned to the scheduler. If this is possible, assign the request to the new worker.
- If all workers have been created and none is idle, then queue the request.
- A final check is then made. If the packet is queued and all the worker threads assigned to the scheduler are in an Infinite wait state, the message described in the "Symptoms" section of this article is reported.
Although this check is made at an individual scheduler level, a good rule would be that most, if not all, worker threads are in a wait state.
Although the thread starvation message could indicate a SQL Server problem, the message usually indicates a query performance problem. Most reported instances have been resolved by correcting queries that have any one of the following problems: out of date statistics, large sort or hashes, missing indexes, heavy use of SELECT INTO or other database-level operations, large blocking chains, and similar situations.
It is not uncommon for operations to perform Infinite wait operations unless they are canceled by the client by using an ATTENTION or the use of a Transact-SQL KILL statement.
When doing a sort, hash, SELECT INTO, cursors or other operations in
tempdb, SQL Server may require heavy use of the allocation pages: Global Allocation Map (GAM), Shared Global Allocation Map (SGAM), Page Free Space (PFS), or Index Allocation Map (IAM). When taking out the appropriate latch, SQL Server will use an Infinite wait. Operations that cause heavy I/O, especially in
tempdb, can quickly advance the number of worker threads to extended wait states.
When poor plans are used because statistics are out of date or optimal indexes do not exist, it is likely to require a work table or hash to complete the query. When the optimizer identifies these types of queries, the probability of using a parallel query plan is increased. This could lead to situations where the parallel query occupies several schedulers. Therefore, it is plausible that a server may only have a few SPIDs logged in but the more parallel plan executions the more worker threads are actually being used.
The simplest form of this message is a long blocking chain. If you use one SPID to do a BEGIN TRAN and an update to the
authors table and then have other users try to select from the
authors table, the message can be generated. Under this condition, it is generally an application that is holding open a transaction too long, leading to long blocking chains or exceedingly high deadlocks.
Unless the message that is placed in the SQL Server error log continues at a high rate and will not subside, approach this as a query performance issue. Use SQL Server Profiler and other available techniques to track down the slow performing queries and analyze them. Another good location to start is the
sysprocesses table. Specifically, examine the
Blocked,
CPU, and
Physical I/O columns. They will help narrow your focus.
If the messages do not subside, there are several things to examine more closely. Does the message contain a SPID number following the date and time in the error log or only the word "server"? If the message has a SPID value then the server is not in a "hung" state and is accepting requests from active clients. The problem is still likely to be a set of poorly performing queries. If only the word "server" appears, this indicates that new login requests that have not yet been authenticated exist. The login request will be queued but it is likely that the login time-out will expire and users will report that the SQL Server appears to be unavailable or does not exist.
If you can establish an active connection to the SQL Server or open a connection and leave it active until the problem is experienced, you can collect more data. Specifically, examine the
sysprocesses,
sp_lock information and if it is possible obtain the information that results from running DBCC STACKDUMP. This statement will dump the current run stacks of active threads, including the input buffers. From the input buffers you can track specific queries. If it is required, the error log and .dmp file generated by the DBCC STACKDUMP can be analyzed even more by Microsoft Product Support Services (PSS) to help in tracking down the issue.
Note The previous DBCC command is unsupported, and may cause unexpected behavior. Microsoft cannot guarantee that you can solve problems that result from the incorrect use of this DBCC command. Use this DBCC command at your own risk. This DBCC command may not be available in future versions of SQL Server. For a list of the supported DBCC commands, see the "DBCC" topic in the Transact-SQL Reference section of SQL Server Books Online.
For more information about extended capture details pertaining to sysprocesses and syslocks, click the following article number to view the article in the Microsoft Knowledge Base:
251004
INF: How to monitor SQL Server 7.0 blocking