FIX: Possible Cluster Failover Issues with Pre-Service Pack 4 Version of SQL Server 7.0 (815125)
The information in this article applies to:
- Microsoft SQL Server 7.0
- Microsoft SQL Server 7.0 Service Pack 1
- Microsoft SQL Server 7.0 Service Pack 2
- Microsoft SQL Server 7.0 Service Pack 3
BUG #: 49573 (sqlbug_70)
BUG #: 101175
(sqlbug_70) SYMPTOMSClustered computers that are running a SQL Server 7.0
pre-Service Pack 4 (SP4) version might experience unexpected failovers. When
the failover occurs, SQL Server might write these error messages to the SQL
Server error log: 2003-01-21 16:13:57.99 logon Login failed for user 'webuser'.
2003-01-21 16:13:59.09 spid1 Closing file F:\MSSQL7\data\MyDb_Data.MDF.
...
2003-01-21 16:13:59.60 spid1 Closing file F:\MSSQL7\DATA\modellog.ldf.
2003-01-21 16:13:59.60 kernel SQL Server is terminating due to 'stop' request from Service Control Manager.
2003-01-21 16:14:00.35 logon Login failed for user 'webuser'.
2003-01-21 16:14:00.35 logon Login failed for user 'webuser'.
2003-01-21 16:14:00.35 logon Login failed for user 'webuser'.
2003-01-21 16:14:00.37 logon Login failed for user 'webuser'.
2003-01-21 16:14:00.37 logon Login failed for user 'webuser'.
2003-01-21 16:14:00.37 logon Login failed for user 'webuser'.
2003-01-21 16:14:00.37 logon Login failed for user 'webuser'.
2003-01-21 16:14:00.37 logon Login failed for user 'webuser'.
2003-01-21 16:14:00.37 logon Login failed for user 'webuser'.
2003-01-21 16:14:00.37 logon Login failed for user 'webuser'.
2003-01-21 16:14:00.37 logon Login failed for user 'webuser'.
A DBCC stackdump or userdump taken for SQL Server before the
failover occurs might show the following stack: ntdll!ZwWaitForSingleObject+0xb
KERNEL32!WaitForSingleObjectEx+0x71
ums!UmsThreadScheduler::Switch+0x59
ums!UmsScheduler::Suspend+0xb2
ums!UmsEvent::Wait+0x89
opends60!release_srvproc+0x7c
opends60!exit_srv_thread+0x73
opends60!process_commands+0x2aa
ums!ProcessWorkRequests+0xed
ums!ThreadStartRoutine+0x13a
MSVCRT!_threadstart+0x57
KERNEL32!BaseThreadStart+0x52 CAUSEBefore SQL Server 7.0 SP4, login requests that are not valid
might delay the release of internal structures ( srv_proc) with SQL Server for a period of 60 seconds. During this delay,
SQL Server does not free the worker thread that is associated with the invalid
login request. This reduces the number of worker threads available to the
worker pool. If a significant number of requests are not coming into SQL
Server, all the worker threads may end up in the 60 second delay. Therefore,
this may lead to no worker threads being available to process valid database
requests. A shortage of worker threads also affects SQL Server cluster
resources. When the cluster resource does not receive a response in 60 seconds,
it may initiate a failover of the resources. The following can
contribute to the cluster failover problem:
- A long running run away query (without proper indexes) not
yielding that leads to possible login timeouts (may be 30 seconds) from Web
servers. Because of the 60 second delay, SQL Server waits 60 seconds before it
gives up the worker thread that was servicing the request that timed-out. When
the Web servers experience timeout issues, they keep trying to make new
requests to keep their pool of connections constant. This exhausts all the
worker threads. Because all the worker threads end up in the release_srvproc function, the cluster resource times out and initiates the
failover of the SQL Server resources.
- When a shutdown of the SQL Server is underway, any non-sa
logins that are trying to run a query are rightfully denied access to the
server, and they are set up to fail with a "Login failed" error message. They
are set up to fail with an error message because the server does not honor a
non-sa login while a shutdown is in progress. Therefore, if the cluster
initiates a shutdown request and the Web servers keep submitting non-sa login
requests at the same time, they all end up with "Login failed" messages in the
error log, as shown earlier in this article. With clusters, as soon as a
shutdown is issued by the cluster service (because it was not able to get a
worker thread to run a simple query to validate if the server is responding),
all new non-sa connection requests coming from Web servers end up holding the
worker threads for 60 seconds each. This leads to a longer period for SQL
Server to shutdown.
- If you are using Microsoft Data Access Components (MDAC)
version 2.6 from a client application to connect to a SQL Server 7.0 database,
read the following articles in the Microsoft Knowledge Base for more
information:
300519 FIX: Problems Occur When Connecting to SQL Server from ADO Application
301202 HOW TO: Check for MDAC Version
231943 INFO: Microsoft Data Access Components (MDAC) Release History
- Actual login requests that are not valid that are being
submitted by the Web servers (a malicious user). The invalid requests delay the
worker threads and lead to the whole chain of events. Going through the Web
server logs and SQL Profiler (monitoring only for "Login failed" error
messages), or use of the Audit Login failures will help you to see if any
invalid login attempts come into SQL Server. Only pursue this after you rule
out the previous options as a possible cause for the behavior.
RESOLUTION To resolve this problem, obtain the latest
service pack for Microsoft SQL Server 7.0. For additional information, click
the following article number to view the article in the Microsoft Knowledge
Base: 301511 INF: How to Obtain the Latest SQL
Server 7.0 Service Pack
After you apply SQL Server 7.0 SP4, you must
also apply hotfix 745 to fix a known memory leak problem on clusters:
253738 FIX: SQL Server Components that Access the Registry in a Cluster Environment May Cause a Memory Leak
Note: Hotfix 7.00.0745 is not included with SQL Server 7.0 SP4. That
is why you must apply it after you install SQL Server 7.0 SP4. STATUS Microsoft
has confirmed that this is a problem in the Microsoft products that are listed
at the beginning of this article.
This problem was first corrected in Microsoft SQL
Server 7.0 Service Pack 4.
Modification Type: | Minor | Last Reviewed: | 1/4/2006 |
---|
Keywords: | kbfix KB815125 |
---|
|