MORE INFORMATION
IMPORTANT: This article contains information about modifying the registry.
Before you modify the registry, make sure to back it up and make sure that you
understand how to restore the registry if a problem occurs. For information
about how to back up, restore, and edit the registry, click the following
article number to view the article in the Microsoft Knowledge Base:
256986 Description of the Microsoft Windows Registry
WARNING: If you use Registry Editor incorrectly, you may cause serious
problems that may require you to reinstall your operating system. Microsoft
cannot guarantee that you can solve problems that result from using Registry
Editor incorrectly. Use Registry Editor at your own
risk.
SQL Virtual Server client behavior
Microsoft Cluster Server (MSCS) provides a reliable and robust
platform on which you can build mission-critical SQL Server applications. You
do not need to modify most server applications to use them with MSCS. However,
transaction-based applications (for example, database servers, such as
Microsoft SQL Server) usually require additional modification so that if the
server fails, failover support properly prevents the loss of transactional
integrity. Developing a client application to operate with MSCS is relatively
straightforward. You must design applications with database recovery and error
checking in mind.
Even without the use of clusters, a SQL Server
server automatically recovers all databases when the server is restarted. To
ensure that a database is recovered in a consistent application state, use
database transactions so that failover occurs in the database correctly and in
a consistent state. Any transactions that are incomplete when failover occurs
should be rolled back, while the effects of all committed transactions should
be preserved.
During failover, client applications lose their
connection to the SQL Server server and must reconnect to continue processing.
If the client connection to the server is stateless, (for example, applications
that are developed by using Microsoft Internet Information Server [IIS] are
stateless) the client reconnects to the server and continues processing. Unless
the client and server have a common state (for example, open cursors, session
variables, Transact-SQL global variables, or data in tempdb), failover is not
transparent to the client. In these cases, you should design the client
application to inform the user that the connection was either lost, or reset or
have the application automatically reestablish its connection to the server.
Any transaction that has not been committed when a failover occurs is rolled
back.
The discussion of how clients deal with server failures is
standard for any SQL Server client application, even without the use of
clusters and virtual servers. The error checking process is quite similar for a
client database application for a cluster. When the cluster begins failover,
the client program receives an error message on the database connection. The
error messages encountered depend upon what the client program is attempting to
do at that time.
If a SQL Server server is failed over by the cluster
admin, TCP reset packets are not sent. If the SQL Server process is
terminated by the operating system (by Kill.exe), the reset packets are
sent.
This may affect the client application if the application does
not specify a query timeout parameter or a query timeout of zero (0).
If the application does not have a query timeout value then open connections
will be left in the ESTABLISHED state after a failover occurs.
The fact that the open connections are not closed and that
no further TCP packets are sent from those connections indicates that those
connections are completely idle. Because the failover did not send any TCP
reset packets to the client application, those open connections wait for the
query results indefinitely (assuming an infinite query timeout), and
potentially cause the connection to stop responding (hang).
To
address this issue from a client application perspective, change the query
timeout to a finite number.
Virtual database failure behavior
When a virtual database server fails, a connection link failed
error message is returned to the waiting client. The database on the failed
node of the cluster is shut down and restarted on the same node per the
parameters you set up in:
Start\Programs\Administrative Tools (Common)\Cluster Administrator\Group\Failover\Properties
The Group Failover default threshold is 10 restarts in a 6-hour
period before a failover occurs to the remaining node. However, the SQL Server
restart threshold, which can be verified through the SQL Server properties on
the SQL Server cluster resource, has a default threshold of three restarts on
that SQL Server in 900 seconds and by default does affect the group. If a
client attempts to connect to the server while a database is being recovered,
the client receives a waiting for database recovery error message and should
retry after a short pause.
SQL Server 6.5 and SQL Server 7.0 considerations
SQL Server 6.5 and SQL Server 7.0 act exactly as described in the
previous "Virtual Database Failure Behavior" section.
When SQL Server
7.0 runs as a virtual server SQL Server 7.0 supports only one IP address but
might listen on additional ports as configured by the customer. This is
described in the "Multiple Listen-On TCP/IP Ports" topic in the following
Microsoft Knowledge Base article:
254321 INF: Clustered SQL Server Do's, Don'ts and Basic Warnings
Microsoft SQL Server 2000 considerations
SQL Server 2000 has some differences in behavior from the
SQL Server 6.5 and SQL Server 7.0 versions.
SQL Server 2000 port usage By default, a named instance listens on a dynamic port.
The first time the server starts with a port set to zero (0), the server
requests a free port number from the operating system and then the server
listens on that port. The server records this to the registry and then uses the
same port every time.
If a server is configured to listen on a
dynamic port and the server fails to listen on the dynamic port on startup,
then the server chooses another port.
If you configured a static
port either during setup or after setup by using the server network utility it
fails to listen on TCP/IP if this port is in use.
Clients detect the
port number to connect to in the case of a named instance or one with a
non-default port number.
The connection information is written to the
"LastConnect" cache in this registry key:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSSQLServer\Client\supersocketnetlib\lastConnect
You will find entries for each server and the method
that was used to connect to them in the registry.
The client attempts
to re-use the connection information on each connection unless it fails and
then re-negotiates the new information. This might happen if the port number
has changed because someone changed it or if it was a dynamic port that was
re-assigned due to a port being in use.
Broken connections
There are three ways a connection can be broken:
- The server fails; the process terminates by being killed
(system server process ID [spid] kill) or an access violation (AV) or something else causes the
operating system or required service to fail.
- Machine hardware failure or loss of power.
- Server shutdown.
Each of these broken connections exhibit different behaviors
seen on the client computer.
- In the case where a server fails, the client receives a
connection broken error message immediately. You can simulate this behavior by
connecting with OSQL, running a long query and then use KILL to terminate your SQL Server process. The client exits with an
ODBC error message.
- A machine failure is more complicated. The behavior can
change slightly based on how the connection loss is detected.
If the
client is in the middle of reading information, the connection loss can be
detected immediately because the data stops.
If the client is just
waiting for results, the behavior is slightly different. The behavior depends
upon the Keep Alive configuration of the client computer.
On
Microsoft Windows 2000 Keep Alive is set by the client code on a per connection
basis. By default, Keep Alive is set to 30 seconds. This means that if the
socket dies it is detected within 30 seconds and the client receives an error
message. On Microsoft Windows NT 4.0, Keep Alive cannot be set on a per
connection basis. Keep Alive must be set for the whole computer, thus affecting
all applications on the server.
The registry keys that are being
referred to are:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters KeepAliveTime\REG_DWORD 30000
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters KeepAliveInterval\REG_DWORD 1000
- When you initiate a server shutdown, the server waits a
while for the clients to finish. However, if the client is still running the
server kills the threads inside the server. Killing the threads may also cause
different error messages on the client. Error messages can include a connection
broken error; however, most of the time you see this error message:
"An unknown error occurred, connection may have been
terminated by the server".
The ODBC native error code is set to zero
(0) in this case but it is returned as an error message to the
client.