INF: Troubleshooting SQL Cluster Wizard Failures (254593)
The information in this article applies to:
- Microsoft SQL Server, Enterprise Edition 6.5
- Microsoft SQL Server, Enterprise Edition 7.0
This article was previously published under Q254593 SUMMARY This article provides information on the actions that the
SQL Server Cluster Wizard performs, along with the order in which these actions
are performed. Additionally, detailed information is given concerning possible
problems that you may encounter with each step that might cause the wizard to
fail. Possible resolutions for these problems are also included. Detailed,
specific problem scenarios and resolutions are also provided. Note All SQL Server 6.5 and 7.0 Cluster customers should upgrade to
SQL Server 2000 as soon as it is available. The following tools, features, and
components are supported with failover clustering in SQL Server 2000 Enterprise
Edition:
- Microsoft Search service (Full Text)
- Multiple instances
- SQL Server Enterprise Manager
- Service Control Manager
- Replication
- SQL Profiler
- SQL Query Analyzer
MORE INFORMATION The following steps describe using the SQL Server Cluster
Wizard:
- First, the SQL Cluster Wizard connects to the server and
verifies that all the databases and binaries are on shared disks.
Possible Problem
About the only thing that can go wrong at this point is
that the service may not be able to start, which is normally due to the shared
disk being owned by the wrong node or the failure to install both the program
files and data files to the cluster disk.
Resolution
To correct this problem, confirm that the shared disk
is owned by the correct node before you run the cluster wizard. Also, check and
make sure that both the program files and data files are installed to the
cluster disk. - After you enter the IP address and network name, the wizard
creates a test resource with those properties and brings it online to see if
any conflicts on the network occur.
NOTE: An error message only occurs if you enter an IP address that is
in use; invalid IP addresses or a bad subnet mask are not detected.
Possible Problem
If you have just unclustered and are re-clustering the
server, you may get an error message indicating that your network name is in
use. This may occur because Windows NT occasionally fails to remove the network
name from the net bios registration properly.
First Resolution
Open a command prompt window and enter the following
command:
nbtstat -RR Press Return. Upon completion, try using the IP address again. If the IP
address still fails, move to the second resolution. Second Resolution Reboot the system. - After you enter all the information, the wizard copies all
the COM files that are registered in BINN to the SQL Server subdirectory of the
location pointed to by the following registry key:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\SharedFilesDir By default, this key points to the following location:
C:\Program Files\Common Files\Microsoft Shared\
Possible Problem
The SQL Server Cluster Wizard is unable to find these
files or the location to which they should be copied. This problem usually
occurs when something is wrong with the following registry key used by the
wizard: HKey_Local_Machine\Software\Microsoft\SharedTools\SharedFilesD
Note that the setup uses a different registry key (listed below), but the two
should normally point to the same path: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\CommonFilesDir
NOTE: By design, when this problem occurs, no error is
displays. This is done so that you can install without replication and still
run the wizard without having it fail. The only way to identify if this problem
is occurring is to look at the debug output window, which is formed by setting
_PRINT_CONSOLE_=1 in the system environment prior to running the SQL Cluster
Wizard. If this step is executing correctly, you see references to the
replication files, such as Replres.dll and Distrib.exe, as they are copied. If
you do not see references to these files, you are encountering this
problem.
First Resolution
Refer to Scenario 8 in the "Specific Scenarios"
section.
Second Resolution
Refer to Scenario 3 in the "Specific Scenarios"
section. - Next, the SQL Cluster Wizard copies the same files to the
same place on the other node, finding the correct path by reading the remote
registry. The SQL Cluster Wizard creates a share on the remote node named
cluster_tools_share, copies the files to that share, and then deletes it.
Possible Problem
The only problem that may occur at this point is if
registry key problems exist on the other node or if the wizard was unable to
create the share.
First Resolution
Refer to step 8.
Second Resolution
Refer to Scenario 3 in the "Specific Scenarios"
section. - The wizard then copies the cluster specific files to the
\System32 directory of both nodes.
Possible Problem
Normally, this step completes successfully. The SQL
Cluster Wizard copies the files from the CD or network share, so it is possible
that it may lose the connection to the share or be unable to create the
cluster_tools_share because it already exists.
Resolution
Refer to Scenario 3 in the "Specific Scenarios"
section. - The wizard runs the "secnode" setup, which installs the
necessary system files to the remote node and registers all the COM files that
have been copied to the C:\Program files\Common files\Microsoft shared
directory.
Possible Problems
One of the most common problems at this point occurs if
the setup is run from a share point on the first node (connected through net use where you specify a user and a password). When this happens, by
default, node2 does not have access to the share so that when secnode runs it
fails to connect back to the install location to copy the files. When this
occurs, you receive a message indicating that setup could not be run on the
remote computer.
Another problem may occur if you install from a
network share when the path has a space in the name. This causes the secnode
setup to fail because it is unable to handle paths with spaces unless they are
quoted. There is no way around this problem apart from renaming the share.
First Resolution
If you experience either of these problems, you should
check in the <%SYSROOT%> directory for the Sqlclstr.log file or on the
second node's TEMP directory for the Remsetup.log for clues or descriptions of
the problem. Correct all problems and then run the wizard again.
Second Resolution
Permissions problems can also prevent the SQL cluster
Wizard from working correctly when performing operations on the second node.
The account under which setup runs MUST have the appropriate
permissions:
- Be a local administrator for both nodes.
- Have the user right to "log on as a
service".
- Have the user right to "act as part of the operating
system".
These permissions MUST exist on BOTH nodes; otherwise this fails.
You set these permissions from the primary domain
controller (PDC). After the correct permissions are set, you need to logoff and
then logon again for the changes to be reflected. For further details, refer to
scenario 5 in the "Specific Scenarios" section.
Possible Problem
Secnode may also fail if it runs but has errors
internally, such as not successfully registering all the COM files.
Resolution
Correct all problems reported in the Sqlstp.log on the
second node. - Next, the SQL Server Cluster Wizard rebinds all the files
located in the following places:
- The SQL BINN directory.
- C:\Program Files\Common Files\Microsoft Shared\SQL
Server
- C:\Program Files\Common Files\Microsoft Shared\Database
Replication
This occurs on both nodes.
The SQL Server Cluster
Wizard then rebinds the following system files on both nodes:
- Dbnmpntw.dll
- Sqlstr.dll
- Sqlwoa.dll
- Sqlsrv32.dll
- Cliconfg.dll
- Cliconfg.exe
The SQL Server Cluster Wizard rebinds
%Sysroot%\System32\Sqlctr70.dll on the local node only. Possible Problem
The rebinding process can only be broken when something
is using one of the files it is trying to bind. If any SQL applications,
including the SQL Service Manager, are open this message displays:
..could not update binaries... For additional
details, refer to:
248380 PRB: SQL 7.0 Failover Wizard Error when Updating Binaries
The most common problem is that some of the system
files are in use. You can usually tell on which node the problem is
occurring by the amount of time it takes to the message to display again after
a retry. If the message displays instantaneously, this usually indicates that a
file on the local computer is in use, but if it takes a few seconds, then the
problem is probably occurring on the other node. Resolution You can usually work around this problem by stopping
all offending services and make sure that you do not have any applications
open. To verify which services you should have running, refer to the following
articles in the Microsoft Knowledge Base:
192708 INF: Installation Order: Cluster Server Support for SQL or MSMQ
for SQL 6.5 Enterprise
Edition 219264 INF: Order of Installation for SQL Server 7.0 Clustering Setup
for SQL 7.0 Enterprise Edition. Possible Problem If you are unclustering and one of the resource DLLs is
in use, the resource DLL may stop responding in one of its connections to the
server. This causes the resource monitor process (Resrcmon.exe) to have the
dbnmpntw.dll file open even when the resource is offline. First Resolution Reboot and re-run the wizard to uninstall. Second Resolution Rename the offending DLL to Dbnmpntw.dll.copy, and then
copy it back to the original name. Now the .copy file is in use but the
dbnmpntw.dll file is not, so the wizard may complete without any problems.
- The SQL Cluster Wizard now creates the net name, IP,
sqlserver, agent and vsrvsvc resources in the cluster, brings the SQL Server
resource online, and changes the local server in the sysservers system table to the virtual server name.
Possible Problem
Creation of the resources is usually never a problem.
You should see the resources being created in the group in which the disk
resides. All this step does is create the resources and make dependencies
between them so that they can start in the correct order.
Bringing
the resources online is the last phase of the setup. The first phase is to
start the MSSQLSERVER$VIRTNAME service, connect to it, and set the values in sysservers correctly. If this step fails, then the whole setup fails and
rollbacks all the work it has done so far. When the rebinding of the
Sqlsrv32.dll (an ODBC file) file does not work correctly. When this occurs, you
will see error 123 or 126 in the cluster setup log (Sqlclstr.log) just after
the fixsysservers call.
If this happens:
- The cluster is completely broken.
- It is caused by the wizard only changing one of the two
references to the Kernel32.dll file to reference the Vernel32.dll file
instead.
- If you previously installed a different version of
Microsoft Data Access Components (MDAC) on the computer before installing SQL,
the version of the Sqlsrv32.dll file on the system is different.
First Resolution
Reboot both servers and, before retrying, make sure
that only the minimum services are running as outlined in the following
Microsoft Knowledge Base articles:
192708 INF: Installation Order: Cluster Server Support for SQL or MSMQ
for SQL 6.5 Enterprise
Edition. 219264 INF: Order of Installation for SQL Server 7.0 Clustering Setup
for SQL 7.0 Enterprise Edition. Second Resolution Rename the Sqlsrv32.dll file, and then reboot the
computer. Before retrying, make sure that only the minimum services are running
as outlined in the following Microsoft Knowledge Base
articles: 192708 INF: Installation Order: Cluster Server Support for SQL or MSMQ
for SQL 6.5 Enterprise
Edition. 219264 INF: Order of Installation for SQL Server 7.0 Clustering Setup
for SQL 7.0 Enterprise Edition. Third Resolution Contact SQL Product Support Services.
- The SQL Cluster Wizard finishes.
Specific ScenariosScenario 1Problem
SQL Cluster Wizard fails with this log entry:
@ CopyFileIfNeeded: [D:\EnterpriseEdition\x86\CLUSTER\SQAGTRES.DLL] => [C:\WINNT\System32\SQAGTRES.DLL]
@@@ CopyFileIfNeeded: [D:\EnterpriseEdition\x86\CLUSTER\SQAGTRES.DLL] => [\\LNXDAYCC02\admin$\system32\SQAGTRES.DLL]
~~~ XXX InstallRemote failed
[reghelp.cpp:34] : 2 (0x2): The system cannot find the file specified.
Resolution Verify that you can make a \\server_name\admin$
connection from both nodes in the cluster. Make sure you check this
if any network interface card (NIC) settings have been changed or if network
cards have been replaced. WARNING: If you disable File and Print Sharing for Microsoft Networks,
under the Network Connection properties on Windows 2000 computers, you will not
be able to make a connection to the Administrative shares. Attempts to access
the Administrative shares causes a Error: 53 error message to occur. Scenario 2Problem
The SQL Cluster Wizard fails with the following generic
message and there is not a reference to a specific file:
File already exists. Resolution Verify that the SQL group name is in all capital
letters. If it is not, the wizard tries to create a new group but is unable to
so. If it is not all uppercase, rename it to a temporary name (such as x) and
then rename it to the correct name in all uppercase. NOTE: This applies to renamed groups only. The default names like
"Disk Group 1" have their resources moved to the new group if required by SQL.
Scenario 3Problem
The Sqlclstr.log file shows the following:
~~~ ClusterResourceStart... tick=2, state=2
[validate.cpp:147] DeleteTestGroup:OpenClusterResource: 5007 (0x138f): The cluster resource could not be found.
~~~ XXX Copy Files failed
[reghelp.cpp:34] : 2 (0x2): The system cannot find the file specified.
Resolution Check the net shares on each node and look for the
following: - \\cluster_tools_share
- \\cluster_setup_share
If either is found, delete them.
Scenario 4Problem
When you try to re-cluster SQL after installing SQL
service pack 1, the install fails with the following error in the
Sqlcluster.log file:
Looking at disk P:
Disk P is fixed in group SQL_Disk
Looking at disk Q:
Disk Q is used by SQL but is moveable
Looking at disk R:
Error: Resource groups SQL_Disk and Disk_R both contain SQL disks
[chkconf.cpp:1416] : 160 (0xa0): The argument string passed to DosExecPgm is not correct.
[chkconf.cpp:1482] ClusterFindVirtualSQLSrvGroup: 160 (0xa0): The argument string passed to DosExecPgm is not correct.
The "P" drive is the drive to which SQL was installed and the
installer thought it was the only drive in use. Actually, the P, Q and R drives
are being used. Resolution Check the SQL error logs and sysdevices system table and make sure that all drives being used by SQL are
in the SQL group SQL is using. NOTE: If additional cluster disk resources are added to the cluster
for use by SQL, or if other disks currently used in the cluster are designated
for use by the clustered SQL server, they should be added as dependencies of
the SQL Server. Scenario 5Problem
Setup is unable to update the remote node or errors
occur when connecting to all the default databases during the initial
setup.
For example:
#### SQL Server Remote Setup - Start Time 10/28/99 13:14:22 ####
Script file copied to '\\server8\ADMIN$\secnode.iss' successfully.
Installing remote service...
Running '\\node1\F$\ENGLISH\X86\setup\setupsql.exe SecNode=1 -s -f1 \\node2\ADMIN$\secnode.iss'...
Remote process exit code was '-1'.
\\node2\Admin$\sqlsp.log
Disconnecting from remote machine...
Service removed successfully.
Remote files removed successfully.
#### SQL Server Remote Setup - Stop Time 10/28/99 13:15:08 #### Resolution Be sure the service account is set up with all the
correct permissions. By copying the existing Administrator account, you can
make sure that the group memberships and many other properties are copied to
the new account. When a user account is copied, the description, group
memberships, logon hours, logon workstations, and account information are
copied exactly. The user name, full name, and password boxes of the new account
are blank and must be entered. The User Cannot Change Password and Password Never Expires check boxes are copied. NOTE: When copying an account that is a member of the Administrators
local group, the User Cannot Change Password setting is not copied. Usually, the User Must Change Password At Next Logon check box is selected, regardless of its setting in the original
account; however, this check box should be clear. Also, the Password Never Expires check box should be selected. After all the entries are complete,
click Add. Now, from the User Manager menu, select Policies\User Rights, select to show Advanced User Rights, and then grant the following rights to the new user:
- Act as part of the operating system.
- Logon as a service.
- Logon locally.
Next, logon to both nodes with the newly created account and
perform basic connectivity and rights testing: - To verify remote procedure call (RPC) connectivity, try to
log on remotely from each node to the other with either Perfmon, Regedt32 or
Srvmgr.
- To verify NetBIOS, try issuing a net view
\\machine_name and net use
\\machine_name\admin$
- To verify RDR and SRV without NBT and IP connectivity
net view \\ IP Address
- Try using a telnet or FTP session to test for transport
functionality.
Scenario 6Problem
The SQL 6.5 Cluster Wizard fails and the last line of
cluster wizard log states:
Start SQL Server cConnectString="ODBC;DSN='';DRIVER={SQL Server};SERVER=CLIO;DATABASE=master;UID=sa;PWD="
Resolution First verify that performing a
@@servername does not return a NULL response. If it
does, then the sysservers system table does not have an entry for the local server name.
Correct this and continue. If you were able to verify @@servername, you should reload the ODBC drivers and then run the SQL Cluster
Wizard again. To reload the ODBC drivers, run the setup program from the SQL
Server 6.5 Extended Edition compact disk in either the \I386\Odbc directory for
Intel based computers or the \Alpha\Odbc directory for Alpha based computers.
Scenario 7Problem
Every time the Clustwiz.exe file runs, a Dr. Watson
message appears pointing to the Cpqmgmt.dbg file.
Resolution
All the following Microsoft Knowledge Base references
indicate that this problem is related to the Compaq Insight Manager. Apply the
latest Compaq SoftPak (in most cases SSD 2.12a) and stop all possible
conflicting services as outlined in the following Microsoft Knowledge Base
articles:
192708 INF: Installation Order: Cluster Server Support for SQL or MSMQ
219264 INF: Order of Installation for SQL Server 7.0 Clustering Setup
Scenario 8Problem
The following registry entry is
incorrect:
Hkey_Local_Machine\Software\Microsoft\Windows\CurrentVer\CommonFilesDir
Resolution
Correct the path if it is wrong.
Scenario 9Problem
You are unable to uncluster SQL using the SQL Cluster
Failover Wizard.
Resolution
When the SQL Cluster Failover Wizard is run, the SQL
cluster resources are created. By default, these resources have the following
naming structure:
<Virtual_SQL_Server_Name> IP Address
<Virtual_SQL_Server_Name> Network Name
<Virtual_SQL_Server_Name> SQL Server 7.0
<Virtual_SQL_Server_Name> VServer
<Virtual_SQL_Server_Name> SQL Server Agent 7.0
For example, if the Virtual_SQL_Server_Name is xyz, the SQL
resources are, by default, named as:
xyz IP Address
xyz Network Name
xyz SQL Server 7.0
xyz VServer
xyz SQL Server Agent 7.0
If all or some of these resources are then modified to:
IP Address
Network Name
SQL Server
Virtual Server
SQL Agent
this can cause the SQL Cluster Failover Wizard to fail or hang
when used. To resolve this, rename the resources back to the default names.
Scenario 10Problem
The SQLCLUST.LOG shows the following:
~~~ OnEnableCluster: UpdateSku
~~~ OnEnableCluster: TransferSQLServices
+++ TransferSQLServices: enter
+++ TransferSQLServices: calling AddVSNameLanManServer
[reghelp.h:132] type not REG_MULTI_SZ: 160 (0xa0): The argument string passed to DosExecPgm is not correct.
[reghelp.h:133] : 160 (0xa0): The argument string passed to DosExecPgm is not correct.
[reghelp.h:290] : 160 (0xa0): The argument string passed to DosExecPgm is not correct.
[clenable.cpp:1803] : 160 (0xa0): The argument string passed to DosExecPgm is not correct.
[clenable.cpp:1836] : 160 (0xa0): The argument string passed to DosExecPgm is not correct.
[clenable.cpp:2379] : 160 (0xa0): The argument string passed to DosExecPgm is not correct.
~~~ XXX TransferSQLServices failed
Resolution Verify that the type value of the following registry
key:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameters\NullSessionPipes
is REG_MULTI_SZ. The actual failure is in
RegQueryValue_MULTI_SZ(). It fails because the type of the key is not
REG_MULTI_SZ. It the type of the key is not REG_MULTI_SZ, you will
need to copy the contents from the key, delete and re-create the key with the
same name and correct type value, and then replace the contents.
Modification Type: | Major | Last Reviewed: | 8/31/2006 |
---|
Keywords: | kbinfo KB254593 kbAudDeveloper |
---|
|