Troubleshooting Multiple Cluster Symptoms on the Same SAN (311081)
The information in this article applies to:
- Microsoft Windows Server 2003, Datacenter Edition
- Microsoft Windows Server 2003, 64-Bit Datacenter Edition
- Microsoft Windows Server 2003, Enterprise Edition
- Microsoft Windows 2000 Advanced Server
- Microsoft Windows 2000 Datacenter Server
- Microsoft Windows NT Server 4.0
- Microsoft Windows NT Server 4.0 SP6a
This article was previously published under Q311081 SUMMARY This article describes the multiple-cluster scenarios when
certification is not met and the disks are allowed to see multiple-clustered
nodes on the same SAN. Multiple-cluster is more than one set of MSCS clusters
that are assigned to one or more fiber-attached host bus adapters
(HBAs). These same SAN devices can be attached to main frame
computers or UNIX operating systems. This can present some challenges because
of differences in SCSI commands sets. These anomalies can be caused by firmware
revisions or the inability to properly zone or mask the bus resets to control
the disks with MSCS Cluster Services. Without this protection in
place (and proper masking, zoning or a combination of both) the following
problems could occur:
IMPORTANT: You should contact the SAN vendor for the specific technology to
use. MORE INFORMATION The issues that are described in the "Summary" section of
this article may appear to resolve themselves after Chkdsk.exe runs, but they
may then return several weeks later and repeat the same pattern on one or more
clustered nodes. These issues may be seen in any pattern, but Event IDs 26, 50,
or 51 are most prevalent. You may also see event warnings and error
messages that are similar to the following error messages:
Event ID: 51 Source: Disk Description: An
error was detected on device \Device\Harddisk9\DR9 during a paging operation.
Event ID: 50 Source: Disk
Description: {Lost Delayed-Write Data} The system was attempting to transfer
file data from buffers to \Device\Harddisk\Volumex. The write operation failed, and only some of the data may have
been written to the file. Event ID: 26
Source: Application Popup Description: Application popup: Windows -
Delayed Write Failed : Windows was unable to save all the data for the file
\Device\HarddiskVolumex\SQLDatabases\System\machine\LOG. The data has been lost. This
error may be caused by a failure of your computer hardware or network
connection. Please try to save this file elsewhere. Event ID: 9 Source: HBA Driver Description: The device, \Device\Scsi\HBA driver, did not respond within the timeout period. Event ID: 15 Source: Disk Description: The device,
\Device\Harddiskx\DRx, is not ready for access yet.
Event ID: 1066 Source: ClusSvc Description: Cluster disk resource Disk x: is corrupt. Running ChkDsk /F to repair problems. NOTE: These issues can also affect network adapters if bus contention
exists throughout the system. Event ID: 1123
Source: ClusSvc Description: The node lost communication with cluster node
'machine' on network 'heartbeat'.
Event ID: 1122
Source: ClusSvc Description: The node (re)established communication with
cluster node 'machine' on network 'heartbeat'.
For additional information about how to determine whether a cluster configuration that shares a storage subsystem with other systems meets certification requirements, click the following article number to view the article in the Microsoft Knowledge Base:
304415
Support for Multiple Clusters
Attached to the Same SAN Device
Modification Type: | Minor | Last Reviewed: | 1/5/2006 |
---|
Keywords: | kberrmsg kbinfo kbnetwork KB311081 |
---|
|