PROBLEM: (94915, 94201) (PATCH ID: TCR540-023) ******** Fixes a regression associated with non SCSI storage. Typical symptoms are error messages of the form: 'drd_get_disk_attributes(182): ksm_get_attributes failed 2.' PROBLEM: (93677, 92409, 94911, 92799) (PATCH ID: TCR540-012) ******** PROBLEM: (93677) (PATCH ID: ) This patch improves the responsiveness of EINPROGRESS handling during the issuing of I/O barriers. The fix removes a possible infinite loop scenario which could occur due to the deletion of a storage device. The issue with EINPROGRESS responsiveness is the continued looping while waiting for a disk structure to become available. No attempts were being made to force the availability of the disk structure. In addition, no retry limit was being enforced and no checks were being made for deleted devices. This combination presents the possiblity of infinite retry attempts. PROBLEM: (92409) (PATCH ID: ) This patch fixes a CNX manager panic encountered while multiple cluster nodes are booted simulataneously. The panic string seen is: CNX MGR: Invalid configuration for cluster seq disk PROBLEM: (94911) (PATCH ID: ) Fixes a possible race condition between a SCSI reservation conflict and an I/O drain, which could result in a hang. The race condition occurs when a SCSI event causes a reservation conflict, such as a path failover, while at the same time a cluster member is in the process of issuing an I/O barrier, due to an event such as a member transition. This results in a hang on the cluster member attempting to barrier. Examination of the system in this state or by a forced crash will reveal one or more drd_event_threads sleeping in ccmn_send_ccb_wait3(). The hang is ultimately caused by in flight I/Os that are pending due to the above thread. Here is a typical stack trace: THREAD: fffffc0003816e00 0 thread_block 1 sleep_prim 2 mpsleep 3 ccmn_send_ccb_wait3 4 ccmn_path_ping3 5 ccmn_resolve_paths3 6 cdisk_ioctl 7 drd_issue_local_ioctl 8 drd_check_path 9 drd_handle_event_io_drained 10 drd_handle_one_event 11 drd_handle_events 12 drd_event_thread PROBLEM: (92799) (PATCH ID: ) This patch alleviates a condition in which a cluster member takes an extremely long time to boot when using LSM. The problem occurs when a fiber channel disk that belongs to an LSM set goes bad. The condition is seen while booting a system into a cluster, where the other members are far enough up to recognize their LSM sets. Immediately after the "starting LSM" boot message, the booting system will appear to hang and will periodically output the following message to the user console: "DRD failed register against returned 5" PROBLEM: (93369) (PATCH ID: TCR540-019) ******** This patch fixes a problem in the cluster kernel where a cluster member panics while doing remote IO over the interconnect. PROBLEM: (93996) (PATCH ID: TCR540-010) ******** The Device Request Dispatcher, DRD, should retry to get disk attributes when EINPROGRESS is returned from the disk driver. This problem can be seen by deleting a device in a cluster and then adding it. The console message is: drd_get_disk_attributes (1234) - ksm_get_attributes failed 36 PROBLEM: (95368) (PATCH ID: TCR540-039) ******** This patch fixes a problem where a munsa_reject was done to the quorum disk without doing a munsa_unreject. This problem can only happen if the quorum disk is on a parallel scsi bus and quorum is lost. PROBLEM: (95401, 95588) (PATCH ID: TCR540-054) ******** This patch fixes problems in the Device Request Dispatcher. Symptoms are hangs during boot and shutdown. As the issue is timing dependent, messages output on the console are not deterministic. PROBLEM: (94963) (PATCH ID: TCR540-028) ******** This patch fixes a race condition, in the Device Request Dispatcher subsystem (drd.mod), whereby two nodes can simultaneously be the local server of a shared tape. The effect of this situation is that multiple nodes could have the device open for I/Os, which is illegal and could result in data inconsistencies.