PROBLEM: (TKTB52828, QAR 42761) (Patch ID: OSF375-370034) ******** This patch fixes a situation in which a system panics, displaying the string: panic `Unable to restart Qlogic(LUN queue after abort) A typical stack trace for this type of panic is: 0 boot 1 panic `Unable to restart Qlogic(LUN queue after abort)` 2 isp_termio_abort_bdr 3 ss_perform_abort 4 ss_sched 5 scsiisr 6 ss_start_sm 7 ss_go 8 ss_abort 9 ss_perform_timeout 10 ss_process_timeouts 11 softclock_scan 12 hardclock 13 _XentInt 14 idle_thread PROBLEM: (QAR 46071, 45122) (Patch ID: OSF375-370034) ******** This patch fixes a situation in which a system panics, displaying the text: simple_unlock: minimum spl violation pc of caller: 0xfffffc0000420b04 lock address: 0xfffffc0077f236d0 lock info addr: 0xfffffc00005b0b50 lock class name: cam_pd_device (could also be cam_isp) current spl level: 0 required spl level: 4 panic (cpu xx): simple_unlock: minimum spl violation where 'xx' above is the CPU number. A sample stack trace is not available. This panic is seen when lockmode is set to 4. PROBLEM: (QAR 46756) (Patch ID: OSF375-370034) ******** This patch fixes the situation where the following sequences of messages are displayed during startup: cam_logger: CAM_ERROR packet cam_logger: bus 0 isp_enable_lun Failed to enable target for selections cam_logger: CAM_ERROR entry too large to log! cam_logger: CAM_ERROR packet cam_logger: bus 0 isp_enable_lun Failed to enable target for selections cam_logger: CAM_ERROR entry too large to log! cam_logger: CAM_ERROR packet PROBLEM: (ZPOB71272,46139,46757,42908,43719,44474,40361,40443,41465) ******** (Patch ID: OSF375-034) This patch fixes a situation in which a system panics, displaying the string: panic `Simple lock : time limit exceeded` One typical stack trace for this type of panic is: 4 boot 5 panic 6 simple_lock_fault 7 simple_lock_time_violation 8 ss_finish 9 sm_bus_free 10 isp_process_response_queue 11 isp_intr 12 intr_dispatch_post 13 _XentInt 14 isp_mailboxcomplete 15 isp_mailboxissue 16 isp_abort 17 isp_termio_abort_bdr 18 ss_perform_abort 19 ss_sched 20 scsiisr Another stack trace encountered is: 12 boot 13 panic `Simple lock : time limit exceeded` 14 simple_lock_fault 15 simple_lock_time_violation 16 ss_finish 17 ss_abort_done 18 sm_bus_free 19 isp_abort_done 20 softclock_scan 21 hardclock 22 _XentInt 23 alpha_delay 24 microdelay 25 isp_mailboxcomplete 26 isp_mailboxissue 27 isp_stop_queue 28 isp_termio_abort_bdr 29 ss_perform_abort 30 ss_sched 31 scsiisr 32 ss_start_sm 33 ss_go 34 ss_abort PROBLEM: (HGOBB0043, HGOBB0597) (Patch ID: OSF375-370043) ******** Intermittently, the probe of an isp will fail during a boot, and the system can hang. This happens maybe once in 20 system boots, so a workaround is to try the boot again. A typical console listing looks like this when it fails: . . . Created FRU table configuration errorlog packet tiop0 at tlsb0 node 8 tiop0: cpu interrupt mask being set as 1. pci0 at tiop0 slot 0 isp0 at pci0 slot 0 isp0: QLOGIC ISP1020 - Differential Mode cam_logger: CAM_ERROR packet cam_logger: bus 0 isp_mailbox complete Timeout on mailbox command completion scheduling chip reinit cam_logger: CAM_ERROR packet cam_logger: bus 0 isp_init Mailbox operations not functional for common init cam_logger: CAM_ERROR packet cam_logger: bus 0 isp_probe Common init failure - Failing probe isp0: Not probed. isp in slot 0 not configured. . . . IMPORTANT: This patch should be installed on ANY system that contains a Qlogic chip. If the system boot sequence displays the word "Qlogic", then install this patch. Qlogic chips are built into many systems, and can also be found in many add-on options, such as: KFTIA, KZPBA, KZPDA, PZPSM, P1SE, P2SE, and others. PROBLEM: (QAR 49540) (Patch ID: OSF375-068) ******** Infrequently, under heavy disk I/O loads, user data can be written to the wrong disk, resulting in data corruption. PROBLEM: (HPXL85F5D) (Patch ID: OSF375-068) ******** In order to disconnect a tape drive from the bus for maintenance, Field Service first quiesced the bus on the HSZ40, then powered off the tape drive. When the tape drive was powered off, the HSZ40 rebooted, and the system then panicked with "simple_lock: time limit exceeded". The stack trace looked like this: One CPU timed out waiting for the spo resource queue lock: 9 panic("simple_lock: time limit exceeded") [src/kernel/bsd/subr_prf.c:757] 10 simple_lock_fault() [src/kernel/kern/lock.c:1794] 11 simple_lock_time_violation(()) [src/kernel/kern/lock.c:1863] 12 spo_add_to_resource_q() [src/kernel/io/cam/spo/simport.c:1063] 13 spo_action() [src/kernel/io/cam/spo/simport.c:624] while another CPU continued to hold the lock: 1 panic("cpu_ip_intr: panic request") [src/kernel/bsd/subr_prf.c:727] 2 cpu_ip_intr() [src/kernel/arch/alpha/cpu.c:487] 3 _XentInt() [src/kernel/arch/alpha/locore.s:961] 4 spo_immediate() [src/kernel/io/cam/spo/simport.c:2156] 5 spo_action_immediate() [src/kernel/io/cam/spo/simport.c:1026] 6 spo_action() [src/kernel/io/cam/spo/simport.c:579] PROBLEM: (HPXLB76FB) (Patch ID: OSF375-068) ******** When the system was under heavy load, the following group of 3 errors was logged into the error logger every few minutes: spo_verify_adap_sanity spo_misc_errors spo_bus_reset The entire system would pause for up to 30 seconds, and then resume normal operation right before each group of 3 errors above were logged. PROBLEM: (QAR 45938, CLD HPAQ11CAJ) (Patch ID: OSF375-068) ******** The system panicked with a kernel memory fault while trying to remove an spo resource queue entry: 10 panic("kernel memory fault") 11 trap() 12 _XentMM 13 spo_remove_queue_entry 14 spo_process_rsp PROBLEM: (QAR 47247) (Patch ID: OSF375-068) ******** The system panicked with: "xpt_callback: callback on freed CCB" 6 panic("xpt_callback: callback on freed CCB") 7 xpt_callback() [src/kernel/io/cam/xpt.c:2420] 8 spo_process_ccb() [src/kernel/io/cam/spo/simport.c:3548] 9 spo_process_rsp() [src/kernel/io/cam/spo/simport.c:3467] PROBLEM: (ZPOB91873, HGOBB0092) (Patch ID: OSF375-365052) ******** This patch provides additional event logging of Unit Attention messages by the SCSI/CAM disk driver to the binary.errlog file. It also provides additional details for hard errors logged after unsuccessful I/O recovery attempts, and provides informational messages on the progress of recovery events. With this patch, Unit Attention messages are logged if they are received during: a) Attempts to bring a drive online b) Sense data requests for reads and writes c) Recovery from a failed Test Unit Ready command PROBLEM: (MCPMA8909) (Patch ID: OSF375-350309) ******** When HSZ50 hardware is installed without this patch, the system can exhibit very slow performance. This happens because the HSZ50 is not defined in cam_data.c and devio.h, so the system does not take advantage of its more advanced capabilities. PROBLEM: (Patch ID: OSF375-073) ******** This provides support for TZ89, and latent support for: TZS20 TZS2 TLZ10 TLZ1 PROBLEM: (mgo102246) (Patch ID: OSF375-365036) ******** After the appearance of binary.errlog entries: sim_err_sm Target went to command phase sim94_intr Illegal command The system can hang with the following typical stack trace: 0 ss_process_timeouts() [src/kernel/io/cam/sim_sched.c:2572] 1 softclock_scan() [src/kernel/bsd/kern_clock.c:1015] 2 hardclock("") [src/kernel/bsd/kern_clock.c:839] 3 _XentInt() [src/kernel/arch/alpha/locore.s:917] 4 idle_thread() [src/kernel/kern/sched_prim.c:3009] PROBLEM: (QAR 29649) (Patch ID: OSF375-365036) ******** Fix system panic: "xpt_callback: callback on freed CCB". The panic described in this QAR was caused by a bug in the SIM94 interrupt handler where the target mode flag for a controller was being set before a previous non-target mode request was completed. PROBLEM: (QAR 56217, 57168) (Patch ID: OSF375-089, OSF375-098) ******** This patch provides the following support to Digital UNIX V3.2G: - Support the HSZ70 Raid controller on the Fast10 Wide Differential KZPSA adapter in cluster environments under V3.2G. Support of the HSZ70 Raid controller also requires the KZPSA firmware to be upgraded to at least the version distributed on the Version 5.0 AlphaServer Console Firmware CDrom. - Performance regression fix for Qlogic isp1020/isp1040 chips. - Provide SCSI target mode fixes for ASE/TCR support on QLogic, primarily for HSZ70 support. - All modifications included in this patch are compatible with existing versions of KZPSA and Qlogic firmware. PROBLEM: (QAR 51268) (Patch ID: OSF375-365068) ******** After a disk error occurs, mirror set switching may not happen soon enough to ensure high availability, or in some cases may not happen at all. The problem is that the timeout and retry mechanisms that go into action after a disk failure prevent prompt notification of LSM that there was an error. PROBLEM: (Patch ID: OSF375-095) ******** This patch supplants Patch ID: OSF375-073 using the cam_data script. The cam_data script updates /usr/sys/data/cam_data.c without deleting any customer or user added devices. It adds support for the TZ89, TLZ10, TZS20 and HSZ50 devices. WARNING: When this script is executed, it will modify the system's existing cam_data.c file. PROBLEM: (QAR 52608) (Patch ID: OSF375-112) ******** This patch fixes a problem that occurs on AlphaServer 4100 systems. If no devices are attached to the KZPSA disk controller, the system may panic when attempting to perform I/O. This patch provides a workaround to the problem by suppressing the sending of the "verify-adapter-sanity" command until a device has been attached to the disk controller. PROBLEM: (Patch ID: OSF375-141) ******** This patch provides a set of workarounds for Qlogic firmware bugs. These bugs were encountered when using the HSZ70 Raid Array Controller on the KZPBA-CB wide differential UltraSCSI adapter in a dual-node cluster environment. o Better handle sensitive error recovery sequences during HSZ70 controller failover o Handles Command Error (a mailbox error code 0x4005) without resorting to chip reinit/bus reset. o Additional workarounds for version 5.53 target-mode firmware bugs. Complete support for cluster environments also requires that the Qlogic adapter firmware version and HSZ70 Raid Array controller firmware version is at least at the level as documented in the HSZ70 Raid Controller Platform Kit. All modifications included in this patch are compatible with existing versions of Qlogic firmware. PROBLEM: (QAR 59059) (Patch ID: OSF375-161) ******** This patch fixes a problem that occurs when KZPSA and KZTSA hardware resources needed to do I/O are unavailable causing a large number of events to be logged. The system can become sluggish and sometimes crash. This problem is seen on 8400 and 4100 systems with limited hardware scatter-gather memory resources. PROBLEM: (QAR 47111 QAR 58034 QAR 60575) (Patch ID: OSF375-204) ******** The first problem occurs when a failed (hardware or driver failure) KZPSA adapter panics the kernel. You will observe this problem when the failed adapter generates a miscellaneous error continuously, eventually causing a kernel panic with messages similar to the following: KZPSA adapter misc error, asr=0x4, afar=0x4a03f158, afpr=0x20311 pzaintr: KZPSA adapter misc error, asr=0x4, afar=0x4a03f158, afpr=0x20311 This patch causes the system to check for a failed device before allocating dma resources, preventing the panic. The second problem is that the SIMPORT code returns I/O with a CAM status of "NO HBA" when a miscellaneous adapter error occurs. This CAM status is incorrect since the adapter is re-initialized. The correct CAM status for such as condition is "CAM BUSY". The "NO HBA" status is only returned when the adapter can not recover from the error (The re-initialization failed).