PROBLEM: (82061) (PATCH ID: TCR510-002) ******** This patch fixes an occasional cluster hang which can occur after a Memory Channel error. PROBLEM: (81500) (PATCH ID: TCR510-003) ******** This patch fixes a kernel memory fault which occurs in the ics_mct_ring_recv() routine. The kernel memory fault would be seen when a node is booting into the cluster, and can occur on the booting node or on another node. PROBLEM: (GB_G01045, N/A) (PATCH ID: TCR510-023) ******** This patch fixes a problem in ICS where ring_recv() doesn't properly handle a change in channel numbers. The fix will, in turn, improve validation of the connection structure on node joins. PROBLEM: (TKTB10063) (PATCH ID: TCR510-042) ******** This patch fixes the way communication errors occur on clusters such that a down node will not declare all other nodes dead. It will also ensure that communication errors occur in a timely fashion. PROBLEM: (87806) (PATCH ID: TCR510-039) ******** The patch fixes the problem that causes a panic with error message "CNX QDISK: Yielding to foreign owner with quorum" caused by a long running thread, ICS/MCT receive thread, which defers other kernel threads from accessing the CPU. PROBLEM: (BCGM8141P, BCGMC0CNQ, GOZ12089B, MGO30400A, ZUO44230A, UVO8654046, GB_G00916) (PATCH ID: TCR510-018) ******** This patch eliminates unnecessary rail failovers in vhub configurations and removes rmerror_int diagnostic messages. PROBLEM: (85969) (PATCH ID: TCR510-028) ******** This patch fixes an issue which causes all cluster nodes to hang or panic if a wildfire is halted via the halt button. A typical panic string would be: panic: mcs_lock: time limit exceeded PROBLEM: (87049, 85012, 86195) (PATCH ID: TCR510-052) ******** This patch fixes a panic that is caused in a clustered environment that has the following error message: rm_request_on_bad_prail PROBLEM: (HPAQ517QB, BCGM526NS) (PATCH ID: TCR510-043) ******** This patch prevents a "ics_mct: Error from establish_RM_notification_channel" panic on clusters. A typcial stack trace will look like: (dbx) t > 0 stop_secondary_cpu(do_lwc = (unallocated - symbol optimized away)) ["../../../../src/kernel/arch/alpha/cpu.c":1205, 0xfffffc00] 1 panic(s = (unallocated - symbol optimized away)) ["../../../../src/kernel/bsd/subr_prf.c":1252, 0xfffffc0000294674] 2 event_timeout(func = (unallocated - symbol optimized away), arg = (unallocated - symbol optimized away), timeout = (unallocate] 3 printf(fmt = (unallocated - symbol optimized away)) ["../../../../src/kernel/bsd/subr_prf.c":940, 0xfffffc0000293a28] 4 panic(s = (unallocated - symbol optimized away)) ["../../../../src/kernel/bsd/subr_prf.c":1309, 0xfffffc00002947a8] 5 ics_mct_neg_vers(ni_ptr = 0xfffffc005e806480, thd_info = 0xfffffc003cfd5b00, ver = 0x0) ["../../../../src/kernel/tnc_common/tn] 6 ics_mct_constr(nodeinfo_ptr = 0xfffffc005e806480) ["../../../../src/kernel/tnc_common/tnc_ics/ics_mct/ics_mct_llmgmt.c":607, 0] PROBLEM: (QARs, 89757) (PATCH ID: TCR510-095) ******** This patch is a four fold fix. It fixes situations when a physical MC rail goes offline, when the master failover node goes offline during a failover, how ics handles the resend situation when MC errors take place, and failing over due to parity errors increasing beyond the limit. PROBLEM: (86307, 87354, 88058) (PATCH ID: TCR510-107) ******** This patch fixes hangs and increases performance of memory channel ICS operation. The hang occurs by running out of memory channel space or by allocating MC space for large messages. The performance problem sometimes led to time outs.