PROBLEM: (89313, GB_G01781) (PATCH ID: TCR520-013)
********
This patch fixes a situation in which one or several cluster members
would panic if a Memory Channel cable was removed or faulty.


PROBLEM: (90195) (PATCH ID: TCR520-055)
********
This patch fixes the following problems with Memory Channel in a
cluster environment:
         - a problem with the Memory Channel power off in
           LAN interconnect cluster which causes a cluster wide panic,
         - a user is now allow to kill a LAN interconnect cluster via 
           Memory Channel,
         - supports Memory Channel usage in a LAN cluster.


PROBLEM: (84876, 87656) (PATCH ID: TCR520-106)
********
This patch fixes when the master failover node goes offline during a failover 
and failing over due to parity errors increasing beyond the limit.
Some symtoms of the master failover node going offline:

One node in the cluster panics, and the other nodes hang.  The reason
for the panic may be anything, but it is important to note that the
cluster was in a failover during the panic.  This can be seen in the
crash dump:

rmerror_int: failover: mchan0 error_type = 0xe0000000 error_count = 0xba 
           time = 0x17e573c50 mcerr = 0x12020248 lcsr = 0xc07b 
            mcport = 0x16400000

The nodes that hang may display:

m_state_change: mchan0 slot 0 offline
rm slave: mchan0, hubslot = 1, phys_rail 1 removed
rm slave: mchan0, hubslot = 1, phys_rail 1 (size 512 MB)

depending on the timing and where in the code path the master was when 
he failed.  If the other nodes are not reset before the paniced node is 
rebooted they may panic.  Those panics can misleading and range from: 

"panic (cpu 1): ics_unable_to_make_progress: input thread stalled" 

to a machine check.    

Some symtoms of the parity error limit being exceeded:

This is more difficult to diagnose, and if any of the following panics
are seen this may be the cause:

1. On one node: "panic (cpu 0): simple_lock: time limit exceeded", and on 
   one or more of the other nodes: PANIC: "ics_mct: Node arrival with node 
   in bad state"
2. PANIC: "cmn_err: CE_PANIC: ICS MCT Assertion failed: 
   total_fragments == ectx"
3. PANIC: "cmn_err: CE_PANIC: ICS MCT Assertion failed: lf != 0,file: 
   ics_mct_oolencoder.c"
4. panic (cpu 1): kernel memory fault


PROBLEM: (IT_G03453) (PATCH ID: TCR520-132)
********
This fix addresses a problem in which a bad Memory Channel cable causes a 
cluster member to panic with a panic string of "rm_eh_init" or 
"rm_eh_init_prail". The problem occurs in dual-rail Memory Channel cluster
configurations after a cable problem causes the cluster to mark the first rail
as bad and fail over to the second rail.  If the Memory Channel code later 
decides to mark the first rail as okay and additional cable problems occur, a 
cluster member may panic as described above.


PROBLEM: (92909) (PATCH ID: TCR520-152)
********
This patch contains changes that should make Memory Channel failovers work
better.  It will also handle bad optical cables.  The symptoms of bad optical
cables are an impossible number of state change interrupts or out of bounds
hubslot identification numbers being passed to the state change interrupt 
handler.


PROBLEM: (92318) (PATCH ID: TCR520-134)
********
This patch fixes a problem in which a node booting into a cluster hangs
during Memory Channel initialization.  This problem may occur in a
heavily loaded cluster when logical rail threads associated with the
Memory Channel logical rails are blocked while a member is booting.  There
will be no deterministic console message.  The cluster will get stuck
during the booting.  It may happen, as we have seen in the past, that the
cluster gets stuck here:
rm slave: mchan0, hubslot = 7, phys_rail 0 (size 512 MB)
rm slave: mchan1, hubslot = 7, phys_rail 1 (size 512 MB)
rm slave: log_rail 0 (size 512 MB), phys_rail 1 (mchan1)


PROBLEM: (95004) (PATCH ID: TCR520-218)
********
In a dual rail memory channel cluster, when one initiates failover, another
node (typically a TLASER) may crash with a KERNEL MEMORY FAULT panic.
   0 boot
   1 panic
   2 trap
   3 _XentMM
   4 rm_get_lock_master
   5 rm_error_cluster_sync
   6 rm_slave_failover
   7 rm_failover_request_int
   8 rm_prail_int
   9 rm_int
  10 Mchan_isr
  11 intr_dispatch_post
  12 _XentInt


PROBLEM: (93962) (PATCH ID: TCR520-180)
********
When parity errors increase beyond the error raet threshold, in a single 
physical rail configuration, the Memory Channel driver will flag the 
rail as 'noisy' and attempt to failover.  In the single rail configuration
there exists no failover rail, and this action causes the entire cluster
to panic.  This fix panics the node whose error has exceeded the threshold.


PROBLEM: (94360) (PATCH ID: TCR520-181)
********
The Memory Channel driver leaves stale data on an offline physical rail.  If
a node is rebooted while this physical rail is offline, and then the physical
rail comes back online the rebooted node will panic with:
panic (cpu 1): memory channel - cluster still thinks node is member
To allow this node to join the cluster requires a cluster reboot.


PROBLEM: (95052) (PATCH ID: TCR520-233)
********
A debug kernel [built non-optimized] can panic with a kernel memory fault:
   0 panic
   1 trap
   2 _XentMM
   3 rm_notif_request
   4 rm_lrail_int_ctx
   5 rm_int
   6 Mchan_isr
   7 intr_dispatch_no_post
   8 _XentInt


PROBLEM: (95794) (PATCH ID: TCR520-253)
********
An error in event log indexing may result in the appearance of superfluous
"rm_event, index too big" messages on the system console.


PROBLEM: (92102, 94878, 94910, 94988,
          95476, 95669) (PATCH ID: TCR520-208)
********
In a memory channel cluster, rebooting a node without performing a hardware
reset an crash other members with a RM_AUDIT_ACK_BLOCK panic.
   0 boot
   1 panic
   2 rm_crash_node_mask
   3 rm_panic
   4 rm_audit_ack_block
   5 rm_write_sync
   6 rm_get_errcnt_lock
   7 rm_lock_global_error
   8 rm_eh_init_shared_data_req
   9 rm_prail_int
  10 rm_int
  11 Mchan_isr
  12 _XentInt


PROBLEM: (94391, 89889) (PATCH ID: TCR520-225)
********
This fixes issues associated with the initialization of the Memory Channel
driver.  The fix addresses an issue whereby the driver's internal data
structures become inconsistent during boot, and this inconsistency is 
subsequently propagated to other nodes.   It also adds resiliency during boot.  
Some of the symptoms the fix addresses are listed below.
PANIC: "rm_prail_boot_signal_any_node:Fix configuration and reboot"
PANIC: "ics_mct: Error from establish_RM_notification_channel"
PANIC: "ics_mct: Error from register_RM_notification_callback"
PANIC: "ics_mct: Error from send_RM_notification"