
PROBLEM:  (CFS.68084)    (Patch ID: TCR160-029)
********
If two nodes in a cluster are communicating using the mc-api, for example
running an MPI application, and a third node, not involved in the
calculation, is re-booted then the first two nodes can hang requiring a
re-boot to resolve the hang.

When a node enters or leaves the cluster (boots or shutsdown) the Memory
Channel failover code is invoked. The problem was that the failover code
was attempting to take a high level lock which under certain circumstances
was held by another process and so the mc-api failover code would
wait forever in kernel mode resulting in a system hang. The problem was
fixed by changing mc-api failover take the lower level lock.

PROBLEM:  (76608, 76806) (PATCH ID: TCR160-050)
********


This patch fixes a problem that can cause a panic in mcs_wait_cluster_event()
when using the Memory Channel API.  The following is an example stack trace
from a lockmode=4 panic:

  0 boot()
  1 panic()
  2 simple_lock_fault()
  3 simple_unlock_count_violation()
  4 mcs_wait_cluster_event()
  5 mcs_configure()
  6 kmodcall()
  7 syscall()
  8 _Xsyscall()

PROBLEM: (CFS.76473) (PATCH ID: TCR160-064)
********
This patch fixes a problem with the Memory Channel API whereby a node
crashes holding an mc-api lock, under certain circumstances the lock
will not be released after the node crashes. For the problem to occur there
must be 3 or more nodes in the cluster and the node handling the cleanup
after a node crashes (known as the primary mapper) does not have the lock
allocated.



