PROBLEM: (QAR 49188) (Patch ID: TCR141-006) ******** This fixes a problem with the Cluster Monitor (cmon) in which ASE service status and node UP/DOWN status display may have been incomplete or incorrect. It is caused by asymmetric connections between tract daemons where all daemons are not bi-directionally connected to each other. This causes ASE status reports to flow asymmetrically between cluster nodes such that the displays on two different instances of the Cluster Monitor on two nodes in the cluster show different information. Careful examination of the user.log file in the syslog hierarchy on each cluster node will often reveal that all tract daemons are not interconnected with all other tract daemons. The connect messages are printed at system boot time and so appear near the top of the user.log file. This condition can always be cleared by killing and restarting the tract daemons and Cluster Monitor (cmon) programs, then restarting the submon daemons, then after 60 seconds restarting the cmon programs. A fairly consistent way to reproduce this problem is to halt a cluster node from multi-user operation using the halt button, then reboot it. This will normally cause one or more cluster nodes to have asymmetric connections to the rebooted node's tract daemon. Again, this condition is cleared by the restart procedure described in the installation instructions. PROBLEM: (QAR 44746, QAR 39919) (Patch ID: TCR141-006) ******* These QARs both report the same problem: a complete depletion of system socket resources, the result of tractd daemons doing repeated connect retries. This problem is seen most commonly when all nodes in a three- or four-node cluster are booted simultaneously and is also fixed by the patch described as fixing problem QAR 49188. The socket depletion problem is noticed when socket-based operations such as rsh and rlogin fail on the afflicted host. In these circumstances the user.log file contains many (10's or more) tract daemon interconnection retry messages. This condition can always be cleared by killing and restarting the tract daemons and Cluster Monitor (cmon) programs, then restarting the submon daemons, then after 60 seconds restarting the cmon programs.