PROBLEM: (TKTR12614) (Patch ID: TCR150-003) ******** This patch fixes a segmentation fault that can cause ASE daemons to exit or hang. The daemon may write an error message to the daemon.log similar to the following from the ASE agent: Agent Warning: aseagent exiting on segmentation fault... The daemon then exits or hangs. In the latter case, the 'ps' command shows the daemon in the run state. PROBLEM: (EVT102532) (Patch ID: TCR150-009) ******** This patch fixes a problem in version 1.5 of the TruCluster Production Server and Available Server products where, during the start of a service, missing special device files were not being created for HSZ disks. Since the special device files did not get created, the service start would fail. PROBLEM: (MCGM21LWR) (Patch ID: TCR150-011) ******** This patch fixes a problem in the message service routines used by the daemons in TruCluster Available Server and Production Server software. When the message queue fills, the following message is entered in the daemon.log file, but the queue is not emptied: msgSvc: message queue overflow, LOST MESSAGE! From this point on, no further messages will be received. PROBLEM: (BRO101102 & STLQ45901) (Patch ID: TCR150-017) ******** This patch fixes a problem where the Host Status Monitor (asehsm) incorrectly reports a network down (HSM_NI_STATUS DOWN) if the counters for the network interface get zeroed. PROBLEM: (DEKB31190) (Patch ID: TCR150-018) ******** This patch fixes a problem that caused the asedirector to core dump if asemgr processes were modifying services from more than one node in the cluster at the same time. PROBLEM: (MCGM910WB & GOZ100924 & DEKQC0187) (Patch ID: TCR150-020) ******** This patch fixes scalability problems in the DECsafe Available Server, TruCluster Available Server and TruCluster Production Server products. The problems caused the asemgr to core dump when adding or modifying services with a large number of disks. PROBLEM: (HPAQB1Q35) (Patch ID: TCR150-023) ******** An ASE service has an ASP with prefered member set to one node and relocate to favored member when it becomes available turned on, will not return to favored member following a return to normal of the network interface. This failed to work in the case when a director is located on the favored member. PROBLEM: ( KAOQ34551) (Patch ID: TCR150-023) ******** This patch fixes a problem in which a failure of a monitored network interface in an ASE in which a non-monitored interface is still intact (ie. Memory Channel), will render the asemgr unable to contact a director located on another node. This patch also fixes a reporting status of the agent on the disconnected node as "UNKNOWN". It now reports the agent status correctly as "KNOWN". PROBLEM: (DEKB50651) (Patch ID: TCR150-024) ******** This patch fixes a problem that could cause the ASE daemons or asemgr utility to core dump with a segmentation violation. The core file that gets created has a corrupted stack, so debugging tools, like dbx, cannot properly initialize, and the core files cannot be analyzed. PROBLEM: (QAR 62571) (Patch ID: TCR150-027) ******** An ASE environment in which a service modification is performed while, and the same time, another process is attempting to obtain service status using asemgr could potentially corrupt the ASE Configuration Data base for that service. The results of the data base corruption is a loss of LSM and physical disk information in the data base for that service. This will be seen as problems with relocating the service and "grep: can't open" messages when checking status of the service. PROBLEM: (UVO106363, UVO106491) (Patch ID: TCR150-062) ******** This patch corrects the following problems: o Initializing Agents fail to respond to RPC calls from other ASE daemons. o Daemons hanging in select, while messages waiting for service go undelivered. o ASE menu options were added to set DRD permissions, owner and group. o Fixed one case in which Member Add failed. o Changed HSM_INQ_TIMEOUT in the director control library to 30 seconds. o Set the HSM_HOST_LIST timeout in agent consistently. o Two problems are fixed in the asedirector. The first is an ASE command timeout problem encountered by large ASE services. The second is an incorrect decision made by the asedirector as a result of a failed inquire services command. - The asedirector has static timeout values for some commands to aseagent processes. Certain commands take much longer to complete because they involve inquiring about large services. As a result, large ASE services can encounter a premature timeout on the following commands, as show in the daemon.log: ASE_STOP_ALL ASE_DELETE_ALL ASE_ADD_ALL ASE_DELETE_MEMBER ASE_REVERT_DB This patch modifies these command timeouts relative to the ASE service size eliminating the premature timeout. - When the asedirector inquires about the status of services on a member, the command or the service inquiry can fail. In this rare situation the director can incorrectly start a duplicate instance of a service on another node. The problem is fixed in a way that if the asedirector receives a failure, it assumes the service is running on that member. It then either stops that service before restarting it, or does not start it until it gets a correct response. The most common reason for this problem is a timeout from the aseagent of a check action script for a service. A less likely reason an asedirector timeout on the ASE_INQ_SERVICES command. Both timeouts can be found in the daemon.log file.