PROBLEM: (TKTR12614) (Patch ID: TCR150-003) ******** This patch fixes a segmentation fault that can cause ASE daemons to exit or hang. The daemon may write an error message to the daemon.log similar to the following from the ASE agent: Agent Warning: aseagent exiting on segmentation fault... The daemon then exits or hangs. In the latter case, the 'ps' command shows the daemon in the run state. PROBLEM: (EVT102532) (Patch ID: TCR150-009) ******** This patch fixes a problem in version 1.5 of the TruCluster Production Server and Available Server products where, during the start of a service, missing special device files were not being created for HSZ disks. Since the special device files did not get created, the service start would fail. PROBLEM: (MCGM21LWR) (Patch ID: TCR150-011) ******** This patch fixes a problem in the message service routines used by the daemons in TruCluster Available Server and Production Server software. When the message queue fills, the following message is entered in the daemon.log file, but the queue is not emptied: msgSvc: message queue overflow, LOST MESSAGE! From this point on, no further messages will be received. PROBLEM: (BRO101102 & STLQ45901) (Patch ID: TCR150-017) ******** This patch fixes a problem where the Host Status Monitor (asehsm) incorrectly reports a network down (HSM_NI_STATUS DOWN) if the counters for the network interface get zeroed. PROBLEM: (DEKB31190) (Patch ID: TCR150-018) ******** This patch fixes a problem that caused the asedirector to core dump if asemgr processes were modifying services from more than one node in the cluster at the same time. PROBLEM: (MCGM910WB & GOZ100924 & DEKQC0187) (Patch ID: TCR150-020) ******** This patch fixes scalability problems in the DECsafe Available Server, TruCluster Available Server and TruCluster Production Server products. The problems caused the asemgr to core dump when adding or modifying services with a large number of disks. PROBLEM: (HPAQB1Q35) (Patch ID: TCR150-023) ******** An ASE service has an ASP with prefered member set to one node and relocate to favored member when it becomes available turned on, will not return to favored member following a return to normal of the network interface. This failed to work in the case when a director is located on the favored member. PROBLEM: ( KAOQ34551) (Patch ID: TCR150-023) ******** This patch fixes a problem in which a failure of a monitored network interface in an ASE in which a non-monitored interface is still intact (ie. Memory Channel), will render the asemgr unable to contact a director located on another node. This patch also fixes a reporting status of the agent on the disconnected node as "UNKNOWN". It now reports the agent status correctly as "KNOWN". PROBLEM: (DEKB50651) (Patch ID: TCR150-024) ******** This patch fixes a problem that could cause the ASE daemons or asemgr utility to core dump with a segmentation violation. The core file that gets created has a corrupted stack, so debugging tools, like dbx, cannot properly initialize, and the core files cannot be analyzed. PROBLEM: (DEKB50651) (Patch ID: TCR150-033) ******** This patch fixes a problem where the ASE management utility, asemgr, consumes increasing amounts of memory when invoked to add several services to the database at one time. Under certain circumstances it could consume all the available memory, causing allocation failures. PROBLEM: (CA8KA0054) (Patch ID: TCR150-037) ******** The ASE agent daemon may core dump when it is opening a new channel. A message similar to the following will appear in the daemon.log: Oct 14 22:38:17 csgmc1 ASE: local Agent Warning: aseagent exiting on segmentation fault... An example stack trace is as follows: DBX> t 0 __kill() 1 (unknown)() 2 __tis_raise() 3 raise() 4 abort() 5 segFault() 6 strcpy() > 7 cb_channelClosed() 8 msgSvcCloseChannel() 9 serviceSelect() 10 msgSvc_function() 11 send_to_instance() 12 process_external_events() 13 msgSvcLoop() 14 main() PROBLEM: (TKTB13177) (Patch ID: TCR150-051) ******** This patch fixes a problem that would fail to recognize HOST_DISC as an up and running state. PROBLEM: (BCSM51M5N) (Patch ID: TCR150-032) ******** This patch fixes a problem where ASE creates temporary files in /tmp during some sesrvice modifications that are not being properly cleaned up. It also fixes the unnecessary use of a temporary file to read the value of a sysconfigtab variable. PROBLEM: (HGOQA0523) (Patch ID: TCR150-043) ******** This patch fixes a problem that can cause the asemgr utility to core dump when modifying services that contain a large number of disks. PROBLEM: (IPMT CFS.57354) (Patch ID: TCR150-005) ******** This patch fixes a problem in the ASE API shared library that can cause Networker to core dump if there are no services defined in an ASE. PROBLEM: (DMO100401) (Patch ID: TCR150-038) ******** This patch fixes a problem than can cause applications, like Networker, which use the shared library: libaseapi.so, to core dump when trying to get the cluster name. PROBLEM: (UVO106244) (Patch ID: TCR150-048) ******** This patch fixes a problem in the ASE API shared library (libaseapi.so) that could cause Networker to core dump. PROBLEM: (BCGM41G5B) (Patch ID: TCR150-056) ******** This patch fixes a situation in which an aseagent could start to loop and the ASE service would not be sucessful relocated. The problem was only seen in a rare sitatution with many devices and extremely long start/stop scripts. PROBLEM: (QAR 72592) (Patch ID: TCR150-061) ******** This patch corrects a problem in asemgr. Asemgr dumped core when multiple services were added in a single asemgr session. PROBLEM: (TKTB13177, QAR 68973) (Patch ID: TCR150-060) ******** The primary problem fixed by this patch is when the last monitored network fails on a node running both the asedirector and a service, the asedirector and service fail to relocate over to another node. Other problems fixed were the following: - HSM Bug: Invalid mesg type 6" error when sending an ASE_PRINT_STATUS message. - Extraneous "xid_send: write failed" errors appearing in daemon.log. - Improved messages coming from agent and director for situations in which a director will not start. PROBLEM: (UVO106363, UVO106491) (Patch ID: TCR150-062) ******** This patch corrects the following problems: o Initializing Agents fail to respond to RPC calls from other ASE daemons. o Daemons hanging in select, while messages waiting for service go undelivered. o ASE menu options were added to set DRD permissions, owner and group. o Fixed one case in which Member Add failed. o Changed HSM_INQ_TIMEOUT in the director control library to 30 seconds. o Set the HSM_HOST_LIST timeout in agent consistently. o Two problems are fixed in the asedirector. The first is an ASE command timeout problem encountered by large ASE services. The second is an incorrect decision made by the asedirector as a result of a failed inquire services command. - The asedirector has static timeout values for some commands to aseagent processes. Certain commands take much longer to complete because they involve inquiring about large services. As a result, large ASE services can encounter a premature timeout on the following commands, as show in the daemon.log: ASE_STOP_ALL ASE_DELETE_ALL ASE_ADD_ALL ASE_DELETE_MEMBER ASE_REVERT_DB This patch modifies these command timeouts relative to the ASE service size eliminating the premature timeout. - When the asedirector inquires about the status of services on a member, the command or the service inquiry can fail. In this rare situation the director can incorrectly start a duplicate instance of a service on another node. The problem is fixed in a way that if the asedirector receives a failure, it assumes the service is running on that member. It then either stops that service before restarting it, or does not start it until it gets a correct response. The most common reason for this problem is a timeout from the aseagent of a check action script for a service. A less likely reason an asedirector timeout on the ASE_INQ_SERVICES command. Both timeouts can be found in the daemon.log file. PROBLEM: (BRO101372) (Patch ID: TCR150-063) ******** This patch fixes a problem that can cause the asemgr utility to core dump during the modification of a service when the IP addresses of the member systems have been changed. PROBLEM: (HPAQ615S1) (Patch ID: TCR150-063) ******** This patch fixes a problem that caused the ASE daemons to core dump when an internet scanning tool (like SATAN) was used to scan TCP ports on a member system. If the scan was done by a system that was not known to the member system, the daemons would core dump while trying to report a security breach. PROBLEM: (NONE) (Patch ID: TCR150-064) ******** This patch improves startup performance of start scripts. The improvement is most likely not seen in small TCR environments. In large environments with many services it will reduce the number of system calls to start the services. PROBLEM: (UVO106551) (Patch ID: TCR150-068) ******** This patch corrects a problem in which a member add will fail in a large ASE environment. PROBLEM: (HPAQ820WB,MGO47512VNO03879A,BRO91702A) (Patch ID: TCR150-071) ******** This patch corrects a problem with Networker displaying garbage characters following the service name. If there exists a service name that is at least 8 characters in size, doing a "save" in Networker will cause an error similar to the following: save: SYSTEM error, 'dataservM-@M-^?^C' is not a registered client The service name in this case is "dataserv". PROBLEM: (BCSM81P5G) (Patch ID: TCR150-073) ******** This patch corrects a problem which causes asemgr to core dump with a "Segmentation fault" when modifying a single drd service to add more than 200 devices. The following is a representative stack trace: DBX> t > 0 db_free_DB(0x1200386fc, 0x1419aaa00, 0x120020e50, 0x14002e668, 0x1400078c0) [0x120038380] 1 db_modify_group(0x11fffd7106d, 0x14000caf8, 0x11fffe190, 0x11fffe090, 0x0) ["../../../../../../src/usr/sbin/ase/asemgr/db_edit.c":2992, 0x120020e4c] PROBLEM: (BCGM814X8) (Patch ID: TCR150-075) ******** This patch corrects a problem with TruCluster Available Server or Production Server cluster in which services have been started with elevated priority and scheduling algorithm. Under significant load this could lead to intermittent network and cluster problems. In order to see this behavior, the customer would have to have changed the PRIOPT variable in /sbin/init.d/asemember from "-p hsm" to "-p all" so that the aseagent would be started with round-robin scheduling policy. PROBLEM: ('EVT102865, QAR74640') (Patch ID: TCR150-076) ******** This patch fixes a problem which caused a service not to start when there was a short network failure. asemgr would report the service as running on one node, but the service was just stopped and never restarted or relocated. This was seen only with long running stop scripts (i.e. Oracle shutdown) and with multiple network interfaces configured. The ni_status_awk had to be modified so that a failure of a single network adapter caused a service relocation. This patch makes sure that the services gets relocated or restarted based on the configured ASP. PROBLEM: (BCPM91MP9) (Patch ID: TCR150-077) ******** This patch fixes a bug where ASE picks up an extra socket after failover. PROBLEM: (DEKQC0014) (PATCH ID:) ******** This patch corrects a problem which causes an aseagent to hang when restarting the ASE member. The other member will see the asemgr hang for and extended period of time while the member is starting up. PROBLEM: (VNO41809A, MGO104006, VNO29021A) (PATCH ID: TCR150-081) ******** This patch fixes the following TCR problems: - After error events are processed, a timing hole exists whereby important events can be lost. - After a HSZ controller failure, SCSI device reservations could get lost because the error events are not being ordered properly. PROBLEM: (TBD) (PATCH ID: TCR150-084) ******** This patch corrects a problem when modifying a service will fail under the following conditions: - The service is a DRD service. - You are adding more DRDs. - The system-wide per-process data size is too small A message similar to the following will be seen in the daemon.log: Mar 6 14:58:22 mulder ASE: muldermc AseMgr ***ALERT: Could not malloc Mar 6 14:58:22 mulder ASE: muldermc AseMgr Error: Out of memory loading Database. Mar 6 14:58:22 mulder ASE: muldermc AseMgr ***ALERT: BUG NOTICE: Exit before finishing unmarshal_tree The asemgr may also core dump. PROBLEM: (73570) (PATCH ID: TCR150-087) ******** This patch fixes a problem that caused the asemgr utility to not run when called from a program that is owned by root and has the setuid bit turned on. PROBLEM: (KAOQ34551) (PATCH ID: TCR150-091) ******** If a network cable failure on a monitored network is corrected in less than 7 seconds, the services could be left in a state where ASE says they are running, but they really aren't. PROBLEM: (HPAQ21V5C) (PATCH ID: TCR150-092) ******** This patch fixes a problem that caused the asemgr to get a memory fault when adding multiple services in a row. PROBLEM: (70238) (PATCH ID: TCR150-100) ******** This patch fixes a problem where timeout values of greater than 30 seconds in /etc/hsm.conf would cause ASE agent to fail at start up. PROBLEM: (EVT189463, N/A) (PATCH ID: TCR150-099) ******** This patch fixes two similar bugs. The first is that when one member of a cluster is brought up with ASE off, other members report it as UP and RUNNING instead of UP and UNKNOWN. The second is that when a restricted service is running on a member, and 'asemember stop' and 'aseam stop' is executed, the service status is still reported as the member name, instead of Unassigned. PROBLEM: (74383) (PATCH ID: TCR150-096) ******** This patch fixes a problem that caused the asemgr to report that a disk, or mount point, was in multiple services when modifying a service name. PROBLEM: (HPAQ40728, N/A) (PATCH ID: TCR150-095) ******** This patch fixes a bug where the aseagent will occasionally core dump on a SCSI bus hang. While there is no expected behavior during a hang, the aseagent should not core dump. PROBLEM: (NL_G00910) (PATCH ID: TCR150-101) ******** This patch fixes a problem with the ASE application from reporting an incorrect status while booting, after installation or while re-initializing the database. The MEMBER_STATE reported by the user-defined action scripts will always report BOOTING after first installing ASE or re-initializing the database until a reboot or asemember stop and start was completed.