PROBLEM: (UVO106363) (Patch ID: TCR160-002) ******** This patch fixes two problems in the asedirector. The first is an ASE command timeout problem encountered by large ASE services. The second is an incorrect decision made by the asedirector as a result of a failed inquire services command. 1) The asedirector has static timeout values for some commands to aseagent processes. Certain commands take a much longer time to complete because they involve inquiring about large services. As a result, large ASE services could encounter a premature timeout on the following commands, as show in the daemon.log: ASE_STOP_ALL, ASE_DELETE_ALL, ASE_ADD_ALL, ASE_DELETE_MEMBER, ASE_REVERT_DB. This patch modifies these command timeouts relative to the ASE service size eliminating this premature timeout. 2) When the asedirector inquires about the status of services on a member, the command or the service inquiry may fail. In this rare situation the director may incorrectly start a duplicate instance of a service on another node. This patch fixes this problem such that if the director receives a failure, it will assume the service is running on that member. It will then either stop that service before restarting it, or not start it at all until it gets a correct response. The most common reason for this problem to occur is due to a timeout from the aseagent of a check action script for a service. A less probable reason to occur is due to an asedirector timeout on the ASE_INQ_SERVICES command. Both timeouts can be found in the daemon.log file. PROBLEM: (NONE) (Patch ID: TCR160-009) ******** This patch improves startup performance of start scripts. The improvement is most likely not seen in small TCR environments. In large environments with many services it will reduce the number of system calls to start the services. PROBLEM (BRO101102, STLQ45901) (Patch ID: TCR160-016) This patch fixes a problem where the Host Status Monitor (asehsm) incorrectly reports a network down (HSM_NI_STATUS DOWN) if the counters for the network interface get zeroed. PROBLEM: (BCGM41G5B) (Patch ID: TCR160-007) ******** This patch fixes a situation in which an aseagent could start to loop and the ASE service would not be sucessful relocated. The problem was only seen in a rare sitatution with many devices and extremely long start/stop scripts. PROBLEM: (UVO106551) (Patch ID: TCR160-021) ******** This patch corrects a problem in which a member add will fail in a large ASE environment. PROBLEM: (BCGM814X8) (Patch ID: TCR160-024) ******** This patch corrects a problem with TruCluster Available Server or Production Server cluster in which services have been started with elevated priority and scheduling algorithm. Under significant load this could lead to intermittent network and cluster problems. In order to see this behavior, the customer would have to have changed the PRIOPT variable in /sbin/init.d/asemember from "-p hsm" to "-p all" so that the aseagent would be started with round-robin scheduling policy. PROBLEM: ('EVT102865, QAR74640') (Patch ID: TCR160-025) ******** This patch fixes a problem which caused a service not to start when there was a short network failure. asemgr would report the service as running on one node, but the service was just stopped and never restarted or relocated. This was seen only with long running stop scripts (i.e. Oracle shutdown) and with multiple network interfaces configured. The ni_status_awk had to be modified so that a failure of a single network adapter caused a service relocation. This patch makes sure that the services gets relocated or restarted based on the configured ASP. PROBLEM: (BCSM81P5G) (Patch ID: TCR160-022) ******** This patch corrects a problem which causes asemgr to core dump with a "Segmentation fault" when modifying a single drd service to add more than 200 devices. The following is a representative stack trace: DBX> t > 0 db_free_DB(0x1200386fc, 0x1419aaa00, 0x120020e50, 0x14002e668, 0x1400078c0) [0x120038380] 1 db_modify_group(0x11fffd7106d, 0x14000caf8, 0x11fffe190, 0x11fffe090, 0x0) ["../../../../../../src/usr/sbin/ase/asemgr/db_edit.c":2992, 0x120020e4c] PROBLEM: (76524) (PATCH ID: TCR160-031) ******** This patch fixes an aseagent and asehsm segmentation fault in multi volume NFS or Disk service configurations. The problem has been seen during ASE startup. PROBLEM: (69889) (PATCH ID: TCR160-033) ******** This patch fixes a problem where the asemgr will hang as it continuously create and kill multiple directors The problem occurs because the message queue management assumes that messages are enqueued in the same order as the events they report. The fix is to discard the message if it has been dequeued after the host from which it came from goes down and if the associated channel structure has already been freed. PROBLEM: (70239) (PATCH ID: TCR160-035) ******** This patch corrects a problem that causes the ASE director to core dump during initialization. PROBLEM: (TBD) (PATCH ID: TCR160-042) ******** This patch corrects a problem where modifying a service will fail under the following conditions: - The service is a DRD service. - You are adding more DRDs. - The system-wide per-process data size is too small A message similar to the following will be seen in the daemon.log: Mar 6 14:58:22 mulder ASE: muldermc AseMgr ***ALERT: Could not malloc Mar 6 14:58:22 mulder ASE: muldermc AseMgr Error: Out of memory loading Database. Mar 6 14:58:22 mulder ASE: muldermc AseMgr ***ALERT: BUG NOTICE: Exit before finishing unmarshal_tree The asemgr may also core dump. PROBLEM: (BCPMA2116, N/A) (PATCH ID: TCR160-043) ******** This patch fixes a problem where the MEMBER_STATE variable always is shown as BOOTING instead of RUNNING. After first installing TCR, there is no way to have scripts know the MEMBER_STATE. This problem is cleared on a reboot. PROBLEM: (KAOQ34551) (PATCH ID: TCR160-051) ******** If a network cable failure on a monitored network is corrected in less than 7 seconds, the services could be left in a state where ASE says they are running, but they really aren't. PROBLEM: (HPAQ21V5C) (PATCH ID: TCR160-053) ******** This patch fixes a problem that caused the asemgr to get a memory fault when adding multiple services in a row. PROBLEM: (71424) (PATCH ID: TCR160-036) ******** This patch fixes a problem with extraneous compiler warnings about strdup() function calls from ASE. PROBLEM: (73570) (PATCH ID: TCR160-047) ******** This patch fixes a problem that caused the asemgr utility to not run when called from a program that is owned by root and has the setuid bit turned on. PROBLEM: (BCSM81HH7) (PATCH ID: TCR160-028) ******** This patch fixes a problem that can cause the Cluster MIB daemon (cnxmibd) to core dump in Available Server environments. PROBLEM: (BCGM31BBV1) (PATCH ID: TCR160-052) ******** This patch fixes a problem which caused an error message to be logged for the cnxmibd even though no error had occurred. The error message was: **ERROR cnxmib_mthd.c line 855: initDirectorCntl failed! PROBLEM: (EVT256426, N/A) (PATCH ID: TCR160-065) ******** This patch fixes two similar bugs. The first is that when one member of a cluster is brought up with ASE off, other members report it as UP and RUNNING instead of UP and UNKNOWN. The second is that when a restricted service is running on a member, and 'asemember stop' and 'aseam stop' is executed, the service status is still reported as the member name, instead of Unassigned. PROBLEM: (70238) (PATCH ID: TCR160-066) ******** This patch fixes a problem where timeout values of greater than 30 seconds in /etc/hsm.conf would cause ASE agent to fail at start up. PROBLEM: (HPAQ40728, N/A) (PATCH ID: TCR160-058) ******** This patch fixes a bug where the aseagent will occasionally core dump on a SCSI bus hang. While there is no expected behavior during a hang, the aseagent should not core dump. PROBLEM: (74548) (PATCH ID: TCR160-060) ******** This patch fixes a problem that caused the asemgr to report the wrong status for a service. When a device path failure occurs, the ASE director attempts to restart the service on another member system. If the director can't find another member that can run the service, it reported an error, but did not change the status. This meant that the asemgr would report that the service was still running on the original member, when in fact the status should have been "unassigned." PROBLEM: (61554) (PATCH ID: TCR160-054) ******** This patch fixes three problems with the clu_ivp script. The script now checks to be sure that the cluster members are listed in the /etc/hosts file, and it no longer copies /var/adm/messages to /tmp. Copying the messages file to /tmp could result in the filesystem becoming full, and clu_ivp exiting with an error. The clu_ivp script now also checks the /var/adm/messages file for shared busses if none are listed in the con- figuration file. PROBLEM: (69874) (PATCH ID: TCR160-057) ******** This patch fixes a problem that could cause the asedirector to core dump. PROBLEM: (74383) (PATCH ID: TCR160-059) ******** This patch fixes a problem that caused the asemgr to report that a disk, or mount point, was in multiple services when modifying a service name.