3    Summary of TruCluster Software Patches

This chapter summarizes the TruCluster software patches included in Patch Kit-0002.

Table 3-1 lists patches that have been updated.

Table 3-2 provides a summary of patches..

Table 3-1:  Updated TruCluster Software Patches

Patch IDs Change Summary
Patches 67.00, 72.00, 76.00 New
Patches 2.00, 9.00, 10.00, 5.00, 13.00, 16.00, 17.00, 14.00, 29.00, 31.00, 38.00, 39.00, 47.00, 21.00, 49.00, 32.00, 43.00, 27.00, 48.00, 52.00, 53.00, 54.00, 55.00, 56.00, 57.00 Superseded by Patch 59.00
Patches 22.00, 23.00, 24.00, 25.00, 50.00, 51.00 Superseded by Patch 61.00
Patches 20.00, 46.00 Superseded by Patch 63.00
Patch 3.00 Superseded byPatch 65.00
Patches 11.00, 19.00, 26.00, 68.00 Superseded by Patch 70.00
Patches 41.00, 44.00, 45.00 Superseded by Patch 74.00

Table 3-2:  Summary of TruCluster Patches

Patch IDs Abstract

Patch 4.00

TCR160-004

Patch: Fix for Kernel Memory Fault On DRD Client Nodes

State: Existing

This patch fixes a kernel memory fault on the DRD client nodes just as or after the DRD server node has initiated MC2 hub failover.

Patch 7.00

TCR160-010

Patch: Fix for Reliable Datagram API

State: Supersedes patch TCR160-001 (1.00)

This patch corrects the following:

  • Reliable Datagram (RDG) messaging support.

  • RDG: bug fix to the completion queue synchronization protocol.

Patch 8.00

TCR160-011

Patch: doconfig may hang when running in TruCluster environment

State: Existing

This patch fixes two problems that could cause doconfig to appear to hang when running in a TruCluster environment.

Patch 12.00

TCR160-018

Patch: Fixes problem with Networker displaying characters

State: Existing

This patch corrects a problem with Networker displaying garbage characters following service names. It occurs when the service name is 8 characters or greater.

Patch 30.00

TCR160-034

Patch: Fix for boot failure on a cluster

State: Existing

This patch fixes a problem which caused a boot failure on a cluster with a large number of shared SCSI buses.

Patch 33.00

TCR160-037

Patch: Fix for drdadmin problems

State: Existing

This patch fixes various problems with drdadmin to be user friendly.

Patch 34.00

TCR160-038

Patch: Fixes a limitation in ase_reconfig_bus

State: Existing

This patch fixes a limitation in ase_reconfig_bus. Now up to 99 buses can be reconfigured with this command.

Patch 35.00

TCR160-039

Patch: LSM disk information not updated in ASE database

State: Supersedes patch TCR160-030 (28.00)

This patch corrects the following:

  • Fixes a problem that would cause an error from awk(1) when modifying an ASE service that contained a large number of LSM volumes. The error would prevent the service from being properly modified.

  • Fixes a problem where LSM disk information was not properly updated in the ASE database when volumes were removed from a disk service.

Patch 36.00

TCR160-040

Patch: Fix for asedirector hang

State: Existing

This patch fixes a problem that could cause an NFS or Disk Service that has a hyphen (-) in the service name to end up unassigned after a disk failure. A side effect of the problem was that the asedirector would hang after the disk failure was corrected.

Patch 37.00

TCR160-041

Patch: clu_ivp does not recognize Emulex adapter

State: Existing

This patch fixes a problem where the Emulex Fibre Channel adapter was not recognized by clu_ivp.

Patch 42.00

TCR160-046

Patch: Processes may get referenced several times

State: Supersedes patches TCR160-008 (6.00), TCR160-023 (15.00), TCR160-044 (40.00)

This patch corrects the following:

  • Fixes a problem in which a cluster node can panic with the panic string "convert_lock: bad lock state".

  • Corrects a problem in which a failure in the session layer can cause DLM messages to become corrupt resulting in random DLM panic on the receiving member.

  • Fixes a problem that can cause a TruCluster member to panic during shutdown.

  • Fixes a bug where sometimes a certain shared sequence number will not be freed after use. It also fixes a problem where certain processes could get referenced several times.

Patch 59.00

TCR160-059

Patch: Fixes a problem that causes asedirector to core dump

State: Supersedes patches TCR160-002 (2.00), TCR160-009A (9.00), TCR160-016 (10.00), TCR160-007 (5.00), TCR160-021A (13.00), TCR160-024 (16.00), TCR160-025 (17.00), TCR160-022A (14.00), TCR160-033 (29.00), TCR160-035 (31.00), TCR160-042 (38.00), TCR160-043 (39.00), TCR160-051 (47.00), TCR160-031A (21.00), TCR160-053 (49.00), TCR160-036A (32.00), TCR160-047A (43.00), TCR160-028 (27.00), TCR160-052 (48.00), TCR160-065 (52.00), TCR160-066 (53.00), TCR160-058 (54.00), TCR160-060 (55.00), TCR160-054A (56.00), TCR160-057 (57.00)

This patch corrects the following

  • Fixes two problems in the asedirector:

    • An ASE command timeout problem encountered by large ASE services.

    • An incorrect decision made by the asedirector as a result of a failed inquire services command.

  • This is a performance improvement in the startup of start scripts. It will reduce the necessary system calls to start the scripts.

  • Fixes a problem where the Host Status Monitor (asehsm) incorrectly reports a network down (HSM_NI_STATUS DOWN) if the counters for the network interface get zeroed.

  • Fixes an ASE problem where, under certain circumstances, the service scripts could cause the ASE agent to loop during a start or stop service.

  • Corrects a problem with member add in a large environment.

  • Corrects a problem with TruCluster Available Server or Production Server cluster in which services have been started with elevated priority and scheduling algorithm. Under significant load this could lead to intermittent network and cluster problems.

  • Fixes a problem which caused a service not to start when there was a short network failure. This was seen only with long running stop scripts and special network configurations.

  • Corrects a problem which causes asemgr to core dump when modifying a single drd service to add more than 200 devices.

  • Fixes a problem that caused aseagent or asehsm to core dump when starting NFS and Disk Services that contain several LSM volumes.

  • Fixes a problem where the asemgr will hang as it continuously create and kill multiple directors.

  • Corrects a problem that causes the ASE director to core dump during initialization.

  • Corrects a problem where modifying a service with a large number of DRDs will fail and a "could not malloc" message is seen in the daemon.log file.

  • Fixes a problem where the MEMBER_STATE variable always is shown as BOOTING instead of RUNNING. After first installing TCR, there is no way to have scripts know the MEMBER_STATE. This problem is cleared on a reboot.

  • Corrects a problem in which a network cable failure that corrects within 7 seconds of the failure can leave the services in a bad state.

  • Fixes a problem that caused the asemgr to get a memory fault when adding multiple services in a row.

  • Fixes a problem with extraneous compiler warnings about strdup() function calls from ASE.

Patch 59.00

continued

  • Fixes a problem that caused the asemgr utility to not run when called from a program that is owned by root and has the setuid bit turned on.

  • Fixes a problem that can cause the Cluster MIB daemon (cnxmibd) to core dump in Available Server environments.

  • Fixes a problem which caused an error message to be logged for the cnxmibd even though no error had occurred.

  • Fixes two issues with clusters:

    • When a cluster is brought up with ASE off, other members report it as UP and RUNNING instead of UP and UNKNOWN.

    • When a restricted service is running on a member, and asemember stop or aseam stop is executed, the service status is still reported as the member name, instead of Unassigned.

  • Fixes a problem where timeout values of greater than 30 seconds in /etc/hsm.conf would cause ASE agent to fail at start up.

  • Fixes a bug where the aseagent will occasionally core dump on a SCSI bus hang.

  • Fixes a problem that caused the asemgr to report the wrong status for a service.

  • This patch fixes the following problems with the clu_ivp script:

    The script now checks to be sure that the cluster members are listed in the /etc/hosts file, and it no longer copies /var/adm/messages to /tmp. Copying the messages file to /tmp could result in the filesystem becoming full, and clu_ivp exiting with an error. The clu_ivp script now also checks the /var/adm/messages file for shared busses if none are listed in the configuration file.

  • Fixes a problem that could cause the asedirector to core dump.

  • Fixes a problem that caused the asemgr to report that a disk, or mount point, was in multiple services when modifying a service name.

Patch 61.00

TCR160-054B

Patch: Fixes problems with the clu_ivp script

State: TCR160-009B (22.00), TCR160-021B (23.00), TCR160-022B (24.00), TCR160-031B (25.00), TCR160-036B (50.00), TCR160-047B (51.00)

This patch corrects the following:

  • This is a performance improvement in the startup of start scripts. It will reduce the necessary system calls to start the scripts.

  • Corrects a problem with member add in a large environment.

  • Corrects a problem which causes asemgr to core dump when modifying a single drd service to add more than 200 devices.

  • Fixes a problem that caused aseagent or asehsm to core dump when starting NFS and Disk Services that contain several LSM volumes.

  • Fixes a problem with extraneous compiler warnings about strdup() function calls from ASE.

  • Fixes a problem that caused the asemgr utility to not run when called from a program that is owned by root and has the setuid bit turned on.

  • This patch fixes the following problems with the clu_ivp script:

    The script now checks to be sure that the cluster members are listed in the /etc/hosts file, and it no longer copies /var/adm/messages to /tmp. Copying the messages file to /tmp could result in the filesystem becoming full, and clu_ivp exiting with an error. The clu_ivp script now also checks the /var/adm/messages file for shared busses if none are listed in the configuration file.

Patch 63.00

TCR160-064

Patch: Node crashes when holding an mc-api lock

State: Supersedes patches TCR160-029 (20.00), TCR160-050 (46.00)

This patch corrects the following:

  • Fixes a hang problem in a cluster when two nodes communicate using the mc-api and a third node, not involved in the calculation, is rebooted.

  • Fixes a problem that can cause a panic in mcs_wait_cluster_event() when using the Memory Channel API.

  • Fixes a problem with the Memory Channel API whereby a node crashes holding an mc-api lock. Under certain circumstances the lock will not be released after the node crashes.

Patch 65.00

TCR160-063

Patch: Unable to remove LSM volumes from DRD service

State: Supersedes patch TCR160-003 (3.00)

This patch corrects the following:

  • Fixes a problem where DRD permissions could be lost if a service is modified more than once.

  • Fixes a problem that prevented the removal of LSM volumes from a DRD service. The problem occurs when there are multiple LSM diskgroups in the service, and all of the volumes from one diskgroup were removed.

Patch 67.00

TCR160-054C

Patch: clu_ivp script enhancements

State: New

This patch fixes the following problems with the clu_ivp script:

The script now checks to be sure that the cluster members are listed in the /etc/hosts file, and it no longer copies /var/adm/messages to /tmp. Copying the messages file to /tmp could result in the filesystem becoming full, and clu_ivp exiting with an error. The clu_ivp script now also checks the /var/adm/messages file for shared busses if none are listed in the con- figuration file.

Patch 70.00

TCR160-056

Patch: TruCluster Production server hangs during boot

State: Supersedes patches TCR160-017 (11.00), TCR160-027 (19.00), TCR160-032 (26.00), TCR160-062 (68.00)

This patch corrects the following:

  • Fixes a problem where both nodes in a cluster will panic at the same time with a simple_lock timeout panic.

    panic (cpu 0): rm_update_single_lock_miss: time limit exceeded

  • Fixes a problem that could cause an error to be returned when the Cluster software should wait until a global lock is freed.

  • Fixes a problem that could cause a TruCluster Production server member to hang during boot, and can cause a "simple lock time limit exceeded" panic.

Patch 72.00

TCR160-067

Patch: Error msg if system contained unsupported controllers

State: New

This patch fixes a problem that caused an error message to be printed if the system contained unsupported controllers. The error message will now only be printed when running the command in verbose mode.

Patch 74.00

TCR160-061

Patch: Access mode for a directory not set to default

State: Supersedes patches TCR160-045 (41.00), TCR160-048 (44.00), TCR160-049 (45.00)

This patch corrects the following:

  • Fixes a problem that caused the setting of the "force unmount" option to be incorrectly displayed by the asemgr utility.

  • Fixes a problem that caused shell errors if an invalid mount option was specified via the asemgr menu.

  • Fixes a problem that caused the device name for a Unix File System (UFS) to not be displayed when modifying the "force unmount" option via the asemgr utility.

  • Fixes a problem that caused the access mode for a directory to not get set to the default after modifying them via asemgr.

Patch 76.00

TCR160-055

Patch: Problem causes mountd to exit without error

State: New

This patch fixes a problem that could cause mountd to exit without error during boot.