3    Summary of TruCluster Software Patches

This chapter summarizes the TruCluster software patches included in Patch Kit-0003.

Table 3-1 lists patches that have been updated.

Table 3-2 provides a summary of patches in Patch Kit-0003.

Table 3-1:  Updated TruCluster Software Patches

Patch IDs Change Summary
Patches 4.00, 9.00, 17.00, 41.00 New
Patches 2.00, 13.00, 18.00, 19.00, 20.00, 21.00, 22.00, 23.00, 24.00 Superseded by Patch 26.00
Patch 11.00 Superseded by Patch 30.00
Patch 5.00, 7.00 Superseded by Patch 32.00
Patch 15.00, 33.00, 34.00, 35.00, 36.00, 37.00 Superseded by Patch 39.00

Table 3-2:  Summary of TruCluster Patches

Patch IDs Abstract

Patch 4.00

TCR510DX-001

Patch: Fix for Cluster Alias Manager system management tool

State: New

This patch fixes the Cluster Alias Manager system management tool from crashing and displaying errors.

Patch 9.00

TCR510-001

Patch: Initializing the MC-API results in system crash

State: New

This patch fixes a problem where on the AlphaServer GS160 systems, initializing the MC-API results in the system crashing with a "kernel memory fault" message.

Patch 17.00

TCR510-018

Patch: Removes rmerror_int diagnostic messages

State: New

This patch eliminates unnecessary rail failovers in vhub configurations and removes rmerror_int diagnostic messages.

Patch 26.00

TCR510-008

Patch: Security (SSRT0691U)

State: Supersedes patches TCR510-004 (2.00), TCR510-006 (13.00), TCR510-026 (18.00), TCR510-020 (19.00), TCR510-013 (20.00), TCR510-015 (21.00), TCR510-017 (22.00), TCR510-014 (23.00), TCR510-025 (24.00)

This patch corrects the following:

  • A potential security vulnerability has been discovered, where under certain circumstances, system integrity may be compromised. This may be in the form of improper file or privilege management. Compaq has corrected this potential vulnerability.

  • Provides a small TPC-C performance optimization to cfsspec_read for reporting TPC-C single node cluster numbers.

  • When attempting to roll a patch kit on a single member cluster without this patch, the following error messages will be seen when running the postinstall stage:

    *** Error***
          Members '2' is NOT at the new base software version.
     
    *** Error***
          Members '2' is NOT at the new TruCluster software version.

  • During backup stage of clu_upgrade setup 1, clu_upgrade is unable to determine the name of the kernel configuration file.

  • clu_upgrade does not check the availabilty of space in /, /usr, and /usr/i18n.

  • During the preinstalled phase, clu_upgrade will ignore a no answer when the user is prompted, during an error condition, whether they wish to continue.

  • clu_upgrade incorrectly assumes that if the directory /usr/i18n exists, then it is in its own file system.

  • After the clu_upgrade clean phase, the final step of clu_upgrade, no message is displayed that leads the user to believe they have completed the upgrade. Only the prompt is returned and the clu _upgrade -completed clean command reports that the clean had not completed.

  • clu_upgrade can display "Could not get property..." and "...does not exist" type of error messages during the undo install phase.

  • The clu_upgrade undo switch command, after completing a clu_upgrade switch command, should display an error message instead of claiming it has succeeded.

  • Fixes a problem with disaster recovery whereby the node being restored will hang on boot.

  • Corrects a problem in which a cluster may panic with a "cfsdb_assert" message when restoring files from backup while simultaneously relocating the CFS server for that file system.

  • Corrects a problem in which a cluster member can panic with the panic string "cfsdb_assert" when a NFS v3 TCP client attempts to create a socket using mknod(2).

  • Corrects a problem in which a cluster member will panic with the patch string "lock_terminate: lock held" from cinactive().

Patch 26.00

continued

  • Fixes a hang seen while running collect and the vdump utility. This patch prevents the hang in tok_wait from occurring. This also prevents a cfsdb_assert panic that contains the following message:

    Assert Failed: (tcbp->tcb_flags & TOK_GIVEBACK) == 0

  • Prevents a cfsdb_assert panic from occurring in the cfs block reserve code. The system is most likely running process accounting that will receive this type of panic.

  • Provides performance enhancements for copying large files (files smaller than the total size of client's physical memory) between a CFS client and server within the cluster.

Patch 30.00

TCR510-024

Patch: Corrects incorrect warning message

State: Supersedes patch TCR510-007 (11.00)

This patch corrects the following:

  • corrects a problem in which the RDG subsystem will stop sending messages even though there are messages which are deliverable.

  • Fixes an incorrect display of the following warning message at boot time:

    rdg: failed to start context rcvq scan thread

Patch 32.00

TCR510-023

Patch: ring_recv does not handle change in channel numbers

State: Supersedes patches TCR510-002 (5.00), TCR510-003 (7.00)

This patch corrects the following:

  • Fixes an occasional cluster hang which can occur after a Memory Channel error.

  • Fixes a kernel memory fault which occurs in the ics_mct_ring_recv() routine. The kernel memory fault is seen when a node is booting into the cluster, and can occur on the booting node or on another node.

  • Fixes a problem in ICS where ring_recv() does not properly handle a change in channel numbers. The fix will, in turn, improve validation of the connection structure on node joins.

Patch 39.00

TCR510-012

Patch: System panics while doing tape failovers

State: Supersedes patches TCR510-005 (15.00), TCR510-021 (33.00), TCR510-009 (34.00), TCR510-016 (35.00), TCR510-011 (36.00), TCR510-022 (37.00)

This patch corrects the following:

  • Fixes two TruCluster problems:

    • If a Quorum disk is manually added by the command clu_quorum -d add, the disk becomes inaccessible because the PR flag is not being cleaned up. The same command will work in the next reboot.

    • A cluster member cannot boot under a specific hardware setup. The CFS mount fails because of the PR flag is not cleaned up.

  • Addresses the need for IOCTL for remote DRD, adds clean up for failed remote closes for non-disks, fixes error returns on failed tape/changer closes, and fixes tape deadlock experienced in netbackups.

  • Fixes an issue with a tape/changer failing to correctly report a close failure of a device in a cluster environment.

  • Fixes a problem which results in a system panic while doing tape failovers.

  • Fixes a node panic during fiber port disables.

  • Fixes an issue with a tape/changer giving back "busy on open" if a close from a remote node failed.

  • Provides the TCR portion of the functionality to support EMC storage boxes that support Persistent Reserves (SCSI command set) as defined by the final SCSI specification.

  • Fixes an issue with requests being stuck on a failed disk in a cluster.

Patch 41.00

TCR510-019

Patch: Cluster members not able to route an alias

State: New

This patch corrects the following:

  • Fixes the cluamgr command where it will display the alias status even if no cluster member has joined the alias.

  • Fixes a problem in which rpc requests to the cluster alias may fail with "RPC timeout" message.