3    Summary of TruCluster Software Patches

This chapter summarizes the TruCluster software patches included in Patch Kit-0003.

Table 3-1 lists patches that have been updated.

Table 3-2 provides a summary of patches in Patch Kit-0003.

Table 3-1:  Updated TruCluster Software Patches

Patch IDs Change Summary
Patches 28.00, 50.00, 52.00 New
Patches 2.00, 4.00, 12.00, 21.00, 8.00, 29.00, 30.00, 31.00 Superseded by Patch 33.00
Patches 3.00, 13.00, 6.00, 7.00, 9.00, 10.00, 11.00, 24.00, 5.00, 1.00, 14.00, 15.00, 16.00, 17.00, 19.00, 26.00, 34.00, 35.00, 36.00, 37.00, 38.00, 39.00, 40.00, 41.00, 42.00, 43.00, 44.00, 45.00, 46.00, 48.00 Superseded by Patch 54.00

Table 3-2:  Summary of TruCluster Patches

Patch IDs Abstract

Patch 23.00

TCR505-019

Patch: Fixes a problem with the Memory Channel API

State: New

This patch fixes a problem with the Memory Channel API, whereby a node crashes holding an mc-api lock. Under certain circumstances the lock will not be released after the node crashes.

Patch 28.00

TCR505-027

Patch: Fixes a problem in CFS/NFS

State: New

This patch fixes a problem in CFS/NFS. NFS permissions are not handled properly in CFS.

Patch 33.00

TCR505-031

Patch: Fix for MC2 vhub cluster panic

State: Supersedes patches TCR505-006 (2.00), TCR505-002 (4.00), TCR505-014 (12.00), TCR505-021 (21.00), TCR505-008 (8.00), TCR505-039 (29.00), TCR505-043 (30.00), TCR505-032 (31.00)

This patch corrects the following problems:

  • Fixes a system panic that can be caused by Memory Channel errors occurring when the system is under heavy load.

  • Improves cluster communication performance including file system mount times.

  • Corrects problems seen when both a member is leaving and joining the cluster at the same time.

  • Corrects problems with loss of quorum in a cluster. Once the quorum is lost, the member may panic with the panic string:

    QNX DISK: yeilding to foreign owner with quorum.

  • If lockmode has been set to 4, booting an MC2 vhub cluster generates the following panic on the second node booting:

    simple_lock: uninitialized lock
    ....
    panic (cpu 0): simple_lock: uninitialized lock

  • Eliminates double failure panics in vhub configurations and removes rmerror_int diagnostic messages.

  • Fixes a problem in ICS where ring_recv() does not properly handle a change in channel numbers. The fix will, in turn, improve validation of the connection structure on node joins.

  • Corrects a problem in which a loss of the cluster heartbeat could cause a member to panic with "CNX QDISK: Yielding to foreign owner with quorum".

  • Prevents a kmf (kernel memory fault) panic that can occur when a node is joining the cluster.

Patch 50.00

TCR505-045

Patch: Fix for system panic

State: New

This patch fixes a panic which can occur on a V5.0A TruCluster system.

Patch 52.00

TCR505-042

Patch: Fix for cluster node crash

State: New

This patch fixes a problem where a cluster node will crash on boot because CNX could not register seqdisk callback.

Patch 54.00

TCR505-053

Patch: Security (SSRT0691U)

State: Supersedes patches TCR505-009 (3.00), TCR505-018 (13.00), TCR505-003 (6.00), TCR505-007 (7.00), TCR505-010 (9.00), TCR505-012 (10.00), TCR505-013 (11.00), TCR505-023 (24.00), TCR505-004 (5.00), TCR505-005 (1.00), TCR505-015 (14.00), TCR505-024 (15.00), TCR505-020 (16.00), TCR505-016 (17.00), TCR505-017 (19.00), TCR505-011 (26.00), TCR505-046 (34.00), TCR505-028 (35.00), TCR505-041 (36.00), TCR505-040 (37.00), TCR505-044 (38.00), TCR505-035 (39.00), TCR505-033 (40.00), TCR505-029 (41.00), TCR505-030 (42.00), TCR505-036 (43.00), TCR505-038 (44.00), TCR505-026 (45.00), TCR505-025 (46.00), TCR505-034 (48.00)

This patch fixes the following:

  • Delivers a new stripped clu_genvmunix and several fixes to the cluster rolling upgrade procedure.

  • Problem seen when running clu_upgrade preinstall commands on certain multi-CPU systems. Numerous error messages similar to the following are seen:

    *** Error ***
    Could not create: ocolsocols/.Old..ocols

    If you see this problem enter a Ctrl/C and rerun the clu-upgrade preinstall command.

  • Fixes a situation which has caused a node panic with the following message:

    SIMPLE_LOCK: TIME LIMIT EXCEEDED PANIC ON SHARED TAPE

  • Solves a problem with booting and shutting down cluster nodes while using a tape (or changer) device in a 5.0A cluster.

  • Fixes a problem where a mount command will hang after DRM has restored the path to an HSG80 storage volume.

  • Fixes a problem where a path will fail after DRM has restored the path to an HSG80 storage volume.

  • This patch fixes a problem where on a cluster node, if a new device is detected by a HW scan while the cluster is running, one of the following situations can occur:

    • Only one node will be able to use the device; if the device is Fibre Channel.

    • There is a small risk for data inconsistencies on parallel SCSI device on a shared bus if the node subsequently loses quorum.

  • Provides the DRD portion of a fix to prevent an AdvFS Domain Panic from occurring during the boot process following a clu_add_member.

  • Fixes a problem where on a cluster node, if a SCSI bus reset occurs, when there is a loss of quorum, the DRD will be blocked on tape devices.

  • Fixes a kernel memory fault panic in routines cfstok_find_held_tok. This is caused when the very first action of a new allocated thread is a lookup in an NFS filesystem of ".".

  • Fixes a problem where mounts that return "ESTALE" may loop forever. Prevents a KMF panic from occurring when an AdvFS mount is attempted without a fileset being specified.

  • Provides the CFS/CMS portion of a fix to prevent an AdvFS Domain Panic from occurring during the boot process following a clu_add_member.

Patch 54.00

continued

  • Corrects a problem with cluster members panic with a "kernel memory fault" when either running sys_check or mulitple cfsmgr commmands.

  • Provides performance enhancements for CFS.

  • Prevents a "request_internal: client already had token" panic from occurring when nodes are leaving and joining the cluster.

  • Prevents a cfsdb_assert panic from occurring in the CFS block reserve code. The system is most likely running process accounting that receives this type of panic.

  • A potential security vulnerability has been discovered, where under certain circumstances, system integrity may be compromised. This may be in the form of improper file or privilege management. Compaq has corrected this potential vulnerability.

  • Fixes several problems, including addressing the need for IOCTL for remote DRD, adding clean up for failed remote closes for non-disks, fixing error returns on failed tape/changer closes, and fixes to tape deadlock experienced in netbackups.

  • Fixes an issue with a tape/changer giving back busy on open if a close from a remote node failed.

  • Fixes a problem in which I/O to an hsg80 can hang.

  • Fixes a problem with a cluster-as-NFS-client, in which there is a potential race where a CFS client node may not correctly timeout its cached data for a given file. Thus, processes accessing the given file, on that particular cluster member, may not see changes made to the file via the NFS server, or other NFS clients.

  • Fixes the following two TruCluster problems:

    • If a Quorum disk is manually added by the command clu_quorum -d add, the disk becomes inaccessible because the PR flag is not being cleaned up. The same command will work in the next reboot.

    • A cluster member cannot boot under specific hardware setup. The CFS mount fails because of the PR flag is not cleaned up.

  • Fixes a problem in which data can become corrupted on hardware configurations that use multiported parallel Fibre Channel storage arrays. It also fixes a problem in which shared tapes will incorrectly indicate that they are busy.

  • Provides performance enhancements for copying large files ( files smaller than the total size of client's physical memory) between a CFS client and server within the cluster.

  • Corrects a problem in which a cluster member can panic with the panic string "cfsdb_assert" when a NFS V3 TCP client attempts to create a socket using mknod(2).

  • Corrects a problem in which a cluster member will panic with the patch string "lock_terminate: lock held" from cinactive().

  • Fixes a problem in CFS. CFS stops serving lock requests resulting in a process hang.

Patch 54.00

continued

  • Prevents possible file inconsistencies that can occur during a CFS/NFS race condition.

  • Fixes a hang seen while running collect and the vdump utility. This patch prevents the hang in tok_wait from occurring. This also prevents a cfsdb_assert panic that contains the following message:

    Assert Failed: (tcbp->tcb_flags & TOK_GIVEBACK) == 0

  • Fixes a problem where booting several nodes in a cluster simultaneously which could cause a KMF panic to occur.