| Patch IDs |
Abstract |
Patch 4.00
TCR510DX-001
|
Patch:
Fix for Cluster Alias Manager system management tool
State:
Existing
This patch fixes the
Cluster Alias Manager system management tool from crashing and displaying
errors. |
Patch 9.00
TCR510-001 |
Patch:
Initializing the MC-API results in system crash
State:
Existing
This patch fixes a problem where on
the AlphaServer GS160 systems, initializing the MC-API results in the system
crashing with a "kernel memory fault" message. |
Patch 85.00
TCR510-107
|
Patch:
Fixes memory hang
State:
Supersedes
patches TCR510-002 (5.00), TCR510-003 (7.00), TCR510-023 (32.00), TCR510-042
(70.00), TCR510-039 (72.00), TCR510-018 (17.00), TCR510-028 (42.00), TCR510-052
(43.00), TCR510-043 (45.00), TCR510-095 (83.00)
This patch corrects
the following:
Fixes an occasional cluster hang which can occur after a Memory
Channel error.
Fixes a kernel memory fault which occurs in the ics_mct_ring_recv()
routine.
The kernel memory fault is seen when a node is booting into the
cluster, and can occur on the booting node or on another node.
Fixes a problem in ICS where ring_recv() does not properly
handle a change in channel numbers.
The fix will, in turn, improve validation
of the connection structure on node joins.
Fixes the way communication errors occur on clusters such
that a down node will not declare all other nodes dead.
Fixes the problem that causes a panic with error message "CNX
QDISK: Yielding to foreign owner with quorum" caused by a long running thread,
ICS/MCT receive thread, which defers other kernel threads from accessing
the CPU.
Eliminates unnecessary rail failovers in vhub configurations
and removes rmerror_int diagnostic messages.
Fixes an issue which causes all cluster nodes to hang or panic
if a Wildfire is halted via the halt button.
Fixes a panic that is caused in a clustered environment that
has the following error message:
rm_request_on_bad_prail
Prevents an "ics_mct: Error from establish_RM_notification_channel"
panic on clusters.
Fixes four problem situations:
When a physical MC rail goes offline.
When the master failover node goes offline during a failover.
How ICS handles the resend situation when MC errors take place.
Failing over due to parity errors increasing beyond the limit.
Fixes hangs and increases performance of memory channel ICS
operation.
|
Patch 90.00
TCR510-087
|
Patch:
Fixes a panic in clua_cnx_unregister
State:
Supersedes patches TCR510-019 (28.00), TCR510-029 (41.00),
TCR510-041 (46.00), TCR510-048 (47.00), TCR510-037 (49.00), TCR510-091 (86.00),
TCR510-082 (87.00), TCR510-066 (88.00)
This patch corrects the
following:
Fixes the cluamgr command where it will display the alias
status even if no cluster member has joined the alias.
Fixes a problem in which RPC requests to the cluster alias
may fail with "RPC timeout" message.
Fixes a cluster node hang from in_pcbnotify.
Fixes a problem that a rebooted node not able of sending messages
to the cluster alias.
Fixes multiple networking issues within a cluster environment:
Cluster member loses connectivity with clients on remote subnets.
aliasd not handling multiple virtual aliases in a subnet and/or
IP aliases.
Allows cluster members to route for an alias without joining
it.
aliasd writing illegal configurations into gated.conf.memebrX.
Default route not being restored after network connectivity
issues.
Fixes a race condition between aliasd and gated.
Fixes a problem with a hang caused by an incorrect /etc/hosts
entry.
Fixes a problem when the cluster alias subsystem does not
send a reply to a client that pings a cluster alias address with a packet
size of less than 28 bytes.
Fixes a memory corruption panic which could occur after a
member joins the cluster or after adding a new cluster alias to one or more
of the members.
Fixes a problem with cluster alias selection priority when
adding a member to an alias.
Fixes a panic in clua_cnx_unregister where a TP structure
could not be allocated for a new TCP connection.
|
Patch 119.00
TCR510DX-002
|
Patch:
Security (SSRT1-40U, SSRT1-41U, SSRT1-42U, SSRT1-45U)
State:
New
A potential security vulnerability
has been discovered where, under certain circumstances, system integrity may
be compromised.
This may be in the form of improper file access.
Compaq has
corrected this potential vulnerability.
|
Patch 122.00
TCR510-063
|
Patch:
cfsmgr works correctly with upper case member names
State:
New.
Supersedes patch TCR510-070 (120.00)
This patch corrects the following:
|
Patch 127.00
TCR510-092
|
Patch:
Using a cluster as a RIS server causes a panic
State:
New
This patch addresses two problems:
A panic caused by a known problem, using a cluster as a RIS
server.
A fix to RIS/DMS serving in a TruCluster.
|
Patch 129.00
TCR510-071
|
Patch:
EVM cluster-wide event may cause a panic
State:
New
This patch fixes a problem that, under very
heavy loads in a cluster, could cause the system to panic when duplicating
a cluster EVM event.
|
Patch 131.00
TCR510-104
|
Patch:
Fix for Oracle 9i hang
State:
Supersedes patches TCR510-007 (11.00), TCR510-024 (30.00), TCR510-036 (67.00),
TCR510-049 (69.00)
This patch corrects the following:
Corrects a problem in which the RDG subsystem will stop sending
messages even though there are messages which are deliverable.
Fixes an incorrect display of the following warning message
at boot time:
rdg: failed to start context rcvq scan thread
Fixes a kernel memory fault with the RDG autowiring mechanism,
also seen as a "pte not valid" crash.
Adds a multichannel wait flag to pid_unblock.
Contains performance enhancements.
Fixes a problem with RDG whereby broadcast packets can interact
with the context receive queue.
Closes a timing window that can cause Oracle 9i to hang when
a remote node in the cluster goes down.
|
Patch 143.00
TCR510-085
|
Patch:
Panic in distributed lock mgr deadlock detection code
State:
Supersedes patches TCR510-033 (78.00), TCR510-047
(80.00), TCR510-061 (141.00)
This patch corrects the following:
Fixes an Oracle process hang if a node fails after receiving
a "rsbinfo"message.
Fixes a DLM problem where two processes could take out the
same lock.
Fixes a panic in dlm when another node in the cluster is halted.
Fixes a panic in the distributed lock managed deadlock detection
code.
|
Patch 148.00
TCR510-121
|
Patch:
CAA applications not failing over
State:
Supersedes patches TCR510-027 (66.00), TCR510-067 (123.00), TCR510-110
(125.00)
This patch corrects the following:
For systems running TruCluster Server V5.1 with the following
configurations:
Tapes and/or media changer devices used as CAA resources.
A combination of tapes, media changers, and network interfaces
used as CAA resources.
Fixes a problem that prevents CAA from updating the state
of any of the above resources when connectivity to the corresponding device
(tape, media changer, or network) is lost or restored.
Fixes a situation when CAA daemon on a clustered system crashes
and dumps core.
Fixes the major problems of CAA applications not failing over
during a node shutdown and caad hang condition at startup.
Corrects the inability to start and stop CAA resources.
When
started they will go to the unknown state and never start.
The problem is
nondeterministic.
Several CAA resources may be started before the problem
is seen.
|
Patch 150.00
TCR510-115
|
Patch:
Failover does not occur properly
State:
Supersedes patches TCR510-005 (15.00), TCR510-021 (33.00), TCR510-009
(34.00), TCR510-016 (35.00), TCR510-011 (36.00), TCR510-022 (37.00), TCR510-012
(39.00), TCR510-035 (73.00), TCR510-038 (74.00), TCR510-030 (75.00), TCR510-034
(77.00), TCR510-109 (132.00), TCR510-108 (133.00), TCR510-094 (134.00), TCR510-065
(135.00), TCR510-084 (136.00), TCR510-105 (137.00), TCR510-106 (138.00), TCR510-090)
(140.00)
This patch corrects the following:
Fixes two TruCluster problems:
If a Quorum disk is manually added by the command
clu_quorum -d add, the disk becomes inaccessible because
the PR flag is not being cleaned up.
The same command will work in
the next reboot.
A cluster member cannot boot under a specific hardware setup.
The CFS mount fails because of the PR flag is not cleaned up.
Addresses the need for IOCTL for remote DRD, adds clean up
for failed remote closes for non-disks, fixes error returns on failed tape/changer
closes, and fixes tape deadlock experienced in netbackups.
Fixes an issue with a tape/changer failing to correctly report
a close failure of a device in a cluster environment.
Fixes a problem which results in a system panic while doing
tape failovers.
Fixes a node panic during fiber port disables.
Fixes an issue with a tape/changer giving back "busy
on open" if a close from a remote node failed.
Provides the TCR portion of the functionality to support EMC
storage boxes that support Persistent Reserves (SCSI command set) as defined
by the final SCSI specification.
Fixes an issue with requests being stuck on a failed disk
in a cluster.
Allows high density tape drives to use the high density compression
setting in a cluster environment.
Fixes a kernel memory fault panic that can occur within a
cluster member during failover while using shared served devices.
Fixes an issue with the hwmgr -delete command that causes
a panic in a cluster.
Fixes the KZPCC controller problem seen when deleting a Virtual
Drive using SWCC and adding the same drive back can result in the disk being
unaccessible.
Fixes several problems with the device request dispatcher
(drd) kernel subsystem, including cluster hangs, kernel memory faults, reboot
problems, node recovery problems, and device failover problems.
Fixes cluster hangs and panics due to I/O problems.
Fixes a problem where the tape changer is only accessible
from member that's the drd server for the changer.
Fixes a race condition problem when multiple unbarrierable
disks failed at the same time.
Fixes a problem where CAA applications using tape/changers
as required resources will not come ONLINE (as seen by caa_stat).
|
Patch 150.00
continued
|
|
Patch 152.01
TCR510-123
|
Patch:
Security (SSRT0691U)
State:
Supersedes patches TCR510-004 (2.00), TCR510-006 (13.00), TCR510-026 (18.00),
TCR510-020 (19.00), TCR510-013 (20.00), TCR510-015 (21.00), TCR510-017 (22.00),
TCR510-014 (23.00), TCR510-025 (24.00), TCR510-008 (26.00), TCR510-056 (50.00),
TCR510-050 (51.00), TCR510-054 (52.00), TCR510-057 (53.00), TCR510-046 (54.00),
TCR510-040 (55.00), TCR510-031 (56.00), TCR510-032 (57.00), TCR510-051 (58.00),
TCR510-060 (59.00), TCR510-044 (60.00), TCR510-053 (61.00), TCR510-045 (62.00),
TCR510-058 (64.00), TCR510-064 (82.00), TCR510-077 (91.00), TCR510-100 (92.00),
TCR510-098 (93.00), TCR510-081 (94.00), TCR510-072 (95.00), TCR510-073 (96.00),
TCR510-075 (97.00), TCR510-083 (98.00), TCR510-093 (99.00), TCR510-096 (100.00),
TCR510-069 (101.00), TCR510-088 (102.00), TCR510-076 (103.00), TCR510-079
(104.00), TCR510-086 (105.00), TCR510-089 (106.00), TCR510-078 (107.00), TCR510-099
(108.00), TCR510-097 (109.00), TCR510-102 (110.00), TCR510-101 (111.00), TCR510-103
(112.00), TCR510-074 (113.00), TCR510-080 (114.00), TCR510-062 (115.00), TCR510-068
(117.00), TCR510-127 (144.00), TCR510-123 (146.00)
This patch
corrects the following:
A potential security vulnerability has been discovered, where
under certain circumstances, system integrity may be compromised.
This may
be in the form of improper file or privilege management.
Compaq has corrected
this potential vulnerability.
Provides a small TPC-C performance optimization to cfsspec_read
for reporting TPC-C single node cluster numbers.
When attempting to roll a patch kit on a single member cluster
without this patch, the following error messages will be seen when
running the postinstall stage:
*** Error***
Members '2' is NOT at the new base software version.
*** Error***
Members '2' is NOT at the new TruCluster software version.
During backup stage of clu_upgrade setup 1, clu_upgrade
is unable to determine the name of the kernel configuration file.
clu_upgrade does not check the availabilty of space in /,
/usr, and /usr/i18n.
During the preinstalled phase, clu_upgrade will ignore a no
answer when the user is prompted, during an error condition, whether
they wish to continue.
clu_upgrade incorrectly assumes that if the directory /usr/i18n
exists, then it is in its own file system.
After the clu_upgrade clean phase, the final step of clu_upgrade,
no message is displayed that leads the user to believe they have
completed the upgrade.
Only the prompt is returned and the clu _upgrade
-completed clean command reports that the clean had not completed.
clu_upgrade can display "Could not get property..." and
"...does not exist" type of error messages during the undo install
phase.
The clu_upgrade undo switch command, after completing a clu_upgrade
switch command, should display an error message instead of claiming it
has succeeded.
Fixes a problem with disaster recovery whereby the node being
restored will hang on boot.
|
Patch 152.01
continued
|
Corrects a problem in which a cluster may panic with a "cfsdb_assert"
message when restoring files from backup while simultaneously relocating the
CFS server for that file system.
Corrects a problem in which a cluster member can panic with
the panic string "cfsdb_assert" when a NFS V3 TCP client attempts to create
a socket using mknod(2).
Corrects a problem in which a cluster member will panic with
the patch string "lock_terminate: lock held" from cinactive().
Fixes a hang seen while running collect and the vdump utility.
This patch prevents the hang in tok_wait from occurring.
This also prevents
a cfsdb_assert panic that contains the following message:
Assert Failed: (tcbp->tcb_flags & TOK_GIVEBACK) == 0
Prevents a cfsdb_assert panic from occurring in the cfs block
reserve code.
The system is most likely running process accounting that will
receive this type of panic.
Provides performance enhancements for copying large files
(files smaller than the total size of client's physical memory) between
a CFS client and server within the cluster.
Corrects a token hang situation by comparing against the
correct revision mode.
Fixes a bug in the cluster filesytem that can cause a kernel
memory fault.
Eliminates superfluous AutoFS auto-mount attempts during rolling
upgrade.
These attempted auto-mounts slow down certain operations and leave
the AutoFS namespace polluted with directories prefexed with ".Old..".
Fixes memory leak in cfscall_ioctl().
Fixes a panic with the following error message:
panic: cfsdb_assert
Contains corrections required for proper operation of Oracle
9i with Tru64 UNIX/TruCluster 5.1.
The problems corrected include:
Processes hanging when using Cluster File System/Direct I/O
feature.
Improper handling of direct I/O to an AdvFS fileset if a clone
fileset was already in use, potentially resulting in an inconsistent
backup.
Using ls -l, the Cluster File System file attribute could
be seen inconsistently from the server and client members.
For example,
a file's mode could be seen differently from the server and the client.
A file opened for Direct I/O on the Cluster File System server
may inappropriately be opened in non-direct I/O mode by a client.
Oracle processes hanging due to shutting down one cluster
member.
A problem with the Cluster File System which could cause a
cluster system to panic with the panic string "kernel memory fault" in
the routine mc_bcopy().
A problem with Cluster File System which could cause a cluster
member to panic with the panic string "uiomove: mode." This problem
could cause Oracle multi-instance data bases to crash with the message
similar to the following:
ORA-27063: skgfospo: number of bytes read/written is incorrect
|
Patch 152.01
continued
|
Fixes data inconsistency problems that can be seen on clusters
that are NFS clients.
Frevents a cfsdb_assert panic from occurring in cfs_reclaim.
This panic has been seen while running ensight7.
Prevents a potential hang due to external NFS servers.
Provides a warning to users installing a patch kit that includes
a patch which requires a version switch.
The warning informs the user that
the installed patches include a version switch which cannot be removed using
the normal patch removal procedure.
The warning allows the user to continue
with the switch stage or exit clu_upgrade.
Prevents a potential hang that can occur on a CFS failover.
Allows POSIX semaphores/msg queues to operate properly on
a CFS client.
Allows the command cfsstat -i to execute properly.
Corrects a problem which can cause cluster members to hang,
waiting for the update daemon to flush /var/adm/pacct.
Fixes a potential CFS hang on defragment.
Fixes a possible "Kernel Memory Fault" panic on
racing mount update/unmount/remount operations for the same mount point.
Fixes a possible "Kernel Memory Fault" in function
ckidtokgs.
Fixes possible "cfs_add_mount() - database entry present"
panic and possible multinode reboot hang which shows the following message:
WARNING: RETRYING TO LOCK THE BOOT PARTITION DEVICE
Fixes two race conditions in Cluster Mount support:
Fixes two AutoFS problems:
Fixes a panic that would occur during the mount of a cluster
file system on top of a non-cluster file system.
Prevents a "Kernel Memory Fault" panic during
unmount in a Cluster or during a planned relocation.
Corrects a "cfsdb_assert" panic which can occur following
the failure of a cluster node.
|
Patch 152.01
continued
|
Addresses three CFS problems:
A kernel memory fault in the CFS read-ahead code.
A deadlock in the CFS read-ahead code.
A potential data inconsistency problem which could occur when
a filesystem becomes 100% full.
Enforces the rule that mounting on a server-only file system
makes the new mount server-only.
Fixes two race conditions:
Between cluster root failover and mount which results in a
kernel memory fault.
Between failover-related cleanup and bootup-time mount processing,
which results in deadlock and hangs the new node.
Eliminates a Kernel Memory Fault panic during node shutdown.
Addresses a problem in CFS where, under certain conditions,
CFS would temporarily change the value of p_pid of the current running process.
The result of this could break certain pid-based hashing algorithms in the
kernel, as well as advery affect certain kernel debugging tools.
Fixes a race condition during cluster mount which results
in a transient ENODEV seen by a name space lookup.
Addresses a problem where a file's attributes (owner, group,
mode, etc) could become inconsistent cluster-wide.
Fixes a PANIC: CFS_ADD_MOUNT() - DATABASE ENTRY PRESENT panic
when a node re-joins the cluster.
Addresses a problem where CFS may not properly invalidate
cached access rights when a change is made to a file's property list.
Fixes a race condition between node shutdown and unmount,
and ensures that all file sets from an AdvFS domain mounted as server_only
get unmounted when the server node is shut down.
This patch addresses two cluster problems:
Fixes the assertion failure ERROR != ECFS_TRYAGAIN.
Corrects a CFS problem that could cause a panic with the panic
string of "CFS_INFS full".
Fixes several potential CFS panics.
Fixes functional problems dealing with CFS direct I/O and
CFS block reservation.
Fixes a possible panic on boot if mount request is received
from another node too early in the boot process.
Prevents a panic:
Assert failed: vp->v_numoutput > 0
or a system hang when a filesystem becomes full and direct async I/O
via CFS is used.
A vnode will exist that has v_numoutput with a greater than
0 value and the thread is hung in vflushbuf_aged().
This patch prevents the following panic:
cms_kgs_callback_thr: in use already set on non-initiator
|
Patch 152.01
continued
|
Fixes a potential CFS deadlock.
Addresses a problem seen during the setup stage of a rolling
upgrade during tag file creation.
The fix is to change a variable to only
look at 500 files at a time while making tag files, instead of the current
700.
Fixes a hang during cluster unmount which results in the blocking
of all further mounts and unmounts.
Addresses a cluster problem that can arise in the case where
a cluster is serving as an NFS server.
The problem can result in stale data
being cached at the nodes which are servicing NFS requests.
|
| |
|