Patch IDs |
Abstract |
Patch 11.00
TCR150-012
|
Patch:
Cluster Map Not Being Loaded At Boot Time Correction
State:
Existing
This patch fixes a problem
in TruCluster Available Server V1.5.
The cluster map (/etc/CCM) was not being
loaded at boot time, which prevented the Cluster Monitor utility (cmon) and
its associated daemons (tractd and submon) from running. |
Patch 13.00
TCR150DX-003
|
Patch:
Cluster Monitor Hang Correction
State:
Existing
This patch fixes a problem where if the name
of an ASE service is changed using asemgr, Any Cluster Monitor (cmon) that
is running on the cluster will hang. |
Patch 28.00
TCR150-031
|
Patch:
ASE Check Service Script Could Be Corrupt
State:
Existing
This patch corrects a problem in which
an ASE check service script could become corrupted in the ASE configuration
data base.
|
Patch 36.00
TCR150-025-1
|
Patch:
dlm_panic Fix
State:
Supersedes
patches TCR150-016 (14.00), TCR150-022 (20.00), TCR150-025 (23.00)
This patch fixes the following problems:
Problem that can cause a cluster member to panic in rcv_deqlk_msg()
with the panic string set to:
dlm_panic
Provides performance enhancements that are required by Oracle
V8.0.5.
Fixes a system panic with the following message:
snd_grantlk_msg: no memory for message
|
Patch 47.00
TCR150-044
|
Patch:
Kernel Memory Fault Panic
State:
Existing
This patch fixes two panics:
|
Patch 48.00
TCR150-045
|
Patch:
Fix for AdvFS Panic
State:
Supersedes patch TCR150-008 (7.00)
This patch corrects the following:
Fixes a problem in which running the vquotacheck command on
a filesystem participating in an ASE service will cause a system to panic
if the service fails over or relocates while the command is in progress.
Fixes a problem that could cause an AdvFS panic when a service
that has quotas enabled is relocated.
The problem occurs if a command is
running that has a large number of arguments (>99).
|
Patch 49.00
TCR150-046
|
Patch:
drdadmin Incorrectly Builds drdtab File
State:
Supersedes patch TCR150-007 (6.00)
This patch
fixes the following problems:
If a cluster member issued a drdadmin command to create new
DRD map entry while another member is rebooting or had explicitly issued
a SCSI bus reset, the command may fail with the following message:
drdadmin: Error: Can not add map entry for drdadmin:
Error: Can not add map entry for <drd device name>
During system startup, as each DRD map entry is being added.
the following informational message may be seen on the console:
No cluster has been setup, there are 0 nodes.
Fixes a problem where drdadmin does not properly build the
drdtab file during bootup.
|
Patch 52.00
TCR150-050
|
Patch:
Adding second cnxmond Causes Cluster Partition
State:
Existing
This patch fixes a problem where starting
a second cnxmond could cause a cluster partition.
Attempting to start a
second one will now log an error message, and the new process will exit.
|
Patch 60.00
TCR150-040A
|
Patch:
Fix for Memory Channel API
State:
Supersedes patches TCR150-010 (9.00), TCR150-019 (17.00), TCR150-019-1 (41.00),
TCR150-039A (58.00)
This patch fixes the following problems:
Problem with the Memory Channel API whereby the function imc_asalloc
did not allow a negative key (most significant bit of key being set).
Problem that caused mcm_init to core dump when resolver fails
on system boot.
Problem in which a resolver failure produces an unhelpful
error message from mcm_init on boot.
Problem with the Memory Channel API whereby the function
imc_ckerrcnt was signifying an error had occurred when in fact no error
had occurred.
The following is the error code seen when running an MPI
code:
[5]MPI Die-ump2chck.c 91 "ump_wait failure" (-16)
|
Patch 65.00
TCR150-006B
|
Patch:
System Panic dlm getch: illegal csid Correction
State:
Existing
Fixes a problem in the TruCluster Production
Server Software in which a system can panic with the following message:
dlm getch: illegal csid
|
Patch 79.00
TCR150-062C
|
Patch:
Message Service Routine Fixes
State:
Supersedes patches TCR150-003 (2.00), TCR150-009 (8.00), TCR150-011 (10.00),
TCR150-017 (15.00), TCR150-018 (16.00), TCR150-020 (18.00), TCR150-023 (21.00),
TCR150-024-1 (22.01), TCR150-014 (12.00), TCR150-027 (25.00), TCR150-024B-1
(39.00), TCR150-027B-1 (35.01)
This patch fixes the following
problems:
Fixes a problem in the message service routines used by the
daemons in TruCluster Available Server and TruCluster Production Server
software.
When the message queue fills, the following message is entered
in the daemon.log file, but the queue is not emptied:
msgSvc: message queue overflow, LOST MESSAGE!
From this point on, no further messages will be received.
Fixes a problem in Version 1.5 of the TruCluster Production
Server and TruCluster Available Server products where, during the start of
a service, missing special device files were not being created for HSZ
disks.
Since the special device files did not get created, the service
start would fail.
Fixes a segmentation fault that can cause ASE daemons to exit
or hang.
Fixes a problem where the Host Status Monitor (asehsm) incorrectly
reports a network down (HSM_NI_STATUS DOWN) if the counters for the network
interface get zeroed.
Fixes a problem that caused the asedirector to core dump if
asemgr processes were modifying services from more than one node in the
cluster at the same time.
Fixes scalability problems in the DECsafe Available Server,
TruCluster Available Server, and TruCluster Production Server products.
The problems caused the asemgr to core dump when adding or modifying services
with a large number of disks.
Fixes several problems related to ASE service relocation and
reporting in the event of network failures.
Fixes a problem that could cause the ASE daemons or asemgr
utility to core dump with a segmentation violation.
Fixes a problem where, under certain circumstances, an ASE
service modification could result in a corrupted configuration data base.
Fixes several TCR problems involving large sites with services
containing large numbers of DRDs.
|
Patch 95.00
TCR150-080B
|
Patch:
aseagent and asemgr Fixes
State:
Supersedes patches TCR150-003 (2.00), TCR150-009 (8.00), TCR150-011 (10.00),
TCR150-017 (15.00), TCR150-018 (16.00), TCR150-020 (18.00), TCR150-023 (21.00),
TCR150-024-1 (22.01), TCR150-024B (33.00), TCR150-024C (40.00), TCR150-032B
(57.00), TCR150-043B (63.00), TCR150-049B (68.00), TCR150-060B (77.00), TCR150-062B
(78.00), TCR150-063B (80.00), TCR150-064B (92.00), TCR150-068B (93.00), TCR150-073B
(94.00)
This patch fixes the following problems:
Fixes a problem in the message service routines used by the
daemons in TruCluster Available Server and TruCluster Production Server
software.
When the message queue fills, the following message is entered
in the daemon.log file, but the queue is not emptied:
msgSvc: message queue overflow, LOST MESSAGE!
From this point on, no further messages will be received.
Fixes a problem in Version 1.5 of the TruCluster Production
Server and Available Server products where, during the start of a service,
missing special device files were not being created for HSZ disks.
Since
the special device files did not get created, the service start would fail.
Fixes a segmentation fault that can cause ASE daemons to exit
or hang.
Fixes a problem where the Host Status Monitor (asehsm) incorrectly
reports a network down (HSM_NI_STATUS DOWN) if the counters for the network
interface get zeroed.
Fixes a problem that caused the asedirector to core dump if
asemgr processes were modifying services from more than one node in the
cluster at the same time.
Fixes scalability problems in the DECsafe Available Server,
TruCluster Available Server, and TruCluster Production Server products.
The problems caused the asemgr to core dump when adding or modifying services
with a large number of disks.
Fixes several problems related to ASE service relocation and
reporting in the event of network failures.
Fixes a problem that could cause the ASE daemons or asemgr
utility to core dump with a segmentation violation.
Corrects problems with temporary files not being removed and
eliminates the need for one temporary file.
Fixes a problem that can cause the asemgr utility to core
dump when modifying services that contain a large number of disks.
Fixes a number of ASE behavior problems resulting from network
cable failure.
Fixes several TCR problems involving large sites with services
containing large numbers of DRDs.
Fixes a problem that caused the ASE daemons and asemgr to
core dump when the lookup for an IP address failed.
Performance improvement in the startup of start scripts.
It
will reduce the necessary system calls to start the scripts.
|
Patch 95.00
continued |
Corrects a problem in which a member add will fail in a large
ASE environment.
Corrects a problem which causes asemgr to core dump when modifying
a DRD service to add more than 200 devices in a single service.
Corrects a problem which causes an aseagent to hang when restarting
the ASE member.
|
Patch 97.00
TCR150-081A
|
Patch:
Fix SCSI device reservations lost
State:
Supersedes patches TCR150-004 (3.00), TCR150-030 (27.00), TCR150-036
(32.00), TCR150-057 (70.00)
This patch fixes the following problems
in the ASE Availability Manager (AM):
A "simple_lock: time limit exceeded" panic on multiprocessor
and system hangs in single processor systems.
This can occur when multiple
host target mode requests are issued due to SCSI aborts and resets on a
shared bus.
A kernel memory fault panic caused by a race condition when
the AM de-initializes.
Fixes a problem in which tape services may not failover as
expected.
Fixes two problems:
A problem in which the following messages may appear in the
binary error log:
SCSI STATUS RESERVATION CONFLICT Target xx Lun xx
or:
Max SEND SCSI BUSY retries exhausted
A problem in which a system may panic if the system has an
IDE interface and ASE is then installed.
Fixes a problem in clustered systems.
It reduces the occurrences
of tmv2_notify_cbf error messages in the errlog.
Fixes the following TCR problems:
After error events are processed, a timing hole exists
whereby important events can be lost.
After a HSZ controller failure, SCSI device reservations
could get lost because the error events are not being ordered properly.
|
Patch 102.00
TCR150-086
|
Patch:
Various dlm Corrections
State:
Supersedes patches TCR150-016 (14.00), TCR150-022 (20.00), TCR150-025 (23.00),
TCR150-025B (37.00), TCR150-047 (50.00), TCR150-006A (5.00), TCR150-041 (66.00),
TCR150-059 (71.00), TCR150-074 (86.00), TCR150-085 (101.00)
This
patch fixes the following problems:
Problem that can cause a cluster member to panic in rcv_deqlk_msg()
with the panic string set to:
dlm_panic
Provides performance enhancements that are required by Oracle
V8.0.5.
Fixes a system panic with the following message:
snd_grantlk_msg: no memory for message
Fixes a problem in TruCluster in which a node panics with
the string dlm_panic.
Fixes a problem in the TruCluster Production Server Software
in which a system can panic with the following message:
dlm getch: illegal csid
Fixes a deadlock condition between the DLM rebuild thread
and the Connection Manager ping daemon (cnxpingd).
The deadlock can cause
users of DLM (e.g., Oracle) to hang.
Fixes a problem in which a cluster node can panic with the
panic string "convert_lock: bad lock state".
Corrects a problem in which a failure in the session layer
can cause DLM messages to become corrupt resulting in random DLM panic on
the receiving member.
Fixes a problem that can cause a TruCluster member to panic
during shutdown.
Fixes a bug where sometimes a certain shared sequence number
will not be freed after use.
It also fixes a problem where certain processes
could get referenced several times.
|
Patch 105.00
TCR150-089
|
Patch:
Shell errors occur if invalid mount option specified
State:
Supersedes patches TCR150-014 (12.00), TCR150-027
(25.00), TCR150-027A-1 (34.01), TCR150-035 (43.00)), TCR150-042 (46.00),
TCR150-079 (96.00), TCR150-083 (99.00)
This patch fixes the following
problems:
Provides support in asemgr for the new AdvFS mount option
-o noatimes.
Fixes a problem in which, under certain circumstances, an
ASE service modification could result in a corrupted configuration data
base.
Fixes a problem in which a service fails to start when the
ASE service name and the AdvFS domain name are identical.
Fixes a problem where LSM disk information was not properly
updated in the ASE database when volumes were removed from a disk service.
Fixes a deadlock condition between the DLM rebuild thread
and the Connection Manager ping daemon (cnxpingd).
The deadlock can cause
users of DLM (e.g., Oracle) to hang.
Fixes a problem that would cause an error from awk(1) when
modifying an ASE service that contained a large number of LSM volumes.
The
error would prevent the service from being properly modified.
Fixes a problem where LSM disk information was not properly
updated in the ASE database when volumes were removed from a disk service.
Fixes a problem that caused shell errors if an invalid mount
option was specified via the asemgr menu.
|
Patch 118.00
TCR150-093
|
Patch:
mountd exits without error during boot
State:
New
This patch fixes a problem that could cause
mountd to exit without error during boot.
|
Patch 120.00
TCR150-098
|
Patch:
Fix for Memory Channel API node crash
State:
New.
Supersedes patches TCR150-010 (9.00), TCR150-019 (17.00),
TCR150-019B (42.00), TCR150-039B (59.00), TCR150-040B (61.00), TCR150-090
(106.00)
This patch fixes the following problems:
Problem with the Memory Channel API whereby the function imc_asalloc
did not allow a negative key (most significant bit of key being set).
Problem that caused mcm_init to core dump when resolver fails
on system boot.
Problem in which a resolver failure produces an unhelpful
error message from mcm_init on boot.
Problem with the Memory Channel API whereby the function
imc_ckerrcnt was signifying an error had occured when in fact no error
had occurred.
The following is the error code seen when running an MPI code:
[5]MPI Die-ump2chck.c 91 "ump_wait failure" (-16)
Fixes a problem that can cause a panic in mcs_wait_cluster_event()
when using the Memory Channel API.
Fixes a problem with the Memory Channel API whereby a node
crashes holding an MC-API lock, under certain circumstances the lock will
not be released after the node crashes.
|
Patch 121.00
TCR150-097
|
Patch:
clumember produces error msg during system startup
State:
New.
Supersedes patches TCR150-002 (1.00), TCR150-015 (31.00),
TCR150-021 (19.00), TCR150-026 (24.00), TCR150-029 (26.00), TCR150-052 (64.00),
TCR150-065 (76.00), TCR150-078 (90.00), TCR150-069 (83.00), TCR150-082 (98.00),
TCR150-088 (104.00), TCR150-094 (111.00)
This patches fixes the
following problems:
Problem booting a second member into a cluster.
In a virtual hub cluster, shutting down one node can cause
the other to crash.
Typical panic strings on the node that crashes are as
follows:
rm_failover_self
and
rm_failover_all: target rail offline
Various repairs in Memory Channel error handling.
Fixes for
virtual hub booting with cable unplugged.
Various problems with MC errror handling discovered in cable
pull under load tests.
Hubless MC2 systems hang during boot and/or experience error
interrupts.
Reliable datagram (RDG) messaging support.
RDG: bug fix to the completion queue synchronization protocol.
Fixes a kernel memory fault in rm_lock_update_retry().
Fixes a problem where both nodes in a cluster will panic at
the same time with a simple_lock timeout panic.
Fixes a problem which can cause the following panic:
panic (cpu 0): rm_update_single_lock_miss: time limit exceeded
Fixes a problem where /sbin/init.d/clumember produces an error
message during system startup if DRD_AUTO_FAILOVER is not defined in /etc/rc.config.
Fixes a problem that could cause a TruCluster Production server
member to hang during boot, and can cause a "simple lock time limit exceeded"
panic.
Fixes a problem that could cause an error to be returned when
the Cluster software should wait until a global lock is freed.
|
Patch 123.00
TCR150-101
|
Patch:
TCR Available Server and Production Server Fixes
State:
Supersedes patches TCR150-003 (2.00), TCR150-009 (8.00),
TCR150-011 (10.00), TCR150-017 (15.00), TCR150-018 (16.00), TCR150-020 (18.00),
TCR150-023 (21.00), TCR150-024-1 (22.01), TCR150-024-2 (38.00), TCR150-033
(30.00), TCR150-037 (44.00), TCR150-051 (53.00), TCR150-032A (56.00), TCR150-005
(4.00), TCR150-038 (45.00), TCR150-043A (62.00), TCR150-048 (51.00), TCR150-056
(69.00), TCR150-049A (67.00), TCR150-061 (73.00), TCR150-060A (72.00), TCR150-062A
(74.00), TCR150-063A (75.00), TCR150-064A (81.00), TCR150-068A (82.00), TCR150-071
(84.00), TCR150-073A (85.00), TCR150-075 (87.00), TCR150-076 (88.00), TCR150-077
(89.00), TCR150-080A (91.00), TCR150-081B (109.00), TCR150-084 (100.00), TCR150-087
(103.00), TCR150-091 (107.00), TCR150-092 (108.00), TCR150-100 (112.00), TCR150-099
(113.00), TCR150-096 (114.00), TCR150-095 (116.00)
This patch
fixes the following problems:
Fixes a problem in the message service routines used by the
daemons in TruCluster Available Server and TruCluster Production Server
software.
When the message queue fills, the following message is entered
in the daemon.log file, but the queue is not emptied:
msgSvc: message queue overflow, LOST MESSAGE!
From this point on, no further messages will be received.
Fixes a problem in Version 1.5 of the TruCluster Production
Server and TruCluster Available Server products where, during the start of
a service, missing special device files were not being created for HSZ
disks.
Since the special device files did not get created, the service
start would fail.
Fixes a segmentation fault that can cause ASE daemons to exit
or hang.
Fixes a problem where the Host Status Monitor (asehsm) incorrectly
reports a network down (HSM_NI_STATUS DOWN) if the counters for the network
interface get zeroed.
Fixes a problem that caused the asedirector to core dump if
asemgr processes were modifying services from more than one node in the
cluster at the same time.
Fixes scalability problems in the DECsafe Available Server,
TruCluster Available Server, and TruCluster Production Server products.
The problems caused the asemgr to core dump when adding or modifying services
with a large number of disks.
Fixes several problems related to ASE service relocation and
reporting in the event of network failures.
Fixes a problem that could cause the ASE daemons or asemgr
utility to core dump with a segmentation violation.
Fixes a problem where the ASE management utility, asemgr,
consumes increasing amounts of memory when invoked to add several services
to the database at one time.
Under certain circumstances it could consume
all the available memory, causing allocation failures.
Fixes two related problems:
Initialize hostname field properly because lower-layer code
may de-reference it.
Handle an error from IPToHost() properly.
Failure to handle
this error properly could result in the aseagent core dumping.
|
Patch 123.00
continued
|
Corrects a problem in TCR 1.5 that would fail to recognize
HOST_DISC as an up and running state.
Corrects problems with temporary files not being removed and
eliminates the need for one temporary file.
Fixes a problem that can cause the asemgr utility to core
dump when modifying services that contain a large number of disks.
Fixes a problem in the ASE API shared library that can cause
Networker (DECNSR) to core dump if there are no services defined in an ASE.
Fixes a problem than can cause applications, like Networker,
which use the shared library: libaseapi.so, to core dump when trying to
get the cluster name.
Fixes a problem in the ASE API shared library (libaseapi.so)
that could cause Networker to core dump.
Fixes an ASE proablem where under certain circumstances the
service scripts could cause the ase agent to loop during a start or stop
service.
Fixes a problem in which asemgr core dumps when adding multiple
services in a single session.
Fixes a number of ASE behavior problems resulting from network
cable failure.
Fixes several TCR problems involving large sites with services
containing large numbers of DRDs.
Fixes a problem that caused the ASE daemons and asemgr to
core dump when the lookup for an IP address failed.
This is a performance improvement in the startup of start
scripts.
It will reduce the necessary system calls to start the scripts.
Corrects a problem in which a member add will fail in a large
ASE environment.
Corrects a problem with Networker displaying garbage characters
following service names.
It occurs when the service name is 8 characters
or greater.
Corrects a problem which causes asemgr to core dump when modifying
a drd service to add more than 200 devices in a single service.
Corrects a problem which causes asemgr to core dump when modifying
a drd service to add more than 200 devices in a single service.
Fixes a problem which caused a service not to start when there
was a short network failure.
This was seen only with long running stop scripts
and special network configurations.
Fixes a bug where ASE picks up an extra socket after failing
over.
Corrects a problem which causes an aseagent to hang when restarting
the ASE member.
Fixes the following TCR problems:
After error events are processed, a timing hole exists
whereby important events can be lost.
After a HSZ controller failure, SCSI device reservations
could get lost because the error events are not being ordered properly.
|
Patch 123.00
continued
|
Corrects a problem where modifying a service with a large
number of DRDs will fail and a "could not malloc" message is seen in the daemon.log.
Fixes a problem that caused the asemgr utility to not run
when called from a program that is owned by root and has the setuid bit turned
on.
Corrects a problem in which a network cable failure that corrects
within 7 seconds of the failure can leave the services in a bad state.
Fixes a problem that caused the asemgr to get a memory fault
when adding multiple services in a row.
Fixes a problem where timeout values of greater than 30 seconds
in /etc/hsm.conf would cause ASE agent to fail at start up.
Fixes two issues with clusters:
Cluster is brought up with ASE off, other members report
it as UP and RUNNING instead of UP and UNKNOWN.
When a restricted service is running on a member, and asemember
stop or aseam stop is executed, the service status is still reported as
the member name, instead of Unassigned.
Fixes a problem that caused the asemgr to report that a disk,
or mount point, was in multiple services when modifying a service name.
Fixes a bug where the aseagent will occasionally core dump
on a SCSI bus hang.
Fixes a problem with the ASE application from reporting an
incorrect status while booting, after installation or while re-initializing
the database.
|
|
|