Patch IDs |
Abstract |
Patch 3.00
TCR141-003 |
Patch:
Correction For DRD I/O Hangs When No CPU In Slot 0
State:
Existing
This fixes a problem that occurs on
all AlphaServer 8200 systems and on AlphaServer 8400 systems having certain
nonstandard configurations.
When there is no CPU in slot 0, remote DRD I/O
operations hang. |
Patch 4.00
TCR141-004 |
Patch:
Correction For Distributed Lock Manager Hang
State:
Existing
This patch fixes a problem that occurrs
when MEMORY CHANNEL errors are encountered at the same time that a particular
code path is executed.
When these events occurr simultaneously, the distributed
lock manager (DLM) would hang.
The likelihood of this problem occurring is
low. |
Patch 6.00
TCR141-006 |
Patch:
tractd Corrections
State:
Existing
This patch corrects the following:
Fixes a problem where the Cluster Monitor (cmon) in some cases
may display incomplete or incorrect ASE service status and node UP/DOWN
status.
Fixes a problem with complete depletion of system socket resources,
the result of tractd daemons doing repeated connect retries.
This problem
is most commonly seen when all nodes in a three- or four-node cluster are
booted simultaneously.
Dramatically reduces tractd daemon interconnect delays seen
when multiple cluster nodes are booted simultaneously.
These delays are
reduced from the 5+ minutes range in the case of four node clusters, to
just a few seconds.
In addition, the interconnects in these circumstances
are more reliably complete.
|
Patch 7.00
TCR141-007 |
Patch:
Memory Channel Memory Allocation Corrections
State:
Existing
This patch fixes a problem which caused
the "map_RM_receive" panic to occur in some cases.
This problem may also
be seen as distributed raw disk (DRD) print warnings on the console if the
drd-mc-drd-print-warn parameter is set in the /etc/sysconfigtab file. |
Patch 21.00
TCR141-021
|
Patch:
lsm_dg_action Correction
State:
Existing
This patch fixes two problems that were causing certain
LSM actions to not be retried upon failure, even though the conditions that
caused the failures were only temporary. |
Patch 24.00
TCR141-009
|
Patch:
Network interface and Routing Corrections
State:
Existing
This patch fixes the following problems:
During the failover of an ASE service, the removal of the
-alias parameter from the /var/ase/sbin/nfs_ifconfig file caused the routing
file to become corrupted.
When removing and adding services in an available server environment
(ASE) using multiple network interfaces, the gated daemon would be started
even when value of the ASEROUTING variable in the /etc/rc.config file is
"no."
|
Patch 25.00
TCR141-025
|
Patch:
Distributed Lock Manager Corrections
State:
Existing
This patch fixes a problem in TruCluster
Production Server Software that can cause a cluster member to panic during
a shutdown. |
Patch 27.00
TCR141-027
|
Patch:
Correction for KZPBA controllers
State:
Existing
Without this patch the ase_fix_config utility
will not recognize KZPBA controllers. |
Patch 28.00
TCR141-028
|
Patch:
Correction for KZPBA SCSI controllers
State:
Existing
This patch replaces the /usr/sbin/clu_ivp
script with a new script that will recognize the "isp" KZPBA SCSI controllers.
Without this patch the clu_ivp program will ignore these controllers. |
Patch 30.00
TCR141DX-002
|
Patch:
Cluster Monitor Hang Correction
State:
Existing
If an ASE service is renamed, any running
Cluster Monitor (cmon) will lockup and hang.
This occurs whether the rename
was done from within cmon or independent of cmon. |
Patch 32.00
TCR141-033
|
Patch:
Booting Node Hang Correction
State:
Existing
Fixes a problem where a booting node hangs in the imc_init
command.
A re-reboot would also hang in imc_init, requiring a reboot of all
members. |
Patch 33.01
TCR141-034-1
|
Patch:
Kern Mem Fault And simple_lock Panic Correction
State:
Supersedes patches TCR141-011 (11.00), TCR141-019 (23.00)
This patch corrects the following:
Fixes the following problems in the ASE Availability Manager
(AM):
A "simple_lock: time limit exceeded" panic on multi-processor,
and system hangs in single processor systems.
This can occur when multiple
host target mode requests are issued due to SCSI aborts and resets on
a shared bus.
A kernel memory fault panic caused by a race condition when
the AM de-initializes.
Fixes a kernel memory fault in am_select() in the Availability
Manager.
Fixes a problem where the aseagent process goes into a U state
when another ASE member leaves the cluster, due to the aseagent process
waiting on a SCSI ping request that never completes.
|
Patch 35.00
TCR141-036
|
Patch:
rm_spur Driver Correction
State:
Supersedes patch TCR141-002 (2.00)
This patch corrects the following
problems:
Eliminates the loss of a cluster node when "sysconfig -q rm"
is run after the cluster has formed.
Allows more time to remove a node from an 8-node cluster before
causing the system to panic.
Corrects some instances on busy clusters when the software
does not realize a node has gone down.
Corrects the sense of the long/short heartbeat timeout delay
in virtual hub systems, and enables code that allows the system to see a
hub power up after it has been powered down.
|
Patch 47.00
TCR141-013B
|
Patch:
Memory Channel API Shared Library Correction
State:
Supersedes patch TCR141-013 (13.00)
This patch
fixes various problems in the MEMORY CHANNEL API.
In particular, changes
were made to ensure that the API is thread safe, that locks are properly acquired
and released, and to increase performance and reliability. |
Patch 48.00
TCR141-013-1
|
Patch:
Memory Channel API Static Library Correction
State:
Supersedes patch TCR141-013 (13.00)
This patch
fixes various problems in the MEMORY CHANNEL API.
In particular, changes
were made to ensure that the API is thread safe, that locks are properly acquired
and released, and to increase performance and reliability. |
Patch 50.00
TCR141-045B
|
Patch:
LSM and AdvFS Corrections
State:
Supersedes patches TCR141-041 (39.00), TCR141-048 (45.00)
This
patch fixes the following problems:
Increases the timeout values for the LSM action scripts that
are part of the TruCluster Production Server, Available Server and DECsafe
Available Server products.
The timeouts were too small for large LSM configurations
and, under certain conditions, would cause the start of the services to
fail, leaving them unassigned.
Fixes a problem in which under certain cercumstances, an ASE
service modification could result in a corrupted configuration data base.
|
Patch 52.00
TCR141-044C
|
Patch:
Message Service Routine Fixes
State:
Supersedes patches TCR141-005 (5.00), TCR141-029 (29.00), TCR141-035 (34.00),
TCR141-038 (36.00), TCR141-042 (40.00), TCR141-043 (41.00), TCR141-044-1 (42.01)
This patch corrects the following:
Fixes a problem in the message service routines used by the
daemons in TruCluster Available Server and Production Server software.
When the message queue fills, the following message is entered in the
daemon.log file, but the queue is not emptied:
msgSvc: message queue overflow, LOST MESSAGE!
From this point on, no further messages will be received.
Fixes the following problems in the ASE Availability Manager
(AM):
A "simple_lock: time limit exceeded" panic on multi-processor,
and system hangs in single processor systems.
This can occur when multiple
host target mode requests are issued due to SCSI aborts and resets on
a shared bus.
A kernel memory fault panic caused by a race condition when
the AM de-initializes.
Fixes a problem where, during an orderly shutdown (init 0),
the ASE agent shuts down the director before shutting down the services.
Causes the host status monitor (asehsm) to actively go out
and learn current member states before responding to the director with
member state information.
Pulling all monitored network interface cables on the machine
running the asedirector and a service can result in another machine starting
a new director and starting the same service before it has been fully
stopped on the first machine.
This is especially noticeable when a service
takes a long time to stop.
Fixes a problem that caused the asedirector to core dump if
asemgr processes were modifying services from more than one node in the
cluster at the same time.
Fixes a problem where the Host Status Monitor (asehsm) incorrectly
reports a network down (HSM_NI_STATUS DOWN) if the counters for the network
interface get zeroed.
Fixes scalability problems in the DECsafe Available Server,
TruCluster Available Server and TruCluster Production Server products.
The problems caused the asemgr to core dump when adding or modifying services
with a large number of disks.
|
Patch 53.00
TCR141-049
|
Patch:
ASE Check Service Script May Be Corrupt
State:
New
This patch corrects a problem in which an
ASE check service script could become corrupted in the ASE configuration data
base.
|
Patch 56.00
TCR141-052
|
Patch:
LSM Disk Info Not Properly Updated In ASE DB
State:
Supersedes patches TCR141-016 (16.00), TCR141-041 (39.00),
TCR141-039 (37.00), TCR141-048 (45.00), TCR141-045 (43.00), TCR141-045-1 (49.00)
This patch fixes the following problems:
Provides support in asemgr for the new AdvFS mount option
"-o noatimes".
Fixes a problem where changes in the LSM configuration were
not being properly handled during the delete of an LSM volume from a service.
Increases the timeout values for the LSM action scripts that
are part of the TruCluster Production Server, Available Server and DECsafe
Available Server products.
The timeouts were too small for large LSM configurations
and, under certain conditions, would cause the start of the services to
fail, leaving them unassigned.
Fixes a problem in which under certain cercumstances, an ASE
service modification could result in a corrupted configuration data base.
Fixes a problem where LSM disk information was not properly
updated in the ASE database when volumes were removed from a disk service.
|
Patch 60.00
TCR141-056
|
Patch:
Fix For AdvFS Panic
State:
Supersedes patch TCR141-032 (31.00)
This patch corrects the following:
Fixes a problem in which running the vquotacheck command on
a filesystem participating in an ASE service will cause a system to panic
if the service fails over or relocates while the command is in progress.
Fixes a problem that could cause an AdvFS panic when a service
that has quotas enabled is relocated.
The problem occurs if a command is
running that has a large number of arguments (>99).
|
Patch 61.00
TCR141-058A
|
Patch:
asemgr May Core Dump
State:
Supersedes patches TCR141-005 (5.00), TCR141-029 (29.00), TCR141-035 (34.00),
TCR141-038 (36.00), TCR141-042 (40.00), TCR141-043 (41.00), TCR141-044-1 (42.01),
TCR141-044-2 (51.00), TCR141-050 (54.00), TCR141-051 (55.00), TCR141-053A
(57.00)
This patch corrects the following:
Fixes a problem in the message service routines used by the
daemons in TruCluster Available Server and Production Server software.
When the message queue fills, the following message is entered in the
daemon.log file, but the queue is not emptied:
msgSvc: message queue overflow, LOST MESSAGE!
From this point on, no further messages will be received.
Fixes the following problems in the ASE Availability Manager
(AM):
A "simple_lock: time limit exceeded" panic on multi-processor,
and system hangs in single processor systems.
This can occur when multiple
host target mode requests are issued due to SCSI aborts and resets on
a shared bus.
A kernel memory fault panic caused by a race condition when
the AM de-initializes.
Fixes a problem where, during an orderly shutdown (init 0),
the ASE agent shuts down the director before shutting down the services.
Pulling all monitored network interface cables on the machine
running the asedirector and a service can result in another machine starting
a new director and starting the same service before it has been fully
stopped on the first machine.
This is especially noticeable when a service
takes a long time to stop.
Fixes a problem that caused the asedirector to core dump if
asemgr processes were modifying services from more than one node in the
cluster at the same time.
Fixes a problem where the Host Status Monitor (asehsm) incorrectly
reports a network down (HSM_NI_STATUS DOWN) if the counters for the
network interface get zeroed.
Fixes scalability problems in the DECsafe Available Server,
TruCluster Available Server, and TruCluster Production Server products.
The problems caused the asemgr to core dump when adding or modifying services
with a large number of disks.
Fixes a problem where the ASE management utility, asemgr,
consumes increasing amounts of memory when invoked to add several services
to the database at one time.
Under certain circumstances it could consume
all the available memory, causing allocation failures.
Fixes two related problems:
Initializes hostname field properly because lower-layer
code may de-reference it.
Handles an error from IPToHost() properly.
Failure to handle
this error properly could result in the aseagent core dumping.
|
Patch 61.00
continued |
|
Patch 62.00
TCR141-059
|
Patch:
Node Panics With String dlm_panic
State:
Supersedes patches TCR141-014 (14.00), TCR141-022 (22.00), TCR141-026
(26.00), TCR141-040 (38.00), TCR141-046 (44.00), TCR141-054 (58.00), TCR141-055
(59.00)
This patch corrects the following:
Fixes a problem in the TruCluster Production Server Software
in which a system can panic with:
rcv_invvalb_req: value block out of sequence
Two problems in the TruCluster Distributed Lock Manager (DLM):
one resulting from a process's effective group ID not being checked when
a process attempts to join a namespace, another in which repeated calls
to the dlm_quecvt function would erroneously return DLM_LKBUSY status.
An assertion panic that occurs after a large number of transactions
are made using the same lock.
The assertion panic is triggered by integer
wrapping of the lock transaction ID field.
The system may panic with "dlm_panic".
The actual assertion message is "<lkbp->lk_txid == 0>".
An erroneous assertion involving deadlock search.
The system
may panic with "dlm_panic".
The actual assertion message is "<otxid
!= (dlm_trans_id_t)-1>".
Fixes a problem that can cause a cluster member to panic in
rcv_deqlk_msg() with the panic string set to:
dlm_panic
Fixes a system panic with the following message:
snd_grantlk_msg: no memory for message
Fixes a dlm_panic if a process is exiting and a rebuild for
the Distributed Lock Manager (DLM) takes place.
Fixes a problem that caused the command: "sysconfig
-q dlm" to hang if DLM is currently suspended.
Fixes a problem in TruCluster in which a node panics with
the string "dlm_panic".
|
Patch 64.00
TCR141-058B
|
Patch:
Kernel Memory Fault Panic
State:
Supersedes patches TCR141-005 (5.00), TCR141-029 (29.00), TCR141-035 (34.00),
TCR141-038 (36.00), TCR141-042 (40.00), TCR141-043 (41.00), TCR141-044 (42.00),
TCR141-044B (46.00), TCR141-053B (63.00)
This patch corrects the
following:
Fixes a problem in the message service routines used by the
daemons in TruCluster Available Server and Production Server software.
When the message queue fills, the following message is entered in the
daemon.log file, but the queue is not emptied:
msgSvc: message queue overflow, LOST MESSAGE!
From this point on, no further messages will be received.
Fixes the following problems in the ASE Availability Manager
(AM):
A "simple_lock: time limit exceeded" panic on multi-processor,
and system hangs in single processor systems.
This can occur when multiple
host target mode requests are issued due to SCSI aborts and resets on
a shared bus.
A kernel memory fault panic caused by a race condition when
the AM de-initializes.
Fixes a problem where, during an orderly shutdown (init 0),
the ASE agent shuts down the director before shutting down the services.
Causes the host status monitor (asehsm) to actively go out
and learn current member states before responding to the director with
member state information.
Pulling all monitored network interface cables on the machine
running the asedirector and a service can result in another machine starting
a new director and starting the same service before it has been fully
stopped on the first machine.
This is especially noticeable when a service
takes a long time to stop.
Fixes a problem that caused the asedirector to core dump if
asemgr processes were modifying services from more than one node in the
cluster at the same time.
Fixes a problem where the Host Status Monitor (asehsm) incorrectly
reports a network down (HSM_NI_STATUS DOWN) if the counters for the network
interface get zeroed.
Fixes scalability problems in the DECsafe Available Server,
TruCluster Available Server and TruCluster Production Server products.
The problems caused the asemgr to core dump when adding or modifying services
with a large number of disks.
Fixes the following problems:
The 'asemgr -dv' command core dumps if no services are defined.
When deleting a service that has LSM and/or AdvFS volumes,
the asemgr utility prompts for a member on which to leave the
LSM/AdvFS information so that it can be re-used.
If ASE cannot resolve
the IP address for the member, asemgr or aseagent, will core dump.
Fixes a problem that can cause the asemgr utility to core
dump when modifying services that contain a large number of disks.
|
|
|