Patch IDs |
Abstract |
Patch 3.00
TCR141-003 |
Patch:
Correction For DRD I/O Hangs When No CPU In Slot 0
State:
Existing
This fixes a problem that occurs on
all AlphaServer 8200 systems and on AlphaServer 8400 systems having certain
nonstandard configurations.
When there is no CPU in slot 0, remote DRD I/O
operations hang. |
Patch 4.00
TCR141-004 |
Patch:
Correction For Distributed Lock Manager Hang
State:
Existing
This patch fixes a problem that occurrs
when MEMORY CHANNEL errors are encountered at the same time that a particular
code path is executed.
When these events occurr simultaneously, the distributed
lock manager (DLM) would hang.
The likelihood of this problem occurring is
low. |
Patch 6.00
TCR141-006 |
Patch:
tractd Corrections
State:
Existing
This patch corrects the following:
Fixes a problem where the Cluster Monitor (cmon) in some cases
may display incomplete or incorrect ASE service status and node UP/DOWN
status.
Fixes a problem with complete depletion of system socket resources,
the result of tractd daemons doing repeated connect retries.
This problem
is most commonly seen when all nodes in a three- or four-node cluster are
booted simultaneously.
Dramatically reduces tractd daemon interconnect delays seen
when multiple cluster nodes are booted simultaneously.
These delays are
reduced from the 5+ minutes range in the case of four node clusters, to
just a few seconds.
In addition, the interconnects in these circumstances
are more reliably complete.
|
Patch 7.00
TCR141-007 |
Patch:
Memory Channel Memory Allocation Corrections
State:
Existing
This patch fixes a problem which caused
the "map_RM_receive" panic to occur in some cases.
This problem may also
be seen as distributed raw disk (DRD) print warnings on the console if the
drd-mc-drd-print-warn parameter is set in the /etc/sysconfigtab file. |
Patch 21.00
TCR141-021
|
Patch:
lsm_dg_action Correction
State:
Existing
This patch fixes two problems that were causing certain
LSM actions to not be retried upon failure, even though the conditions that
caused the failures were only temporary. |
Patch 24.00
TCR141-009
|
Patch:
Network interface and Routing Corrections
State:
Existing
This patch fixes the following problems:
During the failover of an ASE service, the removal of the
-alias parameter from the /var/ase/sbin/nfs_ifconfig file caused the routing
file to become corrupted.
When removing and adding services in an available server environment
(ASE) using multiple network interfaces, the gated daemon would be started
even when value of the ASEROUTING variable in the /etc/rc.config file is
"no."
|
Patch 25.00
TCR141-025
|
Patch:
Distributed Lock Manager Corrections
State:
Existing
This patch fixes a problem in TruCluster
Production Server Software that can cause a cluster member to panic during
a shutdown. |
Patch 27.00
TCR141-027
|
Patch:
Correction for KZPBA controllers
State:
Existing
Without this patch the ase_fix_config utility
will not recognize KZPBA controllers.
|
Patch 28.00
TCR141-028
|
Patch:
Correction for KZPBA SCSI controllers
State:
Existing
This patch replaces the /usr/sbin/clu_ivp
script with a new script that will recognize the "isp" KZPBA SCSI controllers.
Without this patch the clu_ivp program will ignore these controllers. |
Patch 30.00
TCR141DX-002
|
Patch:
Cluster Monitor Hang Correction
State:
Existing
If an ASE service is renamed, any running
Cluster Monitor (cmon) will lockup and hang.
This occurs whether the rename
was done from within cmon or independent of cmon. |
Patch 31.00
TCR141-032
|
Patch:
ase_mount_action Correction
State:
Existing
Fixes a problem in which running the vquotacheck command
on a filesystem participating in an ASE service will cause a system to panic
if the service fails over or relocates while the command is in progress. |
Patch 32.00
TCR141-033
|
Patch:
Booting Node Hang Correction
State:
Existing
Fixes a problem where a booting node hangs in the imc_init
command.
A re-reboot would also hang in imc_init, requiring a reboot of all
members. |
Patch 33.01
TCR141-034-1
|
Patch:
Kern Mem Fault And simple_lock Panic Correction
State:
Supersedes patches TCR141-011 (11.00), TCR141-019 (23.00)
This patch corrects the following:
Fixes the following problems in the ASE Availability Manager
(AM):
A "simple_lock: time limit exceeded" panic on multi-processor,
and system hangs in single processor systems.
This can occur when multiple
host target mode requests are issued due to SCSI aborts and resets on
a shared bus.
A kernel memory fault panic caused by a race condition when
the AM de-initializes.
Fixes a kernel memory fault in am_select() in the Availability
Manager.
Fixes a problem where the aseagent process goes into a U state
when another ASE member leaves the cluster, due to the aseagent process
waiting on a SCSI ping request that never completes.
|
Patch 35.00
TCR141-036
|
Patch:
rm_spur Driver Correction
State:
Supersedes patch TCR141-002 (2.00)
This patch corrects the following
problems:
Eliminates the loss of a cluster node when "sysconfig -q rm"
is run after the cluster has formed.
Allows more time to remove a node from an 8-node cluster before
causing the system to panic.
Corrects some instances on busy clusters when the software
doesn't realize a node has gone down.
Corrects the sense of the long/short heartbeat timeout delay
in virtual hub systems, and enables code that allows the system to see a
hub power up after it has been powered down.
|
Patch 44.00
TCR141-046
|
Patch:
Lock Manager Corrections
State:
Supersedes patches TCR141-014 (14.00), TCR141-022 (22.00), TCR141-026 (26.00),
TCR141-040 (38.00)
This patch corrects the following:
Fixes a problem in the TruCluster Production Server Software
in which a system can panic with:
rcv_invvalb_req: value block out of sequence
Two problems in the TruCluster Distributed Lock Manager (DLM):
one resulting from a process's effective group ID not being checked when
a process attempts to join a namespace, another in which repeated calls
to the dlm_quecvt function would erroneously return DLM_LKBUSY status.
An assertion panic that occurs after a large number of transactions
are made using the same lock.
The assertion panic is triggered by integer
wrapping of the lock transaction ID field.
The system may panic with "dlm_panic".
The actual assertion message is "lk_txid == 0>".
An erroneous assertion involving deadlock search.
The system
may panic with "dlm_panic".
The actual assertion message is "<otxid
!= (dlm_trans_id_t)-1>".
Fixes a problem that can cause a cluster member to panic in
rcv_deqlk_msg() with the panic string set to:
dlm_panic
Fixes a system panic with the following message:
"snd_grantlk_msg: no memory for message"
|
Patch 46.00
TCR141-044B
|
Patch:
Kernel Memory Fault Panic
State:
Supersedes patches TCR141-005 (5.00), TCR141-029 (29.00), TCR141-035 (34.00),
TCR141-038 (36.00), TCR141-042 (40.00), TCR141-043 (41.00), TCR141-044 (42.00)
This patch corrects the following:
Fixes a problem in the message service routines used by the
daemons in TruCluster Available Server and Production Server software.
When the message queue fills, the following message is entered in the
daemon.log file, but the queue is not emptied:
msgSvc: message queue overflow, LOST MESSAGE!
From this point on, no further messages will be received.
Fixes the following problems in the ASE Availability Manager
(AM):
A "simple_lock: time limit exceeded" panic on multi-processor,
and system hangs in single processor systems.
This can occur when multiple
host target mode requests are issued due to SCSI aborts and resets on
a shared bus.
A kernel memory fault panic caused by a race condition when
the AM de-initializes.
Fixes a problem where, during an orderly shutdown (init 0),
the ASE agent shuts down the director before shutting down the services.
Causes the host status monitor (asehsm) to actively go out
and learn current member states before responding to the director with
member state information.
Pulling all monitored network interface cables on the machine
running the asedirector and a service can result in another machine starting
a new director and starting the same service before it has been fully
stopped on the first machine.
This is especially noticeable when a service
takes a long time to stop.
Fixes a problem that caused the asedirector to core dump if
asemgr processes were modifying services from more than one node in the
cluster at the same time.
Fixes a problem where the Host Status Monitor (asehsm) incorrectly
reports a network down (HSM_NI_STATUS DOWN) if the counters for the network
interface get zeroed.
Fixes scalability problems in the DECsafe Available Server,
TruCluster Available Server and TruCluster Production Server products.
The problems caused the asemgr to core dump when adding or modifying services
with a large number of disks.
|
Patch 47.00
TCR141-013B
|
Patch:
Memory Channel API Shared Library Correction
State:
Supersedes patch TCR141-013 (13.00)
This patch
fixes various problems in the MEMORY CHANNEL API.
In particular, changes
were made to ensure that the API is thread safe, that locks are properly acquired
and released, and to increase performance and reliability. |
Patch 48.00
TCR141-013-1
|
Patch:
Memory Channel API Static Library Correction
State:
Supersedes patch TCR141-013 (13.00)
This patch
fixes various problems in the MEMORY CHANNEL API.
In particular, changes
were made to ensure that the API is thread safe, that locks are properly acquired
and released, and to increase performance and reliability. |
Patch 49.00
TCR141-045-1
|
Patch:
Support For New AdvFS Mount Option "-o noatimes"
State:
Supersedes patches TCR141-016 (16.00), TCR141-041 (39.00),
TCR141-039 (37.00), TCR141-048 (45.00), TCR141-045 (43.00)
This
patch fixes the following problems:
Provides support in asemgr for the new AdvFS mount option
"-o noatimes".
Fixes a problem where changes in the LSM configuration were
not being properly handled during the delete of an LSM volume from a service.
Increases the timeout values for the LSM action scripts that
are part of the TruCluster Production Server, Available Server and DECsafe
Available Server products.
The timeouts were too small for large LSM configurations
and, under certain conditions, would cause the start of the services to
fail, leaving them unassigned.
Fixes a problem in which under certain cercumstances, an ASE
service modification could result in a corrupted configuration data base.
Fixes a problem where LSM disk information was not properly
updated in the ASE database when volumes were removed from a disk service.
|
Patch 50.00
TCR141-045B
|
Patch:
LSM and AdvFS Corrections
State:
Supersedes patches TCR141-016 (16.00), TCR141-041 (39.00), TCR141-039 (37.00),
TCR141-048 (45.00), TCR141-045 (43.00)
This patch fixes the following
problems:
Provides support in asemgr for the new AdvFS mount option
"-o noatimes".
Fixes a problem where changes in the LSM configuration were
not being properly handled during the delete of an LSM volume from a service.
Increases the timeout values for the LSM action scripts that
are part of the TruCluster Production Server, Available Server and DECsafe
Available Server products.
The timeouts were too small for large LSM configurations
and, under certain conditions, would cause the start of the services to
fail, leaving them unassigned.
Fixes a problem in which under certain cercumstances, an ASE
service modification could result in a corrupted configuration data base.
Fixes a problem where LSM disk information was not properly
updated in the ASE database when volumes were removed from a disk service.
|
Patch 51.00
TCR141-044-2
|
Patch:
Not Properly Handling Error Condition Correction
State:
Supersedes patches TCR141-005 (5.00), TCR141-029 (29.00),
TCR141-035 (34.00), TCR141-038 (36.00), TCR141-042 (40.00), TCR141-043 (41.00),
TCR141-044-1 (42.01)
This patch corrects the following:
Fixes a problem in the message service routines used by the
daemons in TruCluster Available Server and Production Server software.
When the message queue fills, the following message is entered in the
daemon.log file, but the queue is not emptied:
msgSvc: message queue overflow, LOST MESSAGE!
From this point on, no further messages will be received.
Fixes the following problems in the ASE Availability Manager
(AM):
A "simple_lock: time limit exceeded" panic on multi-processor,
and system hangs in single processor systems.
This can occur when multiple
host target mode requests are issued due to SCSI aborts and resets on
a shared bus.
A kernel memory fault panic caused by a race condition when
the AM de-initializes.
Fixes a problem where, during an orderly shutdown (init 0),
the ASE agent shuts down the director before shutting down the services.
Causes the host status monitor (asehsm) to actively go out
and learn current member states before responding to the director with
member state information.
Pulling all monitored network interface cables on the machine
running the asedirector and a service can result in another machine starting
a new director and starting the same service before it has been fully
stopped on the first machine.
This is especially noticeable when a service
takes a long time to stop.
Fixes a problem that caused the asedirector to core dump if
asemgr processes were modifying services from more than one node in the
cluster at the same time.
Fixes a problem where the Host Status Monitor (asehsm) incorrectly
reports a network down (HSM_NI_STATUS DOWN) if the counters for the network
interface get zeroed.
Fixes scalability problems in the DECsafe Available Server,
TruCluster Available Server and TruCluster Production Server products.
The problems caused the asemgr to core dump when adding or modifying services
with a large number of disks.
|
Patch 52.00
TCR141-044C
|
Patch:
Message Service Routine Fixes
State:
Supersedes patches TCR141-005 (5.00), TCR141-029 (29.00), TCR141-035 (34.00),
TCR141-038 (36.00), TCR141-042 (40.00), TCR141-043 (41.00), TCR141-044-1 (42.01)
This patch corrects the following:
Fixes a problem in the message service routines used by the
daemons in TruCluster Available Server and Production Server software.
When the message queue fills, the following message is entered in the
daemon.log file, but the queue is not emptied:
msgSvc: message queue overflow, LOST MESSAGE!
From this point on, no further messages will be received.
Fixes the following problems in the ASE Availability Manager
(AM):
A "simple_lock: time limit exceeded" panic on multi-processor,
and system hangs in single processor systems.
This can occur when multiple
host target mode requests are issued due to SCSI aborts and resets on
a shared bus.
A kernel memory fault panic caused by a race condition when
the AM de-initializes.
Fixes a problem where, during an orderly shutdown (init 0),
the ASE agent shuts down the director before shutting down the services.
Causes the host status monitor (asehsm) to actively go out
and learn current member states before responding to the director with
member state information.
Pulling all monitored network interface cables on the machine
running the asedirector and a service can result in another machine starting
a new director and starting the same service before it has been fully
stopped on the first machine.
This is especially noticeable when a service
takes a long time to stop.
Fixes a problem that caused the asedirector to core dump if
asemgr processes were modifying services from more than one node in the
cluster at the same time.
Fixes a problem where the Host Status Monitor (asehsm) incorrectly
reports a network down (HSM_NI_STATUS DOWN) if the counters for the network
interface get zeroed.
Fixes scalability problems in the DECsafe Available Server,
TruCluster Available Server and TruCluster Production Server products.
The problems caused the asemgr to core dump when adding or modifying services
with a large number of disks.
|
|
|