2 Summary of Available Server Environment Patches

This chapter summarizes the Available Server Environment patches included in Patch Kit-0005.

Table 2-1 lists patches that have been updated.

Table 2-1: Updated Available Server Environment Patches

Patch IDs	Change Summary
Patch 20.00	New
Patches 11.00, 5.00, 21.00	Superseded by Patch 24.00
Patches 4.00, 6.00, 7.00, 8.00, 13.00, 16.00, 17.00, 18.00, 22.00, 23.00	Superseded by Patch 25.00
Patches 4.00, 6.00, 7.00, 8.00, 13.00, 16.00, 17.00, 18.00, 22.00, 23.00	Superseded by Patch 26.00
Patches 2.00, 9.00	Superseded by Patch 19.01

Table 2-2 provides a summary of patches in Patch Kit-0005.

Table 2-2: Summary of Available Server Environment Patches

Patch IDs

Abstract

Patch 3.00

ASE130-003

Patch: Cluster Map Creation Correction

State: Supersedes patch ASE130-001 (01.00)

This patch fixes the following problems:

This problem occurs when an ASE site has defined ifconfig parameters beyond the the netmask in the /etc/rc.config file. When this occurs the cluster map cannot be created and the makeclmap program dumps core. This problem only affects ASE sites with 1.3.

This patch fixes the problem of the cluster monitor not properly identifying all the HSZ40 devices in the cluster map.

Patch 10.00

ASE130-015

Patch: Disk Label, Retry Command Correction

State: Existing

This patch fixes the following problems with Logical Storage Manager (LSM) volumes in a DECsafe Available Server Environment (ASE):

After installing a patch to the LSM voldisk command, the disk labels of LSM disks are inadvertently being reinitialized during service modification. This causes attempts to start the service to fail and leaves the service unassigned.

Certain LSM operations that should have been retried were failing on the first attempt.

Retry messages were not being printed to the log file.

Patch 12.00

ASE130-017

Patch: Correction For Service Aliases

State: Existing

This patch fixes a problem in /var/opt/ASE130/ase/sbin/nfs_ifconfig that corrupts the memory resident routing table and subsequent netstat output (netstat -r) during ASE service failover.

Patch 15.00

ASE130-020

Patch: Recognize KZPBA Correction

State: Existing

This patch adds KZPBA controller support for the ase_fix_config utility.

Patch 19.01

ASE130-024-1

Patch: System Panic, SCSI Error Condition Correction

State: Supersedes patches ASE130-002 (02.00), ASE130-014 (09.00)

This patch fixes the following problems:

Fixes the following problems in the ASE Availability Manager (AM):
- A "simple_lock: time limit exceeded" panic on multi-processor, and system hangs in single processor systems. This can occur when multiple host target mode requests are issued due to SCSI aborts and resets on a shared bus.
- A kernel memory fault panic caused by a race condition when the AM de-initializes.

Fixes three problems in the Availability Manager:
- Kernel memory fault in am_select()
- Unit Attentions exhausting retries
- kernel memory fault in am_ping_complete()

Fixes SCSI host ping drop out problem. This problem would occur 0 to 20 or more times a day with drop out periods of 30 seconds. The problem appears to only happen in ASE environments with three or more members.

Patch 20.00

ASE130-018

Patch: ASE Service Correction

State: New

Running the vquotacheck command on a filesystem participating in an ASE service will cause a system to panic if the service fails over or relocates while the command is in progress.

Patch 24.00

ASE130-028

Patch: ASE Data Base For LSM Correction

State: Supersedes patches ASE130-016 (11.00), ASE130-005 (5.00), ASE130-025 (21.00)

This patch fixes the following problems:

Fixes a problem where changes in the LSM configuration were not being properly handled during the delete of an LSM volume from a service.

Increases the timeout values for the LSM action scripts that are part of the TruCluster Production Server, Available Server and DECsafe Available Server products. The timeouts were too small for large LSM configurations and, under certain conditions, would cause the start of the services to fail, leaving them unassigned.

DECsafe does not correctly support the removal of volumes from AdvFS domains that are assigned to ASE services.

Fixes a problem in which under certain circumstances, an ASE service modification could result in a corrupted configuration data base.

Patch 26.00

ASE130-027B

Patch: Not Properly Handling Error Condition Correction

State: Supersedes patches ASE130-004 (04.00), ASE130-011 (06.00), ASE130-012 (07.00), ASE130-013 (08.00), ASE130-019 (13.00), ASE130-021 (16.00), ASE130-022 (17.00), ASE130-023 (18.00), ASE130-026 (22.00), ASE130-027 (23.00)

This patch fixes the following problems:

Fixes a problem in the message service routines used by the daemons in TruCluster Available Server and Production Server software.
When the message queue fills, the following message is entered in the daemon.log file, but the queue is not emptied:
msgSvc: message queue overflow, LOST MESSAGE!

From this point on, no further messages will be received.

Fixes a problem that may occur in an ASE (either DECsafe ASE Version 1.3, TruCluster Available Server, or TruCluster Production Server) when the ASE encounters connection attempts from hosts whose IP addresses cannot be resolved to hostnames. Instead of printing a warning about a possible security breach, the ASE daemons will core dump with a segmentation violation. One cause of this problem may be unknown hosts on the network using public domain internet security software which scans all TCP ports on remote hosts.

This patch is part of the set of DIGITAL UNIX patches required to support the HSZ70 UltraSCSI Raid Array controller on the KZPSA adapter under ASE 1.3.

This patch corrects a problem whereby the ASE agent daemon (aseagent), ASE director daemon (asedirector), the trigger-action server daemon (tractd), or the submon process fails and exits without a core file if a SIGPIPE or other stray signal occurs.

Pulling a network cable on all ASE members results in the asedirector exiting. Replacing the cable in any ASE member would not start a director.
The director restart logic in the agent was not starting a director in some cases that it should have been. All cases are now explicitly handled in this code. This fixed a number of director restart problems related to network cable pulls.

Pulling all monitored network interface cables on the machine running the asedirector and a service can result in another machine starting a new director and starting the same service before it has been fully stopped on the first machine. This is especially noticeable when a service takes a long time to stop.
This patch causes the host status monitor (asehsm) to actively go out and learn current member states before responding to the director with member state information.

Fixes problem where reports about the ASE environment observed via the Cluster Monitor program (cmon) may be missing or incomplete.

Patch 26.00

continued

Fixes a problem where the cluster monitor either will not come up or has incomplete, or obviously incorrect, cluster status information.
This problem occurs on systems running ASE where, because of a network disconnect, hundreds of error messages were being logged in the daemon.log file. This file contained very large numbers of ASE_INQ_SERVICE failed messages or other similar messages.
The /usr/sbin/submon daemon fills the log with hundreds or thousands of "ASE_INQ_SERVICES failed or hung up" messages following a disconnect by the ASE director.

Fixes a problem where the Host Status Monitor (asehsm) incorrectly reports a network down (HSM_NI_STATUS DOWN) if the counters for the network interface get zeroed.

Fixes a problem that caused the asedirector to core dump if asemgr processes were modifying services from more than one node in the cluster at the same time.

Patch 27.00

ASE130-031

Patch: asemgr Core Dumps

This patch fixes the following problems:

This patch corrects a problem in which the asemgr can core dump when adding a member back into an ASE.

Fixes a problem in the message service routines used by the daemons in TruCluster Available Server and Production Server software.
When the message queue fills, the following message is entered in the daemon.log file, but the queue is not emptied:
msgSvc: message queue overflow, LOST MESSAGE!

From this point on, no further messages will be received.

Fixes a problem that may occur in an ASE (either DECsafe ASE Version 1.3, TruCluster Available Server, or TruCluster Production Server) when the ASE encounters connection attempts from hosts whose IP addresses cannot be resolved to hostnames. Instead of printing a warning about a possible security breach, the ASE daemons will core dump with a segmentation violation. One cause of this problem may be unknown hosts on the network using public domain internet security software which scans all TCP ports on remote hosts.

This patch is part of the set of DIGITAL UNIX patches required to support the HSZ70 UltraSCSI Raid Array controller on the KZPSA adapter under ASE 1.3.

This patch corrects a problem whereby the ASE agent daemon (aseagent), ASE director daemon (asedirector), the trigger-action server daemon (tractd), or the submon process fails and exits without a core file if a SIGPIPE or other stray signal occurs.

Pulling a network cable on all ASE members results in the asedirector exiting. Replacing the cable in any ASE member would not start a director.
The director restart logic in the agent was not starting a director in some cases that it should have been. All cases are now explicitly handled in this code. This fixed a number of director restart problems related to network cable pulls.

Pulling all monitored network interface cables on the machine running the asedirector and a service can result in another machine starting a new director and starting the same service before it has been fully stopped on the first machine. This is especially noticeable when a service takes a long time to stop.
This patch causes the host status monitor (asehsm) to actively go out and learn current member states before responding to the director with member state information.

Fixes problem where reports about the ASE environment observed via the Cluster Monitor program (cmon) may be missing or incomplete.

Patch 27.00

continued

Fixes a problem where the cluster monitor either will not come up or has incomplete, or obviously incorrect, cluster status information.
This problem occurs on systems running ASE where, because of a network disconnect, hundreds of error messages were being logged in the daemon.log file. This file contained very large numbers of ASE_INQ_SERVICE failed messages or other similar messages.
The /usr/sbin/submon daemon fills the log with hundreds or thousands of "ASE_INQ_SERVICES failed or hung up" messages following a disconnect by the ASE director.

Fixes a problem where the Host Status Monitor (asehsm) incorrectly reports a network down (HSM_NI_STATUS DOWN) if the counters for the network interface get zeroed.

Fixes a problem that caused the asedirector to core dump if asemgr processes were modifying services from more than one node in the cluster at the same time.