OpenVMS__SHADOW VAXSHAD04_060 VAX V6.0 VOLUME SHADOWING ECO Summary
NOTE: An OpenVMS saveset or PCSI installation file is stored
on the Internet in a self-expanding compressed file.
The name of the compressed file will be kit_name-dcx_vaxexe
for OpenVMS VAX or kit_name-dcx_axpexe for OpenVMS Alpha.
Once the file is copied to your system, it can be expanded
by typing RUN compressed_file. The resultant file will
be the OpenVMS saveset or PCSI installation file which
can be used to install the ECO.
Copyright (c) Digital Equipment Corporation 1995, 1996. All rights reserved.
************************* CAUTION!! *************************
* *
* Please *READ* the installation instructions in the release *
* notes/cover letter for this ECO kit *BEFORE* you install *
* it on your system. System crashes may occur if this ECO *
* kit is not installed on *every* OpenVMS VAX node and the *
* OpenVMS Alpha (ALPSHAD) version of this kit is not *
* installed on *every* OpenVMS Alpha in a mixed architecture *
* cluster before the cluster is rebooted. *
* *
* Installation of this kit may also change the configuration *
* performance for existing SCSI shadow sets. *
* *
***************************************************************
PRODUCT: Volume Shadowing for OpenVMS (Phase II)
NOTE: The problems fixed in this ECO Kit also affect the
following products:
VAXcluster Software for OpenVMS VAX
DEC TCP/IP Services for VMS (UCX)
OP/SYS: OpenVMS VAX
COMPONENTS: System, Bugcheck, Backup,
Mount, Dismount, MSCP, TMSCP, MTAAACP,
I/O Routines, Audit Server,
Security, System Primitives,
Adaptive Pool Management (APM),
Operator Communication Manager (OPCOM),
User Environmental Test Package (UETP)
SOURCE: Digital Equipment Corporation
ECO INFORMATION:
ECO Kit Name: VAXSHAD04_060
ECO Kits Superseded by and Included in this ECO Kit:
VAXSHAD03_060
VAXSHAD07_061 (For OpenVMS VAX V6.0 ONLY)
VAXSHAD06_061
VAXSHAD05_061
VAXSHAD04_061
VAXSHAD03_061
VAXSHAD01_061 (CSCPAT_1160)
VAXSHAD02_060 (CSCPAT_1116)
VAXSHAD01_060 (CSCPAT_1116)
VAXDRIV02_060 (CSCPAT_1136)
VAXSYS14_061 (For OpenVMS VAX V6.0 ONLY)
VAXSYS12_061
VAXSYS07_061
VAXSYS01_061 (CSCPAT_1113)
VAXSYS04_060 (CSCPAT_1113)
VAXSYS03_060 (CSCPAT_1113, CSCPAT_1124)
VAXSYS01_060 (CSCPAT_1113)
ECO Kit Approximate Size: 2790 Blocks
Kit Applies To: OpenVMS VAX V6.0
System Reboot Necessary: Yes
CAUTION:
Before Installing this Kit, Read the Following Cautions:
After installation of this kit, the following issues may occur:
1) ISSUE: When a node reboots into the cluster there may not
be an OPCOM message that reports the node is joining
the cluster. Absent messages occur on a random
basis.
WORKAROUND: In order to verify the node has entered the
cluster, after the node has fully rebooted, the
user should enter the command:
$ SHOW CLUSTER
to verify the node is a valid member of the
VAXcluster.
2) ISSUE FROM THE CSC: An INVEXCEPTN in SNDRIVER may be seen if
DECnet/SNA V2.1 is used in conjunction
with the IO_ROUTINES from the VAXSHAD
ECO kit. SNAVMS_E04021 (CSCPAT_5041) will
fix this problem by replacing the
incompatible SNDRIVER in DECnet/SNA V2.1
NOTE: SNAVMS_E04021 applies to
DECnet/SNA V2.1 only.
These issues are being addressed and will be corrected in a
future version of OpenVMS VAX.
ECO KIT SUMMARY:
An ECO kit exists for Volume Shadowing on OpenVMS VAX V6.0. This kit
addresses the following problems:
Problems Addressed in the VAXSHAD04_060 Kit for OpenVMS VAX V6.0:
o The VAXSHAD03_060 remedial kit for OpenVMS VAX V6.0 should
have superseded the VAXSYS14_061 remedial kit for OpenVMS
VAX V6.0. This kit supersedes and includes fixes from
VAXSYS14_061.
Problems Addressed in the VAXSHAD03_060 Kit for OpenVMS VAX V6.0:
o After applying the VAXSHAD07_061 kit to the system disk, systems
booting from that disk would no longer boot and would crash in
SYSINIT with a DELCONPFN bugcheck.
Problems Addressed in the VAXSHAD07_061 Kit for OpenVMS VAX V6.0:
o In the VAXSHAD05 and VAXSHAD06 kits two new fields were added
to the IRP data structure for shadow write logging information.
This new IRP definition size conflicts with the IRP sizes of
other images on the system that are not part of the SHADOW kits.
This conflict may cause a variety of errors, including fatal
bugchecks. This fix changes the IRP definitions back to the SBB
versions and adds some special definitions to the SHDRIVER for
the new IRP fields.
o Fatal bugchecks from data structure corruption may occur due
to the addition of the value 10 HEX to the corrupted field.
Crashes are of various types and include node and cluster
crashes, crashes due to invalid UCB addresses, invalid VCB
addresses, invalid member IDs, and invalid number of devices.
Problems Addressed in the VAXSHAD06_061 Kit for OpenVMS VAX V6.0.
o When trying to access a DFS disk, the following error may be
seen:
-SYSTEM-F-FILALRACC, file already accessed on channel
The disk can be accessed immediately after reboot; however,
after a period of time of not accessing the disk, a simple
directory command will return this error.
Problems Addressed in the VAXSHAD05_061 Kit for OpenVMS VAX V6.0:
o After a node crashes, on reboot it cannot mount a Host Based
Volume Shadowing virtual unit. The error message usually
returned is "volume not software enabled"; however, "Medium
Offline" may also be seen. A SHOW DEVICE will show that the
the Shadowset is in 0% merge but SNA will show that a minimerge
is pending.
o A double deallocation crash may occur as the result of MOUNT not
properly initializing the Mounted Volume List (MTL) pointer. This
pointer had a stale value as a result of two calls to SYS$VMOUNT
from a single program. The stale pointer will only cause a problem
if the system is unable to allocate space for defining the logical
name.
NOTE: Since cells are initialized at image activation, this
problem should not occur as a result of DCL commands.
o Tape devices with stacker/loaders, such as the TF857, may take
up to 6 minutes to rewind/unload/load the next tape. In
VAXSHAD01_061, a change was made to the behavior of MOUNT to take
this delay into account. However, a side effect of that change
was that non-stacker drives may also wait 6 minutes before failing.
This problem has been addressed by this VAXSHAD kit.
o System crashes with an INVEXCEPTN during a SHDRIVER COPY_DATA_REPAIR
copy operation.
o If the value of the ALLOCLASS SYSGEN parameter is not set and the
user tries to use shadowing, a shadow volume can be created but
members can not be added to the shadow set. No error messages are
received up until a second member is added. On the MOUNT command,
the customer will receive the error messages:
$ mount /system dsa500 /shadow=dkb400 alphavms015
%MOUNT-I-SHDWMEMFAIL, DKB400 failed as a member of the shadow set
-SYSTEM-F-INCSHAMEM, incompatible shadow set member
"Incompatible" is an inappropriate statement of the problem. A
more accurate message would be "missing allocation class," or
"incorrect allocation class."
o If a shadow set member is dismounted at the same time from multiple
nodes within a cluster, I/O to a shadow set may become stalled.
o Mount will not add shadow set members unless they are either
MSCP or SCSI.
o Shadow set member expulsion is currently based on the time it takes
a fork & wait and a PACKACK to complete rather than the actual time
transpired. On some devices, particularly SCSI, where a PACKACK
can take approximately one minute, the timeout was much too long.
Using the default value of 20 (seconds) for SHADOW_MBR_TMO would
actually mean that it would take 20 minutes to expel from a SCSI
shadow set a member experiencing errors.
o SHDRIVER loss of synchronization may result in a crash where
SHADDETINCON is triggered by the check at the end of
MATCH_MASTER_SCB. In this consistency check, the
SHAD$W_DEVSTS_PASSIVE_MV_CNTR is verified to be zero and is not.
Another symptom is that the virtual unit UCB$W_RWAITCNT is
zero. Also shadow set member counts of zero may be seen.
o Crashes may occur in EXPEL_PACKACK_ANY with connections broken to
all members and IRP$L_SHD_LOCK_FR5 = 1 (packack retries exhausted).
o All members of a shadow set become inaccessible at the same time and
remain inaccessible for a period of time greater than "shadow
member timeout" (SHADOW_MBR_TMO or SHADOW_SYS_TMO) seconds but
less than MVTIMEOUT seconds. All members subsequently become
accessible within seconds of each other but not at exactly the same
time. This results in all but one member being expelled from the
shadow set.
This often occurs when changing HSJ microcode and all members are
connected to the same HSJ. When brought back online, polling will
cause the devices to be found seconds apart which will result in
all but one member being expelled.
o All members of the set must be checked to see if they meet the
criteria of being MSCP. The original design did not allow
for having no index zero member.
o When the mounting of full copy targets exceeds the SHADOW_MAX_COPY
threads for a given node, other nodes with the shadow set mounted
do not pick up the copy work.
o In a cluster, using $PROCESS_SCAN explicitly or implicitly with the
DCL SHOW USER command sometimes causes a system crash due to an
ACCVIO in kernel mode or an IVSSRVRQST bugcheck.
o When a node with a SCSI bus boots, it resets the SCSI bus. In a
multi-host SCSI cluster, this can cause the other node to
experience I/O failures. Normally, this results in a brief mount
verification. The I/O is retried, succeeds, and there is
no serious consequence. However, if the other node is in the
process of booting and the system disk is a shadow set, the
system will crash.
o PGFIPLHI bugcheck in the SHADOW_SERVER process at the REMQUE
in K_GET_COPYSHAD_IRP. On OpenVMS VAX, the PC is A0E and the
VA is 274.
o A page setup module which draws a frame and company logo on each
page of output is used on a queue pointing to an LN03. This page
setup module works on OpenVMS Version VAX 5.5-2 and prior versions.
However, with VAXQMAN8_U2055 (CSCPAT_1165) or OpenVMS VAX Version
6.1 installed, this page setup module causes the printer to
continually spew out paper with only the output from the page setup
module. This continues until the entry is deleted from the queue.
o Due to an inadequate synchronization mechanism, the MONITOR DISK
command can go into an infinite loop on multi-processing machines.
o If a multi-programming application uses a non-homogenous access
pattern to a file which is resident in Virtual I/O cache, there is
a possibility that the size returned in the I/O status block from a
READ operation will be truncated.
o If a clustered application uses of a large number of concurrent
processes to perform file operations consisting of an OPEN, WRITE,
and CLOSE sequence repetitively on the same data file, data
corruption may occur.
o In a multi-programming environment where a significant amount of
NEW data from a file is being loaded into the cache concurrently by
multiple processes, the system may HANG.
o If a user attempts to mount a disk that is 100% full on OpenVMS VAX
V6.* and the disk was originally initialized with a version of
OpenVMS VAX prior to V6.0, paged pool can be corrupted leading to
system crashes. If the disk is filled AFTER it has been mounted
under V6.*, there will not be any problem.
o The class driver will sometimes attempt to send an MSCP command
packet on the wrong connection. This fix detects this mismatch and
corrects it.
o Due to invalid allocation counts, processes hang in RWNPG state
waiting for a request for non-paged pool (NPP) so large that it
cannot be satisfied.
o The system crashes with the current process executing a $CHKPRO
system service call.
o A $AUDIT_EVENT system crash my occur in SECURITY.EXE due to corrupt
scan structure storage.
o When a rights list is passed into $CHKPRO (CHP$_RIGHTS), it is
copied into the ARB within the NSA$A_SCRATCH area. This area
will hold a maximum of eight rights. The code that handles this
copy operation will split any larger rights list into the first
eight, which are copied into the local rights area, and the
remainder, which a descriptor is created and its address is added
as extended process rights.
The code involved in copying the first eight rights was looping
incorrectly and copying rights to random locations within the
NSA$A_SCRATCH area usually resulting in a SSRVEXCPT crash.
Problems Addressed in the VAXSHAD04_061 Kit for OpenVMS VAX V6.0:
o When booting two or more systems simultaneously from shadowed
system disks, the systems may appear to hang. Crashing the
systems and examining the crash dumps indicates that shadowing
driver blocking AST routines have not run.
o In a two node VAXcluster configuration, containing a DSSI system
shadow set and a quorum disk, if one node exits the cluster and
reboots, the node will hang on boot while attempting to form the
system disk shadow set virtual unit.
o When multiple virtual unit mount commands are issued that will
result in copy operations, only the node from which the commands
are issued will attempt to perform the copy operations. Only
the SHADOWMAXCOPY number of copies will run simultaneously.
This means that copy operations might take longer than expected
and copies will not be started for copy members that are
added to shadow sets.
o On OpenVMS VAX V6.0 systems, disks could not be mounted after
installation of VAXSHAD03_061.
This problem is fixed in OpenVMS VAX V6.1
Problems Addressed in the VAXSHAD03_061 Kit for OpenVMS VAX V6.0:
o A double-deallocation crash may occur as the result of MOUNT not
properly initializing the MTL pointer. This pointer had a stale
value as a result of 2 calls to SYS$VMOUNT from a single program.
The problem will not happen as a result of DCL commands, as the
cells are initialized at image activation. The stale pointer
will only cause a problem if the system is unable to allocate
space for defining the logical name.
o OPCOM message was being output even though /NOASSIST was
specified in the MOUNT command. This caused problems for UETP.
o System crash in SECURITY.EXE.
o A process is in RWPAG while auditing an event.
o When the current process executes a $CHKPRO system service call,
the system will crash.
o Processes hang in RWNPG state (Call to $CRMPSC) waiting for a
request for NPP so large that it cannot be satisfied.
o DISMOUNT/OVERRIDE=CHECKS against the SYSTEM disk is allowed.
Once this command is issued nothing else can be done.
Installation of this kit will only allow this command to
be issued on non-system disks.
Problems Addressed in the VAXSHAD01_061 Kit for OpenVMS VAX V6.0:
o In Volume Shadowing for OpenVMS Alpha V6.1, minimerge
functionality across mixed architecture VMSclusters was
disabled. In order to reestablish the minimerge functionality,
install this kit across any VMScluster that contains an OpenVMS
Alpha V6.1 node.
o Mounting an RZ28B disk device with an RZ28 in the same
shadow set is not allowed and will display the following error:
%MOUNT-I-SHDWMEMFAIL, $1$DUA0 failed as a member of the shadow set
-SYSTEM-F-INCSHAMEM, incompatible shadow set member
This behavior is seen when RZ28/RZ28B shadow set members are
connected with a local SCSI (Small Computer System Interface)
controller.
With this kit, RZ28 and RZ28B devices can be combined in a
shadow set if they are connected to like controllers.
NOTE: If this kit is installed across a VMScluster, SCSI
shadow sets configured across different controller
types are not supported and will no longer work.
Problems Addressed in the VAXSHAD02_060 Kit for OpenVMS VAX V6.0:
o After installation of CSCPAT_1116 V1.0 (VAXSHAD01_060), the
system may crash with a SHADDETINCON bugcheck at SHDRIVER+F0B4.
The bugcheck occurs when a disk is removed from a mounted shadow
set.
Problems Addressed in the VAXSHAD01_060 Kit for OpenVMS VAX V6.0:
o In a situation in which more than one member of a three-member
shadow set go into error recovery at the same time and
cannot be brought back into the shadow set (due to loss of
connectivity, media offline, write-locked device, etc.),
SHDRIVER expels one of the members and crashes with a
SHADDETINCON bugcheck because it cannot update the Storage
Control Block (SCB) on the remaining members. This can cause
many cluster nodes to crash at the same time.
o When all three members of a three-member shadow set are
write-locked, a bugcheck will occur due to the destruction
of Register 4 upon execution of a jump to sub-routine command
that overwrites the value in the register.
o The SHADOW_MAX_COPY SYSGEN parameter is used to set how many
merge/copy threads may be started at the same time on a node.
Systems are allowing more than SHADOW_MAX_COPY number of
threads to run concurrently.
o Various SHDRIVER system disk member timer issues and
Register 2/Register 5 Corruption:
- The SHSB$MATCH_MASTER_SCB routine uses SHSB$PAUSE
incorrectly. This improper usage causes the value
in Register 2 to be destroyed when the time delay
is invoked, so the resulting value in Register 2
is indeterminate.
- The SHSB$MATCH_MASTER_SCB routine uses SH$TIME_DELAY
incorrectly. This improper usage causes an incorrect
value to be placed in Register 5, which requires a
UCB value.
- The SH$ABORT_VP routine uses SH$TIME_DELAY incorrectly.
This improper usage causes the value in Register 2 to be
destroyed when the time delay is invoked, so the resulting
value in Register 2 is indeterminate.
- In some customer configurations, the benefit of
re-assembling a multiple-member system disk shadow
set is lost. This occurs because the fixed amount
of time expires and not all of the former members
are available.
- Member time out for system disks and other disks is
not differentiated.
- The hardcoded wait of FF seconds to connect to all
members of an existing system disk is not a controllable
variable.
o SHDRIVER MVTIMEOUT and R5 Corruption errors:
- When one member of a multiple-member shadow set is
spontaneously removed from the shadow set due to a
fatal error condition, some VAXcluster nodes will
hang the virtual unit until the MVTIMEOUT time expires.
- After a call to SHSB$PAUSE, the wait loop at 103$ in
SHSB$VALIDATE_SHADOW_SET does not correctly restore
the contents of R5 to be the virtual unit.
o Post-processing is not performed correctly on all clones which
causes allocation of new, unnecessary Write Log Entries. The
Write Log INUSE bit is never cleared and the write log table
has to be expanded. Once the table expands to MAX, Write
Logging is disabled. When Write Logging is turned back on,
the cycle begins again. Eventually, all the entries in the
controller are exhausted, which forces Write Log Exhaustion
handling and, in some cases, the controller is reset.
o If the READ of Logical Block #1 fails during INVALIDATE_ALL_ENTRIES
or if WLG has been turned off, the shadow set will hang with a
SEQCMD lock and an incorrectly incremented RWAITCNT.
Problems Addressed in the VAXDRIV02_060 Kit for OpenVMS VAX V6.0:
o A tape drive will sometimes fail over to another HSX
controller after the tape is dismounted.
o Numbers greater than 9999 which are randomly generated
by HSx devices may cause the system to crash.
o RE-INITIALIZATION errors are reported to users of SCSI
tape drives attached to an HSx controller. This occurs
if multiple SCSI tapes are attached to the HSx and all the
tapes are at or near PEOT and the connection to the HSx is
broken.
Problems addressed in VAXSYS14_061 Kit for OpenVMS VAX V6.0:
o There is a race condition that may occur when a CFCB (Cache File
Control Block) is being deleted due to XQP action and cache
space is being reclaimed from a LIMBO file.
o Disk corruption can occur when heavy open/read/write/close/delete
operations are occurring.
o At some point after a node CLUEXITs, 2 or more cluster nodes
crash with LOCKMGRERR Bugchecks.
o When two or more VAX or Alpha nodes boot at the same time, one
or more of them may crash.
Problems addressed in the VAXSYS07_061 Kit for OpenVMS VAX V6.0:
o If a multi-programming application uses a non-homogenous
access pattern to a file which is resident in Virtual I/O
cache, there is a possibility that the size returned in the
I/O status block from a READ operation will be truncated.
o If a clustered application uses of a large number of
concurrent processes to perform file operations consisting
of an OPEN, WRITE, and CLOSE sequence repetitively on the
same data file, data corruption may occur.
o In a multi-programming environment where a significant
amount of NEW data from a file is being loaded into the
cache concurrently by multiple processes, the system may
HANG.
o Documentation states that -1 as well as 0 is accepted as a
wildcard in SYS$GETLKI. However, that is no longer the
case beginning with V5.5.
Problems Addressed in the VAXSYS01_061 Kit for OpenVMS VAX V6.0:
o SYS$CHKPRO had several problems that did not manifest themselves
in a readily visible effect to the end user. The problems
include:
- accepting up to 11 rights lists even though no more than two
would actually be processed.
- CHKPRO would accept a CHP$_UIC and write it over a location
which was to contain a rightslist pointer.
- In most cases the wrong UIC was used in access checking.
The only time the customer would notice a problem is if they
specifically tested access to an object known to be protected
from current rights and UIC settings.
o Nonpaged dynamic memory (NPAGEDYN) expansion occurs even when
there is a large amount of free space available. This can lead
to performance problems as pool expansion causes free memory to
be diverted away from that available to processes and dedicated
to nonpaged pool usage. For example, with a SHOW MEMORY/POOL
command you can observe that the "Total" amount of "Nonpaged
Dynamic Memory" increases when the amount of "Free" bytes is
quite large:
Dynamic Mem Usage (bytes): Total Free In Use Largest
Nonpaged Dynamic Mem 38555136 17372224 21182912 38720
Paged Dynamic Mem 17282048 8295888 8986160 8265232
Starting with the introduction of the Adaptive Pool Management
(APM) feature, in OpenVMS VAX V6.0, these figures include the
contributions of both the lookaside lists and the variable pool.
So, a large "Free" figure is indicative of large (and possibly,
growing) lookaside lists. If the "Total" figure is increasing,
it indicates that pool expansion is occurring, and that the
lookaside list space is not being used effectively.
The above symptom can result from either of the two following
separate problems:
- A routine in the software which supports security features
such as "rightslists" was obtaining a nonpaged pool block
and then freeing it in two smaller pieces.
- An internal loop counter governing the number of times a
lookaside list allocation was attempted, was set too low.
This problem will most likely be seen on the VAX 6000 - 500
and 600.
A third software change associated with APM will also be
available in a future OpenVMS VAX version, but is not available
as a remedial change. The third change provides a potential
performance benefit under very specialized conditions, such as
during VMScluster state transitions.
Problems Addressed in the VAXSYS03_060 & VAXSYS04_060 Kits for OpenVMS
VAX V6.0:
o When tapes are served in cluster tape profiles cannot be changed.
The problem has occurred in the following two ways:
1) If discretionary access does not allow the audit server
process access to the device, the profile cannot be changed.
2) If the object server is available (though it had been
started at least once), the ORB$V_TRANSITION flag is set and
not cleared. In this case, only BYPASS privilege allows
access to the device. This prevents a profile change as in
(1). The profile change, once it is allowed, must clear the
TRANSITION flag.
o Cluster object profile resolution can fail for tape devices when
ASSIGN fails with SS$_NOPRIV. This has shown up in matrix
testing with failures of STABACKIT and UETP.
Problems Addressed in the VAXSYS01_060 Kit for OpenVMS VAX V6.0:
o Attempting to access the VPROT item with GETDVI on UCX TYMNET
terminals may result in an access violation. The VPROT item is
implicitly accessed using $GETDEV and $GETCHN services, which are
used by a number of utilities.
INSTALLATION NOTES:
This kit *MUST* be installed on every VAX in a mixed-architecture
VMScluster, and the Alpha (AXPSHAD) version of this kit *MUST* be
installed on every Alpha system in the cluster BEFORE any systems
are re-booted into the VMScluster. If the correct kit is not
installed on each system, shadow sets cannot be created. System
crashes may also occur if the kits are not installed on all
appropriate cluster nodes.
The following restrictions will apply upon completion of the
installation:
o VMSclusters with shadowed SCSI disks and mixed-architecture
VMSclusters running OpenVMS Alpha V6.1 must apply the kit and
reboot the entire cluster simultaneously. In these cases,
rolling upgrades are not supported.
o Working configurations that contain SCSI shadow sets on
dissimilar controllers may no longer work.
This patch can be found at any of these sites:
Colorado Site
Georgia Site
Files on this server are as follows:
vaxshad04_060.README
vaxshad04_060.CHKSUM
vaxshad04_060.CVRLET_TXT
vaxshad04_060.a-dcx_vaxexe
|