OpenVMS__STORAGE ALPSHAD09_061 Alpha V6.1 Volume Shadowing ECO Summary
NOTE: An OpenVMS saveset or PCSI installation file is stored
on the Internet in a self-expanding compressed file.
The name of the compressed file will be kit_name-dcx_vaxexe
for OpenVMS VAX or kit_name-dcx_axpexe for OpenVMS Alpha.
Once the file is copied to your system, it can be expanded
by typing RUN compressed_file. The resultant file will
be the OpenVMS saveset or PCSI installation file which
can be used to install the ECO.
Copyright (c) Digital Equipment Corporation 1994, 1996. All rights reserved.
PRODUCT: Volume Shadowing for OpenVMS Alpha
OP/SYS: OpenVMS Alpha
SOURCE: Digital Equipment Corporation
ECO INFORMATION:
ECO Kit Name: ALPSHAD09_061
ECO Kits Superseded by This ECO Kit: ALPSHAD07_061
AXPSHAD06_061 (AXPSHAD)
AXPSHAD04_061
AXPSHAD02_061 (CSCPAT_2045)
AXPSHAD01_061
AXPSHAD01_015
ECO Kit Approximate Size: 5904 Blocks
Kit Applies To: OpenVMS Alpha V6.1, V6.1-1H1, V6.1-1H2
System Reboot Necessary: Yes
NOTES: When you install the ALPSHAD09_061 remedial kit you must also
install the ALPSHAD10_061 or later remedial kit before rebooting
your system. Installing the ALPSHAD09_061 kit without installing
ALPSHAD10_061, or later SHADOW kit, may experience the MERGE
problem or the SHADZEROMBR bugcheck problem which was resolved
in the ALPSHAD10_61 remedial kit.
NOTE: The ALPSHAD10_61 was placed on engineering hold
July 10, 1996. Engineering is researching a
problem reported with the ALPSHAD10_61. A replacement
ECO is scheduled for the near future.
Future OpenVMS Alpha V6.1 kits that are issued for facilities
included in the ALPSHAD09_061 kit will not install unless the
ALPSHAD09_061 kit is installed on your system first. It is
highly recommended that the complete ALPSHAD09_061 remedial kit
be installed as soon as possible. Installation of individual
images from the ALPSHAD09_061 remedial kit is not supported and
could result in unpredictable system behavior.
If you have a mixed-architecture cluster, and have not
previously installed a shadowing kit, you must install this kit
on the VAX nodes as well as the Alpha version of this kit on
Alpha nodes of cluster BEFORE you bring up both types of
systems in a cluster again. If both kits are not installed,
you may not be able to create shadow sets.
If you have previously installed a shadowing kit then you do
not need to install the VAX version of this kit at this time as
long as the shadowing kit installed on the VAX nodes of the
cluster is VAXSHAD04_061 or later.
Working configurations that contain SCSI shadow sets on
dissimilar controllers may no longer work.
ECO KIT SUMMARY:
An ECO kit exists for Volume Shadowing on OpenVMS Alpha V6.1 through
V6.1-1H2. This kit addresses the following problems:
Problems Addressed in the ALPSHAD09_061 Kit for OpenVMS Alpha V6.1,
V6.1-1H1, V6.1-1H2:
o Shadowing crash immediately upon booting system with shadowed system
disk, in SHSB$READ_SCB.
o A two member shadowset with member index 0 a copy target and index 1
the only source member experiences a node failure on a node serving
the disks. The source member goes "available". The source index is
never PACKACKed (Packet Acknowledgment) and the system remains with
the set hung in mount verification forever.
o If Shadowing tries to mark a block bad on all disks due to it being
bad on the source(s) and encounters an error it may return an
incorrect status to the user. The status will be SS$_NORMAL for
MSCP devices and may be SS$_UNSUPPORTED for non-MSCP devices (as
determined by routine SHSB$CHECK_MSCP). An SS$_NORMAL error is
misleading as it indicates all blocks were correctly marked bad,
SS$_UNSUPPORTED doesn't seem to be a valid return status for
shadowing I/Os.
o Removing a Disk Copy Data (DCD) copy target and adding it back again
causes the source of the DCD copy to change. This can cause the
copy to be non-assisted if the alternate source isn't on the same
controller.
o If a DCD copy is interrupted by a mini-merge the copy will restart
at 0% copied (LBN 0) rather than continuing from where it left off.
DCD copies should restart at the last copied LBN after interrupted
by mini-merge.
o Failures to start copies or restart copies, usually after after a
node halt, shutdown or reboot. Additional symptoms observed include
inconsistent values for HBS_CIP when compared to SHADOW_MAX_COPY,
negative values for HBS_CIP and copies that should continue started
over from the beginning.
o Demote CMPL to CMPW for #SS$_* to prevent incorrect status handling.
o TPU would output SPR text if a user pressed CTRL/C during the
compile of TPU code that contained errors. Users often do this when
they accidentally try to compile non-TPU code or their procedure has
many coding errors in it.
This problem is corrected in OpenVMS Alpha V6.2.
o If a three member Shadowset has it's index zero member as a copy
target and all three members also require a MERGE, then when the
COPY completes the MERGE does not take place. The LBN for the just
completed COPY (the last LBN on the disk) is passed as the MERGE
starting LBN. So it completes without doing any IO.
o When MONITOR is run on a terminal with more than 24 lines, MONITOR
still uses only 24 lines. For several classes (PROCESS, DISK, and
CLUSTER), it would be nice if MONITOR could use the additional
lines. This ECO provides support for the PROCESS class - the one
that could use it most.
This feature was provided in OpenVMS Alpha V6.2.
o Specifying the MONITOR RMS with the /PERCENT qualifier will cause
MONITOR to unexpectedly terminate with an ACCVIO.
This problem is corrected in OpenVMS Alpha V6.2.
o Specifying the DISK Class to Monitor can result in unexpected side
effects to the display. When MONITOR DISK command is issued on a
system with DFS (DECdfs for OpenVMS Systems) devices mounted, only
the first three characters of the DFS name are displayed correctly.
Instead of the fourth character, the low byte of the unit number is
output. It is often displayed as an non-printable character or as
an escape sequence (in which case, may cause terminal lock-ups,
resetting characteristics, etc).
o Due to an inadequate synchronization mechanism, the MONITOR DISK
or MONITOR CLUSTER command can go into an infinite loop on
multi-processor machines.
This problem is corrected in OpenVMS Alpha V6.2.
o When a DCD should be valid to do, it is not always done. This
results is doing a non-assisted FULL copy operation which takes much
longer to do.
o Event Flag not set when completion AST also specified on $ENQ.
o A problem would occur if a satellite were to crash and then attempt
to boot back into the cluster (in a SCSI CLUSTER). The physical
device would be unavailable to the satellite so that it would never
be allowed to boot back into the cluster.
This problem is corrected in OpenVMS Alpha V6.2.
o On multi-interconnect clusters, there is a window which will allow a
lock remaster operation to complete without all interested nodes
pointing to the new master. This usually results in a number of
nodes crashing with LOCKMGRERR bugchecks. The situation is only
possible after a node CLUEXITs. Other required conditions are that
the node which CLUEXITs must have a LOCKDIRWT of zero, such that a
partial lock rebuild occurs after the CLUEXIT. If a SS$_NODELEAVE
error is returned for a node which is to participate in the
remaster, we must stop the remaster from completing, and allow the
lock rebuild to clean things up.
o A SET SECURITY or SET ACL on volumes on the cluster place High I/O
on the server process. This exhausts paged pool and AUDIT_SERVER
goes into a RWPAG state.
This problem is corrected in OpenVMS Alpha V6.2.
o A field in the IRP that is used during Volume Processing was not
initialized in clones of USER IOs. If an error occurs, the code
that determines the severity of the error can be misled by data in
these fields. It can fail to locate the error and return the IO as
successful. Since we also return a zero Byte count the User would
see an Incomplete Segmented Transfer error. The fix is to
initialize the field when the clone is allocated.
o Listings are sometimes difficult to follow because there are varied
format conventions used and some comments are misleading or missing.
This problem is corrected in OpenVMS Alpha V6.2.
o Certain applications calling $AUDIT_EVENT with AST's turned off will
be interrupted when $AUDIT_EVENT returns to caller.
This problem is corrected in OpenVMS Alpha V6.2.
o Code relies on page being present when trying to release spinlock
and if the system is paging heavily, this might not be the case.
This problem is corrected in OpenVMS Alpha V6.2.
o Repeating wakeups from $SCHDWK show an accumulating drift over time.
This problem is corrected in OpenVMS Alpha V6.2.
o COPY and/or BACKUP of a DISK to a TMSCP-Served TAPE, will fail when
the tape device is placed in a MV state. The failure does not occur
in the same task is performed locally.
COPY will fail with: "SYSTEM-F-TAPEPOSLOST, magnetic tape position lost".
BACKUP will fail with: "-SYSTEM-F-DATALOST, data lost".
This problem is corrected in OpenVMS Alpha V6.2.
o To transition an OpenVMS process from the virtual balance set to the
real balance set, the SPTE's (system page table entries) which
describe its process PTE pages (process page table pages) need to be
copied from saved memory back into the real balance slot from whence
they originally came. This makes the process' P0 and P1 space
accessible again. SPTE's for the process page table pages
describing the undefined area between P0 and P1 must be represented
by pre-initialized null values (actually, ERKW DZERO-type values).
When this undefined void area is exactly zero pages (i.e., P0 and P1
are tangent), the VBSS$READ_OPT2_VBSM routine takes the wrong
branch, causing a VBSSERR bugcheck. This fix adds a test for this
case, and takes the image(s) correct branch.
This problem is corrected in OpenVMS Alpha V6.2.
o When a process is switched from a real balance slot to a virtual
balance slot, the allocation fails, causing a VBSSERR bugcheck.
This problem is corrected in OpenVMS Alpha V6.2.
o When returning process quota (BYTLM) to a process for a created
system global section compute returned quota value correctly.
This problem is corrected in OpenVMS Alpha V6.2.
o System crashes due to corrupted PTE entries. The corruption appears
to be Global Section Table Entries pointing to Global Section
Descriptors.
The problem occurs only if 4095 GBLSECTIONS is exceeded. To check
the number of Global Sections currently in use add the following
values:
o SDA> VALIDATE QUEUE EXE$GL_GSDSYSFL !global sections
o SDA> VALIDATE QUEUE EXE$GL_GSDDELFL !delete pending global sections
o SDA> VALIDATE QUEUE EXE$GL_GSDGRPFL !group global sections
o Devices can remain allocated to processes that no longer exist. The
device remains unusable until the system is rebooted.
o If a previously shadowed disk is mounted with a MOUNT/OVER=SHADOW
command and a new shadow set is created using this disk, OpenVMS
Alpha will attempt to create the old shadow set using the old
physical device names.
o The system crashes with a NOBVPVCB bugcheck. The crash occurs on
the kernel stack with MTAAACP.EXE as the current image.
o The system crashes with an XQPERR while dismounting a MAD drive.
o SUBTRACED errors not correctly determined for images installed
/HEADER_RESIDENT.
This problem is corrected in OpenVMS Alpha V6.2.
o When returning process quota (BYTLM) to a process for a created
system global section compute returned quota value correctly.
o Users of RDB V6.1 may get ILLIOFUNC errors when doing IO to a Host
Based Shadowset whose members are served.
o The user will see a large number of the shadow copies being done by
OpenVMS rather than the controller, even when both disks are on the
same controller and the controller has DCD capabilities.
o If a three member Shadowset has its index zero member as a copy
target and all three members also require a MERGE, then when the
COPY completes the MERGE does not take place. The LBN for the just
completed COPY (the last LBN on the disk) is passed as the MERGE
starting LBN. So it completes without doing any IO.
o System hang when I/Os pending to a shadow set do not complete.
o In previous shadow kits two new fields were added to the IRP data
structure for shadow write logging information. This new IRP
definition size conflicted with the IRP sizes of other images on the
system that were not part of the SHADOW kits. This conflict could
cause a variety of errors including fatal bugchecks. This fix
changes the IRP definitions back to the SSB versions and also adds
some special definitions to the SHDRIVER for the new IRP fields.
o Fatal bugcheck from data structure corruption due to the value 10
HEX being added to the corrupted field. Crashes are of various
types including node and cluster crashes, crashes due to invalid UCB
addresses, invalid VCB addresses, invalid member IDs, invalid number
of devices etc.
Problems Addressed in the ALPSHAD07_061 Kit for OpenVMS Alpha V6.1,
V6.1-1H1, V6.1-1H2:
NOTE: Although this kit contains previous fixes that may be applied
to OpenVMS Alpha V1.5, beginning with the AXPSHAD06_061 ECO kit,
there will be no new fixes included for OpenVMS Alpha V1.5. If
your system is running OpenVMS Alpha V1.5 and you are experiencing
the problems listed in the PROBLEMS ADDRESSED IN AXPSHAD06_061 KIT
FOR OPENVMS AXP V6.1 below, it is strongly recommended that you
upgrade to OpenVMS Alpha V6.1 as soon as possible.
o Fatal bugchecks from data structure corruption may occur due to the
addition of the value 10 HEX to the corrupted field. Crashes are of
various types and include node and cluster crashes, crashes due to
invalid UCB addresses, invalid VCB addresses, invalid member IDs,
and invalid number of devices.
o There is a race condition possible when a CFCB (Cache File Control
Block) is being deleted due to XQP action and cache space is being
reclaimed from a LIMBO file.
o Under certain conditions, a fork locks used by the virtual I/O cache
may be created with an incorrect length. This results in
unsynchronized data access which can cause corruption.
o When a satellite node in a SCSI cluster crashes, the MSCP server
marks the physical device as offline which prevents the satellite
node from being able to boot back into the cluster.
Problems Address in the AXPSHAD06_061 Kit for OpenVMS Alpha V6.1:
o Incorrect information in Register 6 and Register 7 causes the system
to crash with a REGCORDET register corruption bugcheck.
o If the system manager fails to set the value of the ALLOCLASS SYSGEN
parameter and then attempts to use shadowing, a shadow volume can be
created, but new members cannot be added to the shadow set. No
error messages are received until an attempt is made to add a second
member to the shadow set. Using the following DCL 'MOUNT' command,
the following error messages appear:
$ MOUNT/SYSTEM DSA500 /SHADOW=DKB400 ALPHAVMS015
%MOUNT-I-SHDWMEMFAIL, DKB400 failed as a member
of the shadow set -SYSTEM-F-INCSHAMEM,
incompatible shadow set member.
"Incompatible" is not a true statement of the problem. It is
actually due to "missing allocation class," or "incorrect allocation
class."
o I/O to a shadow set may become stalled if a shadow set member is
dismounted at the same time from multiple nodes within a cluster.
o MOUNT will not add shadow set members unless they are either MSCP or
SCSI.
o Shadow set member expulsion is currently based on the time it takes
for a fork and wait and a PACKACK (Packet Acknowledgment) to
complete rather than the actual time transpired. On some devices,
particularly SCSI devices, where a PACKACK can take approximately
one minute, the timeout was much too long. Using the default value
of 20 (seconds) for SHADOW_MBR_TMO would actually mean that it would
take 20 minutes to expel a member that is experiencing errors from a
SCSI shadowset.
o SHDRIVER loss of synchronization may result in a crash where SHADDETINCON
is triggered by the check at the end of MATCH_MASTER_SCB. In this
consistency check, the SHAD$W_DEVSTS_PASSIVE_MV_CNTR is verified to
be zero and is not. Another symptom is that the virtual unit
UCB$W_RWAITCNT is zero. Also shadow set member counts of zero may
be seen.
o Crashes may occur in EXPEL_PACKACK_ANY with connections broken to
all members and IRP$L_SHD_LOCK_FR5 = 1 (packack retries exhausted).
o All members of a shadow set become inaccessible at the same time and
remain inaccessible for a period of time greater than "shadow member
timeout" (SHADOW_MBR_TMO or SHADOW_SYS_TMO) seconds but less than
MVTIMEOUT seconds. All members subsequently become accessible
within seconds of each other but not at exactly the same time. This
results in all but one member being expelled from the shadow set.
This often occurs when changing HSJ microcode and all members are
connected to the same HSJ. When brought back online, polling will
cause the devices to be found seconds apart which will result in all
but one member being expelled.
o All members of the set must be checked to see if they meet the
criteria of being MSCP. The original design did not allow for
having no index zero member.
o In a cluster, using $PROCESS_SCAN explicitly or implicitly with the
DCL command, SHOW USER, sometimes causes a system crash due to an
ACCVIO in kernel mode or an IVSSRVRQST bugcheck.
o When a node with a SCSI bus boots, it resets the SCSI bus. In a
multi-host SCSI cluster, this can cause the other node to experience
I/O failures. Normally, this results in a brief mount verification.
The I/O is retried, succeeds, and there is no serious consequence.
However, if the other node is in the process of booting and the
system disk is a shadow set, the system will crash.
o PGFIPLHI bugcheck in the SHADOW_SERVER process at the REMQUE in
K_GET_COPYSHAD_IRP. On OpenVMS Alpha, the PC is A0E and the VA is 274.
o A double-deallocation crash may occur as the result of MOUNT not
properly initializing the MTL pointer. This error causes the
pointer to have a stale value as a result of 2 calls to SYS$VMOUNT
from a single program. The problem will not happen as a result of
DCL commands, since the cells are initialized at image activation.
The stale pointer will only cause a problem if the system is unable
to allocate space for defining the logical name.
o If a user attempts to mount a disk that is 100% full and the disk
was originally initialized with a version of OpenVMS Alpha prior to
the one currently in use, paged pool can be corrupted. This leads
to system crashes. If the disk is filled AFTER it has been mounted,
there will not be any problem.
o Tape devices with stacker/loaders, such as the TF857, may take up to
6 minutes to Rewind/unload/load the next tape. A change was made to
the behavior of MOUNT to take this delay into account. However, a
side effect of this change is that non-stacker drives may also wait
6 minutes before failing.
o Processes may hang in RWNPG state while waiting for a request for
NPP (non-paged pool) so large that it cannot be satisfied.
o A system crash may occur with the current process executing a
$CHKPRO system service call. This happens when one routine running
in user mode is interrupted by a KERNEL mode AST which activates a
routine that uses the same memory.
o If a multi-programming application uses a non-homogenous access
pattern to a file which is resident in Virtual I/O cache, there is a
possibility that the size returned in the I/O status block from a
READ operation will be truncated.
If a clustered application uses of a large number of concurrent
processes to perform file operations consisting of an OPEN, WRITE,
and CLOSE sequence repetitively on the same data file, data
corruption may occur.
In a multi-programming environment where a significant amount of NEW
data from a file is being loaded into the cache concurrently by
multiple processes, the system may HANG.
o When a value block or value status block can not be returned,
SYS$GETLKI returns the error SS$_ILLRSDM. A correction has been
made to SYS$GETLKI to now return all other requested information
and update the wildcard search index.
o The Audit Server EXCLUDE process list becomes corrupt after a
SET AUDIT/EXCLUDE=pid command is issued.
o Data corruption may occur in the file container during the use of
PATHWORKS. The corruption can be shown by running CHKDSK on the PC
container disk. Using PCDISK to IMPORT and EXPORT files to and from
the container will show corrupted files when EXPORTed back to OpenVMS.
Problems Addressed in AXPSHAD04_061 Kit for OpenVMS Alpha V6.1, V6.1-1H1,
and V6.1-1H2 only:
o When booting two or more systems simultaneously from shadowed system
disks, the systems may appear to hang. Crashing the systems and
examining the crash dumps indicates that shadowing driver blocking
AST routines have not run.
o When a node runs out of SHADOW_MAX_COPY threads while mounting new
copy target units, other nodes in the cluster that have available
SHADOW_MAX_COPY threads will not pick up the copy work. This
results in the copy not being started for copy members that are
added to shadow sets.
Problems Addressed in AXPSHAD02_061 Kit for OpenVMS Alpha V6.1, V6.1-1H1,
and V6.1-1H2 only:
o While running a UETP tape test, fatal controller errors occur. This
problem is caused by the incorrect interpretation of a TUDRIVER
status subcode by TMSCP (the tape server). After the installation
of this ECO kit, a fatal controller error status is returned to the
user when this occurs.
o Shadow sets have separate mount verification done by SHDRIVER,
instead of the usual system mount verification. The SHDRIVER mount
verification has an error updating the volume label on shadow sets
that have the volume label changed except on the node that issues
the label change. Once the devices are in this state, they can not
be recovered until MVTIMEOUT is reached or a reboot of all affected
nodes is performed.
This correction enables the behavior of virtual units to be
consistent with the behavior of physical units.
o Unnecessary calls to MOUNT verification or host-based volume
shadowing processing may occur. On Alpha nodes, these mount
verification or Host-Based Volume Shadowing processing calls will
fail, resulting in I/O hangs and, eventually, volume invalid errors.
o AVAILABLE or OFFLINE status returned from a transfer command does
not implement the MSCP specification correctly.
o OpenVMS VAX MSCP Parity with OpenVMS Alpha. A served disk may
appear to be ONLINE when it is really OFFLINE. This occurs because
the MSCP server's CHECK_SERVICE routine searches the device database
and incorrectly returns an ONLINE status.
o There is no synchronization between SHADOW_PROCESSING and
INVALIDATE_ALL_ENTRIES, which allows these two code threads to
run simultaneously. This can cause a system crash due to the
fact that the SHADOW_PROCESSING thread may remove a member from
a multimember shadow set and the INVALIDATE_ALL_ENTRIES thread
is not aware that the member has been removed. The system
crash occurs in RESTORE_WLE because no Write Log table
exists.
o A problem exists with the SHADOW_SERVER. Several symptoms
of this problem are:
+ Undiagnosable hangs in individual copy operations or on
the entire server
+ Unexpected copy aborts
+ Poor copy performance
+ Shadow set inconsistency
An optional new system logical name, SHAD$COPY_BUFFER_SIZE, has
also been added. This system logical name can be used to control
the buffer size of shadow copies. SHAD$COPY_BUFFER_SIZE has a
maximum size of 127 blocks (default) and a minimum size of 31
blocks. The size can be changed by using the DEFINE/SYSTEM
command.
o High interrupt stack activity occurs on a node performing a merged
copy operation. This could adversely affect configurations using
HSJ40 controllers with many shadow sets.
o Data inconsistency may exist between members of a Phase II shadow
set. This occurs under very heavy I/O operations to a shadow
set while the members of that shadow set are undergoing failover
from one controller to another.
o Invalid Command status processing of Write History Management
commands unconditionally puts an entry into the error log.
This occurs even when there is no actual error.
o A second shadow server may accidentally be created using the
startup command procedure. This results in desynchronization
of shadow sets. The startup procedure has been modified so
that it does not allow multiple servers.
o When a serving node becomes so busy that it occasionally
exhausts resource limits, the RWAITCNT for heavily used disks
gets incremented. If a client node requests on ONLINE and
RWAITCNT is bumped, it is rejected by MSCP. This makes
MOUNTing devices very difficult.
o After a system failure, the number of blocks to be rewritten
is not computed correctly. This may cause inconsistent data
between shadow set members. This occurs during an assisted
merge when the information regarding which LBNs to include
is only requested from one shadow set member.
o A process issuing I/O to a TMSCP tape device may appear to
hang after a controller failover attempt. This is caused by
an incorrect check of the cached data's lost error status,
which results in an endless loop trying to recover a
nonexistent error.
o OpenVMS Alpha systems are unable to reboot an MSCP controller,
such as an HSC. This might result in stalled pending I/O
to MSCP or TMSCP devices.
o A device may be mounted by an MSCP server, even though a local
controller could be used. This situation may still occur after
the installation of this ECO kit under extreme timing circumstances.
o When new MSCP server I/O is sent to a device that is RWAITCNT
stalled and the connection from the driver to the device fails,
server I/O is posted to the restart queue if it is active. If
not, they are incorrectly left on the UCB (Unit Control Block)
pending queue. This causes shadow sets to appear to be stalled.
If the connection from the client to the server then fails,
I/O from the client that has been passed to the driver is
then allowed to complete. If this I/O is stalled on the
pending queue, it completes much later, possibly after
the client has reissued the stalled I/O.
o I/O hangs to a shadow set might occur because the shadowing
driver has no way to disable write logging if the write log
entries are mismanaged or depleted to a point that the
shadow set is unusable.
o An Invalid Exception bugcheck might occur in DUDRIVER during
I/O request complete processing.
o In the past, MSCP could only serve 256 disks. It can now
serve 512.
o During disk and tape error recover, MSCP is unable to perform
a TMSCP controller reset which results in a system crash.
o During the processing of a write-log entry in SHDRIVER, a
register value may be improperly maintained if the system
is low on nonpaged pool. This will cause a system crash
with an INVEXCEPTN Bugcheck within SHSB$GET_WLE_TABLE in
module SHDSUBS when the entry is resumed.
o In the past, Volume Shadowing checked device IDs and the
maximum logical block numbers (LBNs.) Volume Shadowing
now checks for geometries and maximum LBNs. This
enables devices like the RZ28 and RZ28B to operate in
the same shadow set. Even though their device IDs differ,
their geometries and maximum LBNs will match when configured
on like controllers.
NOTE: If this remedial kit is installed across a VMScluster
system, SCSI shadow sets that are configured across
different controller types are not supported and will
no longer work.
o After approximately 18 hours of operation, some OPCOM
messages that should be logged are skipped.
o If two members of a three-member shadow set are
simultaneously removed, either intentionally or in
a failover situation, the system may hang or fail.
o System crashes might occur during virtual I/O cache (VIOC)
expansion under the following circumstances:
+ Multiple processes (or processors) are accessing the same
file concurrently;
+ The cache space for that file was being expanded;
+ That expansion caused the need for a new hash table
structure.
o When subjected to a high I/O load and multiple failures,
the write logging (minimerge) and shadowing synchronization
subsystems become unreliable.
o Unreliable shadow subsystem behavior and shadow-set hangs
result from VMScluster nodes failing to relinquish shadow-set
resources.
o The TMSCP server bugchecks in TMSCP$FIND_UQB when a command
that refers to a specific unit is processed and that unit
does not have the Server Local Unit Number (SLUN) bit set.
The fix contained in this ECO kit will cause the bugcheck
to occur in TUDRIVER instead of the TMSCP server.
o I/O may stall to a served shadow-set member. Load balancing
makes this condition more likely.
o System crashes may occur during processing of stale I/O in
Host-Based Volume Shadow Sets. This I/O does not properly
reflect changes in shadow set configuration like removal of
members and changes in the write-logging state.
o Shadow set members may be inconsistent after the failure
of a node accessing a shadow set served by an Alpha node.
The amount of corrupted data depends on previous I/O
operations to the shadow set.
Problems Addressed in AXPSHAD01_061 Kit for OpenVMS Alpha V6.1 only:
o In Volume Shadowing for OpenVMS Alpha V6.1, minimerge
functionality across mixed architecture VMSclusters was disabled.
In order to reestablish the minimerge functionality, install this
kit across any VMScluster that contains an OpenVMS Alpha V6.1 node.
After installation of this kit, the entire cluster must be
rebooted simultaneously. Rolling upgrades are *NOT* supported.
o Mounting an RZ28B disk device with an RZ28 in the same
shadow set is not allowed and will display the following error:
%MOUNT-I-SHDWMEMFAIL, $1$DUA0 failed as a member of the shadow set
-SYSTEM-F-INCSHAMEM, incompatible shadow set member
This behavior is seen when RZ28/RZ28B shadow set members are
connected with a local SCSI (Small Computer System Interface)
controller.
With this kit, RZ28 and RZ28B devices can be combined in a
shadow set if they are connected to like controllers.
NOTE: If this kit is installed across a VMScluster, SCSI
shadow sets configured across different controller
types are not supported and will no longer work.
VMSclusters with shadowed SCSI disks and mixed-architecture
VMSclusters running OpenVMS Alpha V6.1 must apply the kit and reboot
the entire cluster simultaneously, so that the entire VMScluster is
running the same version of Volume Shadowing software.
o In a VMScluster (mixed Alpha/VAX environment), shadow sets served to
the DEC 3000 Model 300 are reported as MEDOFL. A DCL command, 'SHOW
DEVICE/SERVED', from a VAX 6000 Model 400 shows the shadow sets as
AVAILABLE.
INSTALLATION NOTES:
If you are using the Shadowing option, it is highly recommended that
this kit be installed.
o When you install the ALPSHAD09_061 remedial kit you must also
install the ALPSHAD10_061 or later remedial kit before rebooting
your system. Installing the ALPSHAD09_061 kit without installing
the ALPSHAD10_061 or later kit could lead to system instability.
o Future OpenVMS Alpha V6.1 kits that are issued for facilities
included in the ALPSHAD09_061 kit will not install unless the
ALPSHAD09_061 kit is installed on your system first. It is highly
recommended that the complete ALPSHAD09_061 remedial kit be
installed as soon as possible. Installation of individual images
from the ALPSHAD09_061 remedial kit is not supported and could
result in unpredictable system behavior.
o This kit *MUST* be installed on every Alpha in a mixed-architecture
VMScluster, and the VAX version of this kit *MUST* be installed on
every VAX system in the cluster BEFORE any systems are re-booted
into the VMScluster. If both kits are not installed, shadow sets
cannot be created.
o Working configurations that contain SCSI shadow sets on dissimilar
controllers may no longer work.
o VMSclusters with shadowed SCSI disks and mixed-architecture
VMSclusters running OpenVMS Alpha V6.1 must apply the kit and reboot
the entire cluster simultaneously. In these cases, rolling upgrades
are not supported.
For more information, please see the Problem Description section of the
Cover Letter/Release Notes supplied with this kit.
This patch can be found at any of these sites:
Colorado Site
Georgia Site
Files on this server are as follows:
alpshad09_061.README
alpshad09_061.CHKSUM
alpshad09_061.CVRLET_TXT
alpshad09_061.a-dcx_axpexe
|