OpenVMS__STORAGE ALPSHAD09_061 Alpha V6.1 Volume Shadowing ECO Summary

NOTE: An OpenVMS saveset or PCSI installation file is stored on the Internet in a self-expanding compressed file. The name of the compressed file will be kit_name-dcx_vaxexe for OpenVMS VAX or kit_name-dcx_axpexe for OpenVMS Alpha. Once the file is copied to your system, it can be expanded by typing RUN compressed_file. The resultant file will be the OpenVMS saveset or PCSI installation file which can be used to install the ECO. Copyright (c) Digital Equipment Corporation 1994, 1996. All rights reserved. PRODUCT: Volume Shadowing for OpenVMS Alpha OP/SYS: OpenVMS Alpha SOURCE: Digital Equipment Corporation ECO INFORMATION: ECO Kit Name: ALPSHAD09_061 ECO Kits Superseded by This ECO Kit: ALPSHAD07_061 AXPSHAD06_061 (AXPSHAD) AXPSHAD04_061 AXPSHAD02_061 (CSCPAT_2045) AXPSHAD01_061 AXPSHAD01_015 ECO Kit Approximate Size: 5904 Blocks Kit Applies To: OpenVMS Alpha V6.1, V6.1-1H1, V6.1-1H2 System Reboot Necessary: Yes NOTES: When you install the ALPSHAD09_061 remedial kit you must also install the ALPSHAD10_061 or later remedial kit before rebooting your system. Installing the ALPSHAD09_061 kit without installing ALPSHAD10_061, or later SHADOW kit, may experience the MERGE problem or the SHADZEROMBR bugcheck problem which was resolved in the ALPSHAD10_61 remedial kit. NOTE: The ALPSHAD10_61 was placed on engineering hold July 10, 1996. Engineering is researching a problem reported with the ALPSHAD10_61. A replacement ECO is scheduled for the near future. Future OpenVMS Alpha V6.1 kits that are issued for facilities included in the ALPSHAD09_061 kit will not install unless the ALPSHAD09_061 kit is installed on your system first. It is highly recommended that the complete ALPSHAD09_061 remedial kit be installed as soon as possible. Installation of individual images from the ALPSHAD09_061 remedial kit is not supported and could result in unpredictable system behavior. If you have a mixed-architecture cluster, and have not previously installed a shadowing kit, you must install this kit on the VAX nodes as well as the Alpha version of this kit on Alpha nodes of cluster BEFORE you bring up both types of systems in a cluster again. If both kits are not installed, you may not be able to create shadow sets. If you have previously installed a shadowing kit then you do not need to install the VAX version of this kit at this time as long as the shadowing kit installed on the VAX nodes of the cluster is VAXSHAD04_061 or later. Working configurations that contain SCSI shadow sets on dissimilar controllers may no longer work. ECO KIT SUMMARY: An ECO kit exists for Volume Shadowing on OpenVMS Alpha V6.1 through V6.1-1H2. This kit addresses the following problems: Problems Addressed in the ALPSHAD09_061 Kit for OpenVMS Alpha V6.1, V6.1-1H1, V6.1-1H2: o Shadowing crash immediately upon booting system with shadowed system disk, in SHSB$READ_SCB. o A two member shadowset with member index 0 a copy target and index 1 the only source member experiences a node failure on a node serving the disks. The source member goes "available". The source index is never PACKACKed (Packet Acknowledgment) and the system remains with the set hung in mount verification forever. o If Shadowing tries to mark a block bad on all disks due to it being bad on the source(s) and encounters an error it may return an incorrect status to the user. The status will be SS$_NORMAL for MSCP devices and may be SS$_UNSUPPORTED for non-MSCP devices (as determined by routine SHSB$CHECK_MSCP). An SS$_NORMAL error is misleading as it indicates all blocks were correctly marked bad, SS$_UNSUPPORTED doesn't seem to be a valid return status for shadowing I/Os. o Removing a Disk Copy Data (DCD) copy target and adding it back again causes the source of the DCD copy to change. This can cause the copy to be non-assisted if the alternate source isn't on the same controller. o If a DCD copy is interrupted by a mini-merge the copy will restart at 0% copied (LBN 0) rather than continuing from where it left off. DCD copies should restart at the last copied LBN after interrupted by mini-merge. o Failures to start copies or restart copies, usually after after a node halt, shutdown or reboot. Additional symptoms observed include inconsistent values for HBS_CIP when compared to SHADOW_MAX_COPY, negative values for HBS_CIP and copies that should continue started over from the beginning. o Demote CMPL to CMPW for #SS$_* to prevent incorrect status handling. o TPU would output SPR text if a user pressed CTRL/C during the compile of TPU code that contained errors. Users often do this when they accidentally try to compile non-TPU code or their procedure has many coding errors in it. This problem is corrected in OpenVMS Alpha V6.2. o If a three member Shadowset has it's index zero member as a copy target and all three members also require a MERGE, then when the COPY completes the MERGE does not take place. The LBN for the just completed COPY (the last LBN on the disk) is passed as the MERGE starting LBN. So it completes without doing any IO. o When MONITOR is run on a terminal with more than 24 lines, MONITOR still uses only 24 lines. For several classes (PROCESS, DISK, and CLUSTER), it would be nice if MONITOR could use the additional lines. This ECO provides support for the PROCESS class - the one that could use it most. This feature was provided in OpenVMS Alpha V6.2. o Specifying the MONITOR RMS with the /PERCENT qualifier will cause MONITOR to unexpectedly terminate with an ACCVIO. This problem is corrected in OpenVMS Alpha V6.2. o Specifying the DISK Class to Monitor can result in unexpected side effects to the display. When MONITOR DISK command is issued on a system with DFS (DECdfs for OpenVMS Systems) devices mounted, only the first three characters of the DFS name are displayed correctly. Instead of the fourth character, the low byte of the unit number is output. It is often displayed as an non-printable character or as an escape sequence (in which case, may cause terminal lock-ups, resetting characteristics, etc). o Due to an inadequate synchronization mechanism, the MONITOR DISK or MONITOR CLUSTER command can go into an infinite loop on multi-processor machines. This problem is corrected in OpenVMS Alpha V6.2. o When a DCD should be valid to do, it is not always done. This results is doing a non-assisted FULL copy operation which takes much longer to do. o Event Flag not set when completion AST also specified on $ENQ. o A problem would occur if a satellite were to crash and then attempt to boot back into the cluster (in a SCSI CLUSTER). The physical device would be unavailable to the satellite so that it would never be allowed to boot back into the cluster. This problem is corrected in OpenVMS Alpha V6.2. o On multi-interconnect clusters, there is a window which will allow a lock remaster operation to complete without all interested nodes pointing to the new master. This usually results in a number of nodes crashing with LOCKMGRERR bugchecks. The situation is only possible after a node CLUEXITs. Other required conditions are that the node which CLUEXITs must have a LOCKDIRWT of zero, such that a partial lock rebuild occurs after the CLUEXIT. If a SS$_NODELEAVE error is returned for a node which is to participate in the remaster, we must stop the remaster from completing, and allow the lock rebuild to clean things up. o A SET SECURITY or SET ACL on volumes on the cluster place High I/O on the server process. This exhausts paged pool and AUDIT_SERVER goes into a RWPAG state. This problem is corrected in OpenVMS Alpha V6.2. o A field in the IRP that is used during Volume Processing was not initialized in clones of USER IOs. If an error occurs, the code that determines the severity of the error can be misled by data in these fields. It can fail to locate the error and return the IO as successful. Since we also return a zero Byte count the User would see an Incomplete Segmented Transfer error. The fix is to initialize the field when the clone is allocated. o Listings are sometimes difficult to follow because there are varied format conventions used and some comments are misleading or missing. This problem is corrected in OpenVMS Alpha V6.2. o Certain applications calling $AUDIT_EVENT with AST's turned off will be interrupted when $AUDIT_EVENT returns to caller. This problem is corrected in OpenVMS Alpha V6.2. o Code relies on page being present when trying to release spinlock and if the system is paging heavily, this might not be the case. This problem is corrected in OpenVMS Alpha V6.2. o Repeating wakeups from $SCHDWK show an accumulating drift over time. This problem is corrected in OpenVMS Alpha V6.2. o COPY and/or BACKUP of a DISK to a TMSCP-Served TAPE, will fail when the tape device is placed in a MV state. The failure does not occur in the same task is performed locally. COPY will fail with: "SYSTEM-F-TAPEPOSLOST, magnetic tape position lost". BACKUP will fail with: "-SYSTEM-F-DATALOST, data lost". This problem is corrected in OpenVMS Alpha V6.2. o To transition an OpenVMS process from the virtual balance set to the real balance set, the SPTE's (system page table entries) which describe its process PTE pages (process page table pages) need to be copied from saved memory back into the real balance slot from whence they originally came. This makes the process' P0 and P1 space accessible again. SPTE's for the process page table pages describing the undefined area between P0 and P1 must be represented by pre-initialized null values (actually, ERKW DZERO-type values). When this undefined void area is exactly zero pages (i.e., P0 and P1 are tangent), the VBSS$READ_OPT2_VBSM routine takes the wrong branch, causing a VBSSERR bugcheck. This fix adds a test for this case, and takes the image(s) correct branch. This problem is corrected in OpenVMS Alpha V6.2. o When a process is switched from a real balance slot to a virtual balance slot, the allocation fails, causing a VBSSERR bugcheck. This problem is corrected in OpenVMS Alpha V6.2. o When returning process quota (BYTLM) to a process for a created system global section compute returned quota value correctly. This problem is corrected in OpenVMS Alpha V6.2. o System crashes due to corrupted PTE entries. The corruption appears to be Global Section Table Entries pointing to Global Section Descriptors. The problem occurs only if 4095 GBLSECTIONS is exceeded. To check the number of Global Sections currently in use add the following values: o SDA> VALIDATE QUEUE EXE$GL_GSDSYSFL !global sections o SDA> VALIDATE QUEUE EXE$GL_GSDDELFL !delete pending global sections o SDA> VALIDATE QUEUE EXE$GL_GSDGRPFL !group global sections o Devices can remain allocated to processes that no longer exist. The device remains unusable until the system is rebooted. o If a previously shadowed disk is mounted with a MOUNT/OVER=SHADOW command and a new shadow set is created using this disk, OpenVMS Alpha will attempt to create the old shadow set using the old physical device names. o The system crashes with a NOBVPVCB bugcheck. The crash occurs on the kernel stack with MTAAACP.EXE as the current image. o The system crashes with an XQPERR while dismounting a MAD drive. o SUBTRACED errors not correctly determined for images installed /HEADER_RESIDENT. This problem is corrected in OpenVMS Alpha V6.2. o When returning process quota (BYTLM) to a process for a created system global section compute returned quota value correctly. o Users of RDB V6.1 may get ILLIOFUNC errors when doing IO to a Host Based Shadowset whose members are served. o The user will see a large number of the shadow copies being done by OpenVMS rather than the controller, even when both disks are on the same controller and the controller has DCD capabilities. o If a three member Shadowset has its index zero member as a copy target and all three members also require a MERGE, then when the COPY completes the MERGE does not take place. The LBN for the just completed COPY (the last LBN on the disk) is passed as the MERGE starting LBN. So it completes without doing any IO. o System hang when I/Os pending to a shadow set do not complete. o In previous shadow kits two new fields were added to the IRP data structure for shadow write logging information. This new IRP definition size conflicted with the IRP sizes of other images on the system that were not part of the SHADOW kits. This conflict could cause a variety of errors including fatal bugchecks. This fix changes the IRP definitions back to the SSB versions and also adds some special definitions to the SHDRIVER for the new IRP fields. o Fatal bugcheck from data structure corruption due to the value 10 HEX being added to the corrupted field. Crashes are of various types including node and cluster crashes, crashes due to invalid UCB addresses, invalid VCB addresses, invalid member IDs, invalid number of devices etc. Problems Addressed in the ALPSHAD07_061 Kit for OpenVMS Alpha V6.1, V6.1-1H1, V6.1-1H2: NOTE: Although this kit contains previous fixes that may be applied to OpenVMS Alpha V1.5, beginning with the AXPSHAD06_061 ECO kit, there will be no new fixes included for OpenVMS Alpha V1.5. If your system is running OpenVMS Alpha V1.5 and you are experiencing the problems listed in the PROBLEMS ADDRESSED IN AXPSHAD06_061 KIT FOR OPENVMS AXP V6.1 below, it is strongly recommended that you upgrade to OpenVMS Alpha V6.1 as soon as possible. o Fatal bugchecks from data structure corruption may occur due to the addition of the value 10 HEX to the corrupted field. Crashes are of various types and include node and cluster crashes, crashes due to invalid UCB addresses, invalid VCB addresses, invalid member IDs, and invalid number of devices. o There is a race condition possible when a CFCB (Cache File Control Block) is being deleted due to XQP action and cache space is being reclaimed from a LIMBO file. o Under certain conditions, a fork locks used by the virtual I/O cache may be created with an incorrect length. This results in unsynchronized data access which can cause corruption. o When a satellite node in a SCSI cluster crashes, the MSCP server marks the physical device as offline which prevents the satellite node from being able to boot back into the cluster. Problems Address in the AXPSHAD06_061 Kit for OpenVMS Alpha V6.1: o Incorrect information in Register 6 and Register 7 causes the system to crash with a REGCORDET register corruption bugcheck. o If the system manager fails to set the value of the ALLOCLASS SYSGEN parameter and then attempts to use shadowing, a shadow volume can be created, but new members cannot be added to the shadow set. No error messages are received until an attempt is made to add a second member to the shadow set. Using the following DCL 'MOUNT' command, the following error messages appear: $ MOUNT/SYSTEM DSA500 /SHADOW=DKB400 ALPHAVMS015 %MOUNT-I-SHDWMEMFAIL, DKB400 failed as a member of the shadow set -SYSTEM-F-INCSHAMEM, incompatible shadow set member. "Incompatible" is not a true statement of the problem. It is actually due to "missing allocation class," or "incorrect allocation class." o I/O to a shadow set may become stalled if a shadow set member is dismounted at the same time from multiple nodes within a cluster. o MOUNT will not add shadow set members unless they are either MSCP or SCSI. o Shadow set member expulsion is currently based on the time it takes for a fork and wait and a PACKACK (Packet Acknowledgment) to complete rather than the actual time transpired. On some devices, particularly SCSI devices, where a PACKACK can take approximately one minute, the timeout was much too long. Using the default value of 20 (seconds) for SHADOW_MBR_TMO would actually mean that it would take 20 minutes to expel a member that is experiencing errors from a SCSI shadowset. o SHDRIVER loss of synchronization may result in a crash where SHADDETINCON is triggered by the check at the end of MATCH_MASTER_SCB. In this consistency check, the SHAD$W_DEVSTS_PASSIVE_MV_CNTR is verified to be zero and is not. Another symptom is that the virtual unit UCB$W_RWAITCNT is zero. Also shadow set member counts of zero may be seen. o Crashes may occur in EXPEL_PACKACK_ANY with connections broken to all members and IRP$L_SHD_LOCK_FR5 = 1 (packack retries exhausted). o All members of a shadow set become inaccessible at the same time and remain inaccessible for a period of time greater than "shadow member timeout" (SHADOW_MBR_TMO or SHADOW_SYS_TMO) seconds but less than MVTIMEOUT seconds. All members subsequently become accessible within seconds of each other but not at exactly the same time. This results in all but one member being expelled from the shadow set. This often occurs when changing HSJ microcode and all members are connected to the same HSJ. When brought back online, polling will cause the devices to be found seconds apart which will result in all but one member being expelled. o All members of the set must be checked to see if they meet the criteria of being MSCP. The original design did not allow for having no index zero member. o In a cluster, using $PROCESS_SCAN explicitly or implicitly with the DCL command, SHOW USER, sometimes causes a system crash due to an ACCVIO in kernel mode or an IVSSRVRQST bugcheck. o When a node with a SCSI bus boots, it resets the SCSI bus. In a multi-host SCSI cluster, this can cause the other node to experience I/O failures. Normally, this results in a brief mount verification. The I/O is retried, succeeds, and there is no serious consequence. However, if the other node is in the process of booting and the system disk is a shadow set, the system will crash. o PGFIPLHI bugcheck in the SHADOW_SERVER process at the REMQUE in K_GET_COPYSHAD_IRP. On OpenVMS Alpha, the PC is A0E and the VA is 274. o A double-deallocation crash may occur as the result of MOUNT not properly initializing the MTL pointer. This error causes the pointer to have a stale value as a result of 2 calls to SYS$VMOUNT from a single program. The problem will not happen as a result of DCL commands, since the cells are initialized at image activation. The stale pointer will only cause a problem if the system is unable to allocate space for defining the logical name. o If a user attempts to mount a disk that is 100% full and the disk was originally initialized with a version of OpenVMS Alpha prior to the one currently in use, paged pool can be corrupted. This leads to system crashes. If the disk is filled AFTER it has been mounted, there will not be any problem. o Tape devices with stacker/loaders, such as the TF857, may take up to 6 minutes to Rewind/unload/load the next tape. A change was made to the behavior of MOUNT to take this delay into account. However, a side effect of this change is that non-stacker drives may also wait 6 minutes before failing. o Processes may hang in RWNPG state while waiting for a request for NPP (non-paged pool) so large that it cannot be satisfied. o A system crash may occur with the current process executing a $CHKPRO system service call. This happens when one routine running in user mode is interrupted by a KERNEL mode AST which activates a routine that uses the same memory. o If a multi-programming application uses a non-homogenous access pattern to a file which is resident in Virtual I/O cache, there is a possibility that the size returned in the I/O status block from a READ operation will be truncated. If a clustered application uses of a large number of concurrent processes to perform file operations consisting of an OPEN, WRITE, and CLOSE sequence repetitively on the same data file, data corruption may occur. In a multi-programming environment where a significant amount of NEW data from a file is being loaded into the cache concurrently by multiple processes, the system may HANG. o When a value block or value status block can not be returned, SYS$GETLKI returns the error SS$_ILLRSDM. A correction has been made to SYS$GETLKI to now return all other requested information and update the wildcard search index. o The Audit Server EXCLUDE process list becomes corrupt after a SET AUDIT/EXCLUDE=pid command is issued. o Data corruption may occur in the file container during the use of PATHWORKS. The corruption can be shown by running CHKDSK on the PC container disk. Using PCDISK to IMPORT and EXPORT files to and from the container will show corrupted files when EXPORTed back to OpenVMS. Problems Addressed in AXPSHAD04_061 Kit for OpenVMS Alpha V6.1, V6.1-1H1, and V6.1-1H2 only: o When booting two or more systems simultaneously from shadowed system disks, the systems may appear to hang. Crashing the systems and examining the crash dumps indicates that shadowing driver blocking AST routines have not run. o When a node runs out of SHADOW_MAX_COPY threads while mounting new copy target units, other nodes in the cluster that have available SHADOW_MAX_COPY threads will not pick up the copy work. This results in the copy not being started for copy members that are added to shadow sets. Problems Addressed in AXPSHAD02_061 Kit for OpenVMS Alpha V6.1, V6.1-1H1, and V6.1-1H2 only: o While running a UETP tape test, fatal controller errors occur. This problem is caused by the incorrect interpretation of a TUDRIVER status subcode by TMSCP (the tape server). After the installation of this ECO kit, a fatal controller error status is returned to the user when this occurs. o Shadow sets have separate mount verification done by SHDRIVER, instead of the usual system mount verification. The SHDRIVER mount verification has an error updating the volume label on shadow sets that have the volume label changed except on the node that issues the label change. Once the devices are in this state, they can not be recovered until MVTIMEOUT is reached or a reboot of all affected nodes is performed. This correction enables the behavior of virtual units to be consistent with the behavior of physical units. o Unnecessary calls to MOUNT verification or host-based volume shadowing processing may occur. On Alpha nodes, these mount verification or Host-Based Volume Shadowing processing calls will fail, resulting in I/O hangs and, eventually, volume invalid errors. o AVAILABLE or OFFLINE status returned from a transfer command does not implement the MSCP specification correctly. o OpenVMS VAX MSCP Parity with OpenVMS Alpha. A served disk may appear to be ONLINE when it is really OFFLINE. This occurs because the MSCP server's CHECK_SERVICE routine searches the device database and incorrectly returns an ONLINE status. o There is no synchronization between SHADOW_PROCESSING and INVALIDATE_ALL_ENTRIES, which allows these two code threads to run simultaneously. This can cause a system crash due to the fact that the SHADOW_PROCESSING thread may remove a member from a multimember shadow set and the INVALIDATE_ALL_ENTRIES thread is not aware that the member has been removed. The system crash occurs in RESTORE_WLE because no Write Log table exists. o A problem exists with the SHADOW_SERVER. Several symptoms of this problem are: + Undiagnosable hangs in individual copy operations or on the entire server + Unexpected copy aborts + Poor copy performance + Shadow set inconsistency An optional new system logical name, SHAD$COPY_BUFFER_SIZE, has also been added. This system logical name can be used to control the buffer size of shadow copies. SHAD$COPY_BUFFER_SIZE has a maximum size of 127 blocks (default) and a minimum size of 31 blocks. The size can be changed by using the DEFINE/SYSTEM command. o High interrupt stack activity occurs on a node performing a merged copy operation. This could adversely affect configurations using HSJ40 controllers with many shadow sets. o Data inconsistency may exist between members of a Phase II shadow set. This occurs under very heavy I/O operations to a shadow set while the members of that shadow set are undergoing failover from one controller to another. o Invalid Command status processing of Write History Management commands unconditionally puts an entry into the error log. This occurs even when there is no actual error. o A second shadow server may accidentally be created using the startup command procedure. This results in desynchronization of shadow sets. The startup procedure has been modified so that it does not allow multiple servers. o When a serving node becomes so busy that it occasionally exhausts resource limits, the RWAITCNT for heavily used disks gets incremented. If a client node requests on ONLINE and RWAITCNT is bumped, it is rejected by MSCP. This makes MOUNTing devices very difficult. o After a system failure, the number of blocks to be rewritten is not computed correctly. This may cause inconsistent data between shadow set members. This occurs during an assisted merge when the information regarding which LBNs to include is only requested from one shadow set member. o A process issuing I/O to a TMSCP tape device may appear to hang after a controller failover attempt. This is caused by an incorrect check of the cached data's lost error status, which results in an endless loop trying to recover a nonexistent error. o OpenVMS Alpha systems are unable to reboot an MSCP controller, such as an HSC. This might result in stalled pending I/O to MSCP or TMSCP devices. o A device may be mounted by an MSCP server, even though a local controller could be used. This situation may still occur after the installation of this ECO kit under extreme timing circumstances. o When new MSCP server I/O is sent to a device that is RWAITCNT stalled and the connection from the driver to the device fails, server I/O is posted to the restart queue if it is active. If not, they are incorrectly left on the UCB (Unit Control Block) pending queue. This causes shadow sets to appear to be stalled. If the connection from the client to the server then fails, I/O from the client that has been passed to the driver is then allowed to complete. If this I/O is stalled on the pending queue, it completes much later, possibly after the client has reissued the stalled I/O. o I/O hangs to a shadow set might occur because the shadowing driver has no way to disable write logging if the write log entries are mismanaged or depleted to a point that the shadow set is unusable. o An Invalid Exception bugcheck might occur in DUDRIVER during I/O request complete processing. o In the past, MSCP could only serve 256 disks. It can now serve 512. o During disk and tape error recover, MSCP is unable to perform a TMSCP controller reset which results in a system crash. o During the processing of a write-log entry in SHDRIVER, a register value may be improperly maintained if the system is low on nonpaged pool. This will cause a system crash with an INVEXCEPTN Bugcheck within SHSB$GET_WLE_TABLE in module SHDSUBS when the entry is resumed. o In the past, Volume Shadowing checked device IDs and the maximum logical block numbers (LBNs.) Volume Shadowing now checks for geometries and maximum LBNs. This enables devices like the RZ28 and RZ28B to operate in the same shadow set. Even though their device IDs differ, their geometries and maximum LBNs will match when configured on like controllers. NOTE: If this remedial kit is installed across a VMScluster system, SCSI shadow sets that are configured across different controller types are not supported and will no longer work. o After approximately 18 hours of operation, some OPCOM messages that should be logged are skipped. o If two members of a three-member shadow set are simultaneously removed, either intentionally or in a failover situation, the system may hang or fail. o System crashes might occur during virtual I/O cache (VIOC) expansion under the following circumstances: + Multiple processes (or processors) are accessing the same file concurrently; + The cache space for that file was being expanded; + That expansion caused the need for a new hash table structure. o When subjected to a high I/O load and multiple failures, the write logging (minimerge) and shadowing synchronization subsystems become unreliable. o Unreliable shadow subsystem behavior and shadow-set hangs result from VMScluster nodes failing to relinquish shadow-set resources. o The TMSCP server bugchecks in TMSCP$FIND_UQB when a command that refers to a specific unit is processed and that unit does not have the Server Local Unit Number (SLUN) bit set. The fix contained in this ECO kit will cause the bugcheck to occur in TUDRIVER instead of the TMSCP server. o I/O may stall to a served shadow-set member. Load balancing makes this condition more likely. o System crashes may occur during processing of stale I/O in Host-Based Volume Shadow Sets. This I/O does not properly reflect changes in shadow set configuration like removal of members and changes in the write-logging state. o Shadow set members may be inconsistent after the failure of a node accessing a shadow set served by an Alpha node. The amount of corrupted data depends on previous I/O operations to the shadow set. Problems Addressed in AXPSHAD01_061 Kit for OpenVMS Alpha V6.1 only: o In Volume Shadowing for OpenVMS Alpha V6.1, minimerge functionality across mixed architecture VMSclusters was disabled. In order to reestablish the minimerge functionality, install this kit across any VMScluster that contains an OpenVMS Alpha V6.1 node. After installation of this kit, the entire cluster must be rebooted simultaneously. Rolling upgrades are *NOT* supported. o Mounting an RZ28B disk device with an RZ28 in the same shadow set is not allowed and will display the following error: %MOUNT-I-SHDWMEMFAIL, $1$DUA0 failed as a member of the shadow set -SYSTEM-F-INCSHAMEM, incompatible shadow set member This behavior is seen when RZ28/RZ28B shadow set members are connected with a local SCSI (Small Computer System Interface) controller. With this kit, RZ28 and RZ28B devices can be combined in a shadow set if they are connected to like controllers. NOTE: If this kit is installed across a VMScluster, SCSI shadow sets configured across different controller types are not supported and will no longer work. VMSclusters with shadowed SCSI disks and mixed-architecture VMSclusters running OpenVMS Alpha V6.1 must apply the kit and reboot the entire cluster simultaneously, so that the entire VMScluster is running the same version of Volume Shadowing software. o In a VMScluster (mixed Alpha/VAX environment), shadow sets served to the DEC 3000 Model 300 are reported as MEDOFL. A DCL command, 'SHOW DEVICE/SERVED', from a VAX 6000 Model 400 shows the shadow sets as AVAILABLE. INSTALLATION NOTES: If you are using the Shadowing option, it is highly recommended that this kit be installed. o When you install the ALPSHAD09_061 remedial kit you must also install the ALPSHAD10_061 or later remedial kit before rebooting your system. Installing the ALPSHAD09_061 kit without installing the ALPSHAD10_061 or later kit could lead to system instability. o Future OpenVMS Alpha V6.1 kits that are issued for facilities included in the ALPSHAD09_061 kit will not install unless the ALPSHAD09_061 kit is installed on your system first. It is highly recommended that the complete ALPSHAD09_061 remedial kit be installed as soon as possible. Installation of individual images from the ALPSHAD09_061 remedial kit is not supported and could result in unpredictable system behavior. o This kit *MUST* be installed on every Alpha in a mixed-architecture VMScluster, and the VAX version of this kit *MUST* be installed on every VAX system in the cluster BEFORE any systems are re-booted into the VMScluster. If both kits are not installed, shadow sets cannot be created. o Working configurations that contain SCSI shadow sets on dissimilar controllers may no longer work. o VMSclusters with shadowed SCSI disks and mixed-architecture VMSclusters running OpenVMS Alpha V6.1 must apply the kit and reboot the entire cluster simultaneously. In these cases, rolling upgrades are not supported. For more information, please see the Problem Description section of the Cover Letter/Release Notes supplied with this kit.

This patch can be found at any of these sites:

Files on this server are as follows:

alpshad09_061.README
alpshad09_061.CHKSUM
alpshad09_061.CVRLET_TXT
alpshad09_061.a-dcx_axpexe