OpenVMS__SHADOW VAXSHAD09_061 VAX V6.1 ECO Summary

NOTE: An OpenVMS saveset or PCSI installation file is stored on the Internet in a self-expanding compressed file. The name of the compressed file will be kit_name-dcx_vaxexe for OpenVMS VAX or kit_name-dcx_axpexe for OpenVMS Alpha. Once the file is copied to your system, it can be expanded by typing RUN compressed_file. The resultant file will be the OpenVMS saveset or PCSI installation file which can be used to install the ECO. Copyright (c) Digital Equipment Corporation 1994, 1995. All rights reserved. ***** WARNING!!! ***** Future OpenVMS VAX V6.1 kits that are issued for facilities included in the VAXSHAD09_061 kit will not install unless the VAXSHAD09_061 kit is installed on your system first. It is highly recommended that the complete VAXSHAD09_061 remedial kit be installed as soon as possible. Installation of individual images from the VAXSHAD09_061 remedial kit is not supported and could result in unpredictable system behavior. Descriptions for problems that were corrected in previous VAX Shadow kits are included in the VAXSHAD09_061 Release Notes. The release notes can be found in the save set VAXSHAD09_061.A. If you have not installed a previous shadow kit it is recommended that you read these release notes before installing the VAXSHAD09_061 Shadow kit. To access the release notes, restore them from the saveset by issuing a command with the following format: $ BACKUP/SEL=VAXSHAD09_061.RELEASE_NOTES DEVICE:[DIR]VAXSHAD09.A/SA- DEVICE:[DIR]VAXSHAD09_061.RELEASE_NOTES PRODUCT: Volume Shadowing for OpenVMS (Phase II) NOTE: The problems fixed in this ECO Kit also affect the following products: VAXcluster Software for OpenVMS VAX VAXcluster Console System (VCS) OP/SYS: OpenVMS VAX COMPONENTS: System, Bugcheck, Backup, Mount, Dismount, MSCP, TMSCP, MTAAACP, I/O Routines, Audit Server, Security, System Primitives, Adaptive Pool Management (APM), Operator Communication Manager (OPCOM), User Environmental Test Package (UETP), Media Management Extensions (MME) SOURCE: Digital Equipment Corporation ECO INFORMATION: ECO Kit Name: VAXSHAD09_061 ECO Kits Superseded by and Included in this ECO Kit: VAXSHADFT09_061 (Never Officially Released) VAXSHAD08_061 (Never Officially Released) VAXSHAD07_061 (For OpenVMS VAX V6.1 systems only) VAXSHAD06_061 VAXSHAD05_061 VAXSHAD04_061 VAXSHAD03_061 VAXSHAD02_061 (CSCPAT_1160) VAXSHAD01_061 (CSCPAT_1160) VAXMTAA01_062 (For OpenVMS VAX V6.1 systems only) VAXMTAA02_061 VAXMTAA01_061 (CSCPAT_1154) VAXMONT01_061 (For OpenVMS VAX V6.1 systems only) VAXSYS14_061 (For OpenVMS VAX V6.1 systems only) VAXSYS12_061 VAXSYS07_061 VAXSYS01_061 (CSCPAT_1113) VAXMME01_061 (CSCPAT_1174) VAXOPCO01_061 (CSCPAT_1144) VAXAUDI02_061 ECO Kit Size: 3960 Kit Applies To: OpenVMS VAX V6.1 System/Cluster Reboot Necessary: Yes CAUTION: Before Installing this Kit, Read the Following Cautions: After installation of this kit, the following issues may occur: 1) ISSUE: When a node reboots into the cluster there may not be an OPCOM message that reports the node is joining the cluster. Absent messages occur on a random basis. WORKAROUND: In order to verify the node has entered the cluster, after the node has fully rebooted, the user should enter the command: $ SHOW CLUSTER to verify the node is a valid member of the VAXcluster. 2) ISSUE FROM THE CSC: An INVEXCEPTN in SNDRIVER may be seen if DECnet/SNA V2.1 is used in conjunction with the IO_ROUTINES from the VAXSHAD ECO kit. SNAVMS_E04021 (CSCPAT_5041) will fix this problem by replacing the incompatible SNDRIVER in DECnet/SNA V2.1 NOTE: SNAVMS_E04021 applies to DECnet/SNA V2.1 only. These issues are being addressed and will be corrected in a future version of OpenVMS VAX. ECO KIT SUMMARY: An ECO kit exists for Volume Shadowing on OpenVMS VAX V6.1. This kit contains the fixes described below. Problems Addressed in the VAXSHAD09_061 Kit: o A 'SET SECURITY' or 'SET ACL' on a volume in an OpenVMS cluster places high I/O on the server process. This exhausts paged pool and the AUDIT_SERVER goes into an RWPAG state. This problem is corrected in OpenVMS VAX V6.2 o A field in the IRP that is used during Volume Processing is not initialized in clones of USER IOs. If an error occurs, the code that determines the severity of the error can be misled by data in these fields. It can fail to locate the error and return the IO as successful. Since a zero-byte count is returned, an Incomplete Segmented Transfer error will occur. The fix is to initialize the field when the clone is allocated. o While creating a page, a user process might be swapped out and then return using a different balance set slot. This problem is corrected in OpenVMS VAX V6.2. o Certain applications calling $AUDIT_EVENT with ASTs disabled will be interrupted when $AUDIT_EVENT returns to the caller. This problem is corrected in OpenVMS VAX V6.2 o The code relies on a page being present when it attempts to release a spinlock. If the system is paging heavily, the page may not be available. This may result in pagefaults in EXE$BRKTHRU at IPL greater than 2. This problem is corrected in OpenVMS VAX V6.2 o Repeating wakeups from $SCHDWK show an accumulating drift over time. This problem is corrected in OpenVMS VAX V6.2. o Magnetic tape position may be lost in differing circumstances: - COPY and/or BACKUP of a DISK to a TMSCP-Served TAPE, will fail when the tape device is placed in an MV state. The failure does not occur if the same task is performed locally. - COPY will fail with: "SYSTEM-F-TAPEPOSLOST, magnetic tape position lost" - BACKUP will fail with: "-SYSTEM-F-DATALOST, data lost" This problem is corrected in OpenVMS VAX V6.2. o To transition an OpenVMS process from the virtual balance set to the real balance set, the SPTEs (system page table entries) which describe its process PTE pages (process page table pages) need to be copied from saved memory back into the real balance slot from where they originally came. This makes the process' P0 and P1 space accessible again. SPTEs for the process page table pages describing the undefined area between P0 and P1 must be represented by pre-initialized null values (actually, ERKW DZERO-type values). When this undefined void area is exactly zero pages (i.e., P0 and P1 are tangent), the VBSS$READ_OPT2_VBSM routine takes the wrong branch, causing a VBSSERR bugcheck. This fix adds a test for this case, and takes the image's correct branch. This problem is corrected in OpenVMS V6.2. o When a process is switched from a real balance slot to a virtual balance slot, the allocation may fail, causing a VBSSERR bugcheck. This problem is corrected in OpenVMS VAX V6.2. o Incorrect quota value is returned when process quota (BYTLM) is returned to a process for a created system global section. This problem is corrected in OpenVMS VAX V6.2. o System crashes may occur due to corrupted PTE entries. The corruption appears to be Global Section Table Entries pointing to Global Section Descriptors. The problem occurs only if 4095 GBLSECTIONS are exceeded. To check the number of Global Sections currently in use, add the following values: - SDA> VALIDATE QUEUE EXE$GL_GSDSYSFL !global sections - SDA> VALIDATE QUEUE EXE$GL_GSDDELFL !delete pending global !sections - SDA> VALIDATE QUEUE EXE$GL_GSDGRPFL !group global sections o Devices can remain allocated to processes that no longer exist. The device remains unusable until the system is rebooted. o If a previously shadowed disk is mounted with a MOUNT/OVER=SHADOW command and a new shadow set is created using this disk, OpenVMS VAX will attempt to create the old shadow set using the old physical device names. o The system crashes with a NOBVPVCB bugcheck. The crash occurs on the kernel stack with MTAAACP.EXE as the current image. o The system crashes with an XQPERR while dismounting a MAD drive. o SUBTRACED errors are not correctly determined for images installed with /HEADER_RESIDENT. This problem is corrected in OpenVMS VAX V6.2. o Users of ORACLE[R] Rdb V6.1 may get ILLIOFUNC errors when doing IO to a Host Based Shadowset whose members are served. o The user will see a large number of shadow copies being done by OpenVMS rather than the controller, even when both disks are on the same controller and the controller has DCD (Disk Copy Data) capabilities. o If a three-member Shadowset has its index zero member as a copy target and all three members require a MERGE, when the COPY completes the MERGE does not take place. The LBN for the just completed COPY (the last LBN on the disk) is passed as the MERGE starting LBN, so it completes without doing any IO. o Failures occur during attempts to start copies or restart copies, usually after a node halt, shutdown or reboot. Additional symptoms observed include inconsistent values for HBS_CIP when compared to SHADOW_MAX_COPY, negative values for HBS_CIP and copies that should continue start over from the beginning. o System hangs may occur when I/Os pending to a shadow set do not complete. Problems Addressed in the VAXSHAD07_061 Kit: o In the VAXSHAD05 and VAXSHAD06 kits two new fields were added to the IRP data structure for shadow write logging information. This new IRP definition size conflicts with the IRP sizes of other images on the system that are not part of the SHADOW kits. This conflict may cause a variety of errors, including fatal bugchecks. This fix changes the IRP definitions back to the SBB versions and adds some special definitions to the SHDRIVER for the new IRP fields. o Fatal bugchecks from data structure corruption may occur due to the addition of the value 10 HEX to the corrupted field. Crashes are of various types and include node and cluster crashes, crashes due to invalid UCB addresses, invalid VCB addresses, invalid member IDs, and invalid number of devices. o When trying to access a DFS disk, the following error may be seen: -SYSTEM-F-FILALRACC, file already accessed on channel The disk can be accessed immediately after reboot; however, after a period of time of not accessing the disk, a simple directory command will return this error. o If a tape is initialized with a non-blank accessibility field and then mounted using /OVERRIDE=(ACCESSIBILITY), the tape mounts but cannot be read or written to. The command format to initialize the tape would be similar to: INIT/LABEL=VOLUME_ACCESSIBILITY="+" tape: LABEL In addition, the following OPCOM messages are generated and the tape volume is automatically unloaded after an attempt to WRITE or READ the tape volume: %%%%%%%%%%% OPCOM 12-DEC-1994 12:57:23.53 %%%%%%%%%%% Message from user USERXX on NODEXX non-blank accessibility field in volume labels on SYS$DEVICE: %%%%%%%%%%% OPCOM 12-DEC-1994 12:57:23.54 %%%%%%%%%%% o MTAAACP posts attention ASTs to its mailbox. If the AST QUOTA reaches zero and an attempt is made to kill the MTAAACP process or the process that emitted the QIO, MTAAACP will go into the RWAST state and hang. Problems Addressed in the VAXSHAD06_061 Kit: o When using PATHWORKS, data corruption may occur on the file container. The corruption can be seen by running CHKDSK on the PC container disk. Also using PCDISK to IMPORT and EXPORT files to and from the container will show a corrupted file when EXPORTed back to VMS. o System crashes with INVEXCEPTN bugcheck at SCH$POSTEF+21. To correct this problem, a change was made in the IOC$SIMREQCOM routine to cause the destination of the IFNOWET test to initialize R4 before calling the IOC$SCHEDEF routine. IOC$SCHEDEF expects R4 to have the address of the user's PCB. Problems Addressed in the VAXSHAD05_061 Kit for OpenVMS VAX V6.1: o After a node crashes, on reboot it cannot mount a Host Based Volume Shadowing virtual unit. The error message usually returned is "volume not software enabled"; however, "Medium Offline" may also be seen. A SHOW DEVICE will show that the the Shadowset is in 0% merge but SNA will show that a minimerge is pending. o A double deallocation crash may occur as the result of MOUNT not properly initializing the Mounted Volume List (MTL) pointer. This pointer had a stale value as a result of two calls to SYS$VMOUNT from a single program. The stale pointer will only cause a problem if the system is unable to allocate space for defining the logical name. NOTE: Since cells are initialized at image activation, this problem should not occur as a result of DCL commands. o Tape devices with stacker/loaders, such as the TF857, may take up to 6 minutes to rewind/unload/load the next tape. In VAXSHAD01_061, a change was made to the behavior of MOUNT to take this delay into account. However, a side effect of that change was that non-stacker drives may also wait 6 minutes before failing. o System crashes with an INVEXCEPTN during a SHDRIVER COPY_DATA_REPAIR copy operation. o If the value of the ALLOCLASS SYSGEN parameter is not set and the user tries to use shadowing, a shadow volume can be created but members cannot be added to the shadow set. No error messages are received up until a second member is added. On the MOUNT command, the customer will receive the error messages: $ mount /system dsa500 /shadow=dkb400 alphavms015 %MOUNT-I-SHDWMEMFAIL, DKB400 failed as a member of the shadow set -SYSTEM-F-INCSHAMEM, incompatible shadow set member "Incompatible" is an inappropriate statement of the problem. A more accurate message would be "missing allocation class," or "incorrect allocation class." o If a shadow set member is dismounted at the same time from multiple nodes within a cluster, I/O to the shadow set may become stalled. o Mount will not add shadow set members unless they are either MSCP or SCSI. o Shadow set member expulsion is currently based on the time it takes a fork & wait and a PACKACK to complete rather than the actual time transpired. On some devices, particularly SCSI, where a PACKACK can take approximately one minute, the timeout was much too long. Using the default value of 20 (seconds) for SHADOW_MBR_TMO would actually mean that it would take 20 minutes to expel from a SCSI shadow set a member experiencing errors. o SHDRIVER loss of synchronization may result in a crash where SHADDETINCON is triggered by the check at the end of MATCH_MASTER_SCB. In this consistency check, the SHAD$W_DEVSTS_PASSIVE_MV_CNTR is verified to be zero and is not. Another symptom is that the virtual unit UCB$W_RWAITCNT is zero. Shadow set member counts of zero may also be seen. o Crashes may occur in EXPEL_PACKACK_ANY with connections broken to all members and IRP$L_SHD_LOCK_FR5 = 1 (packack retries exhausted). o All members of a shadow set become inaccessible at the same time and remain inaccessible for a period of time greater than "shadow member timeout" (SHADOW_MBR_TMO or SHADOW_SYS_TMO) seconds but less than MVTIMEOUT seconds. All members subsequently become accessible within seconds of each other but not at exactly the same time. This results in all but one member being expelled from the shadow set. This often occurs when changing HSJ microcode and all members are connected to the same HSJ. When brought back online, polling will cause the devices to be found seconds apart which will result in all but one member being expelled. o All members of the set must be checked to see if they meet the criteria of being MSCP. The original design did not allow for having no index zero member. o When the mounting of full copy targets exceeds the SHADOW_MAX_COPY threads for a given node, other nodes with the shadow set mounted do not pick up the copy work. o In a cluster, using $PROCESS_SCAN explicitly or implicitly with the DCL 'SHOW USER' command sometimes causes a system crash due to an ACCVIO in kernel mode or an IVSSRVRQST bugcheck. o When a node with a SCSI bus boots, it resets the SCSI bus. In a multi-host SCSI cluster, this can cause the other node to experience I/O failures. Normally, this results in a brief mount verification. The I/O is retried, succeeds, and there is no serious consequence. However, if the other node is in the process of booting and the system disk is a shadow set, the system will crash. o A PGFIPLHI bugcheck may occur in the SHADOW_SERVER process at the REMQUE in K_GET_COPYSHAD_IRP. On OpenVMS VAX, the PC is A0E and the VA is 274. o A page setup module which draws a frame and company logo on each page of output is used on a queue pointing to an LN03. This page setup module works on OpenVMS Version VAX 5.5-2 and prior versions. However, with VAXQMAN8_U2055 (CSCPAT_1165) or OpenVMS VAX Version 6.1 installed, this page setup module causes the printer to continually spew out paper with only the output from the page setup module. This continues until the entry is deleted from the queue. o If a multi-programming application uses a non-homogenous access pattern to a file which is resident in Virtual I/O cache, there is a possibility that the size returned in the I/O status block from a READ operation will be truncated. o If a clustered application uses of a large number of concurrent processes to perform file operations consisting of an OPEN, WRITE, and CLOSE sequence repetitively on the same data file, data corruption may occur. o In a multi-programming environment where a significant amount of NEW data from a file is being loaded into the cache concurrently by multiple processes, the system may HANG. o If a user attempts to mount a disk that is 100% full on OpenVMS VAX V6.* and the disk was originally initialized with a version of OpenVMS VAX prior to V6.0, paged pool can be corrupted leading to system crashes. If the disk is filled AFTER it has been mounted under V6.*, there will not be any problem. o The class driver will sometimes attempt to send an MSCP command packet on the wrong connection. This fix detects this mismatch and corrects it. o Due to invalid allocation counts, processes hang in RWNPG state waiting for a request for non-paged pool (NPP) so large that it cannot be satisfied. o The system crashes with the current process executing a $CHKPRO system service call. o A $AUDIT_EVENT system crash my occur in SECURITY.EXE due to corrupt scan structure storage. o When a rights list is passed into $CHKPRO (CHP$_RIGHTS), it is copied into the ARB within the NSA$A_SCRATCH area. This area will hold a maximum of eight rights. The code that handles this copy operation will split any larger rights list into the first eight, which are copied into the local rights area, and the remainder, which a descriptor is created and its address is added as extended process rights. The code involved in copying the first eight rights is looping incorrectly and copying rights to random locations within the NSA$A_SCRATCH area usually resulting in a SSRVEXCPT crash. o When a value block or value status block cannot be returned, SYS$GETLKI returns the error SS$_ILLRSDM. A correction has been made to SYS$GETLKI to now return all other requested information and update the wildcard search index. Problems Addressed in the VAXSHAD04_061 Kit: o When booting two or more systems simultaneously from shadowed system disks, the systems may appear to hang. Crashing the systems and examining the crash dumps indicates that shadowing driver blocking AST routines have not run. o When a node runs out of SHADOW_MAX_COPY threads while mounting new copy target units, other nodes in the cluster that have available SHADOW_MAX_COPY threads will not pick up the copy work. This results in the copy not being started for copy members that are added to shadow sets. Problems Addressed in the VAXSHAD03_061 Kit for OpenVMS VAX V6.1: o A double-deallocation crash may occur as the result of MOUNT not properly initializing the MTL pointer. This pointer had a stale value as a result of 2 calls to SYS$VMOUNT from a single program. The problem will not happen as a result of DCL commands, as the cells are initialized at image activation. The stale pointer will only cause a problem if the system is unable to allocate space for defining the logical name. o An OPCOM message was being output even though /NOASSIST was specified in the MOUNT command. This caused problems for UETP. o A system crash may occur in SECURITY.EXE. o A process is in RWPAG while auditing an event. o When the current process executes a $CHKPRO system service call, the system will crash. o Processes hang in RWNPG state (Call to $CRMPSC) waiting for a request for NPP so large that it cannot be satisfied. o DISMOUNT/OVERRIDE=CHECKS against the SYSTEM disk is allowed. Once this command is issued nothing else can be done. Installation of this kit will allow this command to only be issued on non-system disks. o When booting from a Controller-Based Shadowed System disk for the first time as a Host-Based Shadowed System disk, boot fails with a SHADBOOTFAIL bugcheck. A SHADBOOTFAIL may also occur if SHADOW_SYS_UNIT is changed at boot time. o During a copy operation the system may crash with an ACCVIO. o When a user program allocates a read buffer from a TMSCP-served tape creator, the record on tape will get server node system data returned along with the data on tape. Printing the buffer will show that the data from tape is in the correct location of the buffer but it will also show that the area of the buffer that was not supposed to be changed contains server node system data. Problems Addressed in the VAXSHAD02_061 Kit: o The local MSCP server issues a fatal MSCPSERV bug check when it should not. The server should instruct the remote DISKCLASS driver to BUGCHECK. o When a serving node becomes so busy that it occasionally exhausts resource limits, the RWAITCNT for heavily used disks gets incremented. If a client node requests an ONLINE and RWAITCNT is bumped, it is rejected by MSCP. This makes MOUNTing devices very difficult. o On OPCOM restart, the old privilege mask's upper 32-bits may not be restored to their original value. This mask is declared as a longword, but used as a quadword. o When OPCOM receives a message that it does not recognize, the message is included in the log file with the following text: %%%%%% OPCOM 19-APR-1994 11:20:40.06 %%%%%% DUMP_LOG_FILE OPCOM has noticed a condition which might be due to an internal error. might also be explained by normal events, especially if nodes have just crashed or rebooted in a VAXcluster. Please bring this message to Digital's attention only if you are having problems with operator communications. Buffer is 8 (%X0008) bytes -- "- Unknown message received" 00000000 00000000 00000000 00000000 00000000 00000000 - 41534403 0015007B o When an assisted merge is performed, an inaccurate number of LBNs (Logical Block Numbers) and bytes transferred may be computed. Therefore, all LBNs may not be merged in assisted merge operations. o Access path attention (ACPTH) messages are used by MSCP to determine secondary paths for disks that are attached to dual controllers. DUDRIVER might incorrectly assign this information to the wrong device if two units with the same unit number and allocation class exist. These messages may also trigger unnecessary failover attempts. o Servers in VAXclusters with more than 127 nodes may crash when the 128th node attempts to access a given disk. This usually occurs after a serving node crashes for other reasons, but this causes the rest of the servers to crash. o In a small working set, it is possible for the EXE$PSCAN_NEXT_PID routine (called by $GETJPI) to take a page fault at IPL 8. This causes a PGFIPLHI bugcheck. The page referenced is in the PROCESS_SCAN context block (PSCANCTX$ data structure) in process virtual address space. o While running a UETP tape test, fatal controller errors may occur. This problem is caused by TMSCP (the tape server) incorrectly interpreting a TUDRIVER status subcode. This misinterpretation is converted to a fatal controller error status and returned to the user. o Shadow sets have separate mount verification done by SHDRIVER, instead of the usual system mount verification. The SHDRIVER mount verification has an error updating the volume label on shadow sets that have the volume label changed except on the node that issues the label change. Once the devices are in this state, they can not be recovered until MVTIMEOUT is reached or a reboot of all affected nodes is performed. This correction enables the behavior of virtual units to be consistent with the behavior of physical units. o Unnecessary calls to MOUNT verification or host-based volume shadowing processing may occur. On Alpha nodes, these mount verification or Host-Based Volume Shadowing processing calls will fail, resulting in I/O hangs and, eventually, volume invalid errors. o AVAILABLE or OFFLINE status returned from a transfer command does not implement the MSCP specification correctly. o OpenVMS VAX MSCP Parity with OpenVMS Alpha. A served disk may appear to be ONLINE when it is really OFFLINE. This occurs because the MSCP server's CHECK_SERVICE routine searches the device database and incorrectly returns an ONLINE status. o There is no synchronization between SHADOW_PROCESSING and INVALIDATE_ALL_ENTRIES, which allows these two code threads to run simultaneously. This can cause a system crash due to the fact that the SHADOW_PROCESSING thread may remove a member from a multimember shadow set and the INVALIDATE_ALL_ENTRIES thread is not aware that the member has been removed. The system crash occurs in RESTORE_WLE because no Write Log table exists. o A problem exists with the SHADOW_SERVER. The symptoms of this problem are: + Undiagnosable hangs in individual copy operations or on the entire server + Unexpected copy aborts + Poor copy performance + Shadow set inconsistency o High interrupt stack activity occurs on a node performing a merged copy operation. This could adversely affect configurations using HSJ40 controllers with many shadow sets. o Data inconsistency may exist between members of a Phase II shadow set. This occurs under very heavy I/O operations to a shadow set while the members of that shadow set are undergoing failover from one controller to another. o Invalid Command status processing of Write History Management commands unconditionally puts an entry into the error log. This occurs even when there is not actual error. o A second shadow server may accidentally be created using the startup command procedure. This results in desynchronization of shadow sets. The startup procedure has been modified so that it does not allow multiple servers. o When a serving node becomes so busy that it occasionally exhausts resource limits, the RWAITCNT for heavily used disks gets incremented. If a client node requests an ONLINE and RWAITCNT is bumped, it is rejected by MSCP. This makes MOUNTing devices very difficult. o After a system failure, the number of blocks to be rewritten is not computed correctly. This may cause inconsistent data between shadow set members. This occurs during an assisted merge when the information regarding which LBNs to include is only requested from one shadow set member. o A process issuing I/O to a TMSCP tape device may appear to hang after a controller failover attempt. This is caused by an incorrect check of the cached data's lost error status, which results in an endless loop trying to recover a nonexistent error. o In the past, Volume Shadowing checked device IDs and the maximum logical block numbers (LBNs.) Volume Shadowing now checks for geometries and maximum LBNs. This enables devices like the RZ28 and RZ28B to operate in the same shadow set. Even though their device IDs differ, their geometries and maximum LBNs will match when configured on like controllers. NOTE: If this remedial kit is installed across a VMScluster system, SCSI shadow sets that are configured across different controller types are not supported and will no longer work. o A device may be mounted by an MSCP server, even though a local controller could be used. This situation may still occur after the installation of this ECO kit under extreme timing circumstances. o When new MSCP server I/O is sent to a device that is RWAITCNT stalled and the connection from the driver to the device fails, server I/O is posted to the restart queue if it is active. If not, they are incorrectly left on the UCB (Unit Control Block) pending queue. This causes shadow sets to appear to be stalled. If the connection from the client to the server then fails, I/O from the client that has been passed to the driver is then allowed to complete. If this I/O is stalled on the pending queue, it completes much later, possibly after the client has reissued the stalled I/O. o Incorrect MSPC-served disk synchronization might cause I/O to an MSCP-served disk to become stalled on an internal queue which would be restarted later. o I/O hangs to a shadow set might occur because the shadowing driver has no way to disable write logging if the write log entries are mismanaged or depleted to a point that the shadow set is unusable. o An Invalid Exception bugcheck might occur in DUDRIVER during I/O request complete processing. o In the past, MSCP could only serve 256 disks. It can now serve 512. o During the processing of a write-log entry in SHDRIVER, a register value may be improperly maintained if the system is low on nonpaged pool. This will cause a system crash with an INVEXCEPTN Bugcheck within SHSB$GET_WLE_TABLE in module SHDSUBS when the entry is resumed. o After approximately 18 hours of operation, some OPCOM messages that should be logged are skipped. o If two members of a three-member shadow set are simultaneously removed, either intentionally or in a failover situation, the system may hang or fail. o System crashes might occur during virtual I/O cache (VIOC) expansion under the following circumstances: + Multiple processes (or processors) are accessing the same file concurrently; + The cache space for that file was being expanded; + That expansion caused the need for a new hash table structure. o When subjected to a high I/O load and multiple failures, the write logging (minimerge) and shadowing synchronization subsystems become unreliable. o Unreliable shadow subsystem behavior and shadow-set hangs occur when VMScluster nodes fail to relinquish shadow-set resources. o The TMSCP server bugchecks in TMSCP$FIND_UQB when a command that refers to a specific unit is processed and that unit does not have the Server Local Unit Number (SLUN) bit set. The fix contained in this ECO kit will cause the bugcheck to occur in TUDRIVER instead of the TMSCP server. o I/O may stall to a served shadow-set member. Load balancing makes this condition more likely. o System crashes may occur during processing of stale I/O in Host-Based Volume Shadow Sets. This I/O does not properly reflect changes in shadow set configuration, notably removal of members and changes in the write-logging state. o Shadow set members may be inconsistent after the failure of a node accessing a shadow set served by an Alpha node. The amount of corrupted data depends on previous I/O operations to the shadow set. Problems Addressed in the VAXSHAD01_061 Kit: o In Volume Shadowing for OpenVMS Alpha Version 6.1, several changes were made to the assisted merge (minimerge) functionality. These changes disabled mimimerge functionality across mixed architecture VMSclusters. With minimerge disabled, shadowing continued to function normally, except that a full merge was always done when a merge operation occurred. Full merges take considerably longer than minimerges. If you want minimerge functionality, Digital recommends that you install this kit across any VMSclusters that contain an Alpha node running OpenVMS Alpha Version 6.1. Mixed-architecture VMSclusters that are running OpenVMS Alpha Version 6.1 must apply this kit and reboot the entire cluster simultaneously. In these cases, rolling upgrades are not supported. o Prior to this remedial kit, if attempts were made to mount an RZ28B disk device with an RZ28 in the same shadow set, Volume Shadowing detected different device IDs and may not have allowed the devices to be mounted. This behavior applied only an RZ28/RZ28B shadow-set combination when connected with a local SCSI controller. Since RZ28 and RZ28B are different device types but can be shadowed, the checking for shadow-set membership in the host-based shadowing software needed to be modified. This remedial kit enables the combination of RZ28 and RZ28B devices in a shadow set, as long as they are connected to like controllers. With the use of SCSI devices, like controllers are required because geometry can vary from controller to controller. Digital recommends that SCSI shadow sets be configured across like controller types. Existing SDI and DSSI configurations are unaffected; if they are not using SCSI drives and are shadowing SDI devices across different controllers, these configurations will continue to work without this remedial kit. VMSclusters with shadowed SCSI disks and mixed-architecture VMSclusters running OpenVMS Alpha Version 6.1 must apply the kit and reboot the entire cluster simultaneously, so that the entire VMScluster is running the same version of Volume Shadowing software. The kit is required for both VAX and Alpha nodes. Do not mount shadow sets containing RZ28 and RZ28B devices without first applying this kit. o The MME$$MNTREQ function, which requests that a volume should be selected for mount, allowed the use of logical names for the device name. However, since these are process logical names, as part of the caller's process, these logical names are not available to the media manager. o A device not ready for magtapes error is not reported until a delay of up to 6 minutes has expired. o If a user creates a shadow set, dismounts the set, then mounts just one of the members, the other members of the set will be marked "ONLINE" when viewed from the HSC. As a result, no HSC operations are allowed until the disk is MOUNTed then DISMOUNTed from the shadow set. o If MOUNT fails to create a logical name, no error information is displayed. In this case, the logical name may point to an incorrect device. o If a device is MOUNTED/SYSTEM and then it is MOUNTED/CLUSTER with conflicting /OWNER_UIC or /PROTECTION qualifiers, incorrect error messages may be displayed. The following two types of errors may occur. + The error message may generate garbage which would force terminal characteristics to be reset to ASCII. + The following error messages may be displayed: inconsistent /PROTECTION option. Cluster mounted (garbage) inconsistent /OWNER_UIC option. Cluster mounted (garbage) o When a disk with a large EXTENT value is mounted under V6.* for the first time or if the SECURITY.SYS file is missing from the system, the SECURITY.SYS file will be created as EXTENT size and rounded up for the disk cluster size. This may waste disk space. o The message for %MOUNT-F-BADUNDFAT has a typographical error. o If the VOLUME_ACCESSIBILITY option is used in conjunction with the INITIALIZE/LABEL= command upon tape initialization, a user with all privileges enabled is unable to access the tape unless he/she is the owner. o In an OPCOM message, there is no separating the device name and the comment text. o After a BACKUP operation, the header of the INDEXF.SYS file of the backup save set is corrupted. This can be seen by issuing the following DCL command: $ ANALYZE/DISK DJA0: o Previously, MOUNT only waited 10 seconds to allow magtape devices to become ready before determining that the device is off line. Tx8x7 tape devices may take up to 6 minutes to become ready during a volume switch. This fix causes the wait to be done in user mode so that the wait can be aborted by the user via a CTRL/C. Problems Addressed in the VAXMTAA01_062 Kit: o The system crashes with a NOBVPVCB bugcheck. The crash occurs on the kernel stack with MTAAACP.EXE as the current image. o The system crashes with an XQPERR while dismounting a MAD drive. Problems Addressed in the VAXMTAA02_061 Kit: o If a tape is initialized with a non-blank accessibility field and then mounted using /OVERRIDE=(ACCESSIBILITY), the tape mounts but cannot be read or written to. The command format to initialize the tape would be similar to: INIT/LABEL=VOLUME_ACCESSIBILITY="+" tape: LABEL In addition, the following OPCOM messages are generated and the tape volume is automatically unloaded after an attempt to WRITE or READ the tape volume: %%%%%%%%%%% OPCOM 12-DEC-1994 12:57:23.53 %%%%%%%%%%% Message from user USERXX on NODEXX non-blank accessibility field in volume labels on SYS$DEVICE: %%%%%%%%%%% OPCOM 12-DEC-1994 12:57:23.54 %%%%%%%%%%% o If a user attempts to stop the MTAAACP process or a process that emitted a QIO, MTAAACP will go into RWAST state and hang. Problems Addressed in the VAXMTAA01_061 Kit: o If the wrong magnetic tape volume is inserted as the next volume, MTAAACP cancels the request and then hangs. Problems Addressed in the VAXMONT01_061 Kit: o Specifying the DISK Class to Monitor can result in unexpected side effects to the display. When the MONITOR DISK command is issued on a system with DFS devices mounted, only the first three characters of the DFS name are displayed correctly. Instead of the fourth character, the low byte of the unit number is output. It is often displayed as an non-printable character or as an escape sequence (in which case, it may cause terminal lock-ups, resetting characteristics, etc). The following command illustrates this problem when executed on a system with DFS disks mounted: $MONITOR DISK DISK I/O STATISTICS on node NODENAME 7-APR-1994 16:25:17 I/O Operation Rate DSA2241: FOLKLORE 6.27 6.27 6.27 6.27 DSA2249: AUDIT 0.00 0.00 0.00 0.00 DSA2263: VMS19NOVC3L 0.00 0.00 0.00 0.00 DSA2264: LAV19NOVC3L 0.00 0.00 0.00 0.00 DSA2265: MDF19NOVC3L 15.84 15.84 15.84 15.84 DSA2266: VMS28APRB3E 0.00 0.00 0.00 0.00 DSA2267: LAV28APRB3E 0.00 0.00 0.00 0.00 DSA2268: MDF28APRB3E 0.00 0.00 0.00 0.00 DSA2269: VMS18JANC3L 0.00 0.00 0.00 0.00 DSA2270: MDF18JANC3L 0.00 0.00 0.00 0.00 DSA2271: LAV18JANC3L 0.00 0.00 0.00 0.00 DSA2280: VMS12OCTM3C 0.00 0.00 0.00 0.00 $254$DFS�1001() DEC:..._STAR 0.00 0.00 0.00 0.00 $254$DFSH8008() V501_RESD 0.00 0.00 0.00 0.00 $254$DFSI8009() V51_RESD 0.00 0.00 0.00 0.00 o The 'MONITOR DISK' command hangs when monitoring a system with more than 800 disks. MONITOR contains an arbitrary upper limit of 800 on the number of disks it can monitor. When a system contains more than 800, MONITOR generates an error status, but the status is not properly signaled, and the display appears to hang. This can also be seen with a 'MONITOR CLUSTER' command (which collects DISK data implicitly). o Due to an inadequate synchronization mechanism, the MONITOR DISK command can go into an infinite loop on multi-processor machines. o MONITOR PROCESS in a local environment will fail if the SYSGEN parameter MAXPROCESSCNT is set to allow more than 1040 processes. When Virtual Balance Slots were added in OpenVMS V6.0, this number dropped to 978. Problems Addressed in the VAXSYS14_061 kit: o There is a race condition possible when a CFCB (Cache File Control Block) is being deleted due to XQP action and cache space is being reclaimed from a LIMBO file. o Disk corruption can occur when heavy open/read/write/close/delete operations are occurring. o At some point after a node CLUEXITs, 2 or more cluster nodes crash with LOCKMGRERR Bugchecks. o When two or more VAX or Alpha nodes are booting at the same time, one or both of them will crash. Problems Addressed in the VAXSYS12_061 Kit: o When a value block or value status block cannot be returned, SYS$GETLKI returns the error SS$_ILLRSDM. A correction has been made to SYS$GETLKI so that it now returns all other requested information and updates the wildcard search index. Problems Addressed in the VAXSYS07_061 Kit: o If a multi-programming application uses a non-homogenous access pattern to a file which is resident in Virtual I/O cache, there is a possibility that the size returned in the I/O status block from a READ operation will be truncated. If a clustered application consisting of a large number of concurrent processes which perform file operations consisting of an OPEN, WRITE, CLOSE sequence on the same data file repetitively, a possibility of data corruption exists. In a multi-programming environment, where a significant amount of NEW data from a file is being loaded into the cache concurrently by multiple processes, the possibility of a HANG exists. Problems Addressed in the VAXSYS01_061 Kit: o SYS$CHKPRO had several problems that did not manifest themselves in a readily visible effect to the end user. The problems include: - accepting up to 11 rights lists even though no more than two would actually be processed. - CHKPRO would accept a CHP$_UIC and write it over a location which was to contain a rightslist pointer. - In most cases the wrong UIC was used in access checking. The only time the customer would notice a problem is if they specifically tested access to an object known to be protected from current rights and UIC settings. o Nonpaged dynamic memory (NPAGEDYN) expansion occurs even when there is a large amount of free space available. This can lead to performance problems as pool expansion causes free memory to be diverted away from that available to processes and dedicated to nonpaged pool usage. For example, with a SHOW MEMORY/POOL command you can observe that the "Total" amount of "Nonpaged Dynamic Memory" increases when the amount of "Free" bytes is quite large: Dynamic Mem Usage (bytes): Total Free In Use Largest Nonpaged Dynamic Mem 38555136 17372224 21182912 38720 Paged Dynamic Mem 17282048 8295888 8986160 8265232 Starting with the introduction of the Adaptive Pool Management (APM) feature, in OpenVMS VAX V6.0, these figures include the contributions of both the lookaside lists and the variable pool. So, a large "Free" figure is indicative of large (and possibly, growing) lookaside lists. If the "Total" figure is increasing, it indicates that pool expansion is occurring, and that the lookaside list space is not being used effectively. The above symptom can result from either of the two following separate problems: - A routine in the software which supports security features such as "rightslists" was obtaining a nonpaged pool block and then freeing it in two smaller pieces. - An internal loop counter governing the number of times a lookaside list allocation was attempted, was set too low. This problem will most likely be seen on the VAX 6000 - 500 and 600. A third software change associated with APM will also be available in a future OpenVMS VAX version, but is not available as a remedial change. The third change provides a potential performance benefit under very specialized conditions, such as during VMScluster state transitions. Problems Addressed in the VAXMME01_061 Kit: o The MME$$MNTREQ function which requests that a volume should be selected for MOUNT, allows the use of logical names for the device name. However, since these are process logical names, as part of the callers process, these logical names are not available to the media manager. o MME applications are no longer able to set mount and device context. Problems Addressed in the VAXOPCO01_061 Kit for OpenVMS VAX V6.1: o When a node leaves a VAXcluster, OPCOM goes into a tight loop on one of the remaining nodes in the cluster. OPCOM can be seen using 90-95% of the CPU. Problems Addressed in the VAXAUDI02_061 Kit for OpenVMS VAX V6.1: o The Audit Server EXCLUDE process list may become corrupt after the DCL 'SET AUDIT/EXCLUDE=pid' command is issued. INSTALLATION NOTES: This kit *MUST* be installed on every VAX in a mixed-architecture VMScluster, and the Alpha (ALPSHAD) version of this kit *MUST* be installed on every Alpha system in the cluster BEFORE any systems are re-booted into the VMScluster. If the correct kit is not installed on each system, shadow sets cannot be created. System crashes may also occur if the kits are not installed on all appropriate cluster nodes. The following restrictions will apply upon completion of the installation: o VMSclusters with shadowed SCSI disks and mixed-architecture VMSclusters running OpenVMS Alpha V6.1 must apply the kit and reboot the entire cluster simultaneously. In these cases, rolling upgrades are not supported. o Working configurations that contain SCSI shadow sets on dissimilar controllers may no longer work. References: ORACLE is a registered trademark of Oracle Corporation. WordPerfect is a trademark of WordPerfect Corporation.

This patch can be found at any of these sites:

Files on this server are as follows:

vaxshad09_061.README
vaxshad09_061.CHKSUM
vaxshad09_061.CVRLET_TXT
vaxshad09_061.a-dcx_vaxexe