SEARCH CONTACT US SUPPORT SERVICES PRODUCTS STORE
United States    
COMPAQ STORE | PRODUCTS | SERVICES | SUPPORT | CONTACT US | SEARCH
gears
compaq support options
support home
software & drivers
ask Compaq
reference library
support forum
frequently asked questions
support tools
warranty information
service centers
contact support
product resources
parts for your system
give us feedback
associated links
.
} what's new
.
} contract access
.
} browse patch tree
.
} search patches
.
} join mailing list
.
} feedback
.
patches by topic
.
} DOS
.
} OpenVMS
.
} Security
.
} Tru64 Unix
.
} Ultrix 32
.
} Windows
.
} Windows NT
.
connection tools
.
} nameserver lookup
.
} traceroute
.
} ping
OpenVMS__SHADOW VAXSHAD09_U2055 VAX V5.5-2__V5.5-2H4 ECO Summary

TITLE: OpenVMS__SHADOW VAXSHAD09_U2055 VAX V5.5-2__V5.5-2H4 ECO Summary Modification Date: 12-JUL-1999 Modification Type: DOCUMENTATION: Technical Modification Added note regarding V5.5-2HW. NOTE: An OpenVMS saveset or PCSI installation file is stored on the Internet in a self-expanding compressed file. The name of the compressed file will be kit_name-dcx_vaxexe for OpenVMS VAX or kit_name-dcx_axpexe for OpenVMS Alpha. Once the file is copied to your system, it can be expanded by typing RUN compressed_file. The resultant file will be the OpenVMS saveset or PCSI installation file which can be used to install the ECO. Copyright (c) Compaq Computer Corporation 1995, 1999. All rights reserved. NOTE: This ECO kit *CANNOT* be installed on OpenVMS VAX V5.5-2HW. Please upgrade your system to OpenVMS VAX V5.5-2 before attempting to install this kit. ****************** WARNING ********************* * * * **DO NOT** install this ECO kit on OpenVMS * * VAX V5.5-2HF. The system will become * * unbootable. Sustaining Engineering is * * currently researching this problem. * * * ************************************************** PRODUCT: Volume Shadowing for OpenVMS (Phase II) NOTE: The problems fixed in this ECO Kit also affect the following products: VAXcluster Software for OpenVMS VAX VAXcluster Console System (VCS) OP/SYS: OpenVMS VAX COMPONENTS: System, Bugcheck, Backup, Mount, Dismount, MSCP, TMSCP, MTAAACP, I/O Routines, Audit Server, Security, System Primitives, Adaptive Pool Management (APM), Operator Communication Manager (OPCOM), User Environmental Test Package (UETP) SOURCE: Compaq Computer Corporation ECO INFORMATION: ECO Kit Name: VAXSHAD09_U2055 ECO Kits Superseded by and Included in this ECO Kit: VAXSHADFT9_U2055 (Never Released) VAXSHADFT8_U2055 (Never Released) VAXSHAD07_061 (For OpenVMS VAX V5.5-2, V5.5-2H4, V5.5-2HF only) VAXSHAD06_061 VAXSHAD05_061 VAXSHAD04_061 VAXSHAD03_061 VAXSHAD01_061 (CSCPAT_1160) VAXSHAD02_060 (CSCPAT_1116) VAXSHAD01_060 (CSCPAT_1116) VAXSHAD08_U2055 (CSCPAT_0269, CSCPAT_1160) VAXSHAD07_U2055 (CSCPAT_0269, CSCPAT_1160) VAXSHAD05_U2055 (CSCPAT_0269, CSCPAT_1160) VAXSHAD04_U2055 (CSCPAT_0269, CSCPAT_1160) VAXSYS14_061 (For OpenVMS VAX V5.5-2, V5.5-2H4, V5.5-2HF only) VAXSYS16_U2055 VAXSYS15_U2055 VAXSYS14_U2055 VAXSYS13_U2055 VAXSYS12_U2055 VAXSYS11_U2055 VAXSYS10_U2055 VAXSYS09_U2055 PRCMGT$01_U2055 VAXSYS01_2H4055 (CSCPAT_1094) VAXSYSL04_U2055 VAXMONT01_061 (For OpenVMS VAX V5.5-2, V5.5-2H4, V5.5-2HF only) VAXMONT03_U2055 VAXMONT02_U2055 VAXMOUN05_U2055 (CSCPAT_1152) VAXMOUN04_U2055 (CSCPAT_0240) VAXMOUN03_U2055 (CSCPAT_0240) VAXMSCP08_U2055 (CSCPAT_1120) VAXMSCP07_U2055 VAXMSCP05_U2055 (CSCPAT_1068) ECO Kit Approximate Size: 2952 Blocks Kit Applies To: OpenVMS VAX V5.5-2, V5.5-2H4 NOTE: OpenVMS VAX V5.5-2H4 is a limited hardware release, shipped only with the new systems (or system upgrades) listed below. It is not separately orderable and will not be distributed via Consolidated Distribution. o VAX 4000 Model 100A o VAX 4000 Model 500A o VAX 4000 Model 600A o VAX 4000 Model 700A System/Cluster Reboot Necessary: Yes *** WARNING!!! *** Future OpenVMS VAX V5.5-2 kits that are issued for facilities included in the VAXSHAD09_U2055 kit will not install unless the VAXSHAD09_U2055 kit is installed on your system first. It is highly recommended that the complete VAXSHAD09_U2055 remedial kit be installed as soon as possible. Installation of individual images from the VAXSHAD09_U2055 remedial kit is not supported and could result in unpredictable system behavior. Descriptions for problems that were corrected in previous VAX Shadow kits are included in the VAXSHAD09_U2055 Release Notes. The Release notes can be found in the VAXSHAD09_U2055.A save set. If you have not installed a previous shadow kit, it is recommended that you read these release notes before installing the VAXSHAD09_U2055 Shadow kit. To access the release notes, restore them from the saveset by issuing a command with the following format: $ BACKUP/SEL=VAXSHAD09_U2055.RELEASE_NOTES - $_DEVICE:[DIR]VAXSHAD09_U2055.A/SA - $_DEVICE:[DIR]VAXSHAD09_U2055.RELEASE_NOTES If you have a mixed-architecture cluster, and have not previously installed a shadowing kit, you must install this kit on the VAX nodes as well as the applicable Alpha version of this kit on Alpha nodes of cluster BEFORE you bring up both types of systems in a cluster again. If both kits are not installed, you may not be able to create shadow sets. If you have previously installed a shadowing kit then you do not need to install the Alpha version of this kit at this time as long as the shadowing kit installed on the Alpha nodes of the cluster is ALPSHAD04_061 or later. Working configurations that contain SCSI shadow sets on dissimilar controllers may no longer work. CAUTION: Before Installing this Kit, Read the Following Cautions: After installation of this kit, the following issues may occur: 1) ISSUE: When a node reboots into the cluster there may not be an OPCOM message that reports the node is joining the cluster. Absent messages occur on a random basis. WORKAROUND: In order to verify the node has entered the cluster, after the node has fully rebooted, the user should enter the command: $ SHOW CLUSTER to verify the node is a valid member of the VAXcluster. 2) ISSUE FROM THE CSC: An INVEXCEPTN in SNDRIVER may be seen if DECnet/SNA V2.1 is used in conjunction with the IO_ROUTINES from the VAXSHAD ECO kit. SNAVMS_E04021 (CSCPAT_5041) will fix this problem by replacing the incompatible SNDRIVER in DECnet/SNA V2.1 NOTE: SNAVMS_E04021 applies to DECnet/SNA V2.1 only. 3) After installation of this ECO kit on an OpenVMS VAX V5.5-2x system, MAIL, REPLY, or any process that uses the $BRKTHRU system service may hang in MUTEX waiting for more BYTLM that it needs or has available. Looking at the process from SDA will show it is waiting for significantly more BYTLM (R1) than the process has left (BUFIO byte count/limit), and significantly more than the message it is trying to output. The workaround for this is to make BYTLM larger. 4) This ECO kit should *NOT* be installed on an FT810 system running OpenVMS VAX V5.5-2HF. If it is installed, the system will not reboot. These issues are being addressed and will be corrected in a future version of OpenVMS VAX. ECO KIT SUMMARY: An ECO kit exists for Volume Shadowing on OpenVMS VAX V5.5-2 and V5.5-2H4. Problems Addressed in the VAXSHAD09_U2055 kit: o A 'SET SECURITY' or 'SET ACL' command issued on volumes in a cluster places high I/O on the server process. This exhausts paged pool and the AUDIT_SERVER goes into an RWPAG state. This problem is corrected in OpenVMS VAX V6.2. o A field in the IRP that is used during Volume Processing is not initialized in clones of USER IOs. If an error occurs, the code that determines the severity of the error can be misled by data in these fields. The code can fail to locate the error and may return the IO as successful. Since a zero byte count is also returned, a user would see an Incomplete Segmented Transfer error. The fix is to initialize the field when the clone is allocated. o While creating a page, a user process may be swapped out and returned with a different balance set slot. This problem is corrected in OpenVMS VAX V6.2. o Listings may be difficult to read due to varied formats and misleading or missing comments. This problem is corrected in OpenVMS VAX V6.2. o Certain applications that call $AUDIT_EVENT with ASTs turned off will be interrupted when $AUDIT_EVENT returns to the caller. This problem is corrected in OpenVMS VAX V6.2. o The code relies on a page being present when it attempts to release a spinlock. If the system is paging heavily, the page may not be available. This problem is corrected in OpenVMS VAX V6.2. o Repeating wakeups from $SCHDWK show an accumulating drift over time. This problem is corrected in OpenVMS VAX V6.2. o COPY and/or BACKUP of a DISK to a TMSCP-Served tape, will fail when the tape device is placed in an MV state. The failure does not occur if the same task is performed locally. COPY will fail with: "SYSTEM-F-TAPEPOSLOST, magnetic tape position lost" BACKUP will fail with: "-SYSTEM-F-DATALOST, data lost" This problem is corrected in OpenVMS VAX V6.2. o To transition an OpenVMS process from the virtual balance set to the real balance set, the SPTEs (system page table entries) which describe its process PTE pages (process page table pages) need to be copied from saved memory back into the real balance slot from where they originally came. This makes the process' P0 and P1 space accessible again. SPTEs for the process page table pages describing the undefined area between P0 and P1 must be represented by pre-initialized null values (actually, ERKW DZERO-type values). When this undefined void area is exactly zero pages (i.e., P0 and P1 are tangent), the VBSS$READ_OPT2_VBSM routine takes the wrong branch, causing a VBSSERR bugcheck. This fix adds a test for this case, and takes the image's correct branch. This problem is corrected in OpenVMS V6.2. o When a process is switched from a real balance slot to a virtual balance slot, the allocation may fail. This causes a VBSSERR bugcheck. This problem is corrected in OpenVMS VAX V6.2. o The quota value may be incorrect when process quota (bytlm) is returned to a process for a system global section. This problem is corrected in OpenVMS VAX V6.2 o System crashes may occur due to corrupted PTE entries. The corruption appears to be Global Section Table Entries pointing to Global Section Descriptors. The problem occurs only if 4095 GBLSECTIONS are exceeded. To check the number of Global Sections currently in use, add the following values: - SDA> VALIDATE QUEUE EXE$GL_GSDSYSFL !global sections - SDA> VALIDATE QUEUE EXE$GL_GSDDELFL !delete pending global sections - SDA> VALIDATE QUEUE EXE$GL_GSDGRPFL !group global sections o Devices can remain allocated to processes that no longer exist. The device remains unusable until the system is rebooted. o If a previously shadowed disk is mounted with a MOUNT/OVER=SHADOW command and a new shadow set is created using this disk, OpenVMS VAX will attempt to create the old shadow set using the old physical device names. o The system may crash with a NOBVPVCB bugcheck. The crash occurs on the kernel stack with MTAAACP.EXE as the current image. o The system may crash with an XQPERR bugcheck while dismounting a MAD drive. o SUBTRACED errors are not correctly determined for images installed /HEADER_RESIDENT. This problem is corrected in OpenVMS VAX V6.2. o Users of ORACLE [R] Rdb V6.1 may get ILLIOFUNC errors when performing IO to a Host-Based Shadowset whose members are served. o The user will see a large number of the shadow copies being done by OpenVMS rather than the controller, even when both disks are on the same controller and the controller has DCD (Disk Copy Data) capabilities. o If a three-member Shadowset has its index zero member as a copy target and all three members also require a merge, then when the copy completes, the merge does not take place. The LBN for the just completed copy (the last LBN on the disk) is passed as the MERGE starting LBN, so it completes without doing any IO. o Failures to start copies or restart copies may occur, usually after a node halt, shutdown or reboot. Additional symptoms observed include inconsistent values for HBS_CIP when compared to SHADOW_MAX_COPY, negative values for HBS_CIP and copies that should continue started over from the beginning. o System hangs occur when IOs that are pending to a shadow set do not complete. o UCB$L_MAXBCNT appears to be invalid for a shadowed disk. Problems addressed in the VAXSHAD07_061 Kit: o In the VAXSHAD05 and VAXSHAD06 kits two new fields were added to the IRP data structure for shadow write logging information. This new IRP definition size conflicts with the IRP sizes of other images on the system that are not part of the SHADOW kits. This conflict may cause a variety of errors, including fatal bugchecks. This fix changes the IRP definitions back to the SBB versions and adds some special definitions to the SHDRIVER for the new IRP fields. o Fatal bugchecks from data structure corruption may occur due to the addition of the value 10 HEX to the corrupted field. Crashes are of various types and include node and cluster crashes, crashes due to invalid UCB addresses, invalid VCB addresses, invalid member IDs, and invalid number of devices. Problems Addressed in the VAXSHAD06_061 Kit: o When using PATHWORKS, data corruption may occur on the file container. The corruption can be seen by running CHKDSK on the PC container disk. Also using PCDISK to IMPORT and EXPORT files to and from the container will show a corrupted file when EXPORTed back to VMS. o System crashes occur with INVEXCEPTN bugcheck at SCH$POSTEF+21. To correct this problem, a change was made in the IOC$SIMREQCOM routine to cause the destination of the IFNOWET test to initialize R4 before calling the IOC$SCHEDEF routine. IOC$SCHEDEF expects R4 to have the address of the user's PCB. Problems Addressed in the VAXSHAD05_061 Kit: o After a node crashes, it cannot mount a Host-Based Volume Shadowing virtual unit on reboot. The error message usually returned is "volume not software enabled"; however, "Medium Offline" may also be seen. A SHOW DEVICE will show that the the Shadowset is in 0% merge but SDA will show that a minimerge is pending. o A double deallocation crash may occur as the result of MOUNT not properly initializing the Mounted Volume List pointer. This pointer may have a stale value as a result of two calls to SYS$VMOUNT from a single program. The stale pointer will only cause a problem if the system is unable to allocate space for defining the logical name. NOTE: Since cells are initialized at image activation, this problem should not occur as a result of DCL commands. o Tape devices with stacker/loaders, such as the TF857, may take up to 6 minutes to rewind/unload/load the next tape. In VAXSHAD01_061, a change was made to the behavior of MOUNT to take this delay into account. However, a side effect of that change was that non-stacker drives may also wait 6 minutes before failing. o A system may crash with an INVEXCEPTN during an SHDRIVER COPY_DATA_REPAIR copy operation. o If the value of the ALLOCLASS SYSGEN parameter is not set and the user tries to use shadowing, a shadow volume can be created but members cannot be added to the shadow set. No error messages are received up until a second member is added. On the MOUNT command, the customer will receive the error messages: $ mount /system dsa500 /shadow=dkb400 alphavms015 %MOUNT-I-SHDWMEMFAIL, DKB400 failed as a member of the shadow set -SYSTEM-F-INCSHAMEM, incompatible shadow set member "Incompatible" is an inappropriate statement of the problem. A more accurate message would be "missing allocation class," or "incorrect allocation class." o If a shadow set member is dismounted at the same time from multiple nodes within a cluster, I/O to that shadow set may become stalled. o Mount will not add shadow set members unless they are either MSCP or SCSI. o Shadow set member expulsion was based on the time it took a fork & wait and a PACKACK to complete rather than the actual time transpired. On some devices, particularly SCSI, where a PACKACK can take approximately one minute, the timeout was much too long. Using the default value of 20 (seconds) for SHADOW_MBR_TMO would actually mean that it would take 20 minutes to expel a member experiencing errors from a SCSI shadow set. o SHDRIVER loss of synchronization may result in a crash where SHADDETINCON is triggered by the check at the end of MATCH_MASTER_SCB. In this consistency check, the SHAD$W_DEVSTS_PASSIVE_MV_CNTR is verified to be zero and is not. Another symptom is that the virtual unit UCB$W_RWAITCNT is zero. Shadow set member counts of zero may also be seen. o Crashes may occur in EXPEL_PACKACK_ANY with connections broken to all members and IRP$L_SHD_LOCK_FR5 = 1 (packack retries exhausted). o All members of a shadow set become inaccessible at the same time and remain inaccessible for a period of time greater than "shadow member timeout" (SHADOW_MBR_TMO or SHADOW_SYS_TMO) seconds but less than MVTIMEOUT seconds. All members subsequently become accessible within seconds of each other but not at exactly the same time. This results in all but one member being expelled from the shadow set. This often occurs when changing HSJ microcode and all members are connected to the same HSJ. When brought back online, polling will cause the devices to be found seconds apart which will result in all but one member being expelled. o All members of a shadow set must be checked to see if they meet the criteria of being MSCP. The original design did not allow for having no index zero member. o When the mounting of full copy targets exceeds the SHADOW_MAX_COPY threads for a given node, other nodes with the shadow set mounted do not pick up the copy work. o In a cluster, using $PROCESS_SCAN explicitly or implicitly with the DCL 'SHOW USER' command sometimes causes a system crash due to an ACCVIO in kernel mode or an IVSSRVRQST bugcheck. o When a node with a SCSI bus boots, it resets the SCSI bus. In a multi-host SCSI cluster, this can cause the other node to experience I/O failures. Normally, this results in a brief mount verification. The I/O is retried, succeeds, and there is no serious consequence. However, if the other node is in the process of booting and the system disk is a shadow set, the system will crash. o A PGFIPLHI bugcheck may occur in the SHADOW_SERVER process at the REMQUE in K_GET_COPYSHAD_IRP. On OpenVMS VAX, the PC is A0E and the VA is 274. o A page setup module which draws a frame and company logo on each page of output is used on a queue pointing to an LN03. This page setup module works on OpenVMS Version VAX 5.5-2 and prior versions. However, with VAXQMAN8_U2055 (CSCPAT_1165) or OpenVMS VAX Version 6.1 installed, this page setup module causes the printer to continually spew out paper with only the output from the page setup module. This continues until the entry is deleted from the queue. o Due to an inadequate synchronization mechanism, the MONITOR DISK command can go into an infinite loop on multi-processing machines. o A race condition may occur in a VMScluster. This happens most frequently on clusters where the 'SET AUDIT/SERVER=NEW' command is issued repeatedly. The race condition presents itself as one or more of the audit servers within the cluster continuing to use the old audit journal rather than using a newly created journal. o A system may crash with a PGFIPLHI bugcheck with a "PAGE FAULT at IPL too high" error message. Problems Addressed in the VAXSHAD04_061 Kit: o When booting two or more systems simultaneously from shadowed system disks, the systems may appear to hang. Crashing the systems and examining the crash dumps indicates that shadowing driver blocking AST routines have not run. o When a node runs out of SHADOW_MAX_COPY threads while mounting new copy target units, other nodes in the cluster that have available SHADOW_MAX_COPY threads will not pick up the copy work. This results in the copy not being started for copy members that are added to shadow sets. Problems Addressed in the VAXSHAD03_061 Kit: o A double-deallocation crash may occur as the result of MOUNT not properly initializing the MTL pointer. This pointer had a stale value as a result of 2 calls to SYS$VMOUNT from a single program. The problem will not happen as a result of DCL commands, as the cells are initialized at image activation. The stale pointer will only cause a problem if the system is unable to allocate space for defining the logical name. o OPCOM message was being output even though /NOASSIST was specified in the MOUNT command. This caused problems for UETP. o When booting from a Controller-Based System disk for the first time as a Host-Based System disk, boot fails and a SHADBOOTFAIL Bugcheck occurs. A SHADBOOTFAIL will also occur if the SHADOW_SYS_UNIT is changed at boot time. o During a copy operation the system may crash with an ACCVIO. o Reduce the volume of messages printed during SHDRIVER volume processing to make the messages that are printed more meaningful to the user. This involves minor modifications to SHDRIVER to suppress messages that do not indicate actual problems. No messages have been modified, deleted, or changed. Only the frequency with which they are printed has changed. o The path selection logic for DUDRIVER had a timing problem that caused devices to be mounted by an MSCP server, even though a local controller could be used. Although this symptom could still appear under extreme circumstances, the majority of devices should now find the local controller. o In a large LAVC (Local Area VAXcluster) after one or more nodes leave the cluster, state transition times can be excessive and, the following messages may be repeatedly sent to the consoles of the various nodes: %CNXMAN, proposing reconfiguration of the VAXcluster %CNXMAN, aborting VAXcluster state transition The state transition, which normally should complete within 1-3 seconds, instead may take 15-55 seconds or more. o Incorrect MSCP-served disk synchronization, would cause I/O to an MSCP-served disk to get stalled on an internal queue and later restarted. o An internal routine, MOVE_SERVER, had a sequencing problem and could cause stalled I/O to a served shadow-set member. o MSCP server crashes may occur in large clusters. Problems Addressed in the VAXSHAD01_061 Kit: o A delay of up to six minutes can occur before a device-not-ready condition is reported during cartridge volume switching on non-SCSI (Small Computer System Interface) TX867-type devices. o Some of the OpenVMS VAX console executive messages have changed to mixed upper and lower case letters for OpenVMS VAX V6.0 message text. The result is that current VCS scan files will not match the console text, and VCS alarms will fail to trigger. (Please see the ECO kit release notes for more information and instructions regarding this fix.) o There is no synchronization between SHADOW_PROCESSING and INVALIDATE_ALL_ENTRIES, which allows these two code threads to run simultaneously. This can cause a system crash due to the fact that the SHADOW_PROCESSING thread may remove a member from a multimember shadow set and the INVALIDATE_ALL_ENTRIES thread is not aware that the member has been removed. The system crash occurs in RESTORE_WLE because no Write Log table exists. o When shadow set members are not available for SHADOW_MBR_TMO seconds, they should be expelled from the shadow set. Sometimes when two members of a three-member set enter this condition, only one member will be successfully removed. The other member will not be removed, and this will cause the virtual unit to hang until the errant member returns or it is manually removed from the set via push button. o In Volume Shadowing for OpenVMS Alpha Version 6.1, several changes were made to the assisted merge (minimerge) functionality. These changes disabled mimimerge functionality across mixed architecture VMSclusters. With minimerge disabled, shadowing continued to function normally, except that a full merge was always done when a merge operation occurred. Full merges take considerably longer than minimerges. If minimerge functionality is desired, Digital recommends that this kit be installed across any VMSclusters that contain an Alpha node running OpenVMS Alpha Version 6.1. Mixed-architecture VMSclusters that are running OpenVMS Alpha Version 6.1 must apply this kit and reboot the entire cluster simultaneously. In these cases, rolling upgrades are not supported. o Prior to this remedial kit, if attempts were made to mount an RZ28B disk device with an RZ28 in the same shadow set, Volume Shadowing detected different device IDs and may not have allowed the devices to be mounted. This behavior applied only an RZ28/RZ28B shadow-set combination when connected with a local SCSI controller. Since RZ28 and RZ28B are different device types but can be shadowed, the checking for shadow-set membership in the host-based shadowing software needed to be modified. This remedial kit enables the combination of RZ28 and RZ28B devices in a shadow set, as long as they are connected to like controllers. With the use of SCSI devices, like controllers are required because geometry can vary from controller to controller. Digital recommends that SCSI shadow sets be configured across like controller types. Existing SDI and DSSI configurations are unaffected; if they are not using SCSI drives and are shadowing SDI devices across different controllers, these configurations will continue to work without this remedial kit. VMSclusters with shadowed SCSI disks and mixed-architecture VMSclusters running OpenVMS Alpha Version 6.1 must apply the kit and reboot the entire cluster simultaneously, so that the entire VMScluster is running the same version of Volume Shadowing software. The kit is required for both VAX and Alpha nodes. Do not mount shadow sets containing RZ28 and RZ28B devices without first applying this kit. o If a Shadowset Virtual Unit is dismounted during a full copy, the full copy target's SCB is incorrectly written. This allows a subsequent mount of that shadow set member to succeed as if the copy had completed. o System crashes may occur in RESTORE_WLE because there is no Write Log table. In fact, a member has been removed from the set. This problem is similar but different from the DU/SH Synch problem that causes the same symptom. o When members are not available for SHADOW_MBR_TMO seconds, and other members are available, the unavailable members should be ejected from the shadow set. In certain configurations, with the current version of the driver, should two members of a three member set enter this condition, only one member will be successfully removed. The other member will not be removed and the virtual unit will hang until the errant member returns, or it is manually removed from the set via push button. This behavior has been fixed in this kit. Any members that remain unavailable for greater than SHADOW_MBR_TMO seconds will be fully expelled from the set. o Device not ready for magtapes was not reported until a delay of up to 6 minutes expired. Problems Addressed in the VAXSHAD02_060 Kit: o A SHADDETINCON was caused by the X-64A1 check in, because the wrong GPR was used when an unlikely system address was stored into the IRP. Problems Addressed in the VAXSHAD01_060 Kit: o If SHDRIVER encounters a situation where more than one member of a three member shadow set go into error recovery at the same time, and they cannot be brought back into the shadow set (i.e., loss of connectivity, media offline, write locked device, etc.) SHDRIVER will expel one of the members and crash with a SHADDETINCON when it cannot update the SCB on the remaining members. o When all shadow set members are write locked, a bugcheck will occur due to R4 being destroyed across a JSB to SHSB$GET_CLEAN_IRP. This fix preserves that register. o The SHADOW_MAX_COPY SYSGEN parameter is used to set how many merge/copy threads may be started at the same time on a node. This was not working. Systems would start more than SHADOW_MAX_COPY number of threads. o SHdriver system disk member timer issues and R2/R5 corruption problems: 1. The SHSB$MATCH_MASTER_SCB routine makes improper use of SHSB$PAUSE. The use of SHSB$PAUSE causes the SHAD (in R2) not to be preserved when the time delay is invoked (since it forks), so the resulting value in R2 is indeterminate. 2. The SHSB$MATCH_MASTER_SCB routine makes improper use of SH$TIME_DELAY. An input requirement of SH$TIME_DELAY is to have a UCB in R5. 3. The SH$ABORT_VP routine makes improper use of SH$TIME_DELAY. The use of SH$TIME_DELAY causes the SHAD (in R2) not to be preserved when the time delay is invoked (since it forks), therefore the resulting value in R2 is indeterminate. 4. The benefit of reassembling a multiple member system disk shadow set is lost to some configurations if the current fixed amount of time expires and all of the former members of the shadow set are not available. This has caused escalations to be raised to address this specific behavior. Second, enable the differentiation of the member time out time for system disk versus other disks. Last, make the currently hardcoded wait of FF seconds to connect to all members of an existing system disk a user-controlled variable. o SHdriver MVTIMEOUT after member error and R5 corruption problems: 1. The spontaneous removal of one shadow set member of a multiple member set due to a fatal error causes some cluster nodes to hang the virtual unit until the MVTIMEOUT time expires. 2. In SHSB$VALIDATE_SHADOW_SET, the wait loop at 130$ does not correctly restore the contents of R5 to be the VU VCB after a call to SHSB$PAUSE. o WLE_POST_PROC is not done on all the clones. This causes allocation of new unnecessary Write Log Entries. The Write Log INUSE bit is never cleared so the table has to be expanded. Once the table expands to MAX, Write Logging is disabled. When Write Logging gets turned back on it starts all over. All the entries in the controller will be exhausted forcing Write Log Exhaustion handling and in some cases the controller will be reset. o While doing INVALIDATE_ALL_ENTRIES if the READ of LBN #1 fails or WLG has been turned off, a branch goes to the wrong location. This results in issuing an IO with no READY clones and the system will wait forever with SEQCMD lock held and RWAITCNT bumped. Problems Addressed in the VAXSHAD08_U2055 kit: o After installation of CSCPAT_0269 V2.7 (VAXSHAD07_U2055), the system may crash with a SHADDETINCON bugcheck. The bugcheck occurs when a disk is removed from a mounted shadow set. Problems Addressed in the VAXSHAD07_U2055 kit: o Write Log Usage fixes: 1. The first problem symptom is that user I/O to a virtual unit may intermittently hang on any node (usually only one) that has a multiple member virtual unit mounted. The hang can occur with no other overt error symptoms evident in either the error log or as seen by analyzing the live system. 2. The second problem symptom is less apparent, in that the resources used for the write history management function are managed in a more efficient manner. o When one member of a multiple-member shadow set encounters a fatal device error, the node that discovers the initial problem will successfully expel that device from the set. However, other nodes that are under heavy I/O loads when the device is expelled may occasionally fail to recover the full membership. This will cause the virtual unit to hang until the MVTIMEOUT time limit is reached. Problems Addressed in the VAXSHAD05_U2055 Kit for OpenVMS VAX V5.5-2 o A documentation change was made to the VAXSHAD04_U2055 kit to remove an incorrect reference. Problems Addressed in the VAXSHAD04_U2055 Kit for OpenVMS VAX V5.5-2: o When a host receives a controller error, Volume Shadowing Phase II processing removes whatever device is at SHAD index 0 even if this member was not the one that experienced the controller error. Once the index 0 member is gone, all other controller errors are ignored. o The ability to switch to the current master member of a system disk shadow set has a limited configuration of controller/adapter types. Crash dumps that were correctly written (according to console output) cannot be found for analysis when using an HBVS multiple-member system disk shadow set. o Applications can hang or experience I/O transfer errors when using multiple-member shadow sets that are connected in such a way that segmented I/O transfers are needed. This has been reported on systems running WordPerfect[TM]. o AN INVEXCEPTN crash can occur if the allocation of a clone chain fails to successfully allocate. If a FANOUT_ALLOCATION_XXX request fails, the MIRP is still linked to the active queue which causes the next REMQUE to fail. o If a system that currently holds the WATCHER lock crashes while it is validating the status of a Host-Based Volume Shadow Set that is mounted cluster-wide and another node assumes the WATCHER lock, an IPL 8 system hang can occur. o A SYSDUMP.DMP file that appears to be written correctly can be invalid when the boot device and the master member of the system disk shadow set diverge. The device that the system dump is written to has always been the boot device. The SHDRIVER.EXE in this kit allows the system dump to be written to a member of the system disk shadow set other than the boot device. Upon successful write completion, the unit number will be displayed on the console. o The VAX 7000 had been restricted to using the boot device as the only valid dump device in a prior remedial image. Additionally, proper operation was not allowed at shutdown or when a crash dump needed to be written because an incorrect message was sent concerning the path to the system disk. o Occasionally, all of the former members of a system disk shadow set will not return upon a system reboot. This problem will occur only if the virtual unit is not otherwise mounted in the cluster at boot time. o Under certain conditions, once a virtual unit exceeds the mount verify time-out time, the correct behavior is not accomplished. Indeterminate behavior occurs due to use of a corrupted SHAD pointer because fork context requirements are not observed. o If a node is booting into a cluster and the boot device being used is already mounted in the cluster as a member of a virtual unit with a different virtual unit number, the node is incorrectly allowed to continue to boot into this cluster. o Under certain circumstances, the SHADOW_MAX_COPY SYSGEN parameter does not regulate the number of copies a particular node will control. This effectively nullifies the significance of setting any value in SHADOW_MAX_COPY. o Configurations that consume a great number of event flags and create a large number of multiple-member shadow sets (i.e., greater than 50) may experience a system crash. o Inadvertent placement of the SCB (System Control Block) can adversely affect the best time calculation needed for a full merge operation. This will, in turn, adversely affect the total time it takes to perform the full merge operation. o If enough write I/O operations to cause the write log table to go beyond its expansion limit of 4K are issued to a multiple-member shadow set that has write logging enabled, the set may hang. This condition can occur with no evident error symptoms. o If a member of a three-member shadow set loses its connection for SHADOW_MBR_TMO time, a decision to remove that member is initiated. Should either of the remaining members not be able to complete an SCB (System Control Block) update, the removal operation may occasionally result in a SHADDETINCON crash. o When a multiple-member system disk shadow set is in use and a number of nodes are rebooted at the same time, sometimes the path to one of the non-boot device members is used before it has been properly initialized. This causes a race condition which may result in an MSCPCLASS bugcheck. Problems Addressed in the VAXSYS14_061 Kit: o There is a race condition that may occur when a CFCB (Cache File Control Block) is being deleted due to XQP action and cache space is being reclaimed from a LIMBO file. o Disk corruption can occur when heavy open/read/write/close/delete operations are occurring. o At some point after a node CLUEXITs, 2 or more cluster nodes crash with LOCKMGRERR Bugchecks. o When two or more VAX or Alpha nodes boot at the same time, one or more of them may crash. Problems Addressed in the VAXSYS16_U2055, VAXSYS15_U2055, and VAXSYS14_U2055 Kits: o Two new fields were added to the IRP data structure for shadow write logging information. This new IRP definition size conflicts with the IRP sizes of other images on the system. This conflict may cause a variety of errors, including fatal bugchecks. This fix changes the IRP definitions back to the SBB versions. This problem is corrected in OpenVMS VAX V6.2. Problems Addressed in the VAXSYS13_U2055 Kit: NOTE: According to OpenVMS Engineering, the fixes contained in VAXSYS13_U2055 have been included in OpenVMS VAX V7.0. o System crashes may occur due to corrupted PTE entries. The corruption appears to be related to Global Section Table Entries pointing to Global Section Descriptors. The problem occurs only if 4095 GBLSECTIONS is exceeded. To check the number of Global Sections currently in use add the following values: o SDA> VALIDATE QUEUE EXE$GL_GSDSYSFL !global sections o SDA> VALIDATE QUEUE EXE$GL_GSDDELFL !delete pending global sections o SDA> VALIDATE QUEUE EXE$GL_GSDGRPFL !group global sections Problem addressed in the VAXSYS12_U2055 kit: o Due to an inadequate synchronization mechanism, the MONITOR DISK command can go into an infinite loop on multi-processing machines. Problem addressed in the VAXSYS11_U2055 kit: o The system crashes with a PGFIPLHI bugcheck and the message "Pagefault at IPL too high". The VA is pointing to a CCB (Channel Control Block) and the PC is located within the MBDRIVER module. Problem Addressed in the VAXSYS10_U2055 Kit: o Performance may be degraded due to excessive kernel mode time being spent in MMG$FREWSLE attempting to find a working set page to replace. Problems Addressed in the VAXSYS09_U2055 Kit: o In a small working set, it is possible for the EXE$PSCAN_NEXT_PID routine (which is called by $GETJPI) to take a page fault at IPL 8. This causes a PGFIPLHI bugcheck. The page referenced is in the PROCESS_SCAN context block (PSCANCTX$ data structure) in process virtual address space. o The $SETIMR and $SCHDWK system services which request timer interrupts may cause a system to hang. This occurs when a time already passed is specified for a wake to occur. Problems Addressed in the PRCMGT$01_U2055 Kit: o A system crash may occur at POSIX$KERNEL+3B371 with POSIX$DCL as the current image. The crash is provoked when a user logs in with /CLI=POSIX$CLI. The DCL command may cause the system to crash, or the process to evaporate. Occasionally, the crash will occur following a few carriage returns. This problem is corrected in OpenVMS VAX V6.0. o Fixes for various problems in $GETJPI (ECO 15): The following problems have been reported in the $GETJPI and $GETJPIW system services (executive routine EXE$GETJPI): · Process hangs while waiting for $GETJPI to complete A process might wait forever in LEF state while attempting to retrieve an item which required access to another process's P1 space. While this kit includes changes which fix some instances of this problem, there is the possibility it may still occur. Should the problem persist after installing this kit, one may work around the hang by revising the application and adding a timer request AST and recovery routine. For more information, refer to an article in the OPENVMS database using a search string of: Application and $GETJPIW and $GETJPI and Hang · SSRVEXCEPT bugchecks There were several instances where EXE$GETJPI would try to access data structures formerly assigned to a now deleted process. Most frequently, the problem showed up as an access violation at EXE$GETJPI+712 while trying to retrieve the external PID. · PGFIPLHI bugchecks This involved another instance of access to the former data structures of a deleted process. In this case though, EXE$GETJPI attempted recovery. The recovery was incorrect and would lead to unreleased spinlocks, high IPL access to paged code and other problems. · KRPEMPTY bugchecks This was yet another instance of access to a deleted process. If the process was selected by a "wildcard" PID, EXE$GETJPI would attempt to allocate an entry from the KRP lookaside list without having released a previous entry. · Stack corruption The kernel stack could be corrupted if the target process of a $GETJPI request was out of AST quota. · Incorrect AST quota AST quota could be gained or lost on an SMP system because of access via non-interlocked instructions. · Final status of 0 in R0 A user could get a final status of 0 in R0 if the PHD of a target process was swapped out. These problems are corrected in OpenVMS VAX V6.0. Problem addressed in the VAXSYS01_2H4055 kit: o VAX 4000 Model 100A, 500A, 600A and 700A will no longer be able to boot via the Q-bus after installation of DECnet/OSI V5.5 or V5.6. These versions of DECnet/OSI eliminate code for support of new hardware in OpenVMS VAX V5.5-2H4. Problems Address in the VAXSYSL04_U2055 Kit: o The PE1 parameter which was previously used to control the size of trees to be remastered has been changed. If a negative value is placed in the parameter, the RRSCAN routine will exit without doing any scans or remastering. o When the system is scanning for trees to remaster due to a change in cluster membership, RM_QUOTA may be exhausted. When this occurs, possible RSB queue corruption may result. o The system crashes with a LKBREFNEG bugcheck when a parent sub lock count exceeds 32K on a $DEQ. o The system crashes with a RSBREFNEG bugcheck when a parent sub resource count exceeds 32K on a $DEQ. o During dynamic remastering, performance is degraded when large lock trees are moved. o The LKID_MSK routine which is used to mask off the LKID (Lock ID) from the SEQN is incorrectly generated in DSTRLOCK. This can cause the LKID Validation Routines to incorrectly indicate that a LKID is invalid. o Locks are sometimes granted out of order during remastering of a resource. o The "Recover" privilege is not being correctly checked. This prevents recovery processing from recovering databases after node failures. o When the resource for a two phase conversion in progress is canceled, a fatal bugcheck will occur if the resource's BLOCKAST count is invalid. o The activity scan rate of the Lock Manager has been changed from 1 second to 8 seconds to reduce Lock Manager overhead and make the tree moving algorithms more conservative. Problems Addressed in the VAXMONT01_061 kit: o When the 'MONITOR DISK' command is issued on a system with DFS devices mounted, only the first three characters of the DFS disk name are displayed correctly. The last character is often displayed as a non-printable character or as an escape sequence. This may cause terminal lock-ups, resetting of terminal characteristics or other unexpected terminal side effects. o The 'MONITOR DISK' command may appear to hang when monitoring a system with more than 800 disks. An error occurs, but the error status is not displayed. The hang may also occur when a MONITOR CLUSTER command is issued. o Due to an inadequate synchronization mechanism, the 'MONITOR DISK' command can go into an infinite loop on multi-processor machines. o Use of the 'MONITOR PROCESS' command in a local environment will fail if the SYSGEN parameter MAXPROCESSCNT is set to allow more than 1040 processes. When Virtual Balance Slots were added in OpenVMS V6.0, this number dropped to 978. o In a mixed version OpenVMScluster, the following MONITOR command will crash the target V6.0 node if it is issued from a V5.5-2 node: $MONITOR STATES,POOL,DECNET,LOCK /NODE=V6.0_node Problem Addressed in the VAXMONT03_U2055 Kit: o The image to correct the MAXPROCESSCNT problem should have been included in the VAXMON02_U2055 kit. It was not. Problems Addressed in the VAXMONT02_U2055 Kit: o An error occurs following the use of the following MONITOR command: $ MONITOR [CLASS] /NODE={nodelist} The error indicates that the connection to a remote node has been lost and the collection activity terminates for that node. o The MONITOR process class will not function if the SYSGEN parameter MAXPROCESSCNT is larger than 1040. The following errors will be returned: %MONITOR-E-COLLERR, error during data collection -SYSTEM-F-BADPARAM, bad parameter value Problem Addressed in the VAXMOUN05_U2055 Kit: o A delay of up to six minutes can occur before a device-not-ready condition is reported during cartridge volume switching on non-SCSI (Small Computer System Interface) TX867-type devices. Problems Addressed in the VAXMOUN04_U2055 Kit: o RE-INITIALIZATION errors are reported to users of SCSI tape drives attached to an HSx controller. This occurs if multiple SCSI tapes are attached to the HSx and all the tapes are at or near PEOT and the connection to the HSx is broken. o A tape drive will sometimes fail over to another HSx controller after the tape is dismounted. o Numbers greater than 9999 which are randomly generated by HSx devices may cause the system to crash. o Packet Acknowledgements (PACKACK) issued on client nodes that are using a specified preferred path will fail if the specified path is not the current primary path and the path cannot be changed because the disk in online through another path. o In Controller Based Shadowing, mounting a disk named DUx or a tape named MUx causes the following error message to appear: %MOUNT-W-CBSNOTSUPTD, Attention - Phase I Shadowing is not supported as of OpenVMS VAX V6.1 %MOUNT-I-MOUNTED, SCRTCH mounted on _$5$MUA0: (MOOSHEAD) This error message should only appear when an attempt is made to mount a DUS device. o A user is unable to read the second volume of backup tapes written under OpenVMS V5.3. However, the tapes can be read successfully on OpenVMS VAX V5.5-1. o If a logical is specified on a MOUNT shadow set command line and this logical has the same name as one of the shadow set members, then the following command sequence will fail with an INCONSDEV mount error which will cause a system crash: $ MOUNT/SYSTEM DSA0/SHADOW=$1$DIA0: TWI_TEST $1$DIA0 %MOUNT-I-MOUNTED, TWI_TEST mounted on _DSA0: %MOUNT-I-SHDWMEMSUCC, _$1$DIA0: (SPRING) is now a valid member of the shadow set $ MOUNT/SYSTEM DSA0/SHADOW=$1$DIA1: TWI_TEST %MOUNT-F-INCONSDEV, inconsistent device types o If no operator is present to respond, MOUNT within a subprocess will fail with the Following message: %MOUNT-F-BATCHNOOPR, No operator available to service batch request o MOUNT causes an implicit allocation of a device (i.e., a channel is opened to the device) to a child process to change the ownership of the device to the parent process on a dismount. A subsequent mount of the device by the child process will fail because the device is now allocated to the parent. o The new message "Another Volume Set of the Same Label is Already Mounted" has been added. o If a tape device does not support compaction, then the MOUNT/FOREIGN/NOCACHE command mounts the device with CACHE ENABLED. o MOUNT only waits 10 seconds to allow SCSI magtape devices to become ready before determining that the device is off line. Tx8x7 tape devices may take up to 6 minutes to become ready during a volume switch. o MOUNT is unable to skip a number of records greater than 8000 hexadecimal when it tries to reposition tapes after a label verification in mount verify. o A tape initialized with the following command will not be mounted if the user is not the owner, even if all privileges are enabled (i.e., user is SYSTEM): $ INITIALIZE/LABEL=(VOLUME_ACCESSIBILITY:"%")/OWNER=[100,100] - /PROTECTION=(S:RWED,O:RWED,G,W)



This patch can be found at any of these sites:

Colorado Site
Georgia Site



Files on this server are as follows:

vaxshad09_u2055.README
vaxshad09_u2055.CHKSUM
vaxshad09_u2055.CVRLET_TXT
vaxshad09_u2055.a-dcx_vaxexe
vaxshad09_u2055.CVRLET_TXT

privacy and legal statement