ECO NUMBER: ALPSHAD14_062 PRODUCT: OpenVMS Alpha OPERATING SYSTEM V6.2 UPDATE PRODUCT: OpenVMS Alpha OPERATING SYSTEM V6.2 COVER LETTER 1 KIT NAME: ALPSHAD14_062 2 KITS SUPERSEDED BY THIS KIT: ALPSHAD13_062 3 KIT DEPENDENCIES: 3.1 The following remedial kit(s), or later, must be installed BEFORE installation of this, or any required kit: o ALPCLUSIO01_062 o ALPY2K01_062 3.2 In order to receive all the corrections listed in this kit, the following remedial kits, or later, should also be installed: None. 4 KIT DESCRIPTION: 4.1 Version(s) of OpenVMS to which this kit may be applied: OpenVMS Alpha V6.2, V6.2-1H1, V6.2-1H2, V6.2-1H3 4.2 Files patched or replaced: o [SYSEXE]SHADOW_SERVER.EXE (new image) o [SYS$LDR]SYS$SHDRIVER.EXE (new image) 5 PROBLEMS ADDRESSED IN ALPSHAD14_062 KIT o In a multi-node cluster, some cluster members may hang when accessing a shadowset if: -- COVER LETTER -- Page 2 18 January 2002 o The shadowset being accessed has multiple members. o All the shadowset members are local to one of the cluster nodes. o All the shadowset members are being MSCP-served by the local node to the other cluster members. o The local node goes down and remains down for at least MVTIMEOUT seconds. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE o On multiple member shadow sets greater than 18 Gigabyte in size, if the port driver returns an SS$_DATACHECK for a IO$_WRITEPBLK, the wrong LBNs will be written by the shadowing driver. This could lead to data corruption. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE 6 PROBLEMS ADDRESSED IN ALPSHAD13_062 KIT o An SS$_IVADDR error can occur, if a shadow set member is added with a /pol=mini command while another member is performing a mini copy operation. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE o During mini copy command processing using either SHDR$START_ADDSHADMBR or SHDR$START_REMSHADMBR, the path to an existing shadow set member can be switched before the system control block (SCB) can be read. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE o The ABORT and WBM events lack tracing support. -- COVER LETTER -- Page 3 18 January 2002 Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE o A host-based RAIDset hangs when one member of the shadowset encounters an SS$_OPINCOMPL 'operation incomplete' error. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE o A SHADDETINCON crash occurs during a mini copy operation. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE o A SHADDETINCON crash can occur in routine SHLK$MERGE_SIGNAL. See the partial crash dump summary below: Crashdump Summary Information: ------------------------------ Bugcheck Type: SHADDETINCON, SHADOWING detects inconsistent state Current Process: CTM$_00060006 Current Image: $1$DGA5014:[CTM$TMROOT.][CTM_HAMMER] CTM_HAMMER_ALPHA_32.EXE;1 Failing PC: FFFFFFFF.804A1CD4 SYS$SHDRIVER+93CD4 Failing PS: 14000000.00000804 Module: SYS$SHDRIVER (Link Date/Time: 15-DEC-2000 15:08:57.95) Offset: 00093CD4 Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE o A mini copy operation will occasionally abort with a %SYSTEM-F-IVADDR error. This can be caused by either: - Multiple SHADOW_SERVER threads are vying for the same virtual unit on the same system. - The CIP_MBR bit is set at the incorrect time, when there are two copy targets available. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE -- COVER LETTER -- Page 4 18 January 2002 o A system crashes with a SHADDETINCON error in SYS$SHDRIVER + 000762A0 following a Virtual Unit timeout. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE o A system crashes with a SHADDETINCON error in SYS$SHDRIVER + 000762A0. This occurs in Watcher code when the master member identified in the IN_SET lock value block is not a member of the set on the Watcher node. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE o The /POLICY=MINICOPY command fails, if the device is not able to perform a read SCB operation. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE o Multiple systems hang on cluster shutdown because a deadlock occurs when the WATCHER node exits. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE o System disk timeout is not managed properly. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE o Correct system disk MVTIMEOUT error. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE o A SHADDETINCON crash occurs exiting EXPEL_DEVICE. Crashdump Summary Information: ------------------------------ Bugcheck Type: SHADDETINCON, SHADOWING detects inconsistent state. Current Process: NULL Current Image: Failing PC: FFFFFFFF.804F70B0 SYS$SHDRIVER+D70B0 -- COVER LETTER -- Page 5 18 January 2002 Failing PS: 24000000.00000804 Module: SYS$SHDRIVER (Link Date/Time: 23-MAR-2001 13:47:38.91) Offset: 000D70B0 Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE o If SYSGEN system_check is enabled, the first mount of a system disk may crash the system. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE o Path switching occurs after a transient error invokes MOUNTVERIFICATION and no error is reported. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE 7 PROBLEMS ADDRESSED IN ALPSHAD12_062 KIT o The kit installation file in the ALPSHAD11_062 was incorrect and did not allow installation on a VAX system. Images Affected: Not applicable 8 PROBLEMS ADDRESSED IN ALPSHAD11_062 KIT o An INVEXCPETN crash occurs in SHIN$RESTORE_WLE_ENTRY when a write completes to a multi-member shadow set. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE o A SHADDETICON error occurs after removing or adding a shadow set member. -- COVER LETTER -- Page 6 18 January 2002 Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE o Removal of the master shadow set member may cause data corruption. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE o When a path to a device is lost during a write operation, the SCB (system control block) can contain a stale master member index value. This will cause the system to crash. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE o A system crash occurs on a cluster node when a SS$_VALNOTVALID error occurs in MERGE_SIGNAL on another cluster. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE o Increase the merge factor for shadowing from 1,000 to 10,000. This change also displays the merge factor only during an actual merge operation. Images Affected: - [SYSEXE]SHADOW_SERVER.EXE 9 PROBLEMS ADDRESSED IN ALPSHAD10_062 KIT o Multipath secondary UCBs cannot be shadow set members. The multipath disk would be immediately removed from the shadow set and an OPCOM message would be issued. Since MOUNT retries this operation a number of times, even with /NOASSIST, the failure would be repeated a number of times. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE -- COVER LETTER -- Page 7 18 January 2002 o Disabling a FibreChannel cascade connection corrupts a shadowset member. When the cascade connection is broken, two nodes can only see their local FC device. They enter Mountverification. One node will throw out its remote member and continue using the last member. The other node will then throw out its remote member, i.e., the disk that has just been used by the first node. This causes the first node to have zero members and the second node continues with a member that did not get the last set of writes. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE o Disabling a FibreChannel cascade connection results in an INVEXCPTN crash. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE o The SHADOWSET goes MOUNTVERIFYTIMEOUT and cannot be remounted. The process attempting the remount hangs. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE o The Master Member SCB hangs until MVTIMEOUT expires. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE o Ensure that several WAIT_FOR_RESOURCE waits are accomplished in SYSTEM_DISK_BEGIN. If one of the existing members of the system disk shadow set cannot be found and the code thread is looping for SHADOW_SYS_WAIT seconds for it to "appear", this will avoid a potential lock status race condition. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE -- COVER LETTER -- Page 8 18 January 2002 10 PROBLEMS ADDRESSED IN ALPSHAD09_062 o Not all status data for all members of the shadowset is displayed. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE - [SYSEXE]SHADOW_SERVER.EXE o The system can crash with a SHADDETINCON bugcheck. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE - [SYSEXE]SHADOW_SERVER.EXE o The system crashes when an I/O operation incurs a SS$_DATACHECK error during a shadowset copy operation. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE - [SYSEXE]SHADOW_SERVER.EXE o When a copy operation that interrupts a merge operation is terminating, it finds that there are no members marked for the merge and the thread crashes the system with a SHADDETINCON bug check. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE - [SYSEXE]SHADOW_SERVER.EXE o SHOW DEVICES shows zero percent merged status although the shadow set status does not indicate that a merge is required. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE - [SYSEXE]SHADOW_SERVER.EXE -- COVER LETTER -- Page 9 18 January 2002 o Bit 16 in SHADOW_SYS_DISK can be set by the user to eliminate using remote members of the shadowset for reads. Occasionally, use of bit 16 fails to eliminate remote members from being used. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE - [SYSEXE]SHADOW_SERVER.EXE o A CPUSPINWAIT bug check can occur, if the read of the SCB, of a shadow set member, cannot pass the checksum test. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE - [SYSEXE]SHADOW_SERVER.EXE o DCD (Disk Copy Data) will not always be initiated properly. During an assisted operation, if the source member was dismounted or otherwise removed from the shadow set, the connection to the controller would not clean up correctly. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE - [SYSEXE]SHADOW_SERVER.EXE o A full copy operation that is interrupted for a mini-merge will not complete the full copy operation correctly. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE - [SYSEXE]SHADOW_SERVER.EXE o Typing incorrect commands results in a system crash Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE - [SYSEXE]SHADOW_SERVER.EXE o When two disks are added to a shadowset in the same mount command, the copies are done sequentially instead of in parallel. This causes the copies to take twice as long as they -- COVER LETTER -- Page 10 18 January 2002 should. Images Affected: - [SYS$LDR]SYS$SHDRIVER.EXE - [SYSEXE]SHADOW_SERVER.EXE 11 PROBLEMS ADDRESSED IN ALPSHAD08_062 KIT o Functionality was added to enable customers to shadow devices that report an identical number of "Total Blocks". In the past, Sectors per track, Tracks per cylinder, and Total cylinders had to be identical, but the requirement is no longer needed. For example: $ SHOW DEVICES/FULL $84$DKC200: Disk $84$DKC200: (CSG84), device type RZ74, is online, mounted, file-oriented device, shareable, served to a cluster via MSCP Server, error logging enabled. Error count 1 Operations completed 28293 Owner process "" Owner UIC [SYSTEM] Owner process ID 00000000 Dev Prot S:RWPL,O:RWPL,G:R,W Reference count 137 Default buffer size 512 Total blocks 6976375 Sectors per track 91 Total cylinders 3067 Tracks per cylinder 25 $ SHOW DEVICES/FULL $84$MDA1200: Disk $84$MDA1200: (CSG84), device type RAM Disk, is online, allocated, deallocate on dismount, mounted, file-oriented device, shareable, served to cluster via MSCP Server. Error count 0 Operations completed 420 Owner process "USER" Owner UIC [SYSTEM] Owner process ID 4260041B Dev Prot S:RWPL,O:RWPL,G:R,W Reference count 2 Default buffer size 512 Total blocks 6976375 Sectors per track 64 Total cylinders 3407 Tracks per cylinder 32 Allocation class 84 These two devices can be members of the same shadow set. Device Device Error Volume Free Trans Mnt Name Status Count Label Blocks Count Cnt -- COVER LETTER -- Page 11 18 January 2002 DSA8400: Mounted 0 CSG84_V71 56308 319 1 $84$DKC200: (CSG84) ShadowSetMember 0 (member of DSA8400:) $84$MDA1200:(CSG84) ShadowCopying 0 (copy trgt DSA8400: 2% copied) o Faster I/O subsystems, for example the HSZ50 and the HSZ70, were taking longer to perform full merges than some older and slower subsystems. Changes were made to allow the System Manager to adjust thresholds. Two new logicals were adjusted to vary the merge multiplication factor used for a virtual unit, on a per node basis. The logicals used must be defined in the system table and therefore should be defined on each node in the cluster. The valid range for a threshold is 100 to 1000. Any value outside of this range causes a factor to default to 200. This value of 200 is displayed at the start of a shadow set merge, in the '%SHADOW_SERVER-I-SSRVINIMRG' message, following the word 'Factor'. CAUTION: Increasing the values excessively may cause application performance problems when merges are occurring. When setting values, System Managers must balance the site specific application needs with their merge requirements. Since two logical names are evaluated every one thousand I/Os, the factor can possibly be adjusted while a merge is in progress. The first logical name is: SHAD$MERGE_DELAY_FACTOR_DSAnnnn ^^^^ |||| vvvv This logical name is virtual unit specific, with 'nnnn' representing the virtual unit number. This delay factor will be applied to the virtual unit only. If any important disks need to be be merged with minimal disruption, values as high as 1,000% (threshold = 10 times best time) may be defined. By the same token, if a particular disk's merge operation is interfering with application I/O, it can cause the disk to delay more frequently by reducing the value as low as: 100 ( threshold = 1 times the best time) If the above logical is not defined, then the following logical is evaluated: SHAD$MERGE_DELAY_FACTOR -- COVER LETTER -- Page 12 18 January 2002 Like the virtual unit specific logical, this value will adjust the threshold, but only for all shadow sets that do not have a virtual unit specific logical defined. o Additional tracing code was added to help diagnose why mini merge operations were converted to full merge. o If full merge operations are interrupted with a copy operation, then write logging is enabled, which wastes cluster write logging resources. o If a VMScluster that has more than 96 nodes crashes, then write logging is never used to recover the virtual unit. The result is unnecessary full merge operations. o If a shadow set exists on multiple nodes in a cluster and one cluster member adds a device, which cannot be accessed by other nodes in the cluster, then those nodes will crash with an INVEXCEPTN in the SHDriver within SHSB$MATCH_MASTER_SCB. When calling SHSB$AVAILABLE_SHADOW_SET, the call to log an error packet resulted in an overwritten register (R0) and then a system crash occurred. An example of a crash footprint is: Crash Time: 28-OCT-1998 12:47:46.03 Bugcheck Type: INVEXCEPTN, Exception while above ASTDEL Node: NODE1 (Clustered) CPU Type: AlphaServer 8400 Model EV56/440 VMS Version: V6.2-1H3 Current Process: USER_1 Current Image: DSA1111:[RUN]GEM.EXE Failing PC: FFFFFFF8026E454 Failing PS: 34000000 00000804 Module: SYS$SHDRIVER Offset: 0003E454 Boot Time: 25-OCT-1998 18:51:50.00 o A Virtual Unit can hang and then no further use of the virtual unit is possible. If the System Dump Analyzer (SDA) is used to examine the virtual unit, then a negative value will be found in UCB$W_RWAITCNT. o Repeating mini merges or full merges can occur immediately after the successful completion of a previous mini merge or full merge on a virtual unit. o During a system shutdown, two possible scenarios could occur: 1. Other nodes that have the system disk virtual unit MOUNTed may suspend use of that virtual unit, until the node running shutdown is stopped. -- COVER LETTER -- Page 13 18 January 2002 2. When a system disk that is disabled for write logging is mounted on several nodes in a cluster, a non-system disk volume access to that virtual unit in the cluster may suspend, until the node running shutdown is stopped. o During a system reboot, the rebooting node may intermittently hang if write logging is concurrently enabled on the system disk and on other nodes in the cluster. o Since a virtual unit can be aborted for several reasons, additional tracing is needed to differentiate why the virtual units abort. 12 PROBLEMS ADDRESSED IN ALPSHAD07_062 KIT o When shutting down a node in a VMScluster, the system that is being used to perform the shutdown will crash. o Shadowsets intermittently hang. o A new informational message has been added that will result in a Mount verify message if the IO$_DIAGNOSE function is executed by the SHDRIVER. o Additional code changes to improve the error log reporting for Volume Shadowing. o The Volume Shadowing code in OpenVMS V7.1 and (V6.2, with the CLUSIO kit installed) included a new algorithm that did not always guarantee that read requests would be serviced by a locally connected disk in preference to a disk that was MSCP served by another OpenVMS system. Prior to V7.1 (and V6.2 with the CLUSIO kit installed), if there were local and MSCP served disks to choose from, all read requests were always queued to a local disk, unless the queue depth exceeded twenty, on the local member. Some customers, especially those who shadow over FDDI reported that this new algorithm was not preferable, and therefore requested the ability to choose the previous behavior. 13 PROBLEMS ADDRESSED IN ALPSHAD06_062 KIT o A potential system crash with SHADDETINCON bugcheck at SHDRIVER+12124 during boot, from a multiple member shadow set. This occurs if the booting member is not the first in the member array, and the other member is not yet visible. -- COVER LETTER -- Page 14 18 January 2002 o SHADDETINCON bugchecks on multiple nodes in cluster during a merge operation. System crash information ------------------------ Time of system crash: 13-APR-1997 13:21:05.59 Version of system: OpenVMS (TM) VAX Version V6.2 System Version Major ID/Minor ID: 1/0 VAXcluster node: CYV7KE, a VAX 7000-760 Crash CPU ID/Primary CPU ID: 00/00 Bitmask of CPUs active/available: 0000003F/0000003F CPU 00 reason for Bugcheck: SHADDETINCON, SHADOWING detects inconsistent state Process currently executing on this CPU: None Current IPL: 8 (decimal) CPU database address: C9212000 MPB address: B29B09C0 CPU 00 Processor stack General registers: R0 = 00000000 R1 = B67D258C R2 = B67D2180 R3 = B6544600 R4 = B35992C0 R5 = B624A340 R6 = B65447C8 R7 = 00000000 R8 = B67D2180 R9 = B6544730 R10 = 00000000 R11 = B6544600 AP = B65446B8 FP = 7FE2534C SP = C9213DAC PC = B82E42B3 PSL = 04080000 Processor registers: P0BR = C9946800 SBR = 1EF80400 ASTLVL = 00000004 P0LR = 0000018B SLR = 003FFF00 SISR = 00000010 P1BR = C9216400 PCBB = 7F7B0020 ICCS = 00000000 P1LR = 001FF116 SCBB = 1EF5F000 SID = 17000201 LDEV = 00018002 LBER = 00000000 LCNR = 00000001 LCON0 = DF0007ED LCON1 = 00000000 TODR = 44D09B64 LBECR0 = 0040003A LBECR1 = 00008060 LMODE = 000332A4 LMERR = 00000000 BIU_STAT = F00E1070 BIU_ADDR = 00000298 MMESTS = 10004005 TBSTS = 800001D0 PCSTS = FFFFF800 ISP = C9213DAC KSP = 7FFE7800 ESP = 7FFE9800 SSP = 7FFED800 USP = 7FE2534C o System crashes in SHADDETINCON SYS$SHDRIVER+3D3C0. Bugcheck Type: SHADDETINCON, SHA RBADC2 (Clustered) CPU Type: AlphaServer 2100 4/233 VMS Version: V6.2-1H2 Current Process: NULL Current Image: Failing PC: FFFFFFFF 8025B3C0 Failing PS: 08000000 00000804 Module: SYS$SHDRIVER Offset: 0003D3C0 -- COVER LETTER -- Page 15 18 January 2002 Boot Time: 15-APR-1997 08:39:31.00 System Uptime: 5 22:23 Crash/Primary CPU: 00/00 Saved Processes: 22 Pagesize: 8 KByte (8192 bytes) Physical Memory: 256 MByte (32768 PFNs) Dumpfile Pagelets: 184518 blocks Dump Flags: olddump,writecomp,errlogcomp,dump_style EXE$GL_FLAGS: poolpging,init,bugdump Stack Pointers: KSP = FFFFFFFF 8A731D88 ESP = FFFFFFFF 8A733000 SSP = FFFFFFFF 8A72D000 USP = FFFFFFFF 8A72D000 General Registers R0 = 00000000 00000001 R1 = FFFFFFFF 8162F7E0 R2 = FFFFFFFF 8162F7C0 R3 = FFFFFFFF 8186EBC0 R4 = 00000000 00000003 R5 = FFFFFFFF 8162F890 R6 = FFFFFFFF 8186EE80 R7 = 00000000 00000000 R8 = FFFFFFFF 8162F7C0 R9 = FFFFFFFF 8186EDE8 R10 = 00000000 00000000 R11 = FFFFFFFF 8186EBC0 R12 = FFFFFFFF 8186ED38 R13 = FFFFFFFF 8710A270 R14 = FFFFFFFF 87084200 R15 = 00000000 003C60E0 R16 = 00000000 000008B4 R17 = 00000000 00000501 R18 = 00000000 00000000 R19 = FFFFFFFF 87084200 R20 = 00000000 00000000 R21 = FFFFFFFF 8162F808 R22 = FFFFFFFF 8710FB20 R23 = 00000000 00000000 R24 = 00000000 00000001 AI = 00000000 00000001 RA = FFFFFFFF 80288928 PV = FFFFFFFF 8710A698 R28 = 00000000 00000000 FP = FFFFFFFF 8A731DE0 PC = FFFFFFFF 8025B3C4 PS = 08000000 00000804 System Registers: Page Table Base Register (PTBR) 00000000 00007FF8 Processor Base Register (PRBR) FFFFFFFF 8110A000 Privileged Context Block Base (PCBB) 00000000 0110A080 System Control Block Base (SCBB) 00000000 000001B3 Software Interrupt Summary Register (SISR) 00000000 00000000 Address Space Number (ASN) 00000000 00000000 AST Summary / AST Enable (ASTSR_ASTEN) 00000000 00000000 Floating-Point Enable (FEN) 00000000 00000000 Interrupt Priority Level (IPL) 00000000 00000008 Machine Check Error Summary (MCES) 00000000 00000000 -- COVER LETTER -- Page 16 18 January 2002 Virtual Page Table Base Register (VPTB) 00000002 00000000 Failing Instruction: SYS$SHDRIVER_NPRO+393C0: BUGCHK Instruction Stream (last 20 instructions): SYS$SHDRIVER_NPRO+39370: RET R31,(R28) SYS$SHDRIVER_NPRO+39374: LDQ_U R31,(SP) SYS$SHDRIVER_NPRO+39378: SUBQ SP,#X10,SP SYS$SHDRIVER_NPRO+3937C: STQ R16,#X0008(SP) SYS$SHDRIVER_NPRO+39380: STQ R17,(SP) SYS$SHDRIVER_NPRO+39384: LDQ R17,#XF8E0(R13) SYS$SHDRIVER_NPRO+39388: BIS R17,#X04,R17 SYS$SHDRIVER_NPRO+3938C: BIS R31,R17,R16 SYS$SHDRIVER_NPRO+39390: LDQ R17,(SP) SYS$SHDRIVER_NPRO+39394: ADDQ SP,#X08,SP SYS$SHDRIVER_NPRO+39398: BUGCHK SYS$SHDRIVER_NPRO+3939C: HALT SYS$SHDRIVER_NPRO+393A0: SUBQ SP,#X10,SP SYS$SHDRIVER_NPRO+393A4: STQ R16,#X0008(SP) SYS$SHDRIVER_NPRO+393A8: STQ R17,(SP) SYS$SHDRIVER_NPRO+393AC: LDQ R17,#XF8E0(R13) SYS$SHDRIVER_NPRO+393B0: BIS R17,#X04,R17 SYS$SHDRIVER_NPRO+393B4: BIS R31,R17,R16 SYS$SHDRIVER_NPRO+393B8: LDQ R17,(SP) SYS$SHDRIVER_NPRO+393BC: ADDQ SP,#X08,SP SYS$SHDRIVER_NPRO+393C0: BUGCHK SYS$SHDRIVER_NPRO+393C4: HALT SYS$SHDRIVER_NPRO+393C8: BIS R31,R31,R31 SYS$SHDRIVER_NPRO+393CC: BIS R31,R31,R31 SYS$SHDRIVER_NPRO+393D0: SUBQ SP,#X50,SP o The Volume Shadowing software which was shipped in OpenVMS Alpha and VAX V7.1 and the CLUSIO remedial kits, requires additional non-paged pool to improve synchronization. Customers should take this into account when they are tuning their systems, and be aware that Volume Shadowing is now more sensitive to resource problems with the possibility that systems may crash if non-paged pool is exhausted. Shadowing uses approximately 800 bytes additional non-paged pool per concurrent IO to the virtual unit. This remedial kit includes codes which avoids system crashes if a system exhausts non-paged pool. Please be aware that there are still cases under which Non-Paged Pool exhaustion will result in a SHADDETINCON BugCHECK. This modification reduces the probability but does not completely eliminate them. o During internal testing, a system crashed which indicated that IO's were left outstanding in DUDRIVER after a virtual unit had been removed. o There was a missing index on a check for member valid in the BBR_READ_RECOVERY routine. -- COVER LETTER -- Page 17 18 January 2002 o There was an "infinite" loop condition at SHCP$START_QUED, and the code has been modified so that the persistent thread will be "killed" if the VU it was spawned fails. o This remedial kit includes additional error logging capabilities to collect additional information when a virtual unit is made available. The new LOG_IT macro code has the following input parameters: o R0 - value of P4 o R1 - value of P5 o R2 - address of LW in SHAD containing P6 o R3 - VU UCB o R5 - SHAD IRP address with: - CDRP$L_BCNT = P1 - CDRP$L_MEDIA = P2 - CDRP$L_PID = P3 The implementation makes use of the following cells in the errorlog record. o EMB$W_SP_BOFF - set to %xBADE as TAG o EMB$W_SP_FUNC - reason code o EMB$L_SP_BCNT - LW for information o EMB$L_SP_MEDIA - LW for information o EMB$L_SP_RQPID - LW for information o EMB$Q_SP_IOSB - 2 LW for information o EMB$L_SP_CMDREF - LW for Information o Process intermittently hangs during dismount of a shadow-set while waiting for completion of the QIOW in DO_IO routine. o KRNLSTAKNV halt during MOUNT/CLUSTER DSAx: Bugcheck Type: CPUSANITY, CPU sanity timer expired Node: AI84 (Clustered) CPU Type: AlphaServer 8400 Model EV56/440 VMS Version: V6.2-1H3 Current Process: PM2SKZ Current Image: DSA40:[ZENT410.][EXE]BUS.EXE Failing PC: FFFFFFFF 8001F8D0 Failing PS: 18000000 00001604 Module: SYSTEM_PRIMITIVES_MIN Offset: 0000B8D0 -- COVER LETTER -- Page 18 18 January 2002 Boot Time: 26-JUN-1997 08:34:37.00 System Uptime: 1 00:46:34.07 Crash/Primary CPU: 01/00 Saved Processes: 26 Pagesize: 8 KByte (8192 bytes) Physical Memory: 2048 MByte (262144 PFNs) Dumpfile Pagelets: 999974 blocks Dump Flags: writecomp,errlogcomp,dump_style EXE$GL_FLAGS: poolpging,init,bugdump,pgflfrag Stack Pointers: KSP = 00000000 7FF91C98 ESP = 00000000 7FF96000 SSP = 00000000 7FF9C100 USP = 00000000 7EDE4030 General Registers: R0 = 00000000 00000000 R1 = FFFFFFFF 814EA180 R2 = FFFFFFFF 81410000 R3 = FFFFFFFF 9DE268F8 R4 = 00000000 0000012C R5 = 00000000 7FF91D40 R6 = 00000000 7FF445A0 R7 = 08000000 00000200 R8 = FFFFFFFF F7710250 R9 = 00000000 00000030 R10 = 00000000 00000031 R11 = 00000000 00000001 R12 = 00000000 00008001 R13 = FFFFFFFF 9DE268F8 R14 = FFFFFFFF 9DE25640 R15 = FFFFFFFF 9DE04200 R16 = 00000000 00000774 R17 = 00000000 7FF91C38 R18 = FFFFFFFF 9DE32CE0 R19 = FFFFFFFF 9DE04200 R20 = 00000000 00000000 R21 = 00000000 272007F0 R22 = FFFFFFFF 9DE04200 R23 = 00000000 00000000 R24 = FFFFFFFF 9DE04AC0 AI = 00000000 00000000 RA = FFFFFFFF 00000000 PV = FFFFFFFF FFFFFFFF R28 = FFFFFFFF 8001F83C FP = 00000000 7FF91E10 PC = FFFFFFFF 8001F8D4 PS = 18000000 00001604 Failing Instruction: EXE$HWCLKINT_C+00510: BUGCHK o The system crashes when a second node attempts to boot a system disk shadow set with two members. The following SHADDETINCON bugcheck at SHDRIVER+12124 or SYS$SHDRIVER_NPRO+449B4 occurs: SHADDETINCON, SHADOWING detects inconsistent state o The mount of a shadow set fails. The failure report says that the set is already mounted or that there is a duplicate unit number. -- COVER LETTER -- Page 19 18 January 2002 14 KIT INSTALLATION RATING: The following kit installation rating, based upon current CLD information, is provided to serve as a guide to which customers should apply this remedial kit. (Reference attached Disclaimer of Warranty and Limitation of Liability Statement) INSTALLATION RATING: INSTALL_1 : To be installed by all customers. 15 INSTALLATION INSTRUCTIONS: Install this kit with the VMSINSTAL utility by logging into the SYSTEM account, and typing the following at the DCL prompt: @SYS$UPDATE:VMSINSTAL ALPSHAD14_062 [location of the saveset] The saveset location may be a tape drive, CD, or a disk directory that contains the kit saveset. This kit requires a system reboot. Compaq strongly recommends that a reboot is performed immediately after kit installation to avoid system instability If you have other nodes in your OpenVMS cluster, they must also be rebooted in order to make use of the new image(s). If it is not possible or convenient to reboot the entire cluster at this time, a rolling re-boot may be performed. Copyright (c) Compaq Computer Corporation, 2002 All Rights Reserved. Unpublished rights reserved under the copyright laws of the United States. COMPAQ, the Compaq logo, VAX, Alpha, VMS, and OpenVMS are registered in the U.S. Patent and Trademark Office. All other product names mentioned herein may be trademarks of their respective companies. Confidential computer software. Valid license from Compaq required for possession, use, or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. Compaq shall not be liable for technical or editorial errors or omissions contained herein. The information in this document is provided as is without warranty of any kind and is subject to change without notice. The warranties for Compaq products are set forth in the express limited warranty statements accompanying such products. Nothing herein should be construed as constituting an additional warranty. DISCLAIMER OF WARRANTY AND LIMITATION OF LIABILITY THIS PATCH IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND. ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR PARTICULAR PURPOSE, OR -- COVER LETTER -- Page 20 18 January 2002 NON-INFRINGEMENT, ARE HEREBY EXCLUDED TO THE EXTENT PERMITTED BY APPLICABLE LAW. IN NO EVENT WILL COMPAQ BE LIABLE FOR ANY LOST REVENUE OR PROFIT, OR FOR SPECIAL, INDIRECT, CONSEQUENTIAL, INCIDENTAL OR PUNITIVE DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, WITH RESPECT TO ANY PATCH MADE AVAILABLE HERE OR TO THE USE OF SUCH PATCH.