**************************** ECO SUMMARY INFORMATION **************************** Release Date: 19-NOV-2003 Kit Name: DEC-AXPVMS-VMS722_SYS-V0200--4.PCSI Kit Applies To: OpenVMS ALPHA V7.2-2 Approximate Kit Size: 11728 blocks Installation Rating: INSTALL_1 Reboot Required: Yes - rolling reboot Superseded Kits: VMS722_SYS-V0100 Mandatory Kit Dependencies: VMS722_UPDATE-V0100 VMS722_PCSI-V0100 Optional Kit Dependencies: None VMS722_SYS-V0200.PCSI-DCX_AXPEXE Checksum: 4230116653 ======================================================================= Hewlett-Packard OpenVMS ECO Cover Letter ======================================================================= ECO NUMBER: VMS722_SYS-V0200 PRODUCT: OpenVMS Alpha OPERATING SYSTEM V7.2-2 UPDATE PRODUCT: OpenVMS Alpha OPERATING SYSTEM V7.2-2 1 KIT NAME: VMS722_SYS-V0200 2 KIT DESCRIPTION: 2.1 Installation Rating: INSTALL_1 : To be installed by all customers. This installation rating, based upon current CLD information, is provided to serve as a guide to which customers should apply this remedial kit. (Reference attached Disclaimer of Warranty and Limitation of Liability Statement) 2.2 Reboot Requirement: Reboot Required. HP strongly recommends that a reboot is performed immediately after kit installation to avoid system instability. If you have other nodes in your OpenVMS cluster, they must also be rebooted in order to make use of the new image(s). If it is not possible or convenient to reboot the entire cluster at this time, a rolling re-boot may be performed. 2.3 Version(s) of OpenVMS to which this kit may be applied: OpenVMS Alpha V7.2-2 2.4 New functionality or new hardware support provided: No. 3 KITS SUPERSEDED BY THIS KIT: - VMS722_SYS-V0100 4 KIT DEPENDENCIES: 4.1 The following remedial kit(s), or later, must be installed BEFORE installation of this, or any required kit: - VMS722_PCSI-V0100 - VMS722_UPDATE-V0100 Page 2 4.2 In order to receive all the corrections listed in this kit, the following remedial kits, or later, should also be installed: - None 5 FILES PATCHED OR REPLACED: o [SYS$LDR]EXCEPTION.EXE (new image) Image Identification Information image name: "EXCEPTION" image file identification: "X-3" image file build identification: "X71Z-0050170034" link date/time: 16-OCT-2003 21:37:31.87 linker identification: "A11-39" Overall Image Checksum: 1659640710 o [SYS$LDR]EXCEPTION_MON.EXE (new image) Image Identification Information image name: "EXCEPTION_MON" image file identification: "X-3" image file build identification: "X71Z-0050170034" link date/time: 16-OCT-2003 21:41:16.82 linker identification: "A11-39" Overall Image Checksum: 2212135261 o [SYS$LDR]IO_ROUTINES.EXE (new image) Image Identification Information image name: "IO_ROUTINES" image file identification: "X-3" image file build identification: "X71Z-0050170034" link date/time: 16-OCT-2003 21:45:38.61 linker identification: "A11-39" Overall Image Checksum: 3274151755 o [SYS$LDR]IO_ROUTINES_MON.EXE (new image) Image Identification Information image name: "IO_ROUTINES_MON" image file identification: "X-3" image file build identification: "X71Z-0050170034" link date/time: 16-OCT-2003 21:46:56.46 linker identification: "A11-39" Overall Image Checksum: 3186607416 Page 3 o [SYS$LDR]LOCKING.EXE (new image) Image Identification Information image name: "LOCKING" image file identification: "X-3" image file build identification: "X71Z-0050170034" link date/time: 16-OCT-2003 21:47:58.34 linker identification: "A11-39" Overall Image Checksum: 2964601932 o [SYS$LDR]LOGICAL_NAMES.EXE (new image) Image Identification Information image name: "LOGICAL_NAMES" image file identification: "X-3" image file build identification: "X71Z-0050170034" link date/time: 16-OCT-2003 21:48:29.20 linker identification: "A11-39" Overall Image Checksum: 2768611774 o [SYS$LDR]MESSAGE_ROUTINES.EXE (new image) Image Identification Information image name: "MESSAGE_ROUTINES" image file identification: "X-3" image file build identification: "X71Z-0050170034" link date/time: 16-OCT-2003 21:34:52.12 linker identification: "A11-39" Overall Image Checksum: 3825457226 o [SYS$LDR]PROCESS_MANAGEMENT.EXE (new image) Image Identification Information image name: "PROCESS_MANAGEMENT" image file identification: "X-3" image file build identification: "X71Z-0050170034" link date/time: 16-OCT-2003 21:40:34.72 linker identification: "A11-39" Overall Image Checksum: 2065535560 o [SYS$LDR]PROCESS_MANAGEMENT_MON.EXE (new image) Image Identification Information image name: "PROCESS_MANAGEMENT_MON" image file identification: "X-3" image file build identification: "X71Z-0050170034" link date/time: 16-OCT-2003 21:44:22.30 linker identification: "A11-39" Overall Image Checksum: 3057281551 Page 4 o [SYS$LDR]SECURITY.EXE (new image) Image Identification Information image name: "SECURITY" image file identification: "X-5" image file build identification: "X71Z-0050170034" link date/time: 16-OCT-2003 21:46:14.46 linker identification: "A11-39" Overall Image Checksum: 803836901 o [SYS$LDR]SECURITY_MON.EXE (new image) Image Identification Information image name: "SECURITY_MON" image file identification: "X-5" image file build identification: "X71Z-0050170034" link date/time: 16-OCT-2003 21:46:35.90 linker identification: "A11-39" Overall Image Checksum: 3364535278 o [SYS$LDR]SYS$VCC.EXE (new image) Image Identification Information image name: "SYS$VCC" image file identification: "X-3" image file build identification: "X71Z-0050170034" link date/time: 16-OCT-2003 21:47:33.06 linker identification: "A11-39" Overall Image Checksum: 3493320729 o [SYS$LDR]SYS$VCC_MON.EXE (new image) Image Identification Information image name: "SYS$VCC_MON" image file identification: "X-3" image file build identification: "X71Z-0050170034" link date/time: 16-OCT-2003 21:47:48.60 linker identification: "A11-39" Overall Image Checksum: 3641700056 o [SYS$LDR]SYS$VM.EXE (new image) Image Identification Information image name: "SYS$VM" image file identification: "X-3" image file build identification: "X71Z-0050170034" link date/time: 16-OCT-2003 21:48:37.43 linker identification: "A11-39" Overall Image Checksum: 2010202021 Page 5 o [SYS$LDR]SYSGETSYI.EXE (new image) Image Identification Information image name: "SYSGETSYI" image file identification: "X-3" image file build identification: "X71Z-0050170034" link date/time: 16-OCT-2003 21:34:30.25 linker identification: "A11-39" Overall Image Checksum: 2480485375 o [SYS$LDR]SYSLDR_DYN.EXE (new image) Image Identification Information image name: "SYSLDR_DYN" image file identification: "X-3" image file build identification: "X71Z-0050170034" link date/time: 16-OCT-2003 21:37:35.28 linker identification: "A11-39" Overall Image Checksum: 2902181902 o [SYS$LDR]SYSTEM_PRIMITIVES.EXE (new image) Image Identification Information image name: "SYSTEM_PRIMITIVES" image file identification: "X-3" image file build identification: "X71Z-0050170034" link date/time: 16-OCT-2003 21:44:26.38 linker identification: "A11-39" Overall Image Checksum: 2936417312 o [SYS$LDR]SYSTEM_PRIMITIVES_MIN.EXE (new image) Image Identification Information image name: "SYSTEM_PRIMITIVES_MIN" image file identification: "X-3" image file build identification: "X71Z-0050170034" link date/time: 16-OCT-2003 21:42:06.38 linker identification: "A11-39" Overall Image Checksum: 939553816 o [SYS$LDR]EXCEPTION.STB (new file) o [SYS$LDR]EXCEPTION_MON.STB (new file) o [SYS$LDR]IO_ROUTINES.STB (new file) o [SYS$LDR]IO_ROUTINES_MON.STB (new file) o [SYS$LDR]LOCKING.STB (new file) o [SYS$LDR]LOGICAL_NAMES.STB (new file) Page 6 o [SYS$LDR]PROCESS_MANAGEMENT.STB (new file) o [SYS$LDR]PROCESS_MANAGEMENT_MON.STB (new file) o [SYS$LDR]SECURITY.STB (new file) o [SYS$LDR]SECURITY_MON.STB (new file) o [SYS$LDR]SYS$VCC.STB (new file) o [SYS$LDR]SYS$VCC_MON.STB (new file) o [SYS$LDR]SYS$VM.STB (new file) o [SYS$LDR]SYSTEM_PRIMITIVES.STB (new file) o [SYS$LDR]SYSTEM_PRIMITIVES_MIN.STB (new file) 6 PROBLEMS ADDRESSED IN THIS KIT 6.1 New problems addressed in the VMS722_SYS-V0200 kit 6.1.1 This is a mandatory patch that corrects a serious security problem in OpenVMS. 6.1.1.1 Problem Description: This is a mandatory patch that corrects a serious security problem in OpenVMS. Images Affected: - [SYS$LDR]IO_ROUTINES.EXE - [SYS$LDR]IO_ROUTINES.STB - [SYS$LDR]IO_ROUTINES_MON.EXE - [SYS$LDR]IO_ROUTINES_MON.STB 6.1.1.2 CLDs, and QARs reporting this problem: 6.1.1.3 CLD(s) None. Page 7 6.1.1.4 QAR(s) None. 6.1.1.5 Problem Analysis: This is a mandatory patch that corrects a serious security problem in OpenVMS. 6.1.1.6 Work-arounds: None. 6.1.2 F$GETSYI('RAD_MAX_RADS',node) fails to get the information from the specified node. 6.1.2.1 Problem Description: F$GETSYI('RAD_MAX_RADS',node) fails to get the information from the specified node. Instead, the command returns information from the current node the command is executing on. Note that with OpenVMS versions before V7.3-1, RAD_MAX_RADS will return 1 if there is no RAD support, and will return 8 no matter how many QBBs are physically present. This limitation was documented in the V7.2-1H1R New Features and Release Notes, June 2000, section 3.5.6.1. Images Affected: - [SYS$LDR]SYSGETSYI.EXE 6.1.2.2 CLDs, and QARs reporting this problem: 6.1.2.3 CLD(s) CFS.89598 6.1.2.4 QAR(s) None. Page 8 6.1.2.5 Problem Analysis: To correct this problem, the code will now check to see if the F$GETSYI command specifies a remote node. If so, it will get the information from that node rather than returning information from the current node. 6.1.2.6 Work-arounds: None. 6.1.3 System can crash with a FILCNTNONZ bugcheck 6.1.3.1 Problem Description: The system can crash with a FILCNTNONZ bugcheck. CrashDump Summary ----------------- Bugcheck Type: FILCNTNONZ, Open file count nonzero after process rundown Current Image: Failing PC: FFFFFFFF.D269A360 PROCESS_MANAGEMENT+3C360 Failing PS: 18000000.00000000 Module: PROCESS_MANAGEMENT (Link Date/Time: 18-OCT-2000 07:01:30.44) Offset: 0003C360 Images Affected: - [SYS$LDR]IO_ROUTINES.EXE - [SYS$LDR]IO_ROUTINES.STB - [SYS$LDR]IO_ROUTINES_MON.EXE - [SYS$LDR]IO_ROUTINES_MON.STB 6.1.3.2 CLDs, and QARs reporting this problem: 6.1.3.3 CLD(s) CFS.84385 Page 9 6.1.3.4 QAR(s) None. 6.1.3.5 Problem Analysis: This fix will simply detect the problem, recompute the channel number and repair the CCB, thus allowing the channel to be closed. 6.1.3.6 Work-arounds: None. 6.1.4 "INCONSTATE, Inconsistent I/O data base" bugcheck 6.1.4.1 Problem Description: The system can crash with an "INCONSTATE, Inconsistent I/O data base" bugcheck Crashdump Summary Information ----------------------------- Bugcheck Type: INCONSTATE, Inconsistent I/O data base Current Image: $1$DUA42:[SYS3.SYSCOMMON.] [SYSEXE]SMPUTIL.EXE Failing PC: FFFFFFFF.800BC83C EXE$FP_ASSIGN_PORT_C+0021C Failing PS: 20000000.00000803 Module: IO_ROUTINES (Link Date/Time: 22-MAR-2001 00:58:18.18) Offset: 0000883C Images Affected: - [SYS$LDR]SYSTEM_PRIMITIVES.EXE - [SYS$LDR]SYSTEM_PRIMITIVES.STB - [SYS$LDR]SYSTEM_PRIMITIVES_MON.EXE - [SYS$LDR]SYSTEM_PRIMITIVES_MON.STB 6.1.4.2 CLDs, and QARs reporting this problem: Page 10 6.1.4.3 CLD(s) CFS.87977 6.1.4.4 QAR(s) None. 6.1.4.5 Problem Analysis: Several CBB support routines use quadword updates to modify internal bitmasks. When bit 31 is specified in exe$cbb_insert_bitmask and exe$cbb_clear_bit, an unexpected sign extension occurs. This leaves extra bits set beyond the calculated end of the CBB structure. Those extra bits are incorrectly counted and incorporated into the structure's count cell. External code that relies on the accuracy of that count cell may not work properly. 6.1.4.6 Work-arounds: None. 6.1.5 System can crash with a CWLNMERR bugcheck 6.1.5.1 Problem Description: The system can crash with a CWLNMERR bugcheck. The failing PC is in the LOGICAL_NAMES executive image. Images Affected: - [SYS$LDR]LOGICAL_NAMES.EXE - [SYS$LDR]LOGICAL_NAMES.STB 6.1.5.2 CLDs, and QARs reporting this problem: 6.1.5.3 CLD(s) None. 6.1.5.4 QAR(s) 75-13-871 Page 11 6.1.5.5 Problem Analysis: To do a clusterwide logical name operation, $ENQW is requested to lock the cwlogical name resource. Any error is totally unexpected and inexplicable, so the service crashes in response. A process in a job with no ENQCNT quota can cause an $ENQW failure, causing the crash. 6.1.5.6 Work-arounds: None. 6.1.6 Program using C signals can be aborted with the condition code set to the C signal value. 6.1.6.1 Problem Description: A program using C signals can be aborted with the condition code set to the C signal value. Images Affected: - [SYS$LDR]EXCEPTION.EXE - [SYS$LDR]EXCEPTION.STB - [SYS$LDR]EXCEPTION_MON.EXE - [SYS$LDR]EXCEPTION_MON.STB 6.1.6.2 CLDs, and QARs reporting this problem: 6.1.6.3 CLD(s) 70-3-5990,70-17-64,CFS.92523 6.1.6.4 QAR(s) None. 6.1.6.5 Problem Analysis: The code in OpenVMS condition handling that searches for a handler (srchandler) checks for exceptions within the dispatcher (sys$call_handl). It filters out signals, such as SS$_DEBUG, SS$_IMGDMP, and SS$_BREAK. It was not filtering C signals. When it encounters a C signal, srchandler branches to the badhandler code, which is fatal to the program. Page 12 6.1.6.6 Work-arounds: None. 6.1.7 System can crash with an SSRVEXCEPT, Unexpected system bugcheck at EXE$IO_PERFORM_C+00650 6.1.7.1 Problem Description: The system can crash with an SSRVEXCEPT, Unexpected system bugcheck at EXE$IO_PERFORM_C+00650 Crashdump Summary Information: ------------------------------ Bugcheck Type: SSRVEXCEPT, Unexpected system service exception Current Process: ORA_P10111B1238 Current Image: $1$DGA92:[A_ORACLE.ORACLEV7336.RDBMS]SRV.EXE Failing PC: FFFFFFFF.800CBEB0 EXE$IO_PERFORM_C+00650 Failing PS: 30000000.00000203 Module: IO_ROUTINES (Link Date/Time: 5-NOV-2001 11:49:25.01) Offset: 00017EB0 Failing Instruction: EXE$IO_PERFORM_C+00650: LDQ R18,#X0108(R18)\ Images Affected: - [SYS$LDR]IO_ROUTINES.EXE - [SYS$LDR]IO_ROUTINES.STB - [SYS$LDR]IO_ROUTINES_MON.EXE - [SYS$LDR]IO_ROUTINES_MON.STB 6.1.7.2 CLDs, and QARs reporting this problem: 6.1.7.3 CLD(s) 70-3-5965,CFS.92424 6.1.7.4 QAR(s) None. Page 13 6.1.7.5 Problem Analysis: A local variable was not initialized resulting in the crash. The correction for this problem is to fix the code path to store the correct information in this local variable. 6.1.7.6 Work-arounds: None. 6.1.8 User-written device driver can result in an SSREXCEPT bugcheck in MMG_STD$IOLOCK_BUF 6.1.8.1 Problem Description: An SSREXCEPT bugcheck can occur in MMG_STD$IOLOCK_BUF if it must fault in the affected page. Images Affected: - [SYS$LDR]IO_ROUTINES.EXE - [SYS$LDR]IO_ROUTINES_MON.EXE - [SYS$LDR]IO_ROUTINES.STB - [SYS$LDR]IO_ROUTINES_MON.STB 6.1.8.2 CLDs, and QARs reporting this problem: 6.1.8.3 CLD(s) CFS.92197 6.1.8.4 QAR(s) None. 6.1.8.5 Problem Analysis: Section 2.2.1 of the OpenVMS Alpha Guide to Upgrading Privileged Code Applications describes how device drivers can lock down multiple I/O buffers using IRPEs with the EXE_STD$READLOCK, WRITELOCK, or MODIFYLOCK routines. Unfortunately, when an IRPE was passed in and the buffer required a page fault this code used the IRP$PS_FDT_CONTEXT offset on the IRPE, which is only valid on an IRP. Page 14 The correction for this problem is to delay the use of the IRP$PS_FDT_CONTEXT cell until after the original IRP pointer has been restored by a call to the error call back routine. 6.1.8.6 Work-arounds: None. 6.1.9 Process may hang waiting for the completion of an RMS $FLUSH operation. 6.1.9.1 Problem Description: If there is a lot of cluster-wide $GETJPI activity, a process may hang waiting for the completion of an RMS $FLUSH operation. The I/O request resulting from this may wait forever because the IOSB is not filled in. Images Affected: - [SYS$LDR]PROCESS_MANAGEMENT.EXE - [SYS$LDR]PROCESS_MANAGEMENT.STB - [SYS$LDR]PROCESS_MANAGEMENT_MON.EXE - [SYS$LDR]PROCESS_MANAGEMENT_MON.STB 6.1.9.2 CLDs, and QARs reporting this problem: 6.1.9.3 CLD(s) CFS.83753,CFS.92041,CFS.92830 6.1.9.4 QAR(s) None. 6.1.9.5 Problem Analysis: The problem is highly dependent on timing. In the final stages of clusterwide $GETJPI processing an AST may become active, delivering the result of remote $GETJPI operations. At that time, it may also be needed to do some more local $GETJPI processing. For this to function properly, the previous mode in the PSL must be set to the mode of the thread invoking the original system service. This is to allow buffer probing to work correctly. Page 15 What may happen however is that before we exit, another kernel AST is queued to our process. This AST is the result of RMS modifying a file, asking the XQP to do the work, and the XQP completing the I/O. Before the AST delivery code dismisses the AST it will notice that another kernel mode AST is pending, and allow that AST to run. That AST is then executing code in I/O post-processing filling in the final IOSB. Now, because RMS has an IOSB in its private space, protected as EW and because the previous mode is user due to the GETJPI processing, the probe checking the accessibility of the IOSB fails leaving the IOSB untouched and RMS waiting forever. 6.1.9.6 Work-arounds: None. 6.1.10 System that supports NUMA and has RAD_SUPPORT enabled, can crash with an INVEXCEPTN bugcheck 6.1.10.1 Problem Description: A system that supports NUMA and has RAD_SUPPORT enabled, can crash with an INVEXCEPTN bugcheck when the swapper is the current process. The failing PC is at or around MMG$WRTMFYPAG_C+006E4. Images Affected: - [SYS$LDR]SYS$VM.EXE - [SYS$LDR]SYS$VM.STB 6.1.10.2 CLDs, and QARs reporting this problem: 6.1.10.3 CLD(s) None. 6.1.10.4 QAR(s) 75-66-1366 6.1.10.5 Problem Analysis: The size of the GBLSEC_RADS array failed to take into account that more global sections can be created than indicated by the GBLSECTIONS SYSGEN parameter. Page 16 The size of the GBLSEC_RADS array is now calculated based on the different between the end of the global section table and the last WSL entry for the system PHD. This calculation is larger than the value found in the GBLSECTIONS SYSGEN parameter and allows enough array entries for all possible global sections. 6.1.10.6 Work-arounds: None. 6.1.11 Lookup of a device may fail 6.1.11.1 Problem Description: Under some circumstances lookup of a device may fail. If the system doing the lookup has a zero allocation class specified, and attempts to access an MSCP-served disk by using the NODE$DUA form, the access may fail because the device will not be found. This can be seen in the following SET VOLUME/REBUILD DCL example (SHOW DEVICE fields have been edited for space reasons). $ SHOW DEVICE $8$DUA10 Device Device Error Volume Name Status Count Label $8$DUA10: (UTRAMP) Mounted alloc 0 TEST1 $ WRITE SYS$OUTPUT F$GETDVI("$8$DUA10:","EXISTS") TRUE $ $ WRITE SYS$OUTPUT F$GETDVI("$8$DUA10:","ROOTDEVNAM") _UTRAMP$DUA10: $ $ WRITE SYS$OUTPUT F$GETDVI("$8$DUA10:","DEVNAM") _UTRAMP$DUA10: $ $ WRITE SYS$OUTPUT F$GETDVI("_utramp$dua10:","EXISTS") FALSE $ $ SET VOLUME/REBUILD $8$DUA10: %SET-E-NOTSET, error modifying _UTRAMP$DUA10: -RMS-F-DEV, error in device name or inappropriate device type for operation %SET-E-NOTSET, error modifying _UTRAMP$DUA10: -SYSTEM-W-NOSUCHDEV, no such device available A lookup via the runtime library function LIB$FIND_IMAGE_SYMBOL will also show the problem. Page 17 Images Affected: - [SYS$LDR]IO_ROUTINES.EXE - [SYS$LDR]IO_ROUTINES_MON.EXE - [SYS$LDR]IO_ROUTINES.STB - [SYS$LDR]IO_ROUTINES_MON.STB 6.1.11.2 CLDs, and QARs reporting this problem: 6.1.11.3 CLD(s) CFS.90852 6.1.11.4 QAR(s) None. 6.1.11.5 Problem Analysis: IOC_STD$SEARCHINT does not look any further if it encounters a DDB without any units attached to it. Also, SYS$GETDVI's item codes DVI$_ROOTDEVNAM and DVI$_NEXTDEVNAM return DVI$_DEVNAMDEVNAM, which should be DVI$_FULLEVNAM. 6.1.11.6 Work-arounds: None. 6.1.12 Starting up Pathworks Advanced server may fail with a SS$_BADPARAM error. 6.1.12.1 Problem Description: PWRK$STREAMSOS_V7.EXE is a big file, about 1760 blocks in size. If this file has become very fragmented after installation then Advanced server startup may fail with a SS$_BADPARAM error. This can occur if PWRK$STREAMSOS_V7.EXE is heavily fragmented and has multiple file headers (the following example has been modified for space reasons): $ @sys$startup:pwrk$startup The file server will use DECnet, TCP/IP. Advanced Server mail notification will use DECnet. %SYSTEM-F-BADPARAM, bad parameter value %TRACE-F-TRACEBACK, symbolic stack dump follows Page 18 image module routine line PWRK$LOADSTREAMS_V7 LOAD main 7822 rel PC abs PC 000000000000213C 000000000003213C image module routine line PWRK$LOADSTREAMS_V7 LOAD __main 0 rel PC abs PC 0000000000000070 0000000000030070 image module routine line 0 rel PC abs PC FFFFFFFF802653B4 FFFFFFFF802653B4 Images Affected: - [SYS$LDR]SYSLDR_DYN.EXE - [SYS$LDR]SYSLDR_DYN.STB 6.1.12.2 CLDs, and QARs reporting this problem: 6.1.12.3 CLD(s) None. 6.1.12.4 QAR(s) 75-13-893 6.1.12.5 Problem Analysis: The system module is loaded via a call to LDR$LOAD_IMAGE, which opens the file in question with cathedral windows so that all mapping pointers are available. If the file is loaded after the system startup has completed, then the XQP is used so that a multi-header file should not be an issue. There is, however, one problem in that the loader calculates the size of a file in routine GET_FILE_LENGTH in [SYS]SYSLDR_COMMON.MAR, which is only looking at the first WCB created for the file. For a multi-header file, multiple WCB's may be created which are not taken into account. The result is that routine PROCESS_GST needs the file length which is used to calculate the number of PTE's to allocate for the GST, If the file length is short because not all the WCB's have been looked at, the allocation turns out to be negative which causes LDR_STD$ALLOC_S0S1_VA to return Page 19 SS$_BADPARAM. 6.1.12.6 Work-arounds: None. 6.1.13 When using $GETJPI to request rightslist information about a process, incorrect information is returned. 6.1.13.1 Problem Description: If a process uses $GETJPI to request rightslist information about itself then the information appears to be correct. However, if another process uses $GETJPI to get rightslist information about that same process then the information returned is incorrect. Images Affected: - [SYS$LDR]PROCESS_MANAGEMENT.EXE - [SYS$LDR]PROCESS_MANAGEMENT.STB - [SYS$LDR]PROCESS_MANAGEMENT_MON.EXE - [SYS$LDR]PROCESS_MANAGEMENT_MON.STB 6.1.13.2 CLDs, and QARs reporting this problem: 6.1.13.3 CLD(s) CFS.92887 6.1.13.4 QAR(s) None. 6.1.13.5 Problem Analysis: $GETJPI does not properly handle PSB segmented data for remote processes. The result is incomplete data being returned to the caller. This change employs a loop which continues to copy segmented data until all of it has been transferred to the users buffer. Page 20 6.1.13.6 Work-arounds: None. 6.1.14 Lock manager and nonpaged pool usage may increase over time. 6.1.14.1 Problem Description: Lock manager and nonpaged pool usage may increase over time due to a memory leak of lock blocks (LKB) and AST control blocks (ACB64). This lock manager leak only occurs under very specific circumstances which are not that common. The specific circumstances are the $DEQ of a process based lock with a pending completion AST that has not yet been delivered. Images Affected: - [SYS$LDR]LOCKING.EXE - [SYS$LDR]LOCKING.STB 6.1.14.2 CLDs, and QARs reporting this problem: 6.1.14.3 CLD(s) 70-3-5967,CFS.92447 6.1.14.4 QAR(s) None. 6.1.14.5 Problem Analysis: Under certain conditions, LKBs were not placed on the cache by the LOCK_KAST routine when the Pending Cache bit was set. Within LOCK_KAST, the NODELETE bit of the ACB is tested instead of the LKB. When dequeueing a lock without an outstanding completion AST, the PCACHE bit is set in the LKB and the NODELETE bit is cleared. The NODELETE bit in the ACB is not cleared at this point. To correct this problem, change code in LOCK_KAST to test the LKB bit as opposed to the ACB bit. Also, if NODELETE is cleared, copy the RMOD byte from the LKB to the ACB so ASTDEL will delete the ACB. Page 21 6.1.14.6 Work-arounds: None. 6.1.15 A call to SYS$GETSYI to return system rights will only return the first rightslist. 6.1.15.1 Problem Description: A call to SYS$GETSYI to return system rights will only return the first right in the rightslist, regardless of how many rights are in the list. Images Affected: - [SYS$LDR]SYSGETSYI.EXE 6.1.15.2 CLDs, and QARs reporting this problem: 6.1.15.3 CLD(s) CFS.92887 6.1.15.4 QAR(s) None. 6.1.15.5 Problem Analysis: The routine, SPC_SYSTEM_RIGHTS, only returned a single right because the register to hold the data length, R3, was preserved across the call. The value of R3 coming into the routine was always 8. R3 would be updated with the right length in the routine, but would then be restored with its original value upon returning to the caller. The fix is to adjust the register masks for input/output/reserve such that R3 is no longer preserved across the call. 6.1.15.6 Work-arounds: None. Page 22 6.1.16 System can crash with an INVEXCEPTN bugcheck at IOC$INITIATE_PORT_CPU_C+00234: 6.1.16.1 Problem Description: The system crashes with an INVEXCEPTN bugcheck at IOC$INITIATE_PORT_CPU_C+00234: Crashdump Summary Information: ------------------------------ Bugcheck Type: INVEXCEPTN, Exception while above ASTDEL Failing PC: FFFFFFFF.800AF584 IOC$INITIATE_PORT_CPU_C+00234 Failing PS: 1C000000.00000804 Module: IO_ROUTINES (Link Date/Time: 23-JAN-2001 08:38:43.61) Offset: 0000F584 Stack Pointers: KSP = 00000000.7FFA1CA8 ESP = 00000000.7FFA6000 SSP = 00000000.7FFAE000 USP = 00000000.7B03B900 Failing Instruction: IOC$INITIATE_PORT_CPU_C+00234: STL R25,#X0004(R0) Images Affected: - [SYS$LDR]IO_ROUTINES.EXE - [SYS$LDR]IO_ROUTINES.STB - [SYS$LDR]IO_ROUTINES_MON.EXE - [SYS$LDR]IO_ROUTINES_MON.STB 6.1.16.2 CLDs, and QARs reporting this problem: 6.1.16.3 CLD(s) CFS.93321 6.1.16.4 QAR(s) None. Page 23 6.1.16.5 Problem Analysis: See Problem Description. 6.1.16.6 Work-arounds: None. 6.1.17 System can crash with a WSLXVANMAT bugcheck during image rundown. 6.1.17.1 Problem Description: If a global section follows a process section in process P0 space with the same section index value, the system can crash with a WSLXVANMAT bugcheck during image rundown. Crash Information: ------------------ Bugcheck Type: WSLXVANMAT, Working set list entry does not match VA Failing PC: FFFFFFFF.8016FEF0 SYS$VM+25EF0 R1 = 00000000.00000000 Images Affected: - [SYS$LDR]SYS$VM.EXE - [SYS$LDR]SYS$VM.STB 6.1.17.2 CLDs, and QARs reporting this problem: 6.1.17.3 CLD(s) CFS.92670,70-3-5998,CFS.91933,70-3-5893 6.1.17.4 QAR(s) None. 6.1.17.5 Problem Analysis: The "same section" check in DELETE_CLUSTER did not take into account that a global section could be following a process section with both having the same section index value. This could cause DELPAG_NOWRTREQ to be called with a global page. As PFN$L_WSLX_QW for a global page is not a Working Set List Index, but a counter for the number of times the page has been locked into memory (see Page 24 IDSM Chapter 16.2.7), this will cause a WSLXVANMAT bugcheck. The reason is that the VA part of the WSLE indexed by R1 = PFN$L_WSLX_QW = 0 is not equal to the VA of the page to be deleted. The fix is to explicitly check for the page type in DELETE_CLUSTER and exit the rdonly_loop if the current page is not a process page. 6.1.17.6 Work-arounds: None. 6.1.18 Multiprocessors making heavy use of RMS global buffers could encounter a variety of system crashes 6.1.18.1 Problem Description: Multiprocessors making heavy use of RMS global buffers could encounter a variety of system crashes related to a corrupted system buffer object list, PCB$Q_BUFOBJ_LIST off of the system PCB (process control block). These could include things such as: o SSRVEXCEPTN at MMG_STD$INSERT_BOD_C+2C o INVEXCEPTN at EXE$DELETE_BUFOBJ_C+1C0 o Nonpaged pool corruption involving BOD sized packets and lists o CPUSPINWAIT due to pool corruption with TQE list o Global Buffered File access getting SYS-F-IVLOCKID errors Images Affected: - [SYS$LDR]SYS$VM.EXE - [SYS$LDR]SYS$VM.STB 6.1.18.2 CLDs, and QARs reporting this problem: Page 25 6.1.18.3 CLD(s) CFS.93787,CFS.93119,CFS.93436 6.1.18.4 QAR(s) None. 6.1.18.5 Problem Analysis: Code that adds or removes Buffer Objects to or from the system PCB needs to be synchronized, via the MMG spinlock, to avoid conflicts between different CPUs that are manipulating the buffer object queue simultaneously. 6.1.18.6 Work-arounds: None. 6.1.19 System crash with an INVEXCEPTN, Exception while above ASTDEL bugcheck 6.1.19.1 Problem Description: When the lock manager is unable to obtain physical memory to allocate an RSB, the error path to return an insufficient memory error may result in a system crash. Crashdump Summary Information: ------------------------------ Bugcheck Type: INVEXCEPTN, Exception while above ASTDEL Failing PC: FFFFFFFF.80184888 RSDM_OSR_PREPROCESS_C+00658 Failing PS: 18000000.00000804 Module: LOCKING (Link Date/Time: 28-MAR-2002 09:11:25.76) Offset: 00004888 Failing Instruction: RSDM_OSR_PREPROCESS_C+00658: LDL R23,#X00C0(R5) Images Affected: - [SYS$LDR]LOCKING.EXE - [SYS$LDR]LOCKING.STB Page 26 6.1.19.2 CLDs, and QARs reporting this problem: 6.1.19.3 CLD(s) 70-3-6180,CFS.93936 6.1.19.4 QAR(s) None. 6.1.19.5 Problem Analysis: After allocating an LKB, the address of the LCKCTX block is placed into the LKB. If the LKB was not initially allocated but succeeded via the fill_cache routine, the routine continues without putting the address of the LCKCTX block in the LKB until later. If however, an RSB was not allocated, the error path assumed the LKB already had the address of the LCKCTX block and thus crashed. 6.1.19.6 Work-arounds: None. 6.1.20 Process can exit with the status SYS$_UNWIND (920) 6.1.20.1 Problem Description: A process can exit with the status SYS$_UNWIND (920). If this is a detached process, this error status can be seen in the accounting report by executing the command: $ ACCOUNT/SINCE=TIME/BEFORE=TIME where time is the time the detached process exited. This problem can affect Oracle 9i RAC LMS processes such that the Oracle instance crashes with the error ORA-00484. Images Affected: - [SYS$LDR]EXCEPTION.EXE - [SYS$LDR]EXCEPTION.STB - [SYS$LDR]EXCEPTION_MON.EXE - [SYS$LDR]EXCEPTION_MON.STB Page 27 6.1.20.2 CLDs, and QARs reporting this problem: 6.1.20.3 CLD(s) None. 6.1.20.4 QAR(s) None. 6.1.20.5 Problem Analysis: The exception handler goto_unwind_handler in module [SYS]SYSUNWIND.MAR did not expect exceptions that are actually signals coming from another process. It treated such a signal as an error condition and called SYS$EXIT. To fix the problem, in the exception handler, goto_unwind_handler, test for signals from other processes. These are SS$_DEBUG, SS$_IMGDMP, and C signals. If the condition is one of these, resignal the condition instead of calling SYS$EXIT. 6.1.20.6 Work-arounds: None. 6.1.21 Application performance issues. 6.1.21.1 Problem Description: Under certain conditions, a process waiting on a mutex may not be woken up when the mutex is unlocked. This can result in application performance issues. Images Affected: - [SYS$LDR]SYSTEM_PRIMITIVES.EXE - [SYS$LDR]SYSTEM_PRIMITIVES.STB - [SYS$LDR]SYSTEM_PRIMITIVES_MIN.EXE - [SYS$LDR]SYSTEM_PRIMITIVES_MIN.STB 6.1.21.2 CLDs, and QARs reporting this problem: Page 28 6.1.21.3 CLD(s) 70-3-6225,CFS.94309 6.1.21.4 QAR(s) None. 6.1.21.5 Problem Analysis: The mutex code was modified to avoid usage of the SCHED spinlock. Without SCHED, these changes opened up the a hole such that another process could unlock the mutex and not yet see a waiter. This change restores prior behavior that guarantees a mutex waiter will be immediately woken when the mutex is unlocked. 6.1.21.6 Work-arounds: None. 6.1.22 System can crash with a SSRVEXCEPT bugcheck 6.1.22.1 Problem Description: The system can crash with a SSRVEXCEPT bugcheck when an image exits before the $BRKTHRU has been delivered to some of its targets. Crash Dump Summary: ------------------- Bugcheck Type: SSRVEXCEPT, Unexpected system service exception Current Process: SDNCC_MBX_MAIN Current Image: Failing PC: FFFFFFFF.98B56AAC IO_ROUTINES+46AAC Failing PS: 00000000.00000000 Module: IO_ROUTINES (Link Date/Time: 17-MAR-2001 03:30:01.24) Offset: 00046AAC Signal Array: 64-bit Signal Array: Arg Count = 00000005 Arg Count = 00000005 Condition = 0000000C Condition = 00000000.0000000C Argument #2 = 00000000 Argument #2 = 00000000.00000000 Argument #3 = 006C41D0 Argument #3 = 00000000.006C41D0 Argument #4 = 98B56AAC Argument #4 = FFFFFFFF.98B56AAC Argument #5 = 00000000 Argument #5 = 00000000.00000000 Failing Instruction: IO_ROUTINES+46AAC: LDL R6,(R7) Page 29 Images Affected: - [SYS$LDR]IO_ROUTINES.EXE - [SYS$LDR]IO_ROUTINES.STB - [SYS$LDR]IO_ROUTINES_MON.EXE - [SYS$LDR]IO_ROUTINES_MON.STB 6.1.22.2 CLDs, and QARs reporting this problem: 6.1.22.3 CLD(s) CFS.70569, CFS.86451 6.1.22.4 QAR(s) None. 6.1.22.5 Problem Analysis: $BRKTHRU is asynchronous and, since terminal/display devices are much slower than contemporary CPUs, may take a long time to finish its work, To avoid errors caused by insufficient P1 pool space (insufficient CTLPAGES for the demands on a given system), $BRKTHRU uses EXE$ALOP1IMAG to allocate from P1 pool if available. If it is not, it then allocates from P0 space. Image rundown does nothing to the P1 pool but will deallocate all of P0 space, including any allocations made by EXE$ALOP1IMAG. With some attention to loading down the terminal or display devices, one can manage to get through image rundown (at least past P0 teardown) before one of the $BRKTHRU requests has finished. With the BRK$ packet stored in a suddenly-deallocated page, the $BRKTHRU code gets an access violation, which translates to a SSRVEXCEPT crash. 6.1.22.6 Work-arounds: None. 6.1.23 SS$_RSDMNOTFOU - Resource domain not found errors Page 30 6.1.23.1 Problem Description: When creating many resource domain IDs via $SET_RESOURCE_DOMAIN with RSDM$_JOIN_DOMAIN, subsequent calls to end that association with RSDM$_LEAVE could result in the error SS$_RSDMNOTFOU (resource domain not found) on some IDs. Images Affected: - [SYS$LDR]LOCKING.EXE - [SYS$LDR]LOCKING.STB 6.1.23.2 CLDs, and QARs reporting this problem: 6.1.23.3 CLD(s) CFS.96833 6.1.23.4 QAR(s) None. 6.1.23.5 Problem Analysis: Routine RSDM_FIND_RSDM_ID was not checking the boundary conditions correctly when moving from one RDPB structure to the next. If the index being looked up is the first one in the NEXT RDPB, it is incorrectly assumed that it belonged to this current RDPB structure and picked up the wrong RDAB address. When the contents were validated, they did not match what was expected and a SS$_RSDMNOTFOU error code was returned. 6.1.23.6 Work-arounds: None. 6.1.24 System crash caused by NPP corruption 6.1.24.1 Problem Description: The System can crash with a "BADALORQSZ, Bad memory allocation request size" bugcheck Crashdump Summary Information: ------------------------------ Bugcheck Type: BADALORQSZ, Bad memory allocation request size Failing PC: FFFFFFFF.80044C38 Page 31 EXE$DEALLOCATE_POOL_C+000B8 Failing PS: 10000000.00000804 Module: SYSTEM_PRIMITIVES (Link Date/Time: 29-OCT-2002 23:52:53.42) Offset: 00018C38 Stack Pointers: KSP = FFFFFFFF.A9901C50 ESP = FFFFFFFF.A98A3000 SSP = FFFFFFFF.A988D000 USP = FFFFFFFF.A988D000 Failing Instruction: EXE$DEALLOCATE_POOL_C+000B8: BUGCHK Images Affected: - [SYS$LDR]PROCESS_MANAGEMENT.EXE - [SYS$LDR]PROCESS_MANAGEMENT.STB - [SYS$LDR]PROCESS_MANAGEMENT_MON.EXE - [SYS$LDR]PROCESS_MANAGEMENT_MON.STB 6.1.24.2 CLDs, and QARs reporting this problem: 6.1.24.3 CLD(s) CFS.97350 6.1.24.4 QAR(s) None. 6.1.24.5 Problem Analysis: Recent changes to handle segmented data capturing from remote processes failed to detect possible data overrun conditions. If a buffer of insufficient size is passed to the service, the resulting overrun could result in NPP corruption. Customer's experiencing a crash due to the data overrun would see corruption in NNP as the cause. Programs that pass data buffers too small to contain the data items request, JPI$_RIGHTLIST items in particular, can trigger this crash. 6.1.24.6 Work-arounds: None. Page 32 6.1.25 INVEXCEPTN, Exception while above ASTDEL system crash 6.1.25.1 Problem Description: The system can crash with an INVEXCEPTN, Exception while above ASTDEL bugcheck at FIND_CVCB_C+0001C. Crashdump Summary Information: ------------------------------ Bugcheck Type: INVEXCEPTN, Exception while above ASTDEL Failing PC: FFFFFFFF.8022939C FIND_CVCB_C+0001C Failing PS: 20000000.00000804 Module: SYS$VCC (Link Date/Time: 5-AUG-2001 01:16:26.26) Offset: 0000539C Stack Pointers: KSP = FFFFFFFF.C5FE9BE8 ESP = FFFFFFFF.C5FEB000 SSP = FFFFFFFF.C5FD5000 USP = FFFFFFFF.C5FD5000 Images Affected: - [SYS$LDR]SYS$VCC.EXE - [SYS$LDR]SYS$VCC.STB - [SSY$LDR]SYS$VCC_MON.EXE - [SSY$LDR]SYS$VCC_MON.STB 6.1.25.2 CLDs, and QARs reporting this problem: 6.1.25.3 CLD(s) 70-3-6762,CFS.97750,CFS.96647,CFS.84900 6.1.25.4 QAR(s) None. 6.1.25.5 Problem Analysis: The insertion and walking of the CVCB queue is covered by the CACHE (MMG) spinlock. However, there were cases where the removal of a CVCB entry in the queue was covered only by the SCS spinlock. In faster and larger SMP configurations a DISMOUNT operation might cause another CPU to misread the CVCB queue and access an incorrect element. Page 33 6.1.25.6 Work-arounds: None. 6.1.26 INVSECURESTATE system crash 6.1.26.1 Problem Description: Several multi-threaded servers have experienced a condition where the reference counts of various persona structures have fallen out of sync. The security subsystem triggers an INVSECURESTATE system crash when this condition is detected by the sanity checks. Crashdump Summary Information: ------------------------------ Bugcheck Type: INVSECURESTATE, Invalid state detected by SECURITY subsystem Current Process: TNT_SERVER Current Image: $1$DUA0:[SYS3.SYSCOMMON.][SYSEXE] TNT$SERVER.EXE Failing PC: FFFFFFFF.801A8E94 NSA$ASSUME_PERSONA_C+00064 Failing PS: 10000000.00000000 Module: SECURITY (Link Date/Time: 13-SEP-2000 06:39:51.16) Offset: 00006E94 Images Affected: - [SYS$LDR]SECURITY.EXE - [SYS$LDR]SECURITY.STB - [SYS$LDR]SECURITY_MON.EXE - [SYS$LDR]SECURITY_MON.STB 6.1.26.2 CLDs, and QARs reporting this problem: 6.1.26.3 CLD(s) CFS.84630, CFS.92481 6.1.26.4 QAR(s) None. Page 34 6.1.26.5 Problem Analysis: The change includes raising IPL at an earlier point in the persona switching logic. Previous to this change, it was possible to have been interrupted between the point where data had been captured from the KTB, and before the KTB was updated. If such an interrupt occurred, and a change was made to the content of the KTB before returning. The interrupted thread could end up acting on stale data it collected before the interrupt. 6.1.26.6 Work-arounds: None. 6.1.27 "%JBC-E-NOPRIV, insufficient privilege or queue protection violation" error 6.1.27.1 Problem Description: An image without UIC based protection access to a QUEUE object, but which is installed with the OPER privilege, will not be able to manipulate the QUEUE object as allowed when holding the OPER privilege. This can result in a "%JBC-E-NOPRIV, insufficient privilege or queue protection violation" error message. Images Affected: - [SYS$LDR]MESSAGE_ROUTINES.EXE 6.1.27.2 CLDs, and QARs reporting this problem: 6.1.27.3 CLD(s) 70-3-6406,CFS.95632 6.1.27.4 QAR(s) 75-13-818 6.1.27.5 Problem Analysis: Previous to VMS V7.2, WORKING and IMAGE privileges were combined in a single privilege mask. Per-Thread security maintains separate masks. Modifications to the QUEUE_OBJECT module to support Per-Thread neglected to bring these two masks together before calling the SYS$CHECK_PRIVILEGE service. Page 35 6.1.27.6 Work-arounds: None. 6.1.28 Values cannot be represented exactly in the available mantissa bits 6.1.28.1 Problem Description: Converting an integer to IEEE S floating with software completion produces a denormal (i.e. very small) result value for very large integer values. These denormal values cannot be represented exactly in the available mantissa bits. The following C program demonstrates the problem : $ type test.c #include main() { const float f1 = (float) 0x7fffffff; const float f2 = (float) 2147483647; const float f3 = (float) 2147483647.0; printf("f1 = %f\n", f1); printf("f2 = %f\n", f2); printf("f3 = %f\n", f3); return 0; } $ cc /float=ieee /ieee=denorm test $ link test $ run test f1 = 0.000000 f2 = 0.000000 f3 = 2147483648.000000 $ Note that any code using floating point constants to hold the float numbers, that is suffering from this bug, needs to be recompiled and relinked after the applying the new image. Images Affected: - [SYS$LDR]EXCEPTION.EXE - [SYS$LDR]EXCEPTION.STB - [SYS$LDR]EXCEPTION_MON.EXE - [SYS$LDR]EXCEPTION_MON.STB Page 36 6.1.28.2 CLDs, and QARs reporting this problem: 6.1.28.3 CLD(s) None. 6.1.28.4 QAR(s) 75-13-1032 6.1.28.5 Problem Analysis: See Problem Description 6.1.28.6 Work-arounds: None. 6.1.29 "SSRVEXCEPT, Unexpected system service exception" bugcheck 6.1.29.1 Problem Description: The system system can crash with a "SSRVEXCEPT, Unexpected system service exception" bugcheck. Crashdump Summary Information: ------------------------------ Crash Time: 14-MAR-2003 15:29:12.56 Bugcheck Type: SSRVEXCEPT, Unexpected system service exception Current Process: RCM_COLLECT Current Image: DSA0:[SYS0.SYSCOMMON.] [SYSTEST]RADCHECK.EXE;1 Failing PC: FFFFFFFF.801553E4 Failing PS: 10000000.00000003 Module: SYS$VM (Link Date/Time: 4-NOV-2002 16:41:24.02) Offset: 000053E4 Stack Pointers: KSP = 00000000.7FFA1B98 ESP = 00000000.7FFA6000 SSP = 00000000.7FFAC100 USP = 00000000.7AE61A30 Images Affected: - [SYS$LDR]SYS$VM.EXE - [SYS$LDR]SYS$VM.STB Page 37 6.1.29.2 CLDs, and QARs reporting this problem: 6.1.29.3 CLD(s) CFS.99244 6.1.29.4 QAR(s) None. 6.1.29.5 Problem Analysis: The last PTE for the global page table ends at MMG$GQ_MAX_GPTE-8, but the routine mmg$rad_check_system will continue checking PTEs up until MMG$GQ_MAX_GPTE. If MMG$GQ_MAX_GPTE is within the same page as MMG$GQ_MAX_GPTE-8, the problem does not occur. However, if MMG$GQ_MAX_GPTE transitions to the next, nonexistent, page, the access violation will be generated trying reference the nonexistent PTE. 6.1.29.6 Work-arounds: None. 6.1.30 Fibre Channel disks hang and do not failover 6.1.30.1 Problem Description: If a fibre channel switch is disabled and the votes from a fibre channel quorum disk are required to maintain quorum, Fibre Channel disks might not all failover and can hang. Images Affected: - [SYS$LDR]IO_ROUTINES.EXE - [SYS$LDR]IO_ROUTINES_MON.EXE - [SYS$LDR]IO_ROUTINES.STB - [SYS$LDR]IO_ROUTINES_MON.STB 6.1.30.2 CLDs, and QARs reporting this problem: Page 38 6.1.30.3 CLD(s) 70-3-6902 6.1.30.4 QAR(s) None. 6.1.30.5 Problem Analysis: If the active I/O on a fibre channel device fails and the I/O is not otherwise eligible for mount verification, mount verification will not be triggered. Since mount verification is what triggers multipath failover, the affected devices can get hung on a failed path. 6.1.30.6 Work-arounds: None. 6.1.31 Various system crashes 6.1.31.1 Problem Description: If the SYSGEN parameter MPW_WRTCLUSTER is set above 430 on multiprocessor or 94 on uniprocessor systems the system may experience system crashes such as: o INCONSTATE bugchecks in SMP$ACQNOIPL_C trying to take out the SCHED spinlock in PAGEFAULT's PROCPAG routine at IPL 2. o INCONSTATE bugchecks in SMP$ACQNOIPL_C trying to take out the MMG spinlock in SYSCREDEL's DELPAG_WRTBAK routine at IPL 2. o INCON_SCHED bugchecks in SCH$FIND_NEXT_PROC_INT_C, or SCH$STATE_TO_COM_C trying to schedule a process in PFW (Page Fault Wait) state. o CPUSPINWAIT bugchecks caused by acquiring the SCHED or MMG spinlocks at IPL 2 and being interrupted by code requesting another spinlock owned by another CPU, which is in turn waiting for the SCHED or MMG spinlock. o INVEXCEPTNs trying to execute the same single threaded process on two different CPUs at the same time, resulting in the corruption of the process' kernel stack. Page 39 o The PFW queue merged in with a COM queue, usually resulting in an INCON_SCHED or a CPUSPINWAIT bugcheck. o An INVEXCEPTN at SCH$QEND_C+38 trying to access the cell CTL$GL_REPORT_USER_FAULTS. Users may also see the following: o Processes that are stuck in CUR state but on the PFW queue and not executing. o Processes stuck in PFW on the PFW queue and not executing Images Affected: - [SYS$LDR]IO_ROUTINES.EXE - [SYS$LDR]IO_ROUTINES_MON.EXE - [SYS$LDR]IO_ROUTINES.STB - [SYS$LDR]IO_ROUTINES_MON.STB 6.1.31.2 CLDs, and QARs reporting this problem: 6.1.31.3 CLD(s) CFS.99169,CFS.98796,CFS.98578,CFS.97413 6.1.31.4 QAR(s) None. 6.1.31.5 Problem Analysis: On multiprocessor systems (or uniprocessor systems with full spinlock checking enabled) ioc$gl_diobm_ptecnt_max will be set to 430. On uniprocessor systems it will be set to 94. Thus setting MPW_WRTCLUSTER above these values causes DIOBM to take "method 3" in DIOBM, the code path that ends up calling mmg_std$lockpgtb_64 and dropping IPL to 2, IPL$_ASTDEL. Page 40 6.1.31.6 Work-arounds: None. 6.1.32 Nonpaged pool usage by IRPs increases over time and can lead to hangs or crashes 6.1.32.1 Problem Description: Nonpaged pool usage by IRPs increases over time and can lead to hangs or crashes (CPUSPINWAIT and other CPU executing in EXE$DEALLCOATE). Crashdump Summary Information: ----------------------- ------- Bugcheck Type: CPUSPINWAIT, CPU spinwait timer expired Current Process: NULL Current Image: Failing PC: FFFFFFFF.800883A4 SMP$TIMEOUT_C+00064 Failing PS: 08000000.00000804 Module: SYSTEM_SYNCHRONIZATION_MIN (Link Date/Time: 19-FEB-2003 11:27:32.44) Offset: 000003A4 Failing Instruction: SMP$TIMEOUT_C+00064: BUGCHK Images Affected: - [SYS$LDR]IO_ROUTINES.EXE - [SYS$LDR]IO_ROUTINES_MON.EXE - [SYS$LDR]IO_ROUTINES.STB - [SYS$LDR]IO_ROUTINES_MON.STB 6.1.32.2 CLDs, and QARs reporting this problem: 6.1.32.3 CLD(s) CFS.100461,70-3-7096 6.1.32.4 QAR(s) None. Page 41 6.1.32.5 Problem Analysis: SDA> SHOW POOL/NONPAGED/HEADER/TYPE=IRP shows groups of IRPs in pool with the same IRP$L_PID. The number of IRPs in each group matches the number of local SCSI devices in the system. Each process executing a SYSMAN IO SCSI_PATH_VERIFY command will cause as many IRPs to be leaked as there a local SCSI devices on the system. If this command is used clusterwide (after a SYSMAN> SET ENV/CLUSTER command), the leaked IRPs on the remote node have IRP$L_PID = SMISERVER internal PID on the remote node. 6.1.32.6 Work-arounds: None. 6.1.33 System crash with a MFYNULPGFL bugcheck at MMG$FREWSLX_64_C+004BC 6.1.33.1 Problem Description: The system can crash with a MFYNULPGFL bugcheck at MMG$FREWSLX_64_C+004BC : Crashdump Summary Information: ------------------------------ Bugcheck Type: MFYNULPGFL, FREWSLE - no backing store, page not modified Failing PC: FFFFFFFF.8016ED9C MMG$FREWSLX_64_C+004BC Failing PS: 14000000.00000800 Module: SYS$VM (Link Date/Time: 28-MAR-2002 14:28:51.00) Offset: 00018D9C Stack Pointers: KSP = 00000000.40039E54 ESP = 00000000.4003E000 SSP = 00000000.40042000 USP = 00000000.7AD92400 Failing Instruction: MMG$FREWSLX_64_C+004BC: BUGCHK Images Affected: - [SYS$LDR]SYS$VM.EXE - [SYS$LDR]SYS$VM.STB Page 42 6.1.33.2 CLDs, and QARs reporting this problem: 6.1.33.3 CLD(s) CFS.94266,CFS.95672,CFS.100378,CFS.100405 6.1.33.4 QAR(s) None. 6.1.33.5 Problem Analysis: It appears to be a race condition between the pagefault code and $DELPAG. 6.1.33.6 Work-arounds: None. 6.1.34 RWAST hangs 6.1.34.1 Problem Description: A system can hang in RWAST. Images Affected: - [SYS$LDR]SYS$VM.EXE - [SYS$LDR]SYS$VM.STB 6.1.34.2 CLDs, and QARs reporting this problem: 6.1.34.3 CLD(s) CFS.98540 6.1.34.4 QAR(s) None. 6.1.34.5 Problem Analysis: $DELTVA can enter an RWAST state indefinitely. $DELTVA typically enters RWAST at IPL 2 when it encounters a page to be deleted which has I/O active. When the I/O completes, the RWAST will be terminated and page deletion will continue. However, if the I/O is dependent on the delivery of an AST in order to complete, $DELTVA will Page 43 never come out of the RWAST state, as AST's cannot be delivered while at IPL 2. This was corrected previously, in certain cases, by waiting at IPL 0 instead of 2. IPL 0 allows AST delivery and, therefore I/O completion. This current fix extends the "waiting at IPL 0" concept to some other cases, namely: o Shared pages. o Pages in user-defined regions. 6.1.34.6 Work-arounds: None. 6.1.35 Process Hang 6.1.35.1 Problem Description: Calling $GETJPI to get rights data from a remote process that is logging into the system can result in a hang of both the requesting process, and the login process. If the login process is holding an RMS record lock in the SYSUAF file at the time of the hang, all other processes trying to login against that record will also hang. Images Affected: - [SYSLDR]PROCESS_MANAGEMENT.EXE - [SYSLDR]PROCESS_MANAGEMENT.STB - [SYSLDR]PROCESS_MANAGEMENT_MON.EXE - [SYSLDR]PROCESS_MANAGEMENT_MON.STB 6.1.35.2 CLDs, and QARs reporting this problem: 6.1.35.3 CLD(s) CFS.102404,70-3-7338 6.1.35.4 QAR(s) None. Page 44 6.1.35.5 Problem Analysis: A call to $GETJPI with an itemlist requesting RIGHTS data, providing a return buffer smaller than necessary to hold all the rights requested, and followed by at least one more item in the list, could set the $GETJPI Special KAST MOVEFU: to loop indefinitely in the process space of the target process. Since the target process "hangs", a return Special KAST to return the data to the requestor is never scheduled, thus hanging the requestor process as well. 6.1.35.6 Work-arounds: None. 7 INSTALLATION INSTRUCTIONS: 7.1 Installation Command Install this kit with the POLYCENTER Software installation utility by logging into the SYSTEM account, and typing the following at the DCL prompt: PRODUCT INSTALL VMS722_SYS-V0200 /SOURCE=[location of Kit] The kit location may be a tape drive, CD, or a disk directory that contains the kit. Additional help on installing PCSI kits can be found by typing HELP PRODUCT INSTALL at the system prompt 7.2 Scripting of Answers to Installation Questions During installation, this kit will ask and require user response to several questions. If you wish to automate the installation of this kit and avoid having to provide responses to these questions, you must create a DCL command procedure that includes the following definitions and commands: - $ DEFINE/SYS NO_ASK$BACKUP TRUE - $ DEFINE/SYS NO_ASK$REBOOT TRUE - Add the following qualifiers to the PRODUCT INSTALL command and add that command to the DCL procedure. /PROD=DEC/BASE=AXPVMS/VER=V2.0 - De-assign the logicals assigned For example, a sample command file to install the VMS722_SYS-V0200 Page 45 kit would be: $ $ DEFINE/SYS NO_ASK$BACKUP TRUE $ DEFINE/SYS NO_ASK$REBOOT TRUE $! $ PROD INSTALL VMS722_SYS-V0200/PROD=DEC/BASE=AXPVMS/VER=V2.0 $! $ DEASSIGN/SYS NO_ASK$BACKUP $ DEASSIGN/SYS NO_ASK$REBOOT $! $ exit 8 COPYRIGHT AND DISCLAIMER: (C) Copyright 2003 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP and/or its subsidiaries required for possession, use, or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. Neither HP nor any of its subsidiaries shall be liable for technical or editorial errors or omissions contained herein. The information in this document is provided "as is" without warranty of any kind and is subject to change without notice. The warranties for HP products are set forth in the express limited warranty statements accompanying such products. Nothing herein should be construed as constituting an additional warranty. DISCLAIMER OF WARRANTY AND LIMITATION OF LIABILITY THIS PATCH IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND. ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR PARTICULAR PURPOSE, OR NON-INFRINGEMENT, ARE HEREBY EXCLUDED TO THE EXTENT PERMITTED BY APPLICABLE LAW. IN NO EVENT WILL COMPAQ BE LIABLE FOR ANY LOST REVENUE OR PROFIT, OR FOR SPECIAL, INDIRECT, CONSEQUENTIAL, INCIDENTAL OR PUNITIVE DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, WITH RESPECT TO ANY PATCH MADE AVAILABLE HERE OR TO THE USE OF SUCH PATCH.