OpenVMS ALPSYS20_071 Alpha V7.1 - V7.1-1H2 System ECO Summary

TITLE: OpenVMS ALPSYS20_071 Alpha V7.1 - V7.1-1H2 System ECO Summary Modification Date: 27-JUL-2000 Modification Type: Documentation Updated: ALPF11X04_071 is superseded by ALPF11X06_071. NOTE: An OpenVMS saveset or PCSI installation file is stored on the Internet in a self-expanding compressed file. For OpenVMS savesets, the name of the compressed saveset file will be kit_name.a-dcx_vaxexe for OpenVMS VAX or kit_name.a-dcx_axpexe for OpenVMS Alpha. Once the OpenVMS saveset is copied to your system, expand the compressed saveset by typing RUN kitname.dcx_vaxexe or kitname.dcx_alpexe. For PCSI files, once the PCSI file is copied to your system, rename the PCSI file to kitname-dcx_axpexe.pcsi, then it can be expanded by typing RUN kitname-dcx_axpexe.pcsi. The resultant file will be the PCSI installation file which can be used to install the ECO. Copyright (c) Compaq Computer Corporation 1999. All rights reserved. PRODUCT: OpenVMS Alpha COMPONENTS: System CLASS_SCHEDULER ERRORLOG EXCEPTION EXCEPTION_MON EXEC_INIT IMAGE_MANAGEMENT IO_ROUTINES IO_ROUTINES_MON LOCKING MESSAGE_ROUTINES PROCESS_MANAGEMENT PROCESS_MANAGEMENT_MON SDA$SHARE SECURITY SYS$BASE_IMAGE SYS$CLUSTER SYS$PUBLIC_VECTORS SYS$SSISHR SYS$VCC SYS$VCC_MON SYS$VM SYSDEVICE SYSGETSYI SYSLDR_DYN SYSTEM_PRIMITIVES SYSTEM_PRIMITIVES_MIN VMS$IEEE_HANDLER SOURCE: Compaq Computer Corporation ECO INFORMATION: ECO Kit Name: ALPSYS20_071 NOTE: The ALPSYSA03_071 remedial kit combined the previous SYSA and SYSB kits into one SYSA kit. Since SYS images are now being distributed in one kit, the kit naming convention has been changed from ALPSYSA to ALPSYS to more accurately reflect the kit contents. ALPSYS20_071 may also be known as ALPSYSA04_071. ECO Kits Superseded by This ECO Kit: ALPSYSA03_071 ALPSYSA02_071 ALPSYSA01_071 ALPSYS16_071 ALPSYS15_071 ALPSYS14_071 ALPSYS13_071 ALPSYS12_071 ALPSYS11_071 ALPSYS10_071 ALPSYS08_071 ALPSYS06_071 ALPSYS05_071 ALPSYS03_071 ALPSYSB02_071 ALPSYSB01_071 ALPSYS17_071 ALPSYS09_071 ALPSYS07_071 ALPSYS04_071 ECO Kit Approximate Size: 12980 Blocks Kit Applies To: OpenVMS Alpha V7.1, V7.1-1H1, V7.1-1H2 System/Cluster Reboot Necessary: Yes Rolling Re-boot Supported: Yes Installation Rating: INSTALL_1 To be installed on all systems running the listed version(s) of OpenVMS. Kit Dependencies: The following remedial kit(s) must be installed BEFORE installation of this kit: ALPBASE02_071 In order to receive all the corrections listed in this kit, the following remedial kits should also be installed: ALPCPU1E03_071 (if kit is installed on an Alpha "DIGITAL Personal Workstation") ALPF11X06_071 (Supersedes ALPF11X04_071) ALPBACK05_071 ALPDISM01_071 ALPINIT01_071 ALPMOUN07_071 ALPMTAA01_071 ALPSYSI01_071 ALPPTD01_071 ECO KIT SUMMARY: An ECO kit exists for various system components on OpenVMS Alpha V7.1 through V7.1-1H2. This kit addresses the following problems: Problems Addressed in ALPSYS20_071: o In previous releases of OpenVMS, protection of a new FT device was unconditionally set to S:RWPL,O:RWPL,G,W (no access for GROUP and WORLD). This change and a corresponding change in the ALPPTD01_071 kit modify this behavior. Protection for a new FT device is now taken from the protection of the "template device" FTA0:. The protection for FTA0: may be set during the boot process, (in SYSTARTUP_VMS.COM), or manually, for example: $ SET SECURITY /CLASS=DEVICE- /PROTECTION=(S:RWLP,O:RWLP,G:RW,W:R) FTA0: At boot time, the protection on FTA0 is unconditionally set to S:RWPL,O:RWPL,G,W. An ACL may also be set on FTA0, either explicitly, or inherited from the SECURITY class TERMINAL device template. In order to get this full fix, you must also install the ALPPTD01_071, or later, remedial kit. Images Affected: - [SYS$LDR]SYS$FTDRIVER.EXE o It is a possible for a scheduling class with greater than 100% quantum to still run out of quantum. This should not occur. Images Affected: - [SYS$LDR]CLASS_SCHEDULER.EXE o A BASIC application terminates abnormally with the BAS$_PROLOSSOR, DEVFOREIGN or ACCVIO status. Images Affected: - [SYS$LDR]PROCESS_MANAGEMENT.EXE - [SYS$LDR]PROCESS_MANAGEMENT_MON.EXE o MFPR_xxx and MTPR_xxx PALcode instructions can leave registers R1, R16 and R17 with unpredictable results. These registers were not always saved and restored in ASTDEL_STACK.M64. Although corruptions of these registers have not been known to happen, the potential is there, particularly on newer platforms. This fix eliminates the possibility of this register corruption from happening. Images Affected: - [SYS$LDR]PROCESS_MANAGEMENT.EXE - [SYSEXE]PROCESS_MANAGEMENT_MON.EXE o During a system boot with SYSTEM_CHECK set to 0 and with XFC (Extended File Cache) loaded (VCC_FLAGS = 2), systems are crashing. Images Affected: - [SYS$LDR]MESSAGE_ROUTINES.EXE o Attempting to run a program linked /DEBUG results in an ACCVIO. Images Affected: - [SYS$LDR]IMAGE_MANAGEMENT.EXE o Multiple regions exist with overlapping VA (Virtual Address) space. This can lead to just about any crash scenario. Images Affected: - [SYS$LDR]SYS$VM.EXE o A DECthread may hang waiting for an event flag upcall. Images Affected: - [SYS$LDR]PROCESS_MANAGEMENT.EXE - [SYS$LDR]PROCESS_MANAGEMENT_MON.EXE - [SYS$LDR]PROCESS_MANAGEMENT.STB - [SYS$LDR]PROCESS_MANAGEMENT_MON.STB o A kernel thread may get stuck waiting for the inner mode semaphore when it already owns it. Images Affected: - [SYS$LDR]PROCESS_MANAGEMENT.EXE - [SYS$LDR]PROCESS_MANAGEMENT_MON.EXE - [SYS$LDR]PROCESS_MANAGEMENT.STB - [SYS$LDR]PROCESS_MANAGEMENT_MON.STB o A kernel thread may get stuck in an AST (Asynchronous System Trap) delivery loop trying to deliver an AST when there are none queued. Images Affected: - [SYS$LDR]PROCESS_MANAGEMENT.EXE - [SYS$LDR]PROCESS_MANAGEMENT_MON.EXE - [SYS$LDR]PROCESS_MANAGEMENT.STB - [SYS$LDR]PROCESS_MANAGEMENT_MON.STB o The system may crash with a "Pagefault with IPL too high" bugcheck trying to deliver an AST. Images Affected: - [SYS$LDR]PROCESS_MANAGEMENT.EXE - [SYS$LDR]PROCESS_MANAGEMENT_MON.EXE - [SYS$LDR]PROCESS_MANAGEMENT.STB - [SYS$LDR]PROCESS_MANAGEMENT_MON.STB Problems Addressed in ALPSYSA03_071: o An INVEXCEPT bugcheck in the SWAPPER can occur. Images(s) Affected: [SYS$LDR]SYS$VM.EXE o The system may not write out a crash dump. Images(s) Affected: - EXCEPTION.EXE - EXCEPTION_MON.EXE - EXCEPTION.STB - EXCEPTION_MON.STB o A process quota leak can occur for remote (detached) process creation. Image(s) affected: - [SYS$LDR]PROCESS_MANAGEMENT.EXE - [SYS$LDR]PROCESS_MANAGEMENT_MON.EXE o A system may bugcheck with an NSABLOST error. This problem can occur when a MAIL application uses the MAIL$USER_GET_INFO utility (routine) call to acquire user mail information. It can also be replicated with the command: MAIL> SHOW FORWARD/USER=* Images(s) Affected: - [SYS$LDR]SECURITY.EXE - [SYS$LDR]PROCESS_MANAGEMENT.EXE o The system crashed while executing Fast I/O code. The failing instruction could be something similar to: EXE$IO_SETUP_C+004D8: LDL R20,(R20) Images(s) Affected: - IO_ROUTINES.EXE - IO_ROUTINES_MON.EXE o The $GETJPI system service did not return the value of the AST enable (ASTen) register for all processes (except the null and swapper processes), rather than just the currently executing process. Images(s) Affected: - [SYS$LDR]PROCESS_MANAGEMENT.EXE - [SYS$LDR]PROCESS_MANAGEMENT_MON.EXE o For the FASTIO code, the system hangs with IPL 8 Fork Queue not being serviced. Image(s) Affected: - [SYS$LDR]IO_ROUTINES.EXE - [SYS$LDR]IO_ROUTINES_MON.EXE o An INCONSTATE bugcheck may occur at offset SYS$VCC+08798 with the condition code 213C = %SYSTEM-F-CVTUNGRANT, Cannot convert an ungranted lock, in R0. Images(s) Affected: - [SYS$LDR]SYS$VCC.EXE - [SYS$LDR]SYS$VCC_MON.EXE o Heavy $GETQUI use could intermittently induce either a nonfatal SSRVEXCEPT bugcheck at EXE$GETQUI_CONTEXT_FIND_C+00018 or a fatal DOUBLDEALO bugcheck at EXE$DEALLOCATE_C+00114. Images(s) Affected: - [SYS$LDR]MESSAGE_ROUTINES.EXE o If a process tries posting more QIOs than is allowed by the process buffered I/O quota and that process does this posting while resource wait is disabled, the BUFIOCNT will end up larger than it should. When the process is later deleted, it will hang waiting for the BUFIOCNT to match the BUFIOLIM, resulting an in unkillable looping process. Images(s) Affected: - [SYS$LDR]IO_ROUTINES.EXE - [SYS$LDR]IO_ROUTINES_MON.EXE o Heavy use of $GETQUI could result in a fatal SSRVEXCEPT bugcheck, if the context queue is modified while it is being scanned. Images(s) Affected: - [SYS$LDR]MESSAGE_ROUTINES.EXE o An I/O error on a device can occur, resulting in a system crash with either bugcheck code WSLVANVAL or bugcheck code SECREFNEG. Images(s) affected: - [SYS$LDR]IO_ROUTINES.EXE - [SYS$LDR]IO_ROUTINES_MON.EXE o An INCONMMGST crash at SYS$VM+000660E8 or an WSLXVANVAL bugcheck can occur due to problems in the modified page writer WRTMFYPAG. The problem typically can occur on a memory starved system or if processes run images that greatly exceed their working set quotas. Image(s) Affected: [SYS$LDR]SYS$VM.EXE o Threaded applications using DECthreads, with kernel support enabled, perform poorly on systems with more than two CPUs. Monitoring the system shows very high MP synchronization and interrupt times. Also, a large amount of time is spent in the $RESCHED system service. Images(s) affected: - [SYS$LDR]PROCESS_MANAGEMENT.EXE - [SYS$LDR]PROCESS_MANAGEMENT_MON.EXE o Closing another window in $GETQUI could cause an SSRVEXCEPT error. Heavy use of $GETQUI could result in this fatal SSRVEXCEPT bugcheck. Images(s) Affected: [SYS$LDR]MESSAGE_ROUTINES.EXE o The system can crash with an PFIPLHI error within the SECURITY execlet. Image(s) Affected: [SYS$LDR]SECURITY.EXE o F$GETSYI returns a 16-byte string consisting of 8 bytes of version number followed by 8 bytes of null (hex 00). Hence, F$EDIT functions such as TRIM and COMPRESS do not remove the trailing nulls. Image(s) Affected: [SYS$LDR]SYSGETSYI.EXE o For the $GETJPI system service, IPL synchronization issues can cause a crash of either the sending or the target node in an OpenVMS Cluster via CWSERR or INVEXCEPTN bugchecks in SYS$CLUSTER code. In most cases the target node crashes, but data can be lost on the sending node. Image(s) Affected: - [SYS$LDR]PROCESS_MANAGEMENT.EXE - [SYS$LDR]SYS$CLUSTER.EXE o The SYS$FAO system service could incorrectly truncate output strings when the !XW and !XL format codes are used. SYS$FAO could also sometimes put a null character in the last character position of a formatted output string. Image(s) Affected: [SYS$LDR]MESSAGE_ROUTINES.EXE o For a call from SYS$DELPRC, an ACCVIO occurred resulting in a system crash. Image(s) Affected: - [SYS$LDR]IMAGE_MANAGEMENT.EXE - [SYS$LDR]PROCESS_MANAGEMENT.EXE - [SYS$LDR]PROCESS_MANAGEMENT_MON.EXE o Threaded applications may hang when using the FAST IO system services. Image(s) Affected: - [SYS$LDR]IO_ROUTINES.EXE - [SYS$LDR]IO_ROUTINES_MON.EXE o An INVEXCEPTN can occur at EXE_STD$WRTMAILBOX_C+584 when a mailbox UCB UCB$L_MB_WRITERWAITQFL ends up with a self-relative pointer to the I/O post processing queue (IOC$GQ_POSTIQ). Image(s) Affected: - [SYS$LDR]SYSDEVICE.EXE o An attempt to close a file whose disk has timed out in mount verify can fail to close, causing FILCNTNONZ bugchecks during process rundown. Image(s) Affected: [SYS$LDR]IO_ROUTINES.EXE o If a process working set is reduced, such that the working set only contains locked pages and the process then pagefaults, the system could bugcheck in MMG$PAGEFAULT with "FREWSLX, Free working set list index, resource wait". The reason for this problem is that no free working set list entries could be found. (OpenVMS Alpha V6.2 could go into an infinite loop). Image(s) affected: [SYS$LDR]SYS$VM.EXE Problems Addressed in ALPSYSA02_071: o The previous remedial kit, ALPSYSA01_071, shipped SYS$BASE_IMAGE.EXE without shipping SDA$SHARE.EXE. Users who installed ALPSYSA01_071 saw the following warning message : %SDA-W-SDALINKSIMS, link time of SYS$BASE_IMAGE built into SDA$SHARE does not match the link time of image in system. o Previous remedial kits that replaced SYSBASE_IMAGE.EXE did not disable MOVEFILE on the image. This problem could cause a problem if third-party defragmentation software is used that moves SYSBASE_IMAGE.EXE. This kit disables MOVEFILE on the SYSBASE_IMAGE.EXE image that is installed with this kit. Note that no changes were made to SYSBASE_IMAGE.EXE. o A process using MME could potentially "miss" the VOL1 label on a tape. Also, a process could "hang" trying to send a message to the MME process. This problem can occur in several different areas of the operating system. In order to get the full implementation of this MME fix, the following remedial kits (or their supersedants) should also be installed: + ALPBACK03_071 + ALPDISM01_071 + ALPINIT01_071 + ALPMOUN05_071 + ALPMTAA01_071 It is not necessary to install these kits at the same time, but until they are installed you may still experience this problem. o A possible system crash occurs during Host Based RAID Unbinds with MME code enabled. A mailbox read synchronization problem causes the crash. This problem only occurs when a host-based RAID UNBIND command is done while an MME-based application is running. This problem can occur in several different code areas of the operating system. In order to eliminate all known instances of this problem, the following remedial kits (or their supersedants) will also need to be installed: + ALPBACK03_071 + ALPDISM01_071 + ALPINIT01_071 + ALPMOUN05_071 + ALPMTAA01_071 o A satellite will hang while booting under the following conditions: 1. The satellite has device naming on; 2. It boots from a served disk which does not have a PAC (DKcn); and 3. It has a PAC assigned to its local PKC. In order to get the full fix, the ALPSYSI01_071 kit also needs to be installed. o A connection to TMSCP served tapes results in a system crash. o Deadlocks or up to 72-second delays occur during booting of a node that uses SCSI port allocation classes. The deadlocks occur if the votes from a quorum disk are needed to form a cluster or if the page or swap disk is a disk other than the system disk. o A system crash (ACCVIO) occurs when attempts are made to access page tables that do not exist. The problem occurs when an application attempts to expand the size of virtual address space for a process, but the process has insufficient pagefile quota. The symptom is a SSRVEXCEPT crash due to an access violation in kernel mode at the label: MMG_STD$CREPAG_64_C+0023C The other symptom of the problem is that the current process has no more pagefile quota (PGFLQUOTA). It can be seen when a SHOW CRASH is done, followed by a FORMAT Job Information Block (JIB) command in SDA. The PGFLQUOTA field (JIB$L_PGFLCNT) will then be 0. o Processor Correctable interrupts (single-bit ecc errors) are not seen on all systems. The MCES register is not properly initialized in the SYS$CPU_ROUTINES module and the DPC bit is left set so that Processor Correctable error logging is disabled. Consequently, single-bit ecc errors, though corrected, are never reported. A user would thus have no indication that memory was exhibiting problems until it was too late. The system would crash with an uncorrectable (multi-bit) ecc error. o If a system has been up for 497.1 days without rebooting, the system cell EXE$GL_ABSTIM_TICS (number of 10 millisecond tics since boot) will overflow. This problem can cause some processes to remain indefinitely in the RWMPB or COMO scheduling state. o A crash occurs, with a PGFIPLHI "Pagefault with IPL too high" bugcheck, in SYS$VM_PRO+15B0 in the S$ADJWSL system service. The reason for the crash is because the code page for SYS$ADJWSL was removed from the system working set. o The performance counter PMS$GL_NPAGDYNEXPS (cell) was never incremented above its initial value of zero. It can be displayed by SDA>CLUE MEM/STAT. o If the system has insufficient Lock IDs and tries to expand the Lock ID table at a time when no free PFNs (Page Frame Numbers) exist, then the base system PTEs (Page Table Entries), pointed to by MMG$GL_SPTBASE, can be overwritten. The result is an inability to get a system dump and a repeat of "kernel stack invalid halt" at the console. o If the system is temporarily out of Lock IDs and there are currently no free pages to expand the Lock ID table, then SYS$VCC could crash with an INCONSTATE bugcheck at SYS$VCC_NPRO+09700. o Absolute TQE (Timer Queue Element) firing times are calculated by $SCHDWK and by $SETIMR from user-supplied relative time quadwords. Those calculations are not protected against changes in the system time quadword EXE$GQ_SYSTIME. If a time server modifies the time in the midst of a calculation, the eventual firing time could be later than expected. o After the application of ALPSYSA01_071 (or ALPSYS12_071 - ALPSYS16_071), current process priorities were not being floated back toward the base priority after a priority boost. The result would be low priority processes staying at their boosted priority and taking more CPU than expected, effectively disabling the process priority scheme. o If the IOC_STD$SIMREQCOM routine is called with a 0 IOSB argument, a DOUBLDEALO bugcheck can occur. The bugcheck happens because an attempt is being made to deallocate a PCB twice. o An incorrect process header vector index was used and the resulting address was inaccessible. The index could have had the sign bit set for a swap write. o A register value was prematurely destroyed allowing COM processes, which were being outswapped, to remain on the COM queue. They could subsequently become CUR, even though their bodies were outswapped, resulting in a variety of bugchecks. o After the application of ALPSYSA01_071, some systems were intermittently experiencing SSRVEXCEPTN bugchecks at IO_ROUTINES_PRO+02CD4, handling SYS$BRKTHRU requests. o The problem has been seen mostly at large ALL-IN-1 sites. If a page being deleted with the $DELTVA or $DELTVA_64 system service is a global page with I/O still active, the process can possibly enter the RWAST scheduling state. Due to a deadlock situation, it could remain in RWAST state indefinitely. When this problem occurs, all disk I/O for the entire VMScluster can be hung. The problem can be detected through the use of the ANALYZE/SYSTEM utility, by issuing a SHOW PROCESS/REGISTER command on a process in the RWAST scheduling state. If the PC register indicates an address in the SYS$VM image or the MMG$DELPAG or MMG$DELPAG_64 routine and the PS register indicates IPL 2, then the problem is present. This update requires a FULL BUILD, which is due to the change to [LIB]MMGDEF.SDL. This change also defines some new flags, used only in this update, which must be obtained from a library not contained within the SYS facility. o $DEVICE can return the name of a dual-pathed SCSI disk twice. These disks have two UCBs, each of which have some differences and allow one to distinguish primary from alternate UCBs. The latter can be filtered out by refraining from returning the name of a UCB with the following characteristics: 1. Bits 2P (dual path), CDP (non-preferred path) and SCSI set in UCB$L_DEVCHAR2; and 2. UCB$L_2P_ALTUCB non-zero (pointer to the other UCB). o P0 is extended in an EXEC or KERNEL AST while the image activator is running port code from Alpha, which retries on VA_IN_USE errors. o A pool leak occurred when the deletion of a remotely created process was done. The Job Information Block (JIB) of a remotely created process was not deallocated when the process was deleted because of incorrect register initialization. Problems Addressed in ALPSYSA01_071: o An Access Violation may occur at EXE$AST_RETURN, in an outer mode - most typically user mode - but with the Frame pointer pointing to the kernel stack. The access violation occurs trying to access data on the kernel stack from user mode. This does not crash the system, but causes the user image to exit. o An SSRVEXCEPT crash in may occur SYS$NETWORK_SERVICES.EXE with NET$ACP as the current process (and image). o With the kernel threads upcall feature enabled, applications which perform high numbers of pagefaults may see threads stuck in a pagefault wait state. o An INVEXCEPTN, Exception while above ASTDEL, may occur in the EXE_STD$REMOVACB routine in the ASTDEL.MAR module. o If a cluster transition occurs at the same time a local system function which requires the PRIMARY capability is being processed, the system will crash with an INCONSCHED bugcheck. o When using the SYS$BRKTHRU system service (i.e., when doing a REPLY/ALL to a large number of terminals), an OpenVMS Alpha 7.1 system can hang, or, on SMP processors, crash with a CPUSPINWAIT or a CPUSANITY bugcheck. The crash dump will usually show that one CPU did not respond to the bugcheck request, and the current process on that CPU is doing a REPLY/ALL or SYS$BRKTHRU that has filled its Kernel Stack with ASTs handling the request. o A Lock Manager deadlock search should either find and break a deadlock or find out that there is no deadlock and remove the lock at the head of the lock timeout queue. Although some valid reasons exist on why a deadlock search could be aborted and retried later on, in the above described case, an aborted search was not the appropriate action. Each second, a deadlock search was started and aborted shortly thereafter, with the original lock being left at the head of the timeout queue. This problem caused continual retry attempts to perform a deadlock search on the same lock, resulting in an application hang. o Updates to application ACE get lost. Customer code locks the ACL, reads their ACE, updates a count field, re-writes the ACE, and unlocks the ACL. The change to the count gets lost. In order to get this full fix you must also install the ALPF11X03_071 remedial kit. Problems Addressed in ALPSYS16_071: o OpenVMS could create processes in the same group UIC with the same process name. o INCONSCHED Scheduler crash. Problems Addressed in ALPSYS15_071: o The system experiences a DELGBLSEC bugcheck with an R0 value of 2C72. This problem has been seen most frequently on systems running Oracle, but can occur with any application utilizing memory-resident global sections. This section must use the /NOALLOC option in the Reserved Memory Registry and must be larger than the amount reserved. o A higher than normal value for MP (MultiProcessor) Synchronization time (seen with the Monitor Modes display) and possibly a CPUSPINWAIT bugcheck may occur. The problem is seen particularly on systems running DECnet Phase IV, and using the FDDI interconnect. Currently the packet size is too large for the lookaside list and must be taken from variable pool, which requires the POOL spinlock which can cause a performance "hit" and can cause a CPUSPINWAIT bugcheck under heavy load. The solution is to increase the maximum size lookaside list. The size needed for this particular problem was 5376. However, it has been shown that there are other requests for larger packets. These could present similar problems so, a maximum size of 8192 has been established. Creating the extra lists has no memory or performance penalty in and of itself. The only issues raised with having more lookaside lists are those associated with where NonPaged Pool resides (on a lookaside or on the variable list). This change errs on the side of having more on the lookasides. The reclamation algorithms ensure that not too much will reside on the lookasides. NOTE This kit does not include changes for utilities that display information about nonpaged pool, including the lookaside lists. Therefore, some statistics displayed by SDA, CLUE,and SHOW MEMORY may be inaccurate in the following way: free memory that resides on the new lookaside lists will appear to be allocated. So, the system may actually have more free memory than that indicated by the utilities. Problems Addressed in ALPSYS14_071: o User-created protected subsystems with subsystem identifiers granted to executable images fail to work properly in manipulating queues via $SNDJBC[W]. Although the image has the subsystem identifier granted, a NOPRIV error is returned. Problems Addressed in ALPSYS13_071: o ******* WARNING ******* Setting the SYSGEN parameter VMSD4 to 1 would be done at the extreme risk of permanent system failure. OpenVMS engineering highly recommends that VMSD4 be left set to the default value of 0, i.e. enable automatic power-off. *********************** On all previous systems, a system fan failure automatically triggered the power supply to remove power from the system. On the new Alpha "DIGITAL Personal Workstation" (DPWS) platform, operating system software is notified of any fan failures. It must respond by removing power from the system. This new operating system feature is required for system FRS. However, for some special applications, one may desire to disable automatic system power-off, which can be achieved by setting the SYSGEN parameter VMSD4 to 1. The result will be continued operation after fan failure, but at the risk mentioned in the warning above. o System crashes with one of the following footprints: INCONSTATE SYS$VCC_NPRO+00009F04 INCONSTATE SYS$VCC_NPRO+00009F00 INCONSTATE VAXCLUSTER_CACHE+03EB1 Problems Addressed in ALPSYS12_071: o This enhancement restructures the code in the scheduler to provide better performance and cache behavior. There are no functional differences. o An access violation may occur at EXE$AST_RETURN, in an outer mode - most typically user mode - but with the FP pointing to the kernel stack. The access violation occurs trying to access data on the kernel stack from user mode. This does not crash the system, but causes the user image to exit. Problems Addressed in ALPSYS11_071: o When turning on the new multithread features with either LINK/THREADS or via THREACP, register corruption can occur. The registers R2 through R7 can be sign extended if a pagefault occurs and the EXEC performs a pagefault upcall to DECthreads. o A multithreaded process with upcalls enabled may hang in HIB due to a missing event flag upcall. If the program makes heavy use of event flags for synchronization, notification of an event flag being set can sometimes be lost. This can result in a thread waiting forever for an event which already took place. Problems Addressed in ALPSYS10_071: o If the Job Controller's mailbox is full at the time a batch job process termination message is sent to the JOB_CONTROL process, the message could be dropped and lost. This could result in SHOW QUEUE showing "executing" jobs with no associated process on the system. o A system crash (ACCVIO) may occur in SYSCREDEL.MAR. When pagefile quota is exhausted just as page tables are created, the region is not contracted appropriately. This causes the next attempt to create address space one page at a time to crash because it thinks the page tables are already there. o A system crash may occur due to non-paged pool leak. When an Oracle database is backed up using FAST-I/O, non-paged pool fills up with DIOBMs. o A system may crash with an SSRVEXCEPT, Unexpected system service exception. The crash footprint is: SDA> CLUE CRASH Crash Time: Bugcheck Type: SSRVEXCEPT, Unexpected system service exception Node: XXXXXX (Clustered) CPU Type: AlphaServer 1000A 5/400 VMS Version: V7.1 Current Process: OLS1 Current Image: $1$DKA4:[MSY30_5.]OLPR.EXE;1 Failing PC: FFFFFFFF.800CA390 EXE$CLONE_ADDRESS_SPACE_64_C+00A40 Failing PS: 0C000000.00000003 Module: SYS$VM Offset: 00006390 Failing Instruction: EXE$CLONE_ADDRESS_SPACE_64_C+00A40: LDL R2,#X0010(R2) Problems Addressed in ALPSYS08_071: o A PGFIPLHI bugcheck (Pagefault at IPL too high) may occur while running Oracle7 R7.3.2.3.2. Problems Addressed in ALPSYS06_071: o A PT page is being processed for a process that is also an outswap target. The SWAPPER already marked all valid and modified pages "delete contents (DELCON)" and decremented the valid page count in the PT page's PFN record. This allowed PROCESS_PT_REQUEST to see a -1 in PT_VAL_CNT and to encounter a valid PTE while the page table page was scanned. This situation was believed to be an inconsistent MMG state. o Audits are generated for mapping to memory resident global sections even though auditing is not enabled for the object. o Code was not dealing with the input from the 64-bit system services properly in descending regions. The VA is set to the last byte within the page. If the page is invalid, the code touches the page to fault it into memory. If the following page is set to no access, the system crashes. o This is a "day 1" bug in the fast-i/o code. We burden the ACB_QUOTA flag with double duty. The rest of the system uses this flag as an indicator whether AST quota have been charged and must be returned upon AST delivery; fast-i/o never charges AST quota but sets the flag anyway as an indication that an AST was requested. The flag must be cleared before the AST is queued. This must be done in three cases: - fast_finish - IPL4 completion - SIMREQCOM The original code cleared the flag only for IPL4 completion. o A change to clustered page deletion tried to close a hole that had the potential to lead to dramatic failures under very rare conditions. The fix was not quite right and led to a loss of pagefile quota under more easily achievable conditions. o System-wide counters for direct and buffered I/Os may become inflated when Fast-I/O is used. o Fast-I/O ($io_perform) transfers a max of 127 blocks to SCSI disks regardless of request size - but reports the full size as requested in the IOSA. o A serial console on a Turbolaser (as opposed to a remote or LAT console) may see device timeouts after issuing commands such as $ directory or $ show device. o Support Global Buffer Objects (achieve VAX parity; enable existing code and complete the port). o Support buffer objects without a system space window on memory resident sections only. Also included several fixes for buffered Fast-I/O and Fast-I/O through VIOC. Problem symptoms were possible process hangs and bad data returned in IOSA. o Fix possible system hang when a memory resident section is created. o Fix possible bugcheck when a memory resident section is deleted. o Fix possible process hang when an image exits with outstanding Fast-I/O. o Fix VIOC/Fast-I/O interaction. Bad data could be returned in IOSA and probed buffers unnecessarily for Fast-I/O. o Fix problems with AVOID_PREEMPT. There are two separate bugs which can allow a non-priv user to crash the system. o There is an omission in the original submission for the support of 32-bit signals as generated by nonprivileged usermode images linked /NOSYSSHR under V6.2 or earlier. A check was not being made in one particular path. ***************************** Notice ************************************** * * * The following problems will be corrected if both this * * remedial kit and the ALPSHAD04_071 (or supersedant) remedial * * kit are installed on the customer's system. Therefore, in * * order to get the complete list of fixes, customers should * * install both kits. However, either of these kits will run * * safely without the other kit installed. * * * *************************************************************************** o SDA> SHOW POOL can take an excessive period of time. o SHOW POOL gives NOSUCHPOOL errors unnecessarily. o SHOW POOL/SUMMARY counts and space totals do not match. o SHOW POOL can not always find the range. o When minimum SYSTEM_PRIMITIVES is in use, SDA will not work instead of signaling the correct message. o The symbol file is opened by SDA even when /OVERRIDE specified (and it is not used). o SDA can get into a loop printing blank lines. o Some of BUGCHECK's messages are confusing. o The Base SVA of buffer objects is only displayed as 32 bits. o An incomplete dump is inaccessible by SDA. The changes in this remedial kit will now treat DUMPINCOMPL as a warning if this is a selective dump and the dump has progressed far enough to dump the first process. o SDA SHOW EXEC does not always display all execlets. READ/EXEC does not read all the symbols. o MODIFY DUMP does not work on the dump header and /CONFIRM fails when the field being updated is a byte or a word and the original value is negative. o BUGCHECK's two public routines, (EXE$BUGCHK_REMOVE_VA and EXE$BUGCHK_CANCEL_REMOVE_VA), do not synchronize their manipulations with spinlocks. o BUGCHECK fails if the only process is the swapper. o Handling of Halt/Restart crashes when the Halt HWPCB is used is faulty. o SHOW DEV MC only allows /HOME but it is documented as /HOMEPAGE. Problems Addressed in ALPSYS05_071: o While booting, the system crashes during driver loading. This is caused by a corrupt Driver Prologue Table (DPT) for a previously loaded driver. Problems Addressed in ALPSYS03_071: o System crashes with one of the following footprints: INCONSTATE SYS$VCC_NPRO+00009F04 INCONSTATE SYS$VCC_NPRO+00009F00 INCONSTATE VAXCLUSTER_CACHE+03EB1 Problems Addressed in ALPSYSB02_071: o Installation of the ALPSYSB01_071 remedial kit on systems with a SYSTEM_SYNCHRONIZATION_PRF.EXE image might fail due to regression issues with the image on the system versus the image in the kit. The SYSTEM_SYNCHRONIZATION_PRF.EXE image has not shipped as part of OpenVMS and should not appear as part of the consolidated SYS kits. This image as well as other images that have not previously shipped in remedial kits have been removed from this ALPSYSB remedial kit. There are no new problem corrections in this kit. If you have successfully installed the ALPSYSB01_071 remedial kit, you do not need to install the ALPSYSB02_071 kit. Problems Addressed in ALPSYSB01_071: o The AUDIT server runs into an access violation in kernel mode when it encounters an unknown security class. o Some Java computer language code failed due to incorrect floating point results. The problems that occurred are: 1. Adding two floating point values which should result in 0.0 in some cases (if one of the two operands is negative) gives a -0.0 result. The IEEE standard states that the result should be 0.0. 2. Adding/Subtracting 0 and a denormal value yields zero instead of the denormal value. (A denormal value is a number between zero and the smallest finite number.) 3. A negative double float converted to a float yields +infinity instead of -infinity. Problems Addressed in ALPSYS17_071: o A SSRVEXCEPT crash occurred early in the boot process when a new 'fixed' image was used. The Signal Array had an ACCVIO with the message code in the offending PC = EXE$WRTMAILBOX+00C1C and in the operand = EXE$DELMBX+006E4. Problems Addressed in ALPSYS09_071: o Several of the Java validation suite tests are failing due to some incorrect floating point results. When a very small double value is cast to a float, it turns into an apparently random float value rather than zero. Similarly, a very large double value, which should turn into infinity when cast to a float, turns into another random value. Problems Addressed in ALPSYS07_071: o Fix incorrect ILLEGAL_SHADOW error when printing pages from Netscape Navigator. Problems Addressed in ALPSYS04_071: o A DOUBLDEALO BUGCHECK during process rundown, caused by double deallocation of a memory block. There is a small window of opportunity between the time the ORB is deallocated and the time its address is cleared in the UNC that the image could be interrupted and a rundown thread could pick up the ORB address and try to deallocate it again. INSTALLATION NOTES: The images in this kit will not take effect until the system is rebooted. If there are other nodes in the VMScluster, they must also be rebooted in order to make use of the new image(s). If it is not possible or convenient to reboot the entire cluster at this time, a rolling re-boot may be performed. All trademarks are the property of their respective owners.

This patch can be found at any of these sites:

Files on this server are as follows:

alpsys20_071.README
alpsys20_071.CHKSUM
alpsys20_071.CVRLET_TXT
alpsys20_071.a-dcx_axpexe
alpsys20_071.CVRLET_TXT