OpenVMS ALPSYS20_071 Alpha V7.1 - V7.1-1H2 System ECO Summary
TITLE: OpenVMS ALPSYS20_071 Alpha V7.1 - V7.1-1H2 System ECO Summary
Modification Date: 27-JUL-2000
Modification Type: Documentation Updated:
ALPF11X04_071 is superseded by ALPF11X06_071.
NOTE: An OpenVMS saveset or PCSI installation file is stored
on the Internet in a self-expanding compressed file.
For OpenVMS savesets, the name of the compressed saveset
file will be kit_name.a-dcx_vaxexe for OpenVMS VAX or
kit_name.a-dcx_axpexe for OpenVMS Alpha. Once the OpenVMS
saveset is copied to your system, expand the compressed
saveset by typing RUN kitname.dcx_vaxexe or kitname.dcx_alpexe.
For PCSI files, once the PCSI file is copied to your system,
rename the PCSI file to kitname-dcx_axpexe.pcsi, then it can
be expanded by typing RUN kitname-dcx_axpexe.pcsi. The resultant
file will be the PCSI installation file which can be used to install
the ECO.
Copyright (c) Compaq Computer Corporation 1999. All rights reserved.
PRODUCT: OpenVMS Alpha
COMPONENTS: System
CLASS_SCHEDULER
ERRORLOG
EXCEPTION
EXCEPTION_MON
EXEC_INIT
IMAGE_MANAGEMENT
IO_ROUTINES
IO_ROUTINES_MON
LOCKING
MESSAGE_ROUTINES
PROCESS_MANAGEMENT
PROCESS_MANAGEMENT_MON
SDA$SHARE
SECURITY
SYS$BASE_IMAGE
SYS$CLUSTER
SYS$PUBLIC_VECTORS
SYS$SSISHR
SYS$VCC
SYS$VCC_MON
SYS$VM
SYSDEVICE
SYSGETSYI
SYSLDR_DYN
SYSTEM_PRIMITIVES
SYSTEM_PRIMITIVES_MIN
VMS$IEEE_HANDLER
SOURCE: Compaq Computer Corporation
ECO INFORMATION:
ECO Kit Name: ALPSYS20_071
NOTE: The ALPSYSA03_071 remedial kit combined the
previous SYSA and SYSB kits into one SYSA kit.
Since SYS images are now being distributed in
one kit, the kit naming convention has been
changed from ALPSYSA to ALPSYS to more
accurately reflect the kit contents.
ALPSYS20_071 may also be known as ALPSYSA04_071.
ECO Kits Superseded by This ECO Kit: ALPSYSA03_071
ALPSYSA02_071
ALPSYSA01_071
ALPSYS16_071
ALPSYS15_071
ALPSYS14_071
ALPSYS13_071
ALPSYS12_071
ALPSYS11_071
ALPSYS10_071
ALPSYS08_071
ALPSYS06_071
ALPSYS05_071
ALPSYS03_071
ALPSYSB02_071
ALPSYSB01_071
ALPSYS17_071
ALPSYS09_071
ALPSYS07_071
ALPSYS04_071
ECO Kit Approximate Size: 12980 Blocks
Kit Applies To: OpenVMS Alpha V7.1, V7.1-1H1, V7.1-1H2
System/Cluster Reboot Necessary: Yes
Rolling Re-boot Supported: Yes
Installation Rating: INSTALL_1 To be installed on all systems running
the listed version(s) of OpenVMS.
Kit Dependencies:
The following remedial kit(s) must be installed BEFORE
installation of this kit:
ALPBASE02_071
In order to receive all the corrections listed in this kit, the
following remedial kits should also be installed:
ALPCPU1E03_071 (if kit is installed on an Alpha "DIGITAL
Personal Workstation")
ALPF11X06_071 (Supersedes ALPF11X04_071)
ALPBACK05_071
ALPDISM01_071
ALPINIT01_071
ALPMOUN07_071
ALPMTAA01_071
ALPSYSI01_071
ALPPTD01_071
ECO KIT SUMMARY:
An ECO kit exists for various system components on OpenVMS Alpha V7.1
through V7.1-1H2. This kit addresses the following problems:
Problems Addressed in ALPSYS20_071:
o In previous releases of OpenVMS, protection of a new FT device was
unconditionally set to S:RWPL,O:RWPL,G,W (no access for GROUP and
WORLD).
This change and a corresponding change in the ALPPTD01_071 kit
modify this behavior. Protection for a new FT device is now taken
from the protection of the "template device" FTA0:. The protection
for FTA0: may be set during the boot process, (in SYSTARTUP_VMS.COM),
or manually, for example:
$ SET SECURITY /CLASS=DEVICE-
/PROTECTION=(S:RWLP,O:RWLP,G:RW,W:R) FTA0:
At boot time, the protection on FTA0 is unconditionally set to
S:RWPL,O:RWPL,G,W. An ACL may also be set on FTA0, either
explicitly, or inherited from the SECURITY class TERMINAL device
template.
In order to get this full fix, you must also install the
ALPPTD01_071, or later, remedial kit.
Images Affected:
- [SYS$LDR]SYS$FTDRIVER.EXE
o It is a possible for a scheduling class with greater than 100%
quantum to still run out of quantum. This should not occur.
Images Affected:
- [SYS$LDR]CLASS_SCHEDULER.EXE
o A BASIC application terminates abnormally with the BAS$_PROLOSSOR,
DEVFOREIGN or ACCVIO status.
Images Affected:
- [SYS$LDR]PROCESS_MANAGEMENT.EXE
- [SYS$LDR]PROCESS_MANAGEMENT_MON.EXE
o MFPR_xxx and MTPR_xxx PALcode instructions can leave registers R1,
R16 and R17 with unpredictable results. These registers were not
always saved and restored in ASTDEL_STACK.M64. Although corruptions
of these registers have not been known to happen, the potential is
there, particularly on newer platforms.
This fix eliminates the possibility of this register corruption from
happening.
Images Affected:
- [SYS$LDR]PROCESS_MANAGEMENT.EXE
- [SYSEXE]PROCESS_MANAGEMENT_MON.EXE
o During a system boot with SYSTEM_CHECK set to 0 and with XFC
(Extended File Cache) loaded (VCC_FLAGS = 2), systems are crashing.
Images Affected:
- [SYS$LDR]MESSAGE_ROUTINES.EXE
o Attempting to run a program linked /DEBUG results in an ACCVIO.
Images Affected:
- [SYS$LDR]IMAGE_MANAGEMENT.EXE
o Multiple regions exist with overlapping VA (Virtual Address) space.
This can lead to just about any crash scenario.
Images Affected:
- [SYS$LDR]SYS$VM.EXE
o A DECthread may hang waiting for an event flag upcall.
Images Affected:
- [SYS$LDR]PROCESS_MANAGEMENT.EXE
- [SYS$LDR]PROCESS_MANAGEMENT_MON.EXE
- [SYS$LDR]PROCESS_MANAGEMENT.STB
- [SYS$LDR]PROCESS_MANAGEMENT_MON.STB
o A kernel thread may get stuck waiting for the inner mode semaphore
when it already owns it.
Images Affected:
- [SYS$LDR]PROCESS_MANAGEMENT.EXE
- [SYS$LDR]PROCESS_MANAGEMENT_MON.EXE
- [SYS$LDR]PROCESS_MANAGEMENT.STB
- [SYS$LDR]PROCESS_MANAGEMENT_MON.STB
o A kernel thread may get stuck in an AST (Asynchronous System Trap)
delivery loop trying to deliver an AST when there are none queued.
Images Affected:
- [SYS$LDR]PROCESS_MANAGEMENT.EXE
- [SYS$LDR]PROCESS_MANAGEMENT_MON.EXE
- [SYS$LDR]PROCESS_MANAGEMENT.STB
- [SYS$LDR]PROCESS_MANAGEMENT_MON.STB
o The system may crash with a "Pagefault with IPL too high" bugcheck
trying to deliver an AST.
Images Affected:
- [SYS$LDR]PROCESS_MANAGEMENT.EXE
- [SYS$LDR]PROCESS_MANAGEMENT_MON.EXE
- [SYS$LDR]PROCESS_MANAGEMENT.STB
- [SYS$LDR]PROCESS_MANAGEMENT_MON.STB
Problems Addressed in ALPSYSA03_071:
o An INVEXCEPT bugcheck in the SWAPPER can occur.
Images(s) Affected: [SYS$LDR]SYS$VM.EXE
o The system may not write out a crash dump.
Images(s) Affected:
- EXCEPTION.EXE
- EXCEPTION_MON.EXE
- EXCEPTION.STB
- EXCEPTION_MON.STB
o A process quota leak can occur for remote (detached) process
creation.
Image(s) affected:
- [SYS$LDR]PROCESS_MANAGEMENT.EXE
- [SYS$LDR]PROCESS_MANAGEMENT_MON.EXE
o A system may bugcheck with an NSABLOST error. This problem
can occur when a MAIL application uses the MAIL$USER_GET_INFO
utility (routine) call to acquire user mail information. It
can also be replicated with the command:
MAIL> SHOW FORWARD/USER=*
Images(s) Affected:
- [SYS$LDR]SECURITY.EXE
- [SYS$LDR]PROCESS_MANAGEMENT.EXE
o The system crashed while executing Fast I/O code. The failing
instruction could be something similar to:
EXE$IO_SETUP_C+004D8: LDL R20,(R20)
Images(s) Affected:
- IO_ROUTINES.EXE
- IO_ROUTINES_MON.EXE
o The $GETJPI system service did not return the value of the AST
enable (ASTen) register for all processes (except the null and
swapper processes), rather than just the currently executing
process.
Images(s) Affected:
- [SYS$LDR]PROCESS_MANAGEMENT.EXE
- [SYS$LDR]PROCESS_MANAGEMENT_MON.EXE
o For the FASTIO code, the system hangs with IPL 8 Fork Queue
not being serviced.
Image(s) Affected:
- [SYS$LDR]IO_ROUTINES.EXE
- [SYS$LDR]IO_ROUTINES_MON.EXE
o An INCONSTATE bugcheck may occur at offset SYS$VCC+08798 with
the condition code 213C = %SYSTEM-F-CVTUNGRANT, Cannot convert
an ungranted lock, in R0.
Images(s) Affected:
- [SYS$LDR]SYS$VCC.EXE
- [SYS$LDR]SYS$VCC_MON.EXE
o Heavy $GETQUI use could intermittently induce either a
nonfatal SSRVEXCEPT bugcheck at EXE$GETQUI_CONTEXT_FIND_C+00018
or a fatal DOUBLDEALO bugcheck at EXE$DEALLOCATE_C+00114.
Images(s) Affected:
- [SYS$LDR]MESSAGE_ROUTINES.EXE
o If a process tries posting more QIOs than is allowed by the
process buffered I/O quota and that process does this posting
while resource wait is disabled, the BUFIOCNT will end up
larger than it should. When the process is later deleted, it
will hang waiting for the BUFIOCNT to match the BUFIOLIM,
resulting an in unkillable looping process.
Images(s) Affected:
- [SYS$LDR]IO_ROUTINES.EXE
- [SYS$LDR]IO_ROUTINES_MON.EXE
o Heavy use of $GETQUI could result in a fatal SSRVEXCEPT
bugcheck, if the context queue is modified while it is being
scanned.
Images(s) Affected:
- [SYS$LDR]MESSAGE_ROUTINES.EXE
o An I/O error on a device can occur, resulting in a system
crash with either bugcheck code WSLVANVAL or bugcheck code
SECREFNEG.
Images(s) affected:
- [SYS$LDR]IO_ROUTINES.EXE
- [SYS$LDR]IO_ROUTINES_MON.EXE
o An INCONMMGST crash at SYS$VM+000660E8 or an WSLXVANVAL
bugcheck can occur due to problems in the modified page writer
WRTMFYPAG.
The problem typically can occur on a memory starved system or
if processes run images that greatly exceed their working set
quotas.
Image(s) Affected: [SYS$LDR]SYS$VM.EXE
o Threaded applications using DECthreads, with kernel support
enabled, perform poorly on systems with more than two CPUs.
Monitoring the system shows very high MP synchronization and
interrupt times. Also, a large amount of time is spent in the
$RESCHED system service.
Images(s) affected:
- [SYS$LDR]PROCESS_MANAGEMENT.EXE
- [SYS$LDR]PROCESS_MANAGEMENT_MON.EXE
o Closing another window in $GETQUI could cause an SSRVEXCEPT
error. Heavy use of $GETQUI could result in this fatal
SSRVEXCEPT bugcheck.
Images(s) Affected: [SYS$LDR]MESSAGE_ROUTINES.EXE
o The system can crash with an PFIPLHI error within the SECURITY
execlet.
Image(s) Affected: [SYS$LDR]SECURITY.EXE
o F$GETSYI returns a 16-byte string consisting of 8 bytes of
version number followed by 8 bytes of null (hex 00). Hence,
F$EDIT functions such as TRIM and COMPRESS do not remove the
trailing nulls.
Image(s) Affected: [SYS$LDR]SYSGETSYI.EXE
o For the $GETJPI system service, IPL synchronization issues can
cause a crash of either the sending or the target node in an
OpenVMS Cluster via CWSERR or INVEXCEPTN bugchecks in SYS$CLUSTER
code. In most cases the target node crashes, but data can be
lost on the sending node.
Image(s) Affected:
- [SYS$LDR]PROCESS_MANAGEMENT.EXE
- [SYS$LDR]SYS$CLUSTER.EXE
o The SYS$FAO system service could incorrectly truncate output
strings when the !XW and !XL format codes are used. SYS$FAO
could also sometimes put a null character in the last
character position of a formatted output string.
Image(s) Affected: [SYS$LDR]MESSAGE_ROUTINES.EXE
o For a call from SYS$DELPRC, an ACCVIO occurred resulting in a
system crash.
Image(s) Affected:
- [SYS$LDR]IMAGE_MANAGEMENT.EXE
- [SYS$LDR]PROCESS_MANAGEMENT.EXE
- [SYS$LDR]PROCESS_MANAGEMENT_MON.EXE
o Threaded applications may hang when using the FAST IO system
services.
Image(s) Affected:
- [SYS$LDR]IO_ROUTINES.EXE
- [SYS$LDR]IO_ROUTINES_MON.EXE
o An INVEXCEPTN can occur at EXE_STD$WRTMAILBOX_C+584 when a
mailbox UCB UCB$L_MB_WRITERWAITQFL ends up with a
self-relative pointer to the I/O post processing queue
(IOC$GQ_POSTIQ).
Image(s) Affected:
- [SYS$LDR]SYSDEVICE.EXE
o An attempt to close a file whose disk has timed out in mount
verify can fail to close, causing FILCNTNONZ bugchecks during
process rundown.
Image(s) Affected: [SYS$LDR]IO_ROUTINES.EXE
o If a process working set is reduced, such that the working set
only contains locked pages and the process then pagefaults,
the system could bugcheck in MMG$PAGEFAULT with "FREWSLX, Free
working set list index, resource wait". The reason for this
problem is that no free working set list entries could be
found. (OpenVMS Alpha V6.2 could go into an infinite loop).
Image(s) affected: [SYS$LDR]SYS$VM.EXE
Problems Addressed in ALPSYSA02_071:
o The previous remedial kit, ALPSYSA01_071, shipped SYS$BASE_IMAGE.EXE
without shipping SDA$SHARE.EXE. Users who installed ALPSYSA01_071
saw the following warning message :
%SDA-W-SDALINKSIMS, link time of SYS$BASE_IMAGE built into
SDA$SHARE does not match the link time of image in system.
o Previous remedial kits that replaced SYSBASE_IMAGE.EXE did not
disable MOVEFILE on the image. This problem could cause a
problem if third-party defragmentation software is used that
moves SYSBASE_IMAGE.EXE.
This kit disables MOVEFILE on the SYSBASE_IMAGE.EXE image that
is installed with this kit. Note that no changes were made to
SYSBASE_IMAGE.EXE.
o A process using MME could potentially "miss" the VOL1 label
on a tape. Also, a process could "hang" trying to send a
message to the MME process.
This problem can occur in several different areas of the
operating system. In order to get the full implementation of
this MME fix, the following remedial kits (or their supersedants)
should also be installed:
+ ALPBACK03_071
+ ALPDISM01_071
+ ALPINIT01_071
+ ALPMOUN05_071
+ ALPMTAA01_071
It is not necessary to install these kits at the same time,
but until they are installed you may still experience this
problem.
o A possible system crash occurs during Host Based RAID Unbinds
with MME code enabled. A mailbox read synchronization problem
causes the crash.
This problem only occurs when a host-based RAID UNBIND command
is done while an MME-based application is running. This problem
can occur in several different code areas of the operating system.
In order to eliminate all known instances of this problem, the
following remedial kits (or their supersedants) will also need
to be installed:
+ ALPBACK03_071
+ ALPDISM01_071
+ ALPINIT01_071
+ ALPMOUN05_071
+ ALPMTAA01_071
o A satellite will hang while booting under the following
conditions:
1. The satellite has device naming on;
2. It boots from a served disk which does not have a
PAC (DKcn); and
3. It has a PAC assigned to its local PKC.
In order to get the full fix, the ALPSYSI01_071 kit also needs
to be installed.
o A connection to TMSCP served tapes results in a system crash.
o Deadlocks or up to 72-second delays occur during booting of
a node that uses SCSI port allocation classes. The deadlocks
occur if the votes from a quorum disk are needed to form a
cluster or if the page or swap disk is a disk other than the
system disk.
o A system crash (ACCVIO) occurs when attempts are made to access
page tables that do not exist.
The problem occurs when an application attempts to expand the
size of virtual address space for a process, but the process
has insufficient pagefile quota. The symptom is a SSRVEXCEPT
crash due to an access violation in kernel mode at the label:
MMG_STD$CREPAG_64_C+0023C
The other symptom of the problem is that the current process
has no more pagefile quota (PGFLQUOTA). It can be seen when a
SHOW CRASH is done, followed by a FORMAT Job Information Block
(JIB) command in SDA. The PGFLQUOTA field (JIB$L_PGFLCNT)
will then be 0.
o Processor Correctable interrupts (single-bit ecc errors)
are not seen on all systems.
The MCES register is not properly initialized in the
SYS$CPU_ROUTINES module and the DPC bit is left set so
that Processor Correctable error logging is disabled.
Consequently, single-bit ecc errors, though corrected,
are never reported. A user would thus have no indication
that memory was exhibiting problems until it was too late.
The system would crash with an uncorrectable (multi-bit)
ecc error.
o If a system has been up for 497.1 days without rebooting,
the system cell EXE$GL_ABSTIM_TICS (number of 10 millisecond
tics since boot) will overflow. This problem can cause some
processes to remain indefinitely in the RWMPB or COMO
scheduling state.
o A crash occurs, with a PGFIPLHI "Pagefault with IPL too high"
bugcheck, in SYS$VM_PRO+15B0 in the S$ADJWSL system service.
The reason for the crash is because the code page for
SYS$ADJWSL was removed from the system working set.
o The performance counter PMS$GL_NPAGDYNEXPS (cell) was never
incremented above its initial value of zero. It can be
displayed by SDA>CLUE MEM/STAT.
o If the system has insufficient Lock IDs and tries to expand
the Lock ID table at a time when no free PFNs (Page Frame
Numbers) exist, then the base system PTEs (Page Table Entries),
pointed to by MMG$GL_SPTBASE, can be overwritten. The result
is an inability to get a system dump and a repeat of "kernel
stack invalid halt" at the console.
o If the system is temporarily out of Lock IDs and there are
currently no free pages to expand the Lock ID table, then
SYS$VCC could crash with an INCONSTATE bugcheck at
SYS$VCC_NPRO+09700.
o Absolute TQE (Timer Queue Element) firing times are calculated
by $SCHDWK and by $SETIMR from user-supplied relative time
quadwords. Those calculations are not protected against
changes in the system time quadword EXE$GQ_SYSTIME. If a time
server modifies the time in the midst of a calculation, the
eventual firing time could be later than expected.
o After the application of ALPSYSA01_071 (or ALPSYS12_071 -
ALPSYS16_071), current process priorities were not being
floated back toward the base priority after a priority boost.
The result would be low priority processes staying at their
boosted priority and taking more CPU than expected,
effectively disabling the process priority scheme.
o If the IOC_STD$SIMREQCOM routine is called with a 0 IOSB
argument, a DOUBLDEALO bugcheck can occur. The bugcheck
happens because an attempt is being made to deallocate a
PCB twice.
o An incorrect process header vector index was used and the
resulting address was inaccessible. The index could have
had the sign bit set for a swap write.
o A register value was prematurely destroyed allowing COM
processes, which were being outswapped, to remain on the COM
queue. They could subsequently become CUR, even though their
bodies were outswapped, resulting in a variety of bugchecks.
o After the application of ALPSYSA01_071, some systems were
intermittently experiencing SSRVEXCEPTN bugchecks at
IO_ROUTINES_PRO+02CD4, handling SYS$BRKTHRU requests.
o The problem has been seen mostly at large ALL-IN-1 sites.
If a page being deleted with the $DELTVA or $DELTVA_64 system
service is a global page with I/O still active, the process
can possibly enter the RWAST scheduling state. Due to a
deadlock situation, it could remain in RWAST state indefinitely.
When this problem occurs, all disk I/O for the entire VMScluster
can be hung.
The problem can be detected through the use of the ANALYZE/SYSTEM
utility, by issuing a SHOW PROCESS/REGISTER command on a process
in the RWAST scheduling state. If the PC register indicates an
address in the SYS$VM image or the MMG$DELPAG or MMG$DELPAG_64
routine and the PS register indicates IPL 2, then the problem is
present.
This update requires a FULL BUILD, which is due to the change
to [LIB]MMGDEF.SDL. This change also defines some new flags,
used only in this update, which must be obtained from a
library not contained within the SYS facility.
o $DEVICE can return the name of a dual-pathed SCSI disk twice.
These disks have two UCBs, each of which have some differences
and allow one to distinguish primary from alternate UCBs. The
latter can be filtered out by refraining from returning the
name of a UCB with the following characteristics:
1. Bits 2P (dual path), CDP (non-preferred path) and SCSI set
in UCB$L_DEVCHAR2; and
2. UCB$L_2P_ALTUCB non-zero (pointer to the other UCB).
o P0 is extended in an EXEC or KERNEL AST while the image
activator is running port code from Alpha, which retries on
VA_IN_USE errors.
o A pool leak occurred when the deletion of a remotely created
process was done. The Job Information Block (JIB) of a
remotely created process was not deallocated when the process
was deleted because of incorrect register initialization.
Problems Addressed in ALPSYSA01_071:
o An Access Violation may occur at EXE$AST_RETURN, in an outer mode
- most typically user mode - but with the Frame pointer pointing
to the kernel stack. The access violation occurs trying to access
data on the kernel stack from user mode. This does not crash the
system, but causes the user image to exit.
o An SSRVEXCEPT crash in may occur SYS$NETWORK_SERVICES.EXE with
NET$ACP as the current process (and image).
o With the kernel threads upcall feature enabled, applications
which perform high numbers of pagefaults may see threads stuck
in a pagefault wait state.
o An INVEXCEPTN, Exception while above ASTDEL, may occur in the
EXE_STD$REMOVACB routine in the ASTDEL.MAR module.
o If a cluster transition occurs at the same time a local system
function which requires the PRIMARY capability is being processed,
the system will crash with an INCONSCHED bugcheck.
o When using the SYS$BRKTHRU system service (i.e., when doing a
REPLY/ALL to a large number of terminals), an OpenVMS Alpha 7.1
system can hang, or, on SMP processors, crash with a CPUSPINWAIT
or a CPUSANITY bugcheck.
The crash dump will usually show that one CPU did not respond to
the bugcheck request, and the current process on that CPU is
doing a REPLY/ALL or SYS$BRKTHRU that has filled its Kernel
Stack with ASTs handling the request.
o A Lock Manager deadlock search should either find and break a
deadlock or find out that there is no deadlock and remove the
lock at the head of the lock timeout queue. Although some
valid reasons exist on why a deadlock search could be aborted
and retried later on, in the above described case, an aborted
search was not the appropriate action.
Each second, a deadlock search was started and aborted shortly
thereafter, with the original lock being left at the head of
the timeout queue. This problem caused continual retry
attempts to perform a deadlock search on the same lock,
resulting in an application hang.
o Updates to application ACE get lost. Customer code locks the
ACL, reads their ACE, updates a count field, re-writes the
ACE, and unlocks the ACL. The change to the count gets lost.
In order to get this full fix you must also install the
ALPF11X03_071 remedial kit.
Problems Addressed in ALPSYS16_071:
o OpenVMS could create processes in the same group UIC with the
same process name.
o INCONSCHED Scheduler crash.
Problems Addressed in ALPSYS15_071:
o The system experiences a DELGBLSEC bugcheck with an R0 value
of 2C72. This problem has been seen most frequently on
systems running Oracle, but can occur with any application
utilizing memory-resident global sections. This section must
use the /NOALLOC option in the Reserved Memory Registry and
must be larger than the amount reserved.
o A higher than normal value for MP (MultiProcessor) Synchronization
time (seen with the Monitor Modes display) and possibly a CPUSPINWAIT
bugcheck may occur. The problem is seen particularly on systems
running DECnet Phase IV, and using the FDDI interconnect.
Currently the packet size is too large for the lookaside list
and must be taken from variable pool, which requires the POOL
spinlock which can cause a performance "hit" and can cause a
CPUSPINWAIT bugcheck under heavy load. The solution is to
increase the maximum size lookaside list. The size needed for
this particular problem was 5376. However, it has been shown
that there are other requests for larger packets. These could
present similar problems so, a maximum size of 8192 has been
established. Creating the extra lists has no memory or
performance penalty in and of itself. The only issues raised
with having more lookaside lists are those associated with
where NonPaged Pool resides (on a lookaside or on the variable
list). This change errs on the side of having more on the
lookasides. The reclamation algorithms ensure that not too
much will reside on the lookasides.
NOTE
This kit does not include changes for
utilities that display information about
nonpaged pool, including the lookaside lists.
Therefore, some statistics displayed by SDA,
CLUE,and SHOW MEMORY may be inaccurate in the
following way: free memory that resides on
the new lookaside lists will appear to be
allocated. So, the system may actually have
more free memory than that indicated by the
utilities.
Problems Addressed in ALPSYS14_071:
o User-created protected subsystems with subsystem identifiers
granted to executable images fail to work properly in
manipulating queues via $SNDJBC[W]. Although the image has
the subsystem identifier granted, a NOPRIV error is returned.
Problems Addressed in ALPSYS13_071:
o ******* WARNING *******
Setting the SYSGEN parameter VMSD4 to 1 would be done at the
extreme risk of permanent system failure. OpenVMS engineering
highly recommends that VMSD4 be left set to the default value
of 0, i.e. enable automatic power-off.
***********************
On all previous systems, a system fan failure automatically
triggered the power supply to remove power from the system. On
the new Alpha "DIGITAL Personal Workstation" (DPWS) platform,
operating system software is notified of any fan failures. It
must respond by removing power from the system. This new
operating system feature is required for system FRS.
However, for some special applications, one may desire to
disable automatic system power-off, which can be achieved by
setting the SYSGEN parameter VMSD4 to 1. The result will be
continued operation after fan failure, but at the risk
mentioned in the warning above.
o System crashes with one of the following footprints:
INCONSTATE SYS$VCC_NPRO+00009F04
INCONSTATE SYS$VCC_NPRO+00009F00
INCONSTATE VAXCLUSTER_CACHE+03EB1
Problems Addressed in ALPSYS12_071:
o This enhancement restructures the code in the scheduler to
provide better performance and cache behavior. There are no
functional differences.
o An access violation may occur at EXE$AST_RETURN, in an outer mode
- most typically user mode - but with the FP pointing to the
kernel stack. The access violation occurs trying to access
data on the kernel stack from user mode. This does not crash
the system, but causes the user image to exit.
Problems Addressed in ALPSYS11_071:
o When turning on the new multithread features with either
LINK/THREADS or via THREACP, register corruption can occur.
The registers R2 through R7 can be sign extended if a pagefault
occurs and the EXEC performs a pagefault upcall to DECthreads.
o A multithreaded process with upcalls enabled may hang in HIB
due to a missing event flag upcall. If the program makes heavy
use of event flags for synchronization, notification of an
event flag being set can sometimes be lost. This can result in
a thread waiting forever for an event which already took place.
Problems Addressed in ALPSYS10_071:
o If the Job Controller's mailbox is full at the time a batch job
process termination message is sent to the JOB_CONTROL process,
the message could be dropped and lost. This could result in
SHOW QUEUE showing "executing" jobs with no associated process
on the system.
o A system crash (ACCVIO) may occur in SYSCREDEL.MAR. When
pagefile quota is exhausted just as page tables are created,
the region is not contracted appropriately. This causes the
next attempt to create address space one page at a time to
crash because it thinks the page tables are already there.
o A system crash may occur due to non-paged pool leak. When
an Oracle database is backed up using FAST-I/O, non-paged
pool fills up with DIOBMs.
o A system may crash with an SSRVEXCEPT, Unexpected system
service exception. The crash footprint is:
SDA> CLUE CRASH
Crash Time:
Bugcheck Type: SSRVEXCEPT, Unexpected system service exception
Node: XXXXXX (Clustered)
CPU Type: AlphaServer 1000A 5/400
VMS Version: V7.1
Current Process: OLS1
Current Image: $1$DKA4:[MSY30_5.]OLPR.EXE;1
Failing PC: FFFFFFFF.800CA390
EXE$CLONE_ADDRESS_SPACE_64_C+00A40
Failing PS: 0C000000.00000003
Module: SYS$VM
Offset: 00006390
Failing Instruction:
EXE$CLONE_ADDRESS_SPACE_64_C+00A40: LDL R2,#X0010(R2)
Problems Addressed in ALPSYS08_071:
o A PGFIPLHI bugcheck (Pagefault at IPL too high) may occur while
running Oracle7 R7.3.2.3.2.
Problems Addressed in ALPSYS06_071:
o A PT page is being processed for a process that is also an
outswap target. The SWAPPER already marked all valid and
modified pages "delete contents (DELCON)" and decremented the
valid page count in the PT page's PFN record. This allowed
PROCESS_PT_REQUEST to see a -1 in PT_VAL_CNT and to encounter a
valid PTE while the page table page was scanned. This
situation was believed to be an inconsistent MMG state.
o Audits are generated for mapping to memory resident global
sections even though auditing is not enabled for the object.
o Code was not dealing with the input from the 64-bit system
services properly in descending regions. The VA is set to the
last byte within the page. If the page is invalid, the code
touches the page to fault it into memory. If the following
page is set to no access, the system crashes.
o This is a "day 1" bug in the fast-i/o code. We burden the
ACB_QUOTA flag with double duty. The rest of the system uses
this flag as an indicator whether AST quota have been charged
and must be returned upon AST delivery; fast-i/o never charges
AST quota but sets the flag anyway as an indication that an AST
was requested. The flag must be cleared before the AST is
queued. This must be done in three cases:
- fast_finish
- IPL4 completion
- SIMREQCOM
The original code cleared the flag only for IPL4 completion.
o A change to clustered page deletion tried to close a hole that
had the potential to lead to dramatic failures under very rare
conditions. The fix was not quite right and led to a loss of
pagefile quota under more easily achievable conditions.
o System-wide counters for direct and buffered I/Os may become
inflated when Fast-I/O is used.
o Fast-I/O ($io_perform) transfers a max of 127 blocks to SCSI
disks regardless of request size - but reports the full size as
requested in the IOSA.
o A serial console on a Turbolaser (as opposed to a remote or LAT
console) may see device timeouts after issuing commands such as
$ directory or $ show device.
o Support Global Buffer Objects (achieve VAX parity; enable
existing code and complete the port).
o Support buffer objects without a system space window on memory
resident sections only. Also included several fixes for
buffered Fast-I/O and Fast-I/O through VIOC. Problem symptoms
were possible process hangs and bad data returned in IOSA.
o Fix possible system hang when a memory resident section is
created.
o Fix possible bugcheck when a memory resident section is
deleted.
o Fix possible process hang when an image exits with outstanding
Fast-I/O.
o Fix VIOC/Fast-I/O interaction. Bad data could be returned in
IOSA and probed buffers unnecessarily for Fast-I/O.
o Fix problems with AVOID_PREEMPT. There are two separate bugs
which can allow a non-priv user to crash the system.
o There is an omission in the original submission for the support
of 32-bit signals as generated by nonprivileged usermode images
linked /NOSYSSHR under V6.2 or earlier. A check was not being
made in one particular path.
***************************** Notice **************************************
* *
* The following problems will be corrected if both this *
* remedial kit and the ALPSHAD04_071 (or supersedant) remedial *
* kit are installed on the customer's system. Therefore, in *
* order to get the complete list of fixes, customers should *
* install both kits. However, either of these kits will run *
* safely without the other kit installed. *
* *
***************************************************************************
o SDA> SHOW POOL can take an excessive period of time.
o SHOW POOL gives NOSUCHPOOL errors unnecessarily.
o SHOW POOL/SUMMARY counts and space totals do not match.
o SHOW POOL can not always find the range.
o When minimum SYSTEM_PRIMITIVES is in use, SDA will not work
instead of signaling the correct message.
o The symbol file is opened by SDA even when /OVERRIDE specified
(and it is not used).
o SDA can get into a loop printing blank lines.
o Some of BUGCHECK's messages are confusing.
o The Base SVA of buffer objects is only displayed as 32 bits.
o An incomplete dump is inaccessible by SDA. The changes in this
remedial kit will now treat DUMPINCOMPL as a warning if this is
a selective dump and the dump has progressed far enough to dump
the first process.
o SDA SHOW EXEC does not always display all execlets. READ/EXEC
does not read all the symbols.
o MODIFY DUMP does not work on the dump header and /CONFIRM fails
when the field being updated is a byte or a word and the
original value is negative.
o BUGCHECK's two public routines, (EXE$BUGCHK_REMOVE_VA and
EXE$BUGCHK_CANCEL_REMOVE_VA), do not synchronize their
manipulations with spinlocks.
o BUGCHECK fails if the only process is the swapper.
o Handling of Halt/Restart crashes when the Halt HWPCB is used is
faulty.
o SHOW DEV MC only allows /HOME but it is documented as
/HOMEPAGE.
Problems Addressed in ALPSYS05_071:
o While booting, the system crashes during driver loading. This
is caused by a corrupt Driver Prologue Table (DPT) for a previously
loaded driver.
Problems Addressed in ALPSYS03_071:
o System crashes with one of the following footprints:
INCONSTATE SYS$VCC_NPRO+00009F04
INCONSTATE SYS$VCC_NPRO+00009F00
INCONSTATE VAXCLUSTER_CACHE+03EB1
Problems Addressed in ALPSYSB02_071:
o Installation of the ALPSYSB01_071 remedial kit on systems with
a SYSTEM_SYNCHRONIZATION_PRF.EXE image might fail due to
regression issues with the image on the system versus the image
in the kit. The SYSTEM_SYNCHRONIZATION_PRF.EXE image has not
shipped as part of OpenVMS and should not appear as part of the
consolidated SYS kits. This image as well as other images that
have not previously shipped in remedial kits have been removed
from this ALPSYSB remedial kit.
There are no new problem corrections in this kit. If you have
successfully installed the ALPSYSB01_071 remedial kit, you do
not need to install the ALPSYSB02_071 kit.
Problems Addressed in ALPSYSB01_071:
o The AUDIT server runs into an access violation in kernel mode
when it encounters an unknown security class.
o Some Java computer language code failed due to incorrect
floating point results. The problems that occurred are:
1. Adding two floating point values which should result
in 0.0 in some cases (if one of the two operands is
negative) gives a -0.0 result. The IEEE standard states
that the result should be 0.0.
2. Adding/Subtracting 0 and a denormal value yields zero
instead of the denormal value. (A denormal value is a
number between zero and the smallest finite number.)
3. A negative double float converted to a float yields
+infinity instead of -infinity.
Problems Addressed in ALPSYS17_071:
o A SSRVEXCEPT crash occurred early in the boot process when a
new 'fixed' image was used. The Signal Array had an ACCVIO
with the message code in the offending PC = EXE$WRTMAILBOX+00C1C
and in the operand = EXE$DELMBX+006E4.
Problems Addressed in ALPSYS09_071:
o Several of the Java validation suite tests are failing due to
some incorrect floating point results. When a very small
double value is cast to a float, it turns into an apparently
random float value rather than zero. Similarly, a very large
double value, which should turn into infinity when cast to a
float, turns into another random value.
Problems Addressed in ALPSYS07_071:
o Fix incorrect ILLEGAL_SHADOW error when printing pages from
Netscape Navigator.
Problems Addressed in ALPSYS04_071:
o A DOUBLDEALO BUGCHECK during process rundown, caused by double
deallocation of a memory block.
There is a small window of opportunity between the time the
ORB is deallocated and the time its address is cleared in the
UNC that the image could be interrupted and a rundown thread
could pick up the ORB address and try to deallocate it again.
INSTALLATION NOTES:
The images in this kit will not take effect until the system is rebooted.
If there are other nodes in the VMScluster, they must also be rebooted in
order to make use of the new image(s).
If it is not possible or convenient to reboot the entire cluster at this
time, a rolling re-boot may be performed.
All trademarks are the property of their respective owners.
This patch can be found at any of these sites:
Colorado Site
Georgia Site
Files on this server are as follows:
alpsys20_071.README
alpsys20_071.CHKSUM
alpsys20_071.CVRLET_TXT
alpsys20_071.a-dcx_axpexe
alpsys20_071.CVRLET_TXT
|