OpenVMS ALPPORTS01_062 Alpha V6.2-1H3 PCA/PNDRIVERS and SCS ECO Summary
TITLE: OpenVMS ALPPORTS01_062 Alpha V6.2-1H3 PCA/PNDRIVERS and SCS ECO Summary
NOTE: An OpenVMS saveset or PCSI installation file is stored
on the Internet in a self-expanding compressed file.
The name of the compressed file will be kit_name-dcx_vaxexe
for OpenVMS VAX or kit_name-dcx_axpexe for OpenVMS Alpha.
Once the file is copied to your system, it can be expanded
by typing RUN compressed_file. The resultant file will
be the OpenVMS saveset or PCSI installation file which
can be used to install the ECO.
*OpenVMS] ALPPORTS01_062 Alpha V6.2 - V6.2-1H3 PCA/PNDRIVERS and SCS ECO Summary"-
Copyright (c) Compaq Computer Corporation 1998. All rights reserved.
Modification Date: 20-NOV-98
Modification Type: Documentation - Applies to V6.2-1H3 ONLY
NOTE: Documention included in this kit indicates that it applies to
V6.2, V6.2-1H1, V6.2-1H2, V6.2-1H3, This is incorrect. As with
the previous ALPDRIV17_H3062 TIMA this new PORTS kit should only
be installed on OpenVMS Alpha V6.2-1H3, and not any of the other
V6.2 releases.
OP/SYS: DIGITAL OpenVMS Alpha
COMPONENT: SYS$PCADRIVER
SYS$PNDRIVER
SYS$SCS
SOURCE: Compaq Computer Corporation
ECO INFORMATION:
ECO Kit Name: ALPPORTS01_062
ECO Kits Superseded by This ECO Kit: ALPDRIV17_H3062
ALPDRIV12_H3062
ECO Kit Approximate Size: 1080 Blocks
Kit Applies To: OpenVMS Alpha V6.2, V6.2-1H1, V6.2-1H2, V6.2-1H3
System/Cluster Reboot Necessary: Yes
Rolling Reboot Supported: Yes
Installation Rating: INSTALL_2
2 - To be installed on all systems running
the listed version(s) of OpenVMS and
using the following feature(s):
OpenVMS Clusters
INSTALLATION NOTE: Please see detailed installation instructions
in the INSTALLATION NOTES section below. *DO NOT* install this
ECO kit without first reviewing these instructions.
Kit Dependencies:
The following remedial kit(s) must be installed BEFORE
installation of this kit:
None
In order to receive all the corrections listed in this
kit, the following remedial kits should also be installed:
None
ECO KIT SUMMARY:
An ECO kit exists for PCADRIVER, PNDRIVER and SCS on OpenVMS Alpha
V6.2 through V6.2-1H3. This kit addresses the following problems:
Problems Addressed in ALPPORTS01_062:
o An AlphaServer node booting into a CI cluster, with CIPCA or
CIXCD, may fail to join the cluster and hang. On node boot-up,
or after virtual-circuit failure recovery, with CIPCA or CIXCD
adapters, an SCS "CONNECTION-REQUEST" (CON_REQ) SCS-control
message may be lost. This will suspend all SCS-sysap connection
formation activity on a given CI-virtual-circuit (SCS path-block
SCSMSG lost).
Under V6.2/V6.2-1Hx, this problem was responsible for the
"Virtual-Circuit-Timeout" errors frequently seen on booting
Alpha/CIPCA/CIXCD nodes. OpenVMS V7.1 changes to VC-timeout
detection to reduce "nuisance errors" caused the
SCS-lost-message hang described here.
Using SDA> SHOW CONNECTION on a crash-dump from a system with
the "lost SCS control message" will show 1 sysap in "CON_SENT",
and 0 or more sysaps on xxx_pend state:
SDA> SHOW CONNECTION
--- CDT Summary Page ---
CDT Address Local Process Connection ID State Remote
----------- ------------- ------------- ----- --------
8105D720 SCS$DIRECTORY DB1F0000 listen
.
8105E530 SCS$DIR_LOOKUP DB1F0009 con_sent ADEBUG
.
8xxxxxxx SCS$DIRECTORY DB1F000A accp_pend ADEBUG
.
8xxxxxxx VMS$DISK_CL_DRVR 6DB70006 con_pend PTMANB
.
.
------------------------------------------------
Using SDA> SHOW CLUSTER/SCS to find the path-block for the
problem virtual-circuit, note that "SCS MSGBUF" is empty,
confirming loss of the single SCS-control-message allocated
for each virtual-circuit (path-block):
VMScluster data structures
--------------------------
--- Path Block (PB) 80DAFF40 ---
Status: 0020 credit
Remote sta. addr. 000000000005 Remote port type CIPCA
Remote state ENAB Number of data paths 2
Remote hardware rev. 00000015 Cables state A-OK B-OK
Remote func. mask ABFF0D00 Local state OPEN
Reseting port 00 Port dev. name PNA0
Handshake retry cnt. 1 SCS MSGBUF address 00000000
========
Msg. buf. wait queue 80DAFF78 PDT address 80DA6B00
--------------------------------------------
Confirming symptoms on remote "victim" nodes is not as reliable
or foolproof. The typical SCS-connection state, from SDA> SHOW
CONNECTIONS would show SCS-sysap-process connections hung in
CON_ACK state, since the remote/culprit node has lost the
SCS-control-message for returning an "ACCP_REQ" (accept request):
--- CDT Summary Page ---
CDT Address Local Process Connection ID State Remote
----------- ------------- ------------- ----- ------
8105D720 SCS$DIRECTORY DB1F0000 listen
.
8105E530 SCS$DIR_LOOKUP DB1F0009 con_ack ANDA1A
o CIXCD (CIMNA) and CIPCA adapters will generate MFQE (Message-Free-
Queue-Empty) interrupts, causing a CI-adapter "RESET" and temporary
loss of all virtual-circuits (mount-verification, etc.) when using
the StorageWorks Control Console (SWCC V2.0) or HSJ-console monitoring
scripts using FYDRIVER/MSCP$DUP. The following console messages
and DECevent-formatted error-log messages will be seen:
CONSOLE:
%PNA0 - Software Shutting Down Port
ERROR-LOG:
**************************** ENTRY 1 ***********************
Logging OS 1. OpenVMS
System Architecture 2. Alpha
OS version V7.1
Event sequence number 1.
Timestamp of occurrence 01-JAN-1996 00:00:11
Time since reboot 0 Day(s) 0:00:11
Host name CSG84
System Model AlphaServer 8400 Model 5/300
Entry type 100. Logged Message
---- Device Profile ----
Unit CSG84$PNB0
Product Name CIXCD (XMI to CI Adapter);
** OR ** CIPCA (PCI to CI Adatper)
---- MSCP Logged Msg ----
Logged Message Type Code 3. Port Message
Error Type/SubType xC002 Signaled via Packet, Software
SHUTTING DOWN Port.
Port will be RE-STARTED.
Count - Remaining Retries 50.
.
.
.
***************************************************************
o System fails to boot OpenVMS V7.1 when boot path is KFMSB
(XMI-to-DSSI)/HSD10. The following console and error-log
events are generated:
CONSOLE:
%PNB0 - Software Shutting Down Port
DECEVENT-FORMATTED ERROR-LOG:
************************** ENTRY 1 ***********************
Logging OS 1. OpenVMS
System Architecture 2. Alpha
OS version V7.1
Event sequence number 1.
Timestamp of occurrence 01-JAN-1996 00:00:11
Time since reboot 0 Day(s) 0:00:11
Host name CSG84
System Model AlphaServer 8400 Model 5/300
Entry type 100. Logged Message
---- Device Profile ----
Unit CSG84$PNB0
Product Name KFMSB (XMI to DSSI Adapter)
---- MSCP Logged Msg ----
Logged Message Type Code 3. Port Message
Error Type/SubType xC002 Signaled via Packet, Software
SHUTTING DOWN Port.
Port will be RE-STARTED.
Count - Remaining Retries 50.
Error Count 1.
Local Station Address x0000000000000007
.
.
.
*************************************************************
o On booting VMS, both the CIXCD and CIPCA will generate "Path
LOOPBACK" error-messages on the console and in the error-log.
This error has occurred since initial release of Alpha OpenVMS
V1.0 and since the CIPCA was introduced with V6.2-1H2. The
following console and error-log entries will appear:
CONSOLE:
%PNA0, Path #0. Loopback has gone from GOOD to BAD
%PNB0, Path #0. Loopback has gone from GOOD to BAD
DECEVENT-FORMATTED ERROR-LOG:
************************ ENTRY 6 *************************
Logging OS 1. OpenVMS
System Architecture 2. Alpha
OS version V6.2-1H3
Event sequence number 1.
Timestamp of occurrence 03-NOV-1997 13:53:48
Time since reboot 0 Day(s) 0:00:20
Host name CSG84
System Model AlphaServer 8400 Model 5/300
Entry type 100. Logged Message
---- Device Profile ----
Unit CSG84$PNB0
Product Name CIXCD (XMI to CI Adapter)
---- MSCP Logged Msg ----
Logged Message Type Code 3. Port Message
Error Type/SubType x4106 Cable Status Change, Path #0.
Loopback went from GOOD to BAD.
Count - Remaining Retries 50.
Error Count 1.
Local Station Address x000000000000000D
Local Station ID x0000000000004DE8
Remote Station Address x0000FFFFFFFFFFFF <- ***
Unavailable
Remote Station ID x0000000000000000 <- ***
Unavailable
*** NOTE that no remote CI-station address is available
*************************************************************
o CIPCA device-registers are not properly read and collected
into the port-descriptor-table (PDT$) by CIPCA.MAR/READ_REG:
routine. This prevents accurate diagnosis of CIPCA adapter or
port-driver errors by the CSCs or VMS Engineering.
o Two problems are corrected:
- CIPCA CORRUPTED CRCTX & BADDALRQSIZ BUGCHECK
A BADDALRQSIZ bugcheck will result, following port-crash/reset
on Alpha systems with more than 1 Gb of memory. This improper
CRCTX "free-queue" reset causes an NPAGEDYN pool-leak of 64
CRCTX buffers (96 bytes x 64 = 6144 bytes) for each CIPCA
device reset.
- DEVICE INIT-FAILURE BAP NPORT-CARRIER LEAKAGE
BAP pool leakage will be seen after CIMNA, CIPCA, or KFMSB
device-initialization failure. For CIPCA/CIMNA, all 14
NPORT stopper-CRRRs will be lost on each port reinit attempt,
accumulating to 700 after the allowed 50 retries.
(CRRR size = 192 bytes x 700 = 134,400 bytes).
o Following a CIPCA, CIXCD, or KFMSB port-reset (UCB$L_ERTCNT
retry-count decrements), the VC_CHK_TIME's (virtual-circuit-timeout)
deallocated TQE may be unintentionally returned to and re-queued by
EXE$SWTIMER_FORK::/SYSUB: system-routine from SYS$PN/PCAdriver. The
64-byte (0x40 byte) non-paged-pool lookaside list will be corrupted,
and incorrectly linked into EXE$GL_TQFL.
The TQE-requeue will *ONLY* occur if TQE$V_REPEAT bit (byte-offset
0x0B, bit<2>) is set when VC_CHK_TIME: deallocates the TQE. Either
by POOLCHECK with "deallocate poison pattern bit<2>=1"; or if the
TQE is immediately reallocated before VC_CHK_TIME returns to
EXE$SWTIMER_FORK::/SYSUB:.
Problems Addressed in ALPDRIV17_H3062:
o Messages such as:
%PNA0, Inappropriate SCA Control Message - FLAGS/OPC/STATUS/PORT
00/00/00/00
may appear on the console, with associated errorlog messages,
on systems with HSX disk controllers.
o Following a CI-port MFQE (message-free-queue-empty) interrupt,
with no SCS-credit deficit (not in "optimistic SCS-credit
mgmt. mode": MFQ entry-count = SCS Rcv-credits), a subsequent
legitimate MFQE interrupt (with SCS-credit deficit) will
result in a series of secondary errors causing port-resets,
never expanding the MFQ queue, and posting of a series of these
error-log entries (key ID: error-type/sub-type = 0x8102):
Logging OS 1. OpenVMS
System Architecture 2. Alpha
OS version V7.1
Event sequence number 3653.
Timestamp of occurrence 20-OCT-1997 00:01:27
Time since reboot 2 Day(s) 12:14:28
Host name GDC140
System Model AlphaServer 8400 Model
5/300
Entry type 98. Asynchronous Device
Attention
---- Device Profile ----
Unit GDC140$PNA0
Product Name CIXCD (XMI to CI Adapter)
------ Adapter Data -----
Error Type/SubType x8102 Hardware Error, Unspeci-
fied Port
Hardware Error.
Port will be RE-STARTED.
Count - Remaining Retries 36.
CASR x00000001 Bit 0: Message Free Que
EXHAUSTED
(AMFQE)
AMCSR x00000004 Bit 2: Interrupt ENABLE
(IE)
PESR xFFFFFFFF
XDEV x05110C2F Device Type is: 0x0C2F
= CIMNA
Device Revision is: 0x11
= A1
Firmware Revision is:0x05
= V-5
ASNR x00000001
XBER x00000040 XMI Node ID is: 1.
Commander ID is: 2 =
Microcode CMDR
XFADR xFFFFFE00
XFAER x73FF0FFF
PDCSR x00000001
PFAR x0000055C
Extra Longword 1 x00000000
Extra Longword 2 x00000000
Extra Longword 3 x00000000
----- Software Info -----
UCB$x_ERTCNT 128. Retries Remaining
UCB$x_ERTMAX 10. Retries Allowable
UCB$x_STS x00000000
UCB$x_ERRCNT 30. Errors This Unit
UCB$L_DEVCHAR1 x0C450000 Sharable
Available
Error Logging
Capable of Input
Capable of Output
o When using the ALPDRIV15_062 Cluster Ports TIMA Kit with
non-NPORT (non-CIPCA,CIMNA, or KFMSB) SCS-port drivers,
NPAGEDYN pool-corruption will occur in pool following the
end of each non-NPORT PDT (port-descriptor-table:
1-per-SCS-port).
Symptoms will vary according to how this NPAGEDYN is currently
used but could consist of INVEXCPTN, SSRVEXCPTN, and other
ACCVIOs.
Problems Addressed in ALPDRIV12_H3062:
o On Alpha systems, with many Virtual Circuit failures, the
system may BUGCHECK with a CLUEXIT or may simply hang.
Within the subsequent dump, many CDTs (Connection
Descriptor Table) in DISC_MATCH will be seen and there will
be no free CDTs.
o These problems only affect Turbolaser AS8200/8400 capable
of exceeding 4 gigabyte memory sizes.
CIMNA (NPORT CIXCD XMI-to-CI adapter for Laser/Turbolaser) and
KFMSB (XMI-to-DSSI) adapters will fail to initialize or start
under OpenVMS if non-paged-dynamic (NPAGEDYN) pool contains
PFNs (physical pages) over 4 gigabytes (PA > 32-bits), and,
BAP (bus-addressable-pool) is merged with NPAGEDYN due to the
absence of a PCI bus on the system. If any of the NPORT
structures (ABLK, AMPB, QBUFs, CRRRs, BDL, BDLT) contain
physical addresses (PA) > 32-bits, these devices fail to
start, producing the following errors. CDTs appear in various
states when examined with the "SDA>".
1. CIMNA ERRORS
===============
The CIMNA will exhibit "port-timeouts" or XMI transaction-
timeout (TTO) memory-system errors on boot, such as:
TURBOLASER CONSOLE LOG:
-----------------------
%PNA0, Port Error Bit(s) Set -
CNF/PMC/PSR 08110C2F/00000004/00000208
%PNA0, Port is Reinitializing (48 Retries Left). Check the
Error Log.
----------------------------------------------
%PNA0, CI port timeout.
%PNA0, Port is Reinitializing (49 Retries Left). Check the
Error Log.
----------------------------------------------
CIMNA ERROR LOG ENTRY:
----------------------
********************** ENTRY 2 *****************************
Logging OS 1. OpenVMS
System Architecture 2. Alpha
OS version V7.1
Event sequence number 1.
Timestamp of occurrence 01-JAN-1996 00:00:04
Time since reboot 0 Day(s) 0:00:04
Host name ANDA1A
System Model AlphaServer 8400 5/300
Entry type 98. Asynchronous Device Attention
---- Device Profile ----
Unit ANDA1A$PNA0
Product Name CIXCD (XMI to CI Adapter)
------ Adapter Data -----
Error Type/SubType x8102 Hardware Error,Unspecified Port
Hardware Error. Port will be RE-STARTED.
Count - Remaining Retries 50.
CASR 00000208 Bit 3: Memory System ERROR (MSE)
Bit 9: Uninitialize State (UNIN)
AMCSR x00000004 Bit 2: Interrupt ENABLE (IE)
PESR xFFFFFFFF
XDEV x08110C2F Device Type is: 0x0C2F = CIMNA
Device Revision is: 0x11 = A1
Firmware Revision is: 0x08 = V-8
ASNR x00000208
XBER x8000A060 Bit 13: Transaction Timeout (TTO)
Bit 15: Command NoAck (CNAK)
Bit 31: Error Summary (ES)
XMI Node ID is: 1.
Commander ID is: 3 = INTR
XFADR xFFFFFFFF XMI Failing Addr[00:28]: x1FFFFFFF
XMI Failing Addr[39]: x00000001
Failing Length: x00000003
XFAER x13FF0000 Mask[00:15]: x00000000
XMI Failing Addr[29:38]: x000003FF
XMI Failing Command: 1, READ
PDCSR x00000208
PFAR x0000055C
Extra Longword 1 x00000000
Extra Longword 2 x00000000
Extra Longword 3 x00000000
----- Software Info -----
UCB$x_ERTCNT 0. Retries Remaining
UCB$x_ERTMAX 0. Retries Allowable
UCB$x_STS x10000000
UCB$x_ERRCNT 1. Errors This Unit
UCB$L_DEVCHAR1 x0C450000 Sharable
Available
Error Logging
Capable of Input
Capable of Output
************************************************************
2. KFMSB ERRORS
===============
TURBOLASER CONSOLE LOG
----------------------
"Port Error Bit(s) Set - CNF/PMC/PSR
xxxxxxxx/xxxxxxxx/05008010"
NOTE: The PSR (taken from the ASR: Adapter Status Register)
translates to:
- <04> Adapter Abnormal Condition
- <15> Channel 1 flag
- <30:24> (=5) Illegal Carrier Address
o SCS SYSAP data transfer mapping requests will generate
incorrect (miscalculated) physical address pointers, causing
disk/tape data-transfer corruption, if the page_offset
requested extends beyond the first page (>8Kb-1: Alpha
page-size) of the requested transfer (page defined by SVAPTE
in SCS$MAP request). SYS$SCS sources the page_offset from
CDRP$L_BOFF, and sources SVAPTE from CDRP$L_SVAPTE, both of
which are supplied by the SCS client SYSAP (DUDRIVER, CNXMGR,
etc.). OpenVMS SCS SYSAPS are not believed to use CDRP$L_BOFF
values > 8k-1 (Alpha page), but user-written SCS SYSAP
applications might use a value > 13-bits since CDRP$L_BOFF is
32-bits (formerly CDRP$W_BOFF/16-bits).
o Performance is degraded on CIXCD and CIPCA based systems,
when communicating with other NADP (non-alternating-dual-path)
nodes during single-CI-path operation. This results in
CI-cable failure/removal or CI single-path failures
(NO_RESPONSE, NAK errors). NADP-supporting nodes currently
are HSJ40, HSJ50, CIPCA, and CIMNA/CIXCD.
o The CIPCA will not properly re-initialize after a PCI-DMA-Engine
"bus error" (PCI bus master abort or target abort). The
port-driver will continually fail to retry the re-initialize
until the 50 retry count is expired. The console OPA0 output is
typically as follows:
%PNB0, Port Error Bit(s) Set - NODESTS/CASR(H)/(L)
%PNB0, Port is Reinitializing (48 Retries Left).
Check the Error Log.
o When booting Alpha machines the console may display messages
such as:
%PNA0, Inappropriate SCA Control Message - FLAGS/OPC/STATUS/PORT
00/00/00/00
on the console - with associated errorlog messages.
o The CIPCA Direct-DMA (DDMA) pool will not correctly initialize
on AS4100 systems running OpenVMS V6.2-1H3 or the V6.2 remedial
stream with greater than 1 Gb. of memory. One of the following
3 symptoms will occur following an NPAGEDYN expansion event,
without a DDMA-pool when > 1 Gb. of memory is present:
o SPINWAIT system-crash
o System-hang which will respond to a ^P HALT request, and
will generate a forced crash if the system-disk/dump-file
is NOT on a CIPCA;
o System-hang which will not respond to a ^P HALT request.
A system-reset (front-panel reset switch) is required to
clear. NO DUMP is created.
NOTE: All crashes/hangs also resulted in CIPCA LED
error-code=PCI-DMA-ENGINE-RING-ERROR-1/0
(code= 0x01C or 0x01B/
System data cells SCS$GQ_DDMA_BASE & SCS$GQ_DDMA_LEN will
both contain a "00000000" value on AS4100 systems with
CIPCA and 1 Gb. of memory (use SDA> to examine).
INSTALLATION NOTES:
The images in this kit will not take effect until the system is
rebooted. If there are other nodes in the VMScluster, they must
also be rebooted in order to make use of the new image(s).
If it is not possible or convenient to reboot the entire cluster at
this time, a rolling re-boot may be performed.
o Multiprocessor Systems with CIPCAs: SMP_SPINWAIT Restriction
If your system uses a CIPCA adapter and you operate with
MULTIPROCESSING set to a non-zero value, you must reset the
value of the SMP_SPINWAIT parameter to 300000 (3 seconds)
instead of the default 100000 (1 second).
If you do not change the value of SMP_SPINWAIT, a CIPCA adapter
error could generate a CPUSPINWAIT system bugcheck similar to
the following:
**** OpenVMS (TM) Alpha Operating System V7.1 - BUGCHECK ****
** Code=0000078C: CPUSPINWAIT, CPU spinwait timer expired
This restriction will be removed in a future OpenVMS release.
Note:
This release note supersedes a similar release note, note
4.15.2.4.5, in the OpenVMS Version 7.1 Release Notes manual as
well as 6.2-1H3 sec:1.13.1, which also included a SYSTEM_CHECK
parameter restriction. The SYSTEM_CHECK parameter restriction
is incorrect. Furthermore, the earlier release note stated
that the change to the SMP_SPINWAIT parameter was required for
a MULTIPROCESSING parameter setting of 1 or 2. This
requirement applies to all non-zero MULTIPROCESSING parameter
settings.
o This ALPPORTS01_062 remedial kit removes the KFMSB/HSD10
booting restriction that was listed in the ALPDRIV17_H3062
remedial kit. The ALPPORTS01_062 kit can be used in
KFMSB/HSD10 boot configurations.
This patch can be found at any of these sites:
Colorado Site
Georgia Site
Files on this server are as follows:
alpports01_062.README
alpports01_062.CHKSUM
alpports01_062.CVRLET_TXT
alpports01_062.a-dcx_axpexe
|