SEARCH CONTACT US SUPPORT SERVICES PRODUCTS STORE
United States    
COMPAQ STORE | PRODUCTS | SERVICES | SUPPORT | CONTACT US | SEARCH
gears
compaq support options
support home
software & drivers
ask Compaq
reference library
support forum
frequently asked questions
support tools
warranty information
service centers
contact support
product resources
parts for your system
give us feedback
associated links
.
} what's new
.
} contract access
.
} browse patch tree
.
} search patches
.
} join mailing list
.
} feedback
.
patches by topic
.
} DOS
.
} OpenVMS
.
} Security
.
} Tru64 Unix
.
} Ultrix 32
.
} Windows
.
} Windows NT
.
connection tools
.
} nameserver lookup
.
} traceroute
.
} ping
OpenVMS ALPPORTS01_071 Alpha V7.1 PCA/PNDRIVER _ SCS ECO Summary

TITLE: OpenVMS ALPPORTS01_071 Alpha V7.1 PCA/PNDRIVER _ SCS ECO Summary Modification Date: 03-NOV-98 Modification Type: Updated Kit Supersedes ALPDIRV08_071 NOTE: An OpenVMS saveset or PCSI installation file is stored on the Internet in a self-expanding compressed file. The name of the compressed file will be kit_name-dcx_vaxexe for OpenVMS VAX or kit_name-dcx_axpexe for OpenVMS Alpha. Once the file is copied to your system, it can be expanded by typing RUN compressed_file. The resultant file will be the OpenVMS saveset or PCSI installation file which can be used to install the ECO. Copyright (c) Compaq Computer Corporation 1998. All rights reserved. PRODUCT: DIGITAL OpenVMS Alpha COMPONENT: SYS$PNDRIVER.EXE SYS$PCADRIVER.EXE SYS$SCS.EXE SOURCE: Compaq Computer Corporation ECO INFORMATION: ECO Kit Name: ALPPORTS01_071 ECO Kits Superseded by This ECO Kit: ALPDRIV08_071 ALPDRIV04_071 ECO Kit Approximate Size: 1134 Blocks Kit Applies To: OpenVMS Alpha V7.1, V7.1-1H1, V7.1-1H2 System/Cluster Reboot Necessary: Yes Rolling Re-boot Supported: Yes Installation Rating: INSTALL_2 2 - To be installed on all systems running the listed version(s) of OpenVMS and using the following feature(s): OpenVMS Clusters Kit Dependencies: The following remedial kit(s) must be installed BEFORE installation of this kit: None In order to receive all the corrections listed in this kit, the following remedial kits should also be installed: None ECO KIT SUMMARY: An ECO kit exists for PCA/PNDRIVER.EXE and SCS.EXE on OpenVMS Alpha V7.1 through V7.1-1H2. This kit addresses the following problems: Problems addressed in ALPPORTS01_071: o Use of CIXCD/CIPCA (NPORT) FAST_PATH I/O (performance enhancement feature) mixed with non-FAST_PATH I/O, under heavy I/O loads, can cause CIPCA-related "invalid scatter-gather map register" machine check system-crashes and NPAGEDYN pool-corruption. NOTE: FAST_PATH is only available under Alpha OpenVMS V7.0 and later. FAST_PATH is enabled by SYSGEN parameter "FAST_PATH"=1, which defaults to "0" under OpenVMS V7.0/V7.1. o An AlphaServer node booting into a CI cluster, with CIPCA or CIXCD, may fail to join the cluster and hang. On node boot-up, or after virtual-circuit failure recovery, with CIPCA or CIXCD adapters, an SCS "CONNECTION-REQUEST" (CON_REQ) SCS-control message may be lost. This will suspend all SCS-sysap connection formation activity on a given CI-virtual-circuit (SCS path-block SCSMSG lost). Under V6.2/V6.2-1Hx, this problem was responsible for the "Virtual-Circuit-Timeout" errors frequently seen on booting Alpha/CIPCA/CIXCD nodes. OpenVMS V7.1 changes to VC-timeout detection to reduce "nuisance errors" caused the SCS-lost-message hang described here. Using SDA> SHOW CONNECTION on a crash-dump from a system with the "lost SCS control message" will show 1 sysap in "CON_SENT", and 0 or more sysaps on xxx_pend state: SDA> SHOW CONNECTION --- CDT Summary Page --- CDT Address Local Process Connection ID State Remote ----------- ------------- ------------- ----- -------- 8105D720 SCS$DIRECTORY DB1F0000 listen . 8105E530 SCS$DIR_LOOKUP DB1F0009 con_sent ADEBUG . 8xxxxxxx SCS$DIRECTORY DB1F000A accp_pend ADEBUG . 8xxxxxxx VMS$DISK_CL_DRVR 6DB70006 con_pend PTMANB . . ------------------------------------------------ Using SDA> SHOW CLUSTER/SCS to find the path-block for the problem virtual-circuit, note that "SCS MSGBUF" is empty, confirming loss of the single SCS-control-message allocated for each virtual-circuit (path-block): VMScluster data structures -------------------------- --- Path Block (PB) 80DAFF40 --- Status: 0020 credit Remote sta. addr. 000000000005 Remote port type CIPCA Remote state ENAB Number of data paths 2 Remote hardware rev. 00000015 Cables state A-OK B-OK Remote func. mask ABFF0D00 Local state OPEN Reseting port 00 Port dev. name PNA0 Handshake retry cnt. 1 SCS MSGBUF address 00000000 ======== Msg. buf. wait queue 80DAFF78 PDT address 80DA6B00 -------------------------------------------- Confirming symptoms on remote "victim" nodes is not as reliable or foolproof. The typical SCS-connection state, from SDA> SHOW CONNECTIONS would show SCS-sysap-process connections hung in CON_ACK state, since the remote/culprit node has lost the SCS-control-message for returning an "ACCP_REQ" (accept request): --- CDT Summary Page --- CDT Address Local Process Connection ID State Remote ----------- ------------- ------------- ----- ------ 8105D720 SCS$DIRECTORY DB1F0000 listen . 8105E530 SCS$DIR_LOOKUP DB1F0009 con_ack ANDA1A o CIXCD (CIMNA) and CIPCA adapters will generate MFQE (Message-Free- Queue-Empty) interrupts, causing a CI-adapter "RESET" and temporary loss of all virtual-circuits (mount-verification, etc.) when using the StorageWorks Control Console (SWCC V2.0) or HSJ-console monitoring scripts using FYDRIVER/MSCP$DUP. The following console messages and DECevent-formatted error-log messages will be seen: CONSOLE: %PNA0 - Software Shutting Down Port ERROR-LOG: **************************** ENTRY 1 *********************** Logging OS 1. OpenVMS System Architecture 2. Alpha OS version V7.1 Event sequence number 1. Timestamp of occurrence 01-JAN-1996 00:00:11 Time since reboot 0 Day(s) 0:00:11 Host name CSG84 System Model AlphaServer 8400 Model 5/300 Entry type 100. Logged Message ---- Device Profile ---- Unit CSG84$PNB0 Product Name CIXCD (XMI to CI Adapter); ** OR ** CIPCA (PCI to CI Adatper) ---- MSCP Logged Msg ---- Logged Message Type Code 3. Port Message Error Type/SubType xC002 Signaled via Packet, Software SHUTTING DOWN Port. Port will be RE-STARTED. Count - Remaining Retries 50. . . . *************************************************************** o System fails to boot OpenVMS V7.1 when boot path is KFMSB (XMI-to-DSSI)/HSD10. The following console and error-log events are generated: CONSOLE: %PNB0 - Software Shutting Down Port DECEVENT-FORMATTED ERROR-LOG: ************************** ENTRY 1 *********************** Logging OS 1. OpenVMS System Architecture 2. Alpha OS version V7.1 Event sequence number 1. Timestamp of occurrence 01-JAN-1996 00:00:11 Time since reboot 0 Day(s) 0:00:11 Host name CSG84 System Model AlphaServer 8400 Model 5/300 Entry type 100. Logged Message ---- Device Profile ---- Unit CSG84$PNB0 Product Name KFMSB (XMI to DSSI Adapter) ---- MSCP Logged Msg ---- Logged Message Type Code 3. Port Message Error Type/SubType xC002 Signaled via Packet, Software SHUTTING DOWN Port. Port will be RE-STARTED. Count - Remaining Retries 50. Error Count 1. Local Station Address x0000000000000007 . . . ************************************************************* o On booting VMS, both the CIXCD and CIPCA will generate "Path LOOPBACK" error-messages on the console and in the error-log. This error has occurred since initial release of Alpha OpenVMS V1.0 and since the CIPCA was introduced with V6.2-1H2. The following console and error-log entries will appear: CONSOLE: %PNA0, Path #0. Loopback has gone from GOOD to BAD %PNB0, Path #0. Loopback has gone from GOOD to BAD DECEVENT-FORMATTED ERROR-LOG: ************************ ENTRY 6 ************************* Logging OS 1. OpenVMS System Architecture 2. Alpha OS version V6.2-1H3 Event sequence number 1. Timestamp of occurrence 03-NOV-1997 13:53:48 Time since reboot 0 Day(s) 0:00:20 Host name CSG84 System Model AlphaServer 8400 Model 5/300 Entry type 100. Logged Message ---- Device Profile ---- Unit CSG84$PNB0 Product Name CIXCD (XMI to CI Adapter) ---- MSCP Logged Msg ---- Logged Message Type Code 3. Port Message Error Type/SubType x4106 Cable Status Change, Path #0. Loopback went from GOOD to BAD. Count - Remaining Retries 50. Error Count 1. Local Station Address x000000000000000D Local Station ID x0000000000004DE8 Remote Station Address x0000FFFFFFFFFFFF <- *** Unavailable Remote Station ID x0000000000000000 <- *** Unavailable *** NOTE that no remote CI-station address is available ************************************************************* o CIPCA device-registers are not properly read and collected into the port-descriptor-table (PDT$) by CIPCA.MAR/READ_REG: routine. This prevents accurate diagnosis of CIPCA adapter or port-driver errors by the CSCs or VMS Engineering. o Two problems are corrected: - CIPCA CORRUPTED CRCTX & BADDALRQSIZ BUGCHECK A BADDALRQSIZ bugcheck will result, following port-crash/reset on Alpha systems with more than 1 Gb of memory. This improper CRCTX "free-queue" reset causes an NPAGEDYN pool-leak of 64 CRCTX buffers (96 bytes x 64 = 6144 bytes) for each CIPCA device reset. - DEVICE INIT-FAILURE BAP NPORT-CARRIER LEAKAGE BAP pool leakage will be seen after CIMNA, CIPCA, or KFMSB device-initialization failure. For CIPCA/CIMNA, all 14 NPORT stopper-CRRRs will be lost on each port reinit attempt, accumulating to 700 after the allowed 50 retries. (CRRR size = 192 bytes x 700 = 134,400 bytes). o NPORT message and carrier BAP allocation failures are mis-reported as "Insufficient pool" on console and in errorlog, when a BAP (BUS ADDRESSABLE POOL) shortage should be identified: CONSOLE: "%PNA0, Insufficient Non-paged Pool for Initialization " Note that BAP is controlled by SYSGEN parameters, "NPAG_BAP_MIN, NPAG_BAP_MAX, and NPAG_BAP_MAX_PA". These parameters are are properly set by running OpenVMS AUTOGEN with "FEEDBACK" after a VMS installation or upgrade. o Following a CIPCA, CIXCD, or KFMSB port-reset (UCB$L_ERTCNT retry-count decrements), the VC_CHK_TIME's (virtual-circuit-timeout) deallocated TQE may be unintentionally returned to and re-queued by EXE$SWTIMER_FORK::/SYSUB: system-routine from SYS$PN/PCAdriver. The 64-byte (0x40 byte) non-paged-pool lookaside list will be corrupted, and incorrectly linked into EXE$GL_TQFL. The TQE-requeue will *ONLY* occur if TQE$V_REPEAT bit (byte-offset 0x0B, bit<2>) is set when VC_CHK_TIME: deallocates the TQE. Either by POOLCHECK with "deallocate poison pattern bit<2>=1"; or if the TQE is immediately reallocated before VC_CHK_TIME returns to EXE$SWTIMER_FORK::/SYSUB:. Problems addressed in ALPDRIV08_071: o Messages such as: %PNA0, Inappropriate SCA Control Message - FLAGS/OPC/STATUS/PORT 00/00/00/00 may appear on the console, with associated errorlog messages, on systems with HSX disk controllers. o Following a CI-port MFQE (message-free-queue-empty) interrupt, with no SCS- credit deficit (not in "optimistic SCS-credit mgmt. mode": MFQ entry-count = SCS Rcv-credits), a subsequent legitimate MFQE interrupt (with SCS-credit deficit) will result in a series of secondary errors causing port-resets, never expanding the MFQ queue, and posting of a series of these error-log entrys (key ID: error-type/sub-type = 0x8102): Logging OS 1. OpenVMS System Architecture 2. Alpha OS version V7.1 Event sequence number 3653. Timestamp of occurrence 20-OCT-1997 00:01:27 Time since reboot 2 Day(s) 12:14:28 Host name GDC140 System Model AlphaServer 8400 Model 5/300 Entry type 98. Asynchronous Device Attention ---- Device Profile ---- Unit GDC140$PNA0 Product Name CIXCD (XMI to CI Adapter) ------ Adapter Data ----- Error Type/SubType x8102 Hardware Error, Unspeci- fied Port Hardware Error. Port will be RE-STARTED. Count - Remaining Retries 36. CASR x00000001 Bit 0: Message Free Que EXHAUSTED (AMFQE) AMCSR x00000004 Bit 2: Interrupt ENABLE (IE) PESR xFFFFFFFF XDEV x05110C2F Device Type is: 0x0C2F = CIMNA Device Revision is: 0x11 = A1 Firmware Revision is:0x05 = V-5 ASNR x00000001 XBER x00000040 XMI Node ID is: 1. Commander ID is: 2 = Microcode CMDR XFADR xFFFFFE00 XFAER x73FF0FFF PDCSR x00000001 PFAR x0000055C Extra Longword 1 x00000000 Extra Longword 2 x00000000 Extra Longword 3 x00000000 ----- Software Info ----- UCB$x_ERTCNT 128. Retries Remaining UCB$x_ERTMAX 10. Retries Allowable UCB$x_STS x00000000 UCB$x_ERRCNT 30. Errors This Unit UCB$L_DEVCHAR1 x0C450000 Sharable Available Error Logging Capable of Input Capable of Output Problems addressed in ALPDRIV04_071: o On Alpha systems, with many Virtual Circuit failures, the system will finally BUGCHECK with a CLUEXIT - or may simply hang. Within the subsequent dump, many CDTs (Connection Descriptor Table) in DISC_MATCH will be seen - and there will be no free CDTs. o These problems only affect Turbolaser AS8200/8400 capable of exceeding 4 gigabyte memory sizes. CIMNA (NPORT CIXCD XMI-to-CI adapter for Laser/Turbolaser) and KFMSB (XMI-to-DSSI) adapters will fail to initialize or start under OpenVMS if non-paged-dynamic (NPAGEDYN) pool contains PFNs (physical pages) over 4 gigabytes (PA > 32-bits), and, BAP (bus-addressable-pool) is merged with NPAGEDYN due to absence of PCI bus on the system. If any of the NPORT structures (ABLK, AMPB, QBUFs, CRRRs, BDL, BDLT) contain physical addresses (PA) > 32-bits, these devices fail to start, producing the following errors. CDTs appear in various states when examined with the "SDA> 1. CIMNA ERRORS =============== The CIMNA will exhibit "port-timeouts" or XMI transaction- timeout (TTO) memory-system errors on boot, such as: TURBOLASER CONSOLE LOG: ----------------------- %PNA0, Port Error Bit(s) Set - CNF/PMC/PSR 08110C2F/00000004/00000208 %PNA0, Port is Reinitializing (48 Retries Left). Check the Error Log. ---------------------------------------------- %PNA0, CI port timeout. %PNA0, Port is Reinitializing (49 Retries Left). Check the Error Log. ---------------------------------------------- CIMNA ERROR LOG ENTRY: ---------------------- ********************** ENTRY 2 ***************************** Logging OS 1. OpenVMS System Architecture 2. Alpha OS version V7.1 Event sequence number 1. Timestamp of occurrence 01-JAN-1996 00:00:04 Time since reboot 0 Day(s) 0:00:04 Host name ANDA1A System Model AlphaServer 8400 5/300 Entry type 98. Asynchronous Device Attention ---- Device Profile ---- Unit ANDA1A$PNA0 Product Name CIXCD (XMI to CI Adapter) ------ Adapter Data ----- Error Type/SubType x8102 Hardware Error,Unspecified Port Hardware Error. Port will be RE-STARTED. Count - Remaining Retries 50. CASR 00000208 Bit 3: Memory System ERROR (MSE) Bit 9: Unintialize State (UNIN) AMCSR x00000004 Bit 2: Interrupt ENABLE (IE) PESR xFFFFFFFF XDEV x08110C2F Device Type is: 0x0C2F = CIMNA Device Revision is: 0x11 = A1 Firmware Revision is: 0x08 = V-8 ASNR x00000208 XBER x8000A060 Bit 13: Transaction Timeout (TTO) Bit 15: Command NoAck (CNAK) Bit 31: Error Summary (ES) XMI Node ID is: 1. Commander ID is: 3 = INTR XFADR xFFFFFFFF XMI Failing Addr[00:28]: x1FFFFFFF XMI Failing Addr[39]: x00000001 Failing Length: x00000003 XFAER x13FF0000 Mask[00:15]: x00000000 XMI Failing Addr[29:38]: x000003FF XMI Failing Command: 1, READ PDCSR x00000208 PFAR x0000055C Extra Longword 1 x00000000 Extra Longword 2 x00000000 Extra Longword 3 x00000000 ----- Software Info ----- UCB$x_ERTCNT 0. Retries Remaining UCB$x_ERTMAX 0. Retries Allowable UCB$x_STS x10000000 UCB$x_ERRCNT 1. Errors This Unit UCB$L_DEVCHAR1 x0C450000 Sharable Available Error Logging Capable of Input Capable of Output ************************************************************ 2. KFMSB ERRORS =============== TURBOLASER CONSOLE LOG ---------------------- "Port Error Bit(s) Set - CNF/PMC/PSR xxxxxxxx/xxxxxxxx/05008010" NOTE: The PSR (taken from the ASR: Adapter Status Register) translates to: - <04> Adapter Abnormal Condition - <15> Channel 1 flag - <30:24> (=5) Illegal Carrier Address o SCS sysap data-transfer mapping requests will generate incorrect (mis-calculated) physical-address pointers, causing disk/tape data-transfer corruption, if the page_offset requested extends beyond the first page (>8Kb-1: Alpha page-size) of the requested transfer (page defined by SVAPTE in SCS$MAP request). SYS$SCS sources the page_offset from CDRP$L_BOFF, and sources SVAPTE from CDRP$L_SVAPTE, both of which are supplied by the SCS client sysap (DUDRIVER, CNXMGR, etc.). OpenVMS SCS sysaps are not believed to use CDRP$L_BOFF values > 8k-1 (Alpha page), but user-written SCS sysap applications might use a value > 13-bits since CDRP$L_BOFF is 32-bits (formerly CDRP$W_BOFF/16-bits). o Performance is degraded on CIXCD and CIPCA based systems, when communicating with other NADP (non-alternating-dual-path) nodes during single-CI-path operation. This results in CI-cable failure/removal or CI single-path failures (NO_RESPONSE, NAK errors). NADP-supporting nodes currently are HSJ40, HSJ50, CIPCA, and CIMNA/CIXCD. o The CIPCA will not properly re-init after a PCI-DMA-Engine "bus error" (PCI bus master abort or target abort). The port-driver will continually fail to retry the re-init until the 50 retry count is expired. The console OPA0 output is typically as follows: %PNB0, Port Error Bit(s) Set - NODESTS/CASR(H)/(L) 02800001/001C0000/000001D0 %PNB0, Port is Reinitializing ( 48 Retries Left). Check the Error Log. o When booting Alpha machines the console may display messages such as: %PNA0, Inappropriate SCA Control Message - FLAGS/OPC/STATUS/PORT 00/00/00/00 on the console - with associated errorlog messages. INSTALLATION NOTES: The images in this kit will not take effect until the system is rebooted. If there are other nodes in the VMScluster, they must also be rebooted in order to make use of the new image(s). If it is not possible or convenient to reboot the entire cluster at this time, a rolling re-boot may be performed. o Multiprocessor Systems with CIPCAs: SMP_SPINWAIT Restriction If your system uses a CIPCA adapter and you operate with MULTIPROCESSING set to a non-zero value, you must reset the value of the SMP_SPINWAIT parameter to 300000 (3 seconds) instead of the default 100000 (1 second). If you do not change the value of SMP_SPINWAIT, a CIPCA adapter error could generate a CPUSPINWAIT system bugcheck similar to the following: **** OpenVMS (TM) Alpha Operating System V7.1 - BUGCHECK **** ** Code=0000078C: CPUSPINWAIT, CPU spinwait timer expired This restriction will be removed in a future OpenVMS release. NOTE: This release note supersedes a similar release note, note 4.15.2.4.5, in the OpenVMS Version 7.1 Release Notes manual as well as 6.2-1H3 sec:1.13.1, which also included a SYSTEM_CHECK parameter restriction. The SYSTEM_CHECK parameter restriction is incorrect. Furthermore, the earlier release note stated that the change to the SMP_SPINWAIT parameter was required for a MULTIPROCESSING parameter setting of 1 or 2. This requirement applies to all non-zero MULTIPROCESSING parameter settings. o This ALPPORTS01_071 remedial kit removes the KFMSB/HSD10 booting restriction that was listed in the ALPDRIV08_071 remedial kit. The ALPPORTS01_071 kit can be used in KFMSB/HSD10 boot configurations.



This patch can be found at any of these sites:

Colorado Site
Georgia Site



Files on this server are as follows:

alpports01_071.README
alpports01_071.CHKSUM
alpports01_071.CVRLET_TXT
alpports01_071.a-dcx_axpexe

privacy and legal statement