RTR V2.2D RTRAVME0922D Reliable Transaction Router Alpha ECO Summary

TITLE: RTR V2.2D RTRAVME0922D Reliable Transaction Router Alpha ECO Summary NOTE: An OpenVMS saveset or PCSI installation file is stored on the Internet in a self-expanding compressed file. The name of the compressed file will be kit_name-dcx_vaxexe for OpenVMS VAX or kit_name-dcx_axpexe for OpenVMS Alpha. Once the file is copied to your system, it can be expanded by typing RUN compressed_file. The resultant file will be the OpenVMS saveset or PCSI installation file which can be used to install the ECO. Copyright (c) Compaq Computer Corporation 1997, 1999. All rights reserved. Modification Date: 07-SEP-1999 Modification Type: Updated Kit Supersedes RTRAVME0822D PRODUCT: Reliable Transaction Router for OpenVMS Alpha (RTR) OP/SYS: OpenVMS Alpha SOURCE: Compaq Computer Corporation ECO INFORMATION: ECO Kit Name: RTRAVME0922D ECO Kits Superseded by This ECO Kit: RTRAVME0822D ECO Kit Approximate Size: 13293 Blocks Saveset A - 189 Blocks Saveset B - 8316 Blocks Saveset C - 4347 Blocks Saveset D - 441 Blocks Kit Applies To: RTR V2.2D OpenVMS Alpha V6.1, V6.1-1H1, V6.1-1H2, V6.2, V6.2-1H1, V6.2-1H2, V6.2-1H3, V7.0 and V7.1 System/Cluster Reboot Necessary: No ECO KIT SUMMARY: An ECO kit exists for Reliable Transaction Router V2.2D on OpenVMS Alpha V6.1 through V7.1. This kit addresses the following problems: Problems addressed in the RTRAVME0922D kit: o When the last block of a contiguous set within the journal contains a particular pattern, with a valid record followed by an invalid one, journal re-reads the last record over and over again. This caused ACP to loop while trying to read a journal. Problems addressed in the RTRAVME0822D kit: o 14-8-258: An error in RTR V2.2D-ECO7 could cause the RTRACP process to intermittently BUGCHECK while it was processing network i/o. This has now been corrected. o 14-8-263: An error in RTR V2.2D-ECO7 could cause the RTRACP process to crash while it was reading the RTR journal file if the journal data from a previous run of RTR happened to end in a particular location. This could result in an inability to restart RTR after a failure, or cause the RTRACP to crash when taking over activity on behalf of another cluster node which had been stopped. This has now been corrected. The following is recommended if Alpha-VMS V7.1 is used: o An OpenVMS bug could cause RTRACP process to crash sporadically with an ACCVIO with a system space PC trying to access an address on the kernel stack. If you are running OpenVMS V7.1, V7.1-1H1, or V7.1-1H2, apply the OpenVMS ECO ALPSYSA02_071 before running RTR V2.2D, ECO8. Problems addressed in the RTRAVME0722D kit: o Network problems could lead to nodes becoming and remaining semi-connected. This state could lead to difficulties establishing quorum, as well as frontend to router connectivity problems. o Long facility names (31 characters) could sometimes cause the RTRACP process to crash with SS$_BADPARAM. o During periods of network disruption when the DNS was not available, some network connections were being rejected because the address-to-name translations were failing. Code has been added to improve this situation. o Some records in the journal could not be deleted, sometimes leading to transactions being seen when not expected. Problems addressed in the RTRAVNE0622D kit: o During a server failover from main to standby nodes, it was sometimes possible for the standby server to get permanently stuck in a WAIT_JNL state. This was mostly noticed on systems using multiple data partitions and error-prone communication links between the backends and routers. o During server failover to a standby node, some standby servers would get stuck in a STANDBY state permanently. This effect was most pronounced when there was a short network link loss between one of the routers and the standby node immediately after the main node failed. o Several errors during creation of a journal file, especially when duplicate or spurious journal files existed. o Some crashes caused by RTR's inability to cope with DECdtm recovery. o Transaction hangs caused by network glitches involving shadowed multi-participant transactions. Problems addressed in the RTRAVME0422D kit: o The Reliable Transaction Router ACP could crash if a set of concurrent servers performed a failover to a standby and the failing node's journal was not accessible. o A failover to a standby node that was not in the same cluster could sometimes result in the RTR ACP process looping and application processes hanging. o Monitoring a remote node that had many applications running could result in the error message "too much data" instead of a display. The amount of data that can be handled by remote monitoring has been increased to avoid this problem. Problems addressed in the RTRAVME0322D kit: o In the previous ECO version of RTR V2.2, the ASTPRM parameter was not being returned properly by the event AST. o Aborted transactions were causing loss of BYTLM quota when a server was using DDTM. o Some rare crashes could occur (with LIB$_BADTAGVAL or LIB$_BADBLOADR) caused by double deallocation of dynamic data structures during some race conditions on network link cleanup. o DECnet/OSI related crashes could occur due to corrupted data packets (e.g., after node shutdown). o A rare refusal of a node to re-establish a connection (due to a DECnet/OSI problem with corrupted optional data in connect request packets) has been corrected. o On rare occasions, a DELETE FACILITY or TRIM FACILITY command could hang due to a race condition in the internal lock manager. o A crash could occur in the Remote Client Handler when TCP/IP Services for OpenVMS (UCX) was improperly started on a node. Now RTR just recognizes the condition and does not use TCP/IP in the Remote Client Handler. The Remote Client Handler needs to be restarted after the UCX problem is cleared, otherwise remote clients using TCP/IP will fail to connect. o The RTR$_ABORT reason status RTR$_REPLYDIFF was not documented. RTR may abort a transaction with this status when there has been a failover from one instance of a server to another (for example, to a shadow server) and the replies from the second server do not exactly match those already received from the first server. RTR aborts the transaction in case the client application context depended upon a single server instance. The client application should restart the transaction. Problems addressed in the RTRAVME02D22 kit: o A rare ACP crash caused by an uninitialized network buffer pointer during a network "glitch" such as a network shutdown on a remote node. o More graceful handling of network "glitches" such as corrupted network packets. o Incorrect setting up of the ASTPRM on event delivery. Problems addressed in the RTRAVME01D22 kit: o On a primary/standby configuration involving multiple router nodes where the backend and frontends were on different nodes, a network link "glitch" could sometimes cause a problem. The transactions would be replayed to the standby backend node, and these transactions would then cause the systems to hang. o In a shadow server configuration, if one of the sites became unavailable during certain multiple failure scenarios, or if the surviving site was in the minority, the servers would remain waiting for the other site to come back. NOTE: This occurred because the surviving site should recover the transactions stored in the other node's journal. This was a problem if the other site was really down, since there was no manual override of this wait. In order to fix this problem, the TRIM FACILITY command has been modified so that if the other site is removed from the surviving site's configuration, then the servers will start processing online transactions. Note that this TRIM FACILITY command should be executed on all the nodes in the surviving site. Also, a corresponding EXTEND FACILITY should be executed on these nodes immediately prior to bringing the failed site back on line. NOTE: Please see the Release Notes supplied with this ECO for more details regarding RTR V2.2D. INSTALLATION NOTES: This ECO kit called RTRVVMECO0922D or RTRAVMECO0922D is a replacement for any previously installed RTR kit. Standard VMS installation procedures are applicable, as described in the RTR installation guide. If you are using RTR in a cluster, you need to execute the SYS$STARTUP:RTR$STARTUP.COM procedure on all nodes in the cluster, apart from the one where the VMSINSTAL is actually performed. A system reboot is not necessary. Note: The OpenVMS VAX kit for RTR V2.2D ECO09 (and previous OpenVMS VAX kits) have 4 savesets, i.e. A, B, C, E. The "D" saveset is intentionally missing.

This patch can be found at any of these sites:

Files on this server are as follows:

rtravme0922d.README
.CHKSUM
rtravme0922d.a-dcx_axpexe
rtravme0922d.b-dcx_axpexe
rtravme0922d.c-dcx_axpexe
rtravme0922d.d-dcx_axpexe
rtravme0922d.CVRLET_TXT