RTR V2.2D RTRVVME0722D Reliable Transaction Router VAX ECO Summary
TITLE: RTR V2.2D RTRVVME0722D Reliable Transaction Router VAX ECO Summary
NOTE: An OpenVMS saveset or PCSI installation file is stored
on the Internet in a self-expanding compressed file.
The name of the compressed file will be kit_name-dcx_vaxexe
for OpenVMS VAX or kit_name-dcx_axpexe for OpenVMS Alpha.
Once the file is copied to your system, it can be expanded
by typing RUN compressed_file. The resultant file will
be the OpenVMS saveset or PCSI installation file which
can be used to install the ECO.
Copyright (c) Compaq Computer Corporation 1997, 1999. All rights reserved.
Modification Date: 17-MAR-1999
Modification Type: Updated Kit Supersedes RTRVVME0622D
PRODUCT: Reliable Transaction Router for OpenVMS VAX (RTR)
OP/SYS: OpenVMS VAX
SOURCE: Compaq Computer Corporation
ECO INFORMATION:
ECO Kit Name: RTRVVME0722D
ECO Kits Superseded by This ECO Kit: RTRVVME0622D
RTRVVME0422D
RTRVVME0322D
RTRVVME02D22
RTRVVME01D22
ECO Kit Approximate Size: 11025 Blocks
Saveset A - 252 Blocks
Saveset B - 5796 Blocks
Saveset C - 3843 Blocks
Saveset E - 1134 Blocks
Kit Applies To: RTR V2.2D
OpenVMS VAX V6.1, V6.2, uiV6.2-0HF, V7.0, V7.1
System/Cluster Reboot Necessary: No
NOTE: RTRVVME0722D is a complete V2.2D kit.
A previous version of RTR V2.2 does
not need to be installed before
installing this kit. However, a valid
license must be installed.
ECO KIT SUMMARY:
An ECO kit exists for Reliable Transaction Router on OpenVMS VAX V6.1
through V7.1 This kit addresses the following problems:
Problems Addressed in the RTRVVME0722D kit:
o Network problems could lead to nodes becoming and remaining
semi-connected. This state could lead to difficulties
establishing quorum, as well as frontend to router connectivity
problems.
o Long facility names (31 characters) could sometimes cause the
RTRACP process to crash with SS$_BADPARAM.
o During periods of network disruption when the DNS was not
available, some network connections were being rejected because
the address-to-name translations were failing. Code has been
added to improve this situation.
o Some records in the journal could not be deleted, sometimes
leading to transactions being seen when not expected.
Problems Addressed in the RTRVVME0622D kit:
o During a server failover from main to standby nodes, it
was sometimes possible for the standby server to get
permanently stuck in a WAIT_JNL state.
o During server failover to a standby node, some standby
servers would get stuck in a STANDBY state permanently.
o Several errors during creation of a journal file,
especially when duplicate or spurious journal files
existed.
o RTR would sometimes crash recovering DECdtm coordinated
transactions.
o Transaction hangs caused by network glitches involving
shadowed multi-participant transactions.
o If DECdtm was involved while doing a standby failover, the
RTR ACP could fail with the status SS$_REJECT.
o A shadowed and multi-participant transaction in
particular failure modes could get stuck in the state
"RST" after the application was restarted.
o The link watch and isolation timer was sometimes too
strict for conditions where the underlying network was
overloaded.
Problems addressed in the RTRVVME0422D kit:
o The Reliable Transaction Router ACP could crash if a set of
concurrent servers performed a failover to a standby and the
failing node's journal was not accessible.
o A failover to a standby node that was not in the same cluster
could sometimes result in the RTR ACP process looping and
application processes hanging.
o Monitoring a remote node that had many applications running
could result in the error message "too much data" instead of a
display. The amount of data that can be handled by remote
monitoring has been increased to avoid this problem.
Problems addressed in the RTRVVME0322D kit:
o In the previous ECO version of RTR V2.2, the ASTPRM parameter was
not being returned properly by the event AST.
o Aborted transactions were causing loss of BYTLM quota when a server
was using DDTM.
o Some rare crashes could occur (with LIB$_BADTAGVAL or
LIB$_BADBLOADR) caused by double deallocation of dynamic data
structures during some race conditions on network link cleanup.
o DECnet/OSI related crashes could occur due to corrupted data
packets (e.g., after node shutdown).
o A rare refusal of a node to re-establish a connection (due to a
DECnet/OSI problem with corrupted optional data in connect request
packets) has been corrected.
o On rare occasions, a DELETE FACILITY or TRIM FACILITY command could
hang due to a race condition in the internal lock manager.
o A crash could occur in the Remote Client Handler when TCP/IP
Services for OpenVMS (UCX) was improperly started on a node.
Now RTR just recognizes the condition and does not use TCP/IP in
the Remote Client Handler. The Remote Client Handler needs to be
restarted after the UCX problem is cleared, otherwise remote
clients using TCP/IP will fail to connect.
o The RTR$_ABORT reason status RTR$_REPLYDIFF was not documented.
RTR may abort a transaction with this status when there has been a
failover from one instance of a server to another (for example, to
a shadow server) and the replies from the second server do not
exactly match those already received from the first server. RTR
aborts the transaction in case the client application context
depended upon a single server instance. The client application
should restart the transaction.
Problems addressed in the RTRVVME02D22 kit:
o A rare ACP crash caused by an uninitialized network buffer pointer
during a network "glitch" such as a network shutdown on a remote
node.
o More graceful handling of network "glitches" such as corrupted
network packets.
o Incorrect setting up of the ASTPRM on event delivery.
Problems addressed in the RTRVVME01D22 kit:
o On a primary/standby configuration involving multiple router nodes
where the backend and frontends were on different nodes, a network
link "glitch" could sometimes cause a problem. The transactions
would be replayed to the standby backend node, and these
transactions would then cause the systems to hang.
o In a shadow server configuration, if one of the sites became
unavailable during certain multiple failure scenarios, or if the
surviving site was in the minority, the servers would remain
waiting for the other site to come back.
NOTE: This occurred because the surviving site should recover the
transactions stored in the other node's journal.
This was a problem if the other site was really down, since there
was no manual override of this wait.
In order to fix this problem, the TRIM FACILITY command has been
modified so that if the other site is removed from the surviving
site's configuration, then the servers will start processing online
transactions.
Note that this TRIM FACILITY command should be executed on all the
nodes in the surviving site. Also, a corresponding EXTEND FACILITY
should be executed on these nodes immediately prior to bringing the
failed site back on line.
NOTE: Please see the Release Notes supplied with this ECO for more
details regarding RTR V2.2D.
INSTALLATION NOTES:
Before the installation of this ECO, RTR must be stopped.
RTRVVME0722D is a replacement for any previously installed RTR kit.
Standard VMS installation procedures are applicable, as described in
the RTR installation guide.
If you are using RTR in a cluster, you need to execute the
SYS$STARTUP:RTR$STARTUP.COM procedure on all nodes in the cluster, apart
from the one where the VMSINSTAL is actually performed.
A system reboot is not necessary.
NOTE: The OpenVMS VAX kit for RTR V2.2D ECO07 (and previous OpenVMS VAX
kits) have 4 savesets, i.e. A, B, C, E. The "D" saveset is intentionally
missing.
This patch can be found at any of these sites:
Colorado Site
Georgia Site
Files on this server are as follows:
rtrvvme0722d.README
.CHKSUM
rtrvvme0722d.a-dcx_vaxexe
rtrvvme0722d.b-dcx_vaxexe
rtrvvme0722d.c-dcx_vaxexe
rtrvvme0722d.e-dcx_vaxexe
|