RTR V2.2D RTRAVME0922D Reliable Transaction Router Alpha ECO Summary
TITLE: RTR V2.2D RTRAVME0922D Reliable Transaction Router Alpha ECO Summary
NOTE: An OpenVMS saveset or PCSI installation file is stored
on the Internet in a self-expanding compressed file.
The name of the compressed file will be kit_name-dcx_vaxexe
for OpenVMS VAX or kit_name-dcx_axpexe for OpenVMS Alpha.
Once the file is copied to your system, it can be expanded
by typing RUN compressed_file. The resultant file will
be the OpenVMS saveset or PCSI installation file which
can be used to install the ECO.
Copyright (c) Compaq Computer Corporation 1997, 1999. All rights reserved.
Modification Date: 07-SEP-1999
Modification Type: Updated Kit Supersedes RTRAVME0822D
PRODUCT: Reliable Transaction Router for OpenVMS Alpha (RTR)
OP/SYS: OpenVMS Alpha
SOURCE: Compaq Computer Corporation
ECO INFORMATION:
ECO Kit Name: RTRAVME0922D
ECO Kits Superseded by This ECO Kit: RTRAVME0822D
ECO Kit Approximate Size: 13293 Blocks
Saveset A - 189 Blocks
Saveset B - 8316 Blocks
Saveset C - 4347 Blocks
Saveset D - 441 Blocks
Kit Applies To: RTR V2.2D
OpenVMS Alpha V6.1, V6.1-1H1, V6.1-1H2, V6.2,
V6.2-1H1, V6.2-1H2, V6.2-1H3,
V7.0 and V7.1
System/Cluster Reboot Necessary: No
ECO KIT SUMMARY:
An ECO kit exists for Reliable Transaction Router V2.2D on OpenVMS
Alpha V6.1 through V7.1. This kit addresses the following problems:
Problems addressed in the RTRAVME0922D kit:
o When the last block of a contiguous set within the
journal contains a particular pattern, with a valid
record followed by an invalid one, journal re-reads
the last record over and over again. This caused ACP
to loop while trying to read a journal.
Problems addressed in the RTRAVME0822D kit:
o 14-8-258: An error in RTR V2.2D-ECO7 could cause the RTRACP
process to intermittently BUGCHECK while it was processing network
i/o. This has now been corrected.
o 14-8-263: An error in RTR V2.2D-ECO7 could cause the RTRACP
process to crash while it was reading the RTR journal file if the
journal data from a previous run of RTR happened to end in
a particular location. This could result in an inability to
restart RTR after a failure, or cause the RTRACP to crash
when taking over activity on behalf of another cluster node
which had been stopped. This has now been corrected.
The following is recommended if Alpha-VMS V7.1 is used:
o An OpenVMS bug could cause RTRACP process to crash sporadically
with an ACCVIO with a system space PC trying to access an
address on the kernel stack. If you are running OpenVMS
V7.1, V7.1-1H1, or V7.1-1H2, apply the OpenVMS ECO ALPSYSA02_071
before running RTR V2.2D, ECO8.
Problems addressed in the RTRAVME0722D kit:
o Network problems could lead to nodes becoming and
remaining semi-connected. This state could lead to
difficulties establishing quorum, as well as frontend
to router connectivity problems.
o Long facility names (31 characters) could sometimes
cause the RTRACP process to crash with SS$_BADPARAM.
o During periods of network disruption when the DNS was
not available, some network connections were being
rejected because the address-to-name translations were
failing. Code has been added to improve this situation.
o Some records in the journal could not be deleted, sometimes
leading to transactions being seen when not expected.
Problems addressed in the RTRAVNE0622D kit:
o During a server failover from main to standby nodes, it was
sometimes possible for the standby server to get permanently
stuck in a WAIT_JNL state. This was mostly noticed on systems
using multiple data partitions and error-prone communication
links between the backends and routers.
o During server failover to a standby node, some standby servers
would get stuck in a STANDBY state permanently. This effect was
most pronounced when there was a short network link loss between
one of the routers and the standby node immediately after the main
node failed.
o Several errors during creation of a journal file, especially when
duplicate or spurious journal files existed.
o Some crashes caused by RTR's inability to cope with DECdtm recovery.
o Transaction hangs caused by network glitches involving shadowed
multi-participant transactions.
Problems addressed in the RTRAVME0422D kit:
o The Reliable Transaction Router ACP could crash if a set of
concurrent servers performed a failover to a standby and the
failing node's journal was not accessible.
o A failover to a standby node that was not in the same cluster
could sometimes result in the RTR ACP process looping and
application processes hanging.
o Monitoring a remote node that had many applications running
could result in the error message "too much data" instead of a
display. The amount of data that can be handled by remote
monitoring has been increased to avoid this problem.
Problems addressed in the RTRAVME0322D kit:
o In the previous ECO version of RTR V2.2, the ASTPRM parameter was
not being returned properly by the event AST.
o Aborted transactions were causing loss of BYTLM quota when a server
was using DDTM.
o Some rare crashes could occur (with LIB$_BADTAGVAL or
LIB$_BADBLOADR) caused by double deallocation of dynamic data
structures during some race conditions on network link cleanup.
o DECnet/OSI related crashes could occur due to corrupted data
packets (e.g., after node shutdown).
o A rare refusal of a node to re-establish a connection (due to a
DECnet/OSI problem with corrupted optional data in connect request
packets) has been corrected.
o On rare occasions, a DELETE FACILITY or TRIM FACILITY command could
hang due to a race condition in the internal lock manager.
o A crash could occur in the Remote Client Handler when TCP/IP
Services for OpenVMS (UCX) was improperly started on a node.
Now RTR just recognizes the condition and does not use TCP/IP in
the Remote Client Handler. The Remote Client Handler needs to be
restarted after the UCX problem is cleared, otherwise remote
clients using TCP/IP will fail to connect.
o The RTR$_ABORT reason status RTR$_REPLYDIFF was not documented.
RTR may abort a transaction with this status when there has been a
failover from one instance of a server to another (for example, to
a shadow server) and the replies from the second server do not
exactly match those already received from the first server. RTR
aborts the transaction in case the client application context
depended upon a single server instance. The client application
should restart the transaction.
Problems addressed in the RTRAVME02D22 kit:
o A rare ACP crash caused by an uninitialized network buffer pointer
during a network "glitch" such as a network shutdown on a remote
node.
o More graceful handling of network "glitches" such as corrupted
network packets.
o Incorrect setting up of the ASTPRM on event delivery.
Problems addressed in the RTRAVME01D22 kit:
o On a primary/standby configuration involving multiple router nodes
where the backend and frontends were on different nodes, a network
link "glitch" could sometimes cause a problem. The transactions
would be replayed to the standby backend node, and these
transactions would then cause the systems to hang.
o In a shadow server configuration, if one of the sites became
unavailable during certain multiple failure scenarios, or if the
surviving site was in the minority, the servers would remain
waiting for the other site to come back.
NOTE: This occurred because the surviving site should recover the
transactions stored in the other node's journal.
This was a problem if the other site was really down, since there
was no manual override of this wait.
In order to fix this problem, the TRIM FACILITY command has been
modified so that if the other site is removed from the surviving
site's configuration, then the servers will start processing online
transactions.
Note that this TRIM FACILITY command should be executed on all the
nodes in the surviving site. Also, a corresponding EXTEND FACILITY
should be executed on these nodes immediately prior to bringing the
failed site back on line.
NOTE: Please see the Release Notes supplied with this ECO for more
details regarding RTR V2.2D.
INSTALLATION NOTES:
This ECO kit called RTRVVMECO0922D or RTRAVMECO0922D is a
replacement for any previously installed RTR kit. Standard VMS
installation procedures are applicable, as described in the RTR
installation guide.
If you are using RTR in a cluster, you need to execute the
SYS$STARTUP:RTR$STARTUP.COM procedure on all nodes in the
cluster, apart from the one where the VMSINSTAL is actually
performed.
A system reboot is not necessary.
Note: The OpenVMS VAX kit for RTR V2.2D ECO09 (and previous
OpenVMS VAX kits) have 4 savesets, i.e. A, B, C, E. The "D"
saveset is intentionally missing.
This patch can be found at any of these sites:
Colorado Site
Georgia Site
Files on this server are as follows:
rtravme0922d.README
.CHKSUM
rtravme0922d.a-dcx_axpexe
rtravme0922d.b-dcx_axpexe
rtravme0922d.c-dcx_axpexe
rtravme0922d.d-dcx_axpexe
rtravme0922d.CVRLET_TXT
|