RTR V3.2 RTR320_249 Reliable Transaction Router Tru64 UNIX ECO Summary
TITLE: RTR V3.2 RTR320_249 Reliable Transaction Router Tru64 UNIX ECO Summary
Copyright (c) Compaq Computer Corporation 1999. All rights reserved.
Modification Date: 25-AUG-1999
Modification Type: New Kit
PRODUCT: Reliable Transaction Router (RTR) for Tru64 UNIX
OP/SYS: Compaq Tru64 UNIX
SOURCE: Compaq Computer Corporation
ECO INFORMATION:
ECO Kit Name: RTR320_249
ECO Kits Superseded by This ECO Kit: None
ECO Kit Approximate Size: 10440 Blocks (5345280 Bytes)
TAR file - 10440 Blocks (5345280 Bytes)
Kit Applies To: RTR V3.2
Compaq Tru64 UNIX V4.0D, V4.0E, V4.0F
System/Cluster Reboot Necessary: No
ECO KIT SUMMARY:
An ECO kit exists for Reliable Transaction Router V3.2 on Compaq Tru64
UNIX V4.0 through V4.0F. This kit addresses the following problems:
Problems Addressed in RTR320_249:
o Show transactions not recovered on link break reconnect
If a secondary shadow backend lost its link to the RTR
router after the router had sent a vote request, and the
server on the primary shadow accepts the transaction,
then in unusual circumstances it was possible that
the transaction would not be immediately recovered on
the secondary shadow after the link to the router was
re-established. In such cases it required a cycle of
the servers on the secondary site for the remembered
transaction to be recovered from the primary shadow
journal.
This has now been fixed.
o Problems with DUMP JOURNAL
In previous versions of RTR, qualifiers which required a
value did not generate an error if the value was not
supplied or was supplied incorrectly. Incorrect or
missing values now generate an error message.
If a string of less than five characters was passed for
partition record class, the partition record counter
was not updated and the record was not available. These
problems have been fixed by comparing each character
instead of five characters at a time.
o Transaction state is not getting EXCEPTION after issuing
rtr_close/imme
SET PARTITION /RECOVERY_RETRY_COUNT is new functionality
implemented in RTR V3.2. The scope of this command was
not fully documented, and is clarified here.
If an application server dies while processing a
transaction recovered from RTR journal, then RTR will
present the transaction to another (concurrent or
standby) server. The RECOVERY_RETRY_LIMIT indicates
the maximum number of times the transaction should be
presented to a server for recovery before being written
to the journal as an exception.
There are two types of recovery operations where
transactions are recovered from journal: local recovery
and shadow recovery. Shadow recovery is the process
of recovering the remembered transactions written to a
primary shadow journal while the secondary shadow site
is down.
The SET PARTITION /RECOVERY_RETRY_COUNT parameter
does not have an effect on remembered transactions
recovered during shadow recovery. That is, if there
is a killer transaction remembered in the journal on
a primary shadow node, on this node RTR does not count
the number of times the transaction is recovered by a
recovering secondary shadow node. The way to ensure that
a remembered transaction will be exceptioned by RTR is
by starting a sufficient number of concurrent servers on
the recovering secondary shadow node.
For this reason, RTR recommends that the number of
concurrent secondary shadow servers started is greater
than the value set for the RECOVERY_RETRY_LIMIT on a
partition. This will ensure that a remembered (killer)
transaction being recovered from a primary shadow
journal will be exceptioned if the retry limit is
exceeded.
Only those transactions that have reached voting stage
on a server can be exceptioned. If a server always dies
before voting on a transaction, then the transaction
will be aborted by RTR after the third try. This is
a hard-coded limit (the so called "three strikes and
you're out" feature).
o Backends erroneously remain inquorate after routers trimmed
In versions V3.1D-eco14 and V3.2 of RTR it was sometimes
possible for nodes to erroneously remain inquorate
following a TRIM FACILITY operation.
This has now been fixed.
o Revised rtrreq.c and rtrsrv.c sample RTR applications
The sample client and server used in the IVP have been
extensively revised. Please pay special attention to the
comments which explain how to write a wakeup handler,
and comments drawing attention to several common
programming mistakes we have seen in RTR applications.
o Looping RTR process for empty node string, e.g., /NODE=dna.
Specifying an incomplete node specification, such as
one with only the protocol prefix, e.g., "RTR SHOW
RTR /NODE=dna." could cause the RTR process to loop,
consuming CPU.
This problem has been fixed.
o ACP access violation
If a number of concurrent servers died in sequence
while processing the same transaction, then under rare
circumstances it was possible the ACP could also abort.
This was due to a counter being incremented incorrectly
and has now been fixed.
o ACP crashed when modifying journal size
After a journal had been modified, the Flow Control
subsystem of RTR was not properly updated with the
new size. This could result in a hang or crash
situation even though the journal size was increased
to accommodate increased traffic.
This problem has been fixed.
o rtr_close_channel fails for distributed transaction
Calling rtr_close_channel while a distributed
transaction was pending caused an incorrect status to
be returned.
The correct status is now returned.
o CALL CLOSE_CHANNEL defaults to IMMEDIATE
The flag RTR_F_CLO_IMMEDIATE is a new flag added in RTR
V3.2 that allows the caller to close a server channel
without acknowledging the transaction on the channel.
By default, the flag is not set when calling the rtr_
close_channel API. However, the /IMMEDIATE qualifier
is implicitly present in the RTR CLI version of the API
(rtr call rtr_close_channel).
Because this is incompatible with the behavior of
previous versions of RTR, functionality has been
restored to the same as before V3.2. When using the
CLI version of the API (rtr call rtr_close_channel),
/NOIMMEDIATE is now the default.
o TOOMANCHA and distributed transaction left open after
rtr_open_channel() failure
If rtr_open_channel failed after the RTR acp had been
stopped, then that channel remained available for a
subsequent open. The application could eventually run
out of channels and return RTR_STS_TOOMANCHA.
Now if rtr_open_channel fails after a distributed
transaction has been opened, the distributed transaction
is always closed.
o SHOW SERVER truncates shd_rec_icpl to shd_rec_ic
Some of the values previously truncated by the brief
SHOW SERVER command are now displayed more fully.
o Application may crash if invoked before RTR after a reboot
Normally the RTR executable must have been invoked at
least once since reboot before an RTR application can
be started. If an RTR application is invoked first, the
first RTR api call now always returns RTRNOTSTA, RTR not
started.
o IOS tid on IP only nodes is not unique
Using previous versions of RTR, if you ran client
applications that used the RTR V2 API on systems that
had DECnet disabled, then there was a remote possibility
that the same transaction identifier could be generated
on two such systems if RTR was started on both systems
within milliseconds of each other.
This has now been fixed.
o Faster loading of large journals on first CREATE FACILITY
RTR now takes much less time to load journals containing
a large number of journaled transactions.
o The broadcast message was not delivered from BE to client
If a frontend loses the connection to its original
router, and is the first frontend to connect to the
router it fails over to, then the frontend may stop
receiving broadcasts. Further, backendters added to a
facility after the server applications have started.
These problems have been fixed.s could also fail
to receive broadcasts delivered by rou
o RTR has both backends as primary for some transactions
(STR#1885690)
In a partitioned network situation (when each of two
routers have access to only half of the backend nodes),
RTR will choose the router with the lower network
address as the one that remains or becomes active. In
previous versions of RTR, this would sometimes result in
both sets of backends becoming active, due to a problem
with the network ID comparison algorithm.
This has been corrected.
o Signals blocked in unthreaded UNIX applications during RTR api
calls
RTR now enables the usual termination signals during RTR
api calls. For example, an idle server RTR application
waiting in rtr_receive_message with no timeout will now
respond to Control-C.
o Terminated RTR application process that used fork is still shown
by RTR
RTR applications now have FD_CLOEXEC set for the IPC
sockets used to communicate with the RTR acp, so that
these do not remain open in a child process after fork
and exec even after the parent process has terminated.
This means that the RTR acp now notices when the parent
exits, and will not accumulate a wait queue of broadcast
messages or delay failover. The terminated process no
longer appears in RTR SHOW PROCESS.
o BADROWCOL and escape sequences visible on dumb or unknown terminal
The default VT100-style terminal escape sequences can
now be completely suppressed with a suitable TERMCAP
environment variable setting. It is still necessary
to set a non-zero window size to avoid BADROWCOL, for
example:
stty rows 48 cols 120
TERM=dumb
TERMCAP="dumb:cm=:do=:le=:nd=:up=:ks=:ke=:cl=:ce=:
ho=:mb=:md=:mr=:us=:ue=:me=:cr=:bl=:"
This is particularly useful when running RTR in an Emacs
shell window, and gives reasonably clean output for all
RTR commands except MONITOR.
Known Problems with Workarounds
o Install procedure needs all rtr processes terminating, including
rtrd
All rtr processes and rtr applications must be terminated
before installing a new version of rtr. After using rtr stop
rtr and rtr disc server please check for any surviving
processes such as rtrd and applications programmed to handle
RTR_STS_NOACP, and terminate any such processes until there
are none left. Note that all the rtr acp and comserver
processes must be terminated before rtrd, otherwise they will
simply create a new rtrd.
On UNIX, the rtrd process blocks HUP INT and TERM but can be
terminated with the KILL signal, kill -9.
INSTALLATION NOTES:
The Reliable Transaction Router Version 3.2 ECO1 installation
procedure is the same as the installation procedure for RTR Version
3.2. Refer to the Installation Guide for further information.
This patch can be found at any of these sites:
Colorado Site
Georgia Site
Files on this server are as follows:
rtr320_249.README
rtr320_249.CHKSUM
rtr320_249.CVRLET_TXT
rtr320_249.tar
rtr320_249.CVRLET_TXT
|