RTR V3.2 RTR320_249 Reliable Transaction Router Tru64 UNIX ECO Summary

TITLE: RTR V3.2 RTR320_249 Reliable Transaction Router Tru64 UNIX ECO Summary Copyright (c) Compaq Computer Corporation 1999. All rights reserved. Modification Date: 25-AUG-1999 Modification Type: New Kit PRODUCT: Reliable Transaction Router (RTR) for Tru64 UNIX OP/SYS: Compaq Tru64 UNIX SOURCE: Compaq Computer Corporation ECO INFORMATION: ECO Kit Name: RTR320_249 ECO Kits Superseded by This ECO Kit: None ECO Kit Approximate Size: 10440 Blocks (5345280 Bytes) TAR file - 10440 Blocks (5345280 Bytes) Kit Applies To: RTR V3.2 Compaq Tru64 UNIX V4.0D, V4.0E, V4.0F System/Cluster Reboot Necessary: No ECO KIT SUMMARY: An ECO kit exists for Reliable Transaction Router V3.2 on Compaq Tru64 UNIX V4.0 through V4.0F. This kit addresses the following problems: Problems Addressed in RTR320_249: o Show transactions not recovered on link break reconnect If a secondary shadow backend lost its link to the RTR router after the router had sent a vote request, and the server on the primary shadow accepts the transaction, then in unusual circumstances it was possible that the transaction would not be immediately recovered on the secondary shadow after the link to the router was re-established. In such cases it required a cycle of the servers on the secondary site for the remembered transaction to be recovered from the primary shadow journal. This has now been fixed. o Problems with DUMP JOURNAL In previous versions of RTR, qualifiers which required a value did not generate an error if the value was not supplied or was supplied incorrectly. Incorrect or missing values now generate an error message. If a string of less than five characters was passed for partition record class, the partition record counter was not updated and the record was not available. These problems have been fixed by comparing each character instead of five characters at a time. o Transaction state is not getting EXCEPTION after issuing rtr_close/imme SET PARTITION /RECOVERY_RETRY_COUNT is new functionality implemented in RTR V3.2. The scope of this command was not fully documented, and is clarified here. If an application server dies while processing a transaction recovered from RTR journal, then RTR will present the transaction to another (concurrent or standby) server. The RECOVERY_RETRY_LIMIT indicates the maximum number of times the transaction should be presented to a server for recovery before being written to the journal as an exception. There are two types of recovery operations where transactions are recovered from journal: local recovery and shadow recovery. Shadow recovery is the process of recovering the remembered transactions written to a primary shadow journal while the secondary shadow site is down. The SET PARTITION /RECOVERY_RETRY_COUNT parameter does not have an effect on remembered transactions recovered during shadow recovery. That is, if there is a killer transaction remembered in the journal on a primary shadow node, on this node RTR does not count the number of times the transaction is recovered by a recovering secondary shadow node. The way to ensure that a remembered transaction will be exceptioned by RTR is by starting a sufficient number of concurrent servers on the recovering secondary shadow node. For this reason, RTR recommends that the number of concurrent secondary shadow servers started is greater than the value set for the RECOVERY_RETRY_LIMIT on a partition. This will ensure that a remembered (killer) transaction being recovered from a primary shadow journal will be exceptioned if the retry limit is exceeded. Only those transactions that have reached voting stage on a server can be exceptioned. If a server always dies before voting on a transaction, then the transaction will be aborted by RTR after the third try. This is a hard-coded limit (the so called "three strikes and you're out" feature). o Backends erroneously remain inquorate after routers trimmed In versions V3.1D-eco14 and V3.2 of RTR it was sometimes possible for nodes to erroneously remain inquorate following a TRIM FACILITY operation. This has now been fixed. o Revised rtrreq.c and rtrsrv.c sample RTR applications The sample client and server used in the IVP have been extensively revised. Please pay special attention to the comments which explain how to write a wakeup handler, and comments drawing attention to several common programming mistakes we have seen in RTR applications. o Looping RTR process for empty node string, e.g., /NODE=dna. Specifying an incomplete node specification, such as one with only the protocol prefix, e.g., "RTR SHOW RTR /NODE=dna." could cause the RTR process to loop, consuming CPU. This problem has been fixed. o ACP access violation If a number of concurrent servers died in sequence while processing the same transaction, then under rare circumstances it was possible the ACP could also abort. This was due to a counter being incremented incorrectly and has now been fixed. o ACP crashed when modifying journal size After a journal had been modified, the Flow Control subsystem of RTR was not properly updated with the new size. This could result in a hang or crash situation even though the journal size was increased to accommodate increased traffic. This problem has been fixed. o rtr_close_channel fails for distributed transaction Calling rtr_close_channel while a distributed transaction was pending caused an incorrect status to be returned. The correct status is now returned. o CALL CLOSE_CHANNEL defaults to IMMEDIATE The flag RTR_F_CLO_IMMEDIATE is a new flag added in RTR V3.2 that allows the caller to close a server channel without acknowledging the transaction on the channel. By default, the flag is not set when calling the rtr_ close_channel API. However, the /IMMEDIATE qualifier is implicitly present in the RTR CLI version of the API (rtr call rtr_close_channel). Because this is incompatible with the behavior of previous versions of RTR, functionality has been restored to the same as before V3.2. When using the CLI version of the API (rtr call rtr_close_channel), /NOIMMEDIATE is now the default. o TOOMANCHA and distributed transaction left open after rtr_open_channel() failure If rtr_open_channel failed after the RTR acp had been stopped, then that channel remained available for a subsequent open. The application could eventually run out of channels and return RTR_STS_TOOMANCHA. Now if rtr_open_channel fails after a distributed transaction has been opened, the distributed transaction is always closed. o SHOW SERVER truncates shd_rec_icpl to shd_rec_ic Some of the values previously truncated by the brief SHOW SERVER command are now displayed more fully. o Application may crash if invoked before RTR after a reboot Normally the RTR executable must have been invoked at least once since reboot before an RTR application can be started. If an RTR application is invoked first, the first RTR api call now always returns RTRNOTSTA, RTR not started. o IOS tid on IP only nodes is not unique Using previous versions of RTR, if you ran client applications that used the RTR V2 API on systems that had DECnet disabled, then there was a remote possibility that the same transaction identifier could be generated on two such systems if RTR was started on both systems within milliseconds of each other. This has now been fixed. o Faster loading of large journals on first CREATE FACILITY RTR now takes much less time to load journals containing a large number of journaled transactions. o The broadcast message was not delivered from BE to client If a frontend loses the connection to its original router, and is the first frontend to connect to the router it fails over to, then the frontend may stop receiving broadcasts. Further, backendters added to a facility after the server applications have started. These problems have been fixed.s could also fail to receive broadcasts delivered by rou o RTR has both backends as primary for some transactions (STR#1885690) In a partitioned network situation (when each of two routers have access to only half of the backend nodes), RTR will choose the router with the lower network address as the one that remains or becomes active. In previous versions of RTR, this would sometimes result in both sets of backends becoming active, due to a problem with the network ID comparison algorithm. This has been corrected. o Signals blocked in unthreaded UNIX applications during RTR api calls RTR now enables the usual termination signals during RTR api calls. For example, an idle server RTR application waiting in rtr_receive_message with no timeout will now respond to Control-C. o Terminated RTR application process that used fork is still shown by RTR RTR applications now have FD_CLOEXEC set for the IPC sockets used to communicate with the RTR acp, so that these do not remain open in a child process after fork and exec even after the parent process has terminated. This means that the RTR acp now notices when the parent exits, and will not accumulate a wait queue of broadcast messages or delay failover. The terminated process no longer appears in RTR SHOW PROCESS. o BADROWCOL and escape sequences visible on dumb or unknown terminal The default VT100-style terminal escape sequences can now be completely suppressed with a suitable TERMCAP environment variable setting. It is still necessary to set a non-zero window size to avoid BADROWCOL, for example: stty rows 48 cols 120 TERM=dumb TERMCAP="dumb:cm=:do=:le=:nd=:up=:ks=:ke=:cl=:ce=: ho=:mb=:md=:mr=:us=:ue=:me=:cr=:bl=:" This is particularly useful when running RTR in an Emacs shell window, and gives reasonably clean output for all RTR commands except MONITOR. Known Problems with Workarounds o Install procedure needs all rtr processes terminating, including rtrd All rtr processes and rtr applications must be terminated before installing a new version of rtr. After using rtr stop rtr and rtr disc server please check for any surviving processes such as rtrd and applications programmed to handle RTR_STS_NOACP, and terminate any such processes until there are none left. Note that all the rtr acp and comserver processes must be terminated before rtrd, otherwise they will simply create a new rtrd. On UNIX, the rtrd process blocks HUP INT and TERM but can be terminated with the KILL signal, kill -9. INSTALLATION NOTES: The Reliable Transaction Router Version 3.2 ECO1 installation procedure is the same as the installation procedure for RTR Version 3.2. Refer to the Installation Guide for further information.

This patch can be found at any of these sites:

Files on this server are as follows:

rtr320_249.README
rtr320_249.CHKSUM
rtr320_249.CVRLET_TXT
rtr320_249.tar
rtr320_249.CVRLET_TXT