RTR V3.1D] RTRVVME014D Reliable Trans. Router V3.1D (VAX) ECO Summary

TITLE: RTR V3.1D] RTRVVME014D Reliable Trans. Router V3.1D (VAX) ECO Summary Copyright (c) Compaq Computer Corporation 1998, 1999. All rights reserved. Modification Date: 18-JUN-1999 Modification Type: Updated Kit: Supersedes RTRVVME010D PRODUCT: Reliable Transaction Router V3.1D for OpenVMS VAX (RTR) OP/SYS: OpenVMS VAX COMPONENTS: RTR.EXE LIBRTR.EXE SOURCE: Compaq Computer Corporation ECO INFORMATION: ECO Kit Name: RTRVVME014D ECO Kits Superseded by This ECO Kit: RTRVVME010D RTRVVME08D ECO Kit Approximate Size: 6521 Blocks 3338752 Bytes Kit Applies To: RTR V3.1D for OpenVMS VAX OpenVMS VAX V6.1, V6.2, V7.0, V7.1, 7.2 System/Cluster Reboot Necessary: No Installation Rating: INSTALL_UNKNOWN Kit Dependencies: The following remedial kit(s) must be installed BEFORE installation of this kit: None In order to receive all the corrections listed in this kit, the following remedial kits should also be installed: None ECO KIT SUMMARY: An ECO kit exists for Reliable Transaction Router on OpenVMS VAX V6.1 through V7.2. This kit addresses the following problems: Problems Addressed in the RTRVVME014D Kit: o 14-1-239 WSAEventSelect 10055 knl_env line 1339 RTR was not handling some errors that could be returned when an asynchronous DECnet line would shut down. o 14-3-118 TX on pri_act not being played on sec_act If RTR is configured with servers on backends that are running DECnet Phase V, then under certain conditions, local recovery from the remote node's journal would not be performed. For example, local and shadow recovery would appear to work correctly in a shadow server configuration after the primary shadow would go down, but in actual fact any transactions in the remote node's journal would not be recovered. This can only occur if the backends are all using DECnet Phase V as the primary RTR transport, and if the DECnet addresses of the nodes concerned match a particular pattern. Note that this is a static DECnet configuration issue. If recovery works in your particular configuration, then it will always work as long as the DECnet network configuration is not changed. o 14-3-124 Bug in V2 load balance In earlier versions of RTR, it was occasionally possible for a router to become permanently incapable of accepting new incoming frontend connections if router load-balancing had been enabled by specifying the /BALANCE qualifier, and a frontend happened to connect to the router during a quorum state transition. o 14-3-131 $DCL_TX_PRC crash when no privs Running a V2 application from an account that does not have RTR info privilege no longer causes the application to crash. o 14-3-132 Extend journal failure Previous versions of RTR would suffer from a crash of the rtr acp process if the RTR journal had been created with /MAXIMUM_BLOCKS greater than /BLOCKS and RTR attempted to extend the journal beyond the initial size. o 14-3-134 More Broadcast message counters required Various counters connected with the delivery of broadcast events have been added: Facility counters fdb_cn_bm_transit_brd_lost and fdb_cn_bm_transit_brd_delivered, link counters ndb_cn_bm_transit_lost and ndb_cn_bm_transit_delivered, and process counters bm_brd_lost and bm_brd_delivered. o 14-3-140 RTR V3 $DCL_TX_PRC does not complete correctly when no TXSB is supplied Using the V2 API verbs on V3 would not always generate the same result as V2 when using undeclared or invalid channels. o 14-3-150 RTR applications hang on trying to continue after ACP restarted If the application tries to open a channel again after seeing the status RTR_STS_ACPNOTVIA, it hangs on the subsequent rtr_receive_message call. This affects applications on threaded RTR platforms, especially Win32 and AIX. If the application tried to open a channel again after seeing the status RTR_STS_ACPNOTVIA it could hang on the subsequent rtr_receive_message call. This problem has been corrected for threaded UNIX platforms. It is no longer necessary to restart any RTR application for UNIX after restarting RTR. o 14-3-151 'signed' identifier not in VAX C For previous versions of RTR, compiling RTR applications with the VAX C compiler generated compiler errors, since the VAX C compiler does not recognize the 'signed' keyword used in RTR.H. The signed keyword is no longer defined in RTR.H if compiling with the VAX C compiler. o 14-3-161 MONITOR CALLS/ID=n where 'n' is not a valid id - monitors all ids Use of the monitor command with any of the qualifers /link, /process, /facility or /partition would generate an empty display if the requested entity did not exist. This was unlike V2 behavior, and was considered by some to be misleading. V2 behavior has been restored. o 14-3-165 Shadow servers experience deadlock using 2 partitions If two shadowed server partitions were set up with primary and secondary roles transposed on the two backends involved and the servers for these partitions acccessed the same database rows it was occasionally possible for a distributed deadlock to occur in which servers on each site waited forever for locks held by the primary server for the other partition to be released. o 14-3-167 Corruption in large monitor pictures Spurious and missing characters were seen in larger monitor pictures displayed to a terminal window or terminal. The corruption was particularly dramatic if a terminal escape sequence was affected. The corruption appeared to occur when more than 8k of data was buffered without a newline or explicit flush, e.g. when monitoring with a recent kit in which Rtr terminal output was changed to line-buffered for efficiency. The output always seemed to be correct when using MONITOR /OUTPUT, or when monitoring remotely from an RTR platform other than VAX, or when buffering more than BUFSIZ 64k of output on OpenVMS Alpha. Investigation so far indicates that Rtr was buffering the output correctly, and that there would seem to be an error in the OpenVMS VAX runtime libraries. RTR now flushes output more frequently so as not to provoke this problem. o 14-3-168 How to calculate required quotas? There is no easy way to calculate the required UAF and SYSGEN parameters needed by RTR. The following information may be used to estimate virtual memory size requirements of the RTR ACP process. The base virtual memory requirement of an unconfigured RTR ACP process is approximately 5.8 Mbytes. To this should be added allowances for the following: - for each link, add 202 kBytes - for each facility, add 13 kBytes, plus 80 bytes for each link in the facility - for each client or server application process, add 190 kBytes for the first channel - for each additional application channel, add 1350 bytes It is also necessary to make allowance for the number of active transactions in the system. Unless your client applications are programmed to initiate multiple concurrent transactions, this number will not exceed the total number of client channels in the system, but you should verify this with your application providers. It is also necessary to determine the size of the transaction messages in use. For each frontend: - add 1 kByte per active transaction - add 250 bytes per message per transaction - plus the size of all the messages For transaction routers, allow about 1 kByte for each active transaction. For backends, allow: - 1 kByte per active transaction - 50 bytes for each message of a transaction. - plus to size of all replies The total of all contributions detailed above will yield an estimate of the likely virtual memory requirements of the ACP. Apply a large factor for safety - it is better to grant RTR resource limits exceeding its real requirements than to risk a loss of service in production as a result of insufficient resource allocation. Divide the result by VM size in pages to obtain the virtual memory requirement. You should set process memory and page file quotas to accommodate at least this much memory. Process quotas for the ACP process are controlled by qualifers to the 'start rtr' command. See the RTR System Manager's Manual for further information. For more control, you may individually set all process quotas for the ACP by using the appropriate qualifer with the 'start rtr' command. For a more holistic approach, 'start rtr' accepts '/links' and '/processes' as qualifers which can be used to specify the expected number of links and application processes in the configuration. The values supplied are used to calculate reasonably safe minimum values for the following ACP process quotas: - astlm - biolm - fillm - diolm - pgflquota The default value for '/links' is 512. This is high, but is chosen to protect RTR routers against a failover scenario where the number of frontends is large and the number of surviving routers becomes small. The default value for '/processes' is 64. This is large for frontend and router nodes, but you may need to specify a larger value on a backend hosting a complex application. You may use an explicit process quota qualifier to specifiy a value larger than that calculated through use of '/link' and '/process', but you may not specify a smaller value. Use of /link and /process do not consider memory requirement for transactions. If your application passes a large amount of data from client to server or vice-versa, you should include this in your sizing calculations. o 14-3-174 $ENQ ACCVIOs with bad channel message Calls to the V2 API specifying an invalid channel identifier would cause an access violation in LIBRTR. o 14-3-195 $START_TX when ACP died The emulation of the RTR V2 API on V3 has been improved to correctly reflect V2 behaviour for channels which which were idle at the time of ACP failure. Subsequent calls on such channels now fail immediately with the status RTR$_NOACP. o 14-3-196 $START_TX from AST w/o ACP" Application calling $START_TX at AST level while the ACP died would cause the application to crash inside LIBRTR. This has been corrected and SYS$START_TX will simply return to the caller a message indicating that the ACP is not available. o 14-3-197 ACPNOTVIA error returned if RTR command $DCL_TX_PRC issued On RTR V3.1D (194-SWX01) ECO7-FT1 if the RTR command $DCL_TX_PRC is issued for a non-existent facility, an ACPNOTVIA error is returned. This does not happen the first time - only subsequent times if RTR is stopped in between. API verbs called from the RTR command line interpreter would fail with the status ACPNOTVIA if RTR was stopped and restarted without restarting the command server. The problem can be avoided on earlier vesions of RTR by issuing the command 'disconnect server' after stopping RTR. o 14-3-203 ACP router crash when other nodes shut Configurations where more than 100 frontends were connected to any particular router may experience an ACP failure whilst managing quorum loss. Automatic router failback has been restored for RTR V2 frontends connecting to RTR V3 routers. o 14-3-205 Inconsistent TR TX timeout if no link to FE Using previous versions of RTR, if a router lost a connection to a frontend that had a transaction active in enqueuing state, then the router would abort the transaction after a period of about one minute if the frontend link was not re-established. This even if the client had specified a transaction timeout much less than this when starting the transaction. This is now fixed, so that a transaction in enqueueing state on the router would be aborted after the interval specified by the client (if it's less than one minute) if the router loses its connection to the frontend. o 14-3-210 START RTR qualifiers from V2 Attempts to use obsolete V2 qualifiers to the 'start rtr' command cause a warning to be issued. Qualifiers affected are 'partitions', 'cache_pages', and 'relations'. Warnings are also generated if an OpenVMS qualifier is used on a non-OpenVMS o 14-3-211 Long facility names were truncated in SHOW FACILITY output Facility names near or at the maximum now push the next column to the right instead of being truncated. Slightly shorter names are still preferred to prevent the SHOW FACILITY columns becoming ragged. o 14-3-213 Fac name can be 31 chars? Although the %RTR-E-FACNAMLON message states the facility name can only have 30 characters, it can take up to 31. The documented maximum length of a facility name string is 30 characters. Prior versions of RTR permitted facility names as long as 31 characters. o 14-3-217 Unthreaded UNIX applications using rtr_set_wakeup can fail, e.g., in malloc When an unthreaded UNIX RTR application calls rtr_set_wakeup, the non-reentrant RTR shared library -lrtr with which it is linked installs a signal handler. This signal handler called functions internal to RTR which could occasionally call runtime library functions such as malloc() that are not async-safe, according to the relevant standards. See man (4) signal. In practice this may appear to work most of the time, but break for no apparent reason when the signal happens to occur while background code is also in a runtime library call such as malloc. The problem in RTR has been corrected. The small penalty for this is that RTR no longer makes any attempt to try to ensure that messages available are not just housekeeping. Applications must always be prepared for a timeout return status on calling rtr_receive_message with a zero timeout, even after a wakeup suggests that a message ought to be available. Application writers are reminded that their RTR wakeup handlers are subject to the same restrictions: routines like printf, malloc, and the entire RTR API may not be used directly or indirectly from within a signal handler. A workaround for applications with unsafe wakeup handlers can be to link with the reentrant version of the library -lrtr_r because different rules apply for wakeups in a thread: applications should not call anything that is not thread-safe, or anything that might block indefinitely, such as rtr_send_to_server, rtr_reply_to_client, rtr_broadcast_event, or rtr_receive_message with a non-zero timeout. o 14-3-218 Microsoft Visual C compiler options /Gz (stdcall) and /Gr (fastcall) supported The RTR API functions in are now declared with the __cdecl attribute so they can be used in applications compiled with calling conventions other than the /Gd (cdecl) default. o 14-3-250 Flow control has -ve credit Applications with multiple channels engaged on more than one facility could experience flow control difficulties causing indefinite delays in transaction completion. This has been corrected. o 14-3-253 Restrictions on the RTR wakeup handler The use of rtr_reply_to_client, rtr_send_to_server, or rtr_broadcast_event in an RTR wakeup handler is not recommended. They may block when they need transaction ids or flow control. This will cause undesired behavior. Functions permitted in an rtr_set_wakeup() handler: In an RTR wakeup handler in an AST in an unthreaded OpenVMS application, the use of rtr_reply_to_client(), rtr_send_to_server(), rtr_broadcast_event(), or rtr_receive_message() with a non-zero timeout is not recommended. They may block when they need transaction ids or flow control, which will cause the whole application to hang until the wakeup completes. In an RTR wakeup handler in a threaded application the same rules apply. Note that wakeups are unnecessary in a threaded paradigm, but they may be used in common code in applications that also need to run on OpenVMS. Please note that your mainline code continues to run while your wakeup is executing, so extra synchronization may be required. Also note that if the wakeup does block then it does not generally hang the whole application. In an RTR wakeup handler in a signal in an unthreaded UNIX application, no RTR API functions and only the very few asynch-safe system and library functions may be called, because the wakeup is performed in a signal handler context. An application can write to a pipe or access a volatile sig_atomic_t variable, but using malloc() and printf(), for example, will cause unexpected failures. Alternatively, on most UNIX platforms, you can compile and link the application as a threaded application with the reentrant RTR shared library -lrtr_r. For maximum portability the wakeup handler should do the minimum necessary to wake up the mainline event loop. You should assume that mainline code and other threads might continue to run in parallel with the wakeup, especially on machines with more than one CPU. o 14-3-255 Multiple broadcast or data received on wrong channel When running W95/NT with Pathworks installed, RTR would not detect that the client had closed its channel when the client application was aborted by closing the window. RTR now detects when the client has aborted the channel and closes the channel. o 14-3-258 Stop inquorate standby from going active When there is a network segmentation in an active/standby configuration, the segment in the minority would become active. This behavior resulted in two active servers for the same partition. RTR now puts the inquorate or minority server in wt_quorum state and the majority server in active state. o 14-8-185 14-3-259 Slow or hanging applications using large messages and discarded broadcasts The threshold for flow control has been increased from 100000 to 1000000, and can also now be changed by defining the environment variable RTR_MAX_CHANNEL_WAITQ_BYTES to e.g. 10000000 when starting the ACP. When too many data bytes are queued to be sent to a destination process then the flow control feature is activated. The sender application may then be forced to wait for a while in the next api call that sends data, or broadcasts may be discarded, until the queue reduces and flow control credit is freely granted again. You should increase this parameter if your application sends large broadcasts or sends or replies with large amounts of data per transaction. Because broadcasts are subject to discarding you should not use them to send large amounts of data reliably. You may wish to consider using a sequence number and providing a read-only transaction in your application to detect and request re-transmission of any discarded broadcast data. There is also a hard limit parameter with the default 100000000 (10^8). The channel will be closed immediately if the send wait queue exceeds this. These tunable flow control parameters are provisional and subject to change. o 14-3-265 Successive ACP crashes Reception of a corrupt network message could cause a failed assertion and demise of the RTR ACP process. The behavior has been changed to yield a log file entry (BADNETMSG), followed by a reset of the link concerned. If such log file entries persist for a particular pair of nodes, it may mean that a network problem exists, and you should consider checking the network hardware for correct operation. The RTR KNL subsystem log entry has also been improved to better identify the link on which it reports errors. o 14-3-272 RTR no longer disables the TCP/IP Nagle algorithm with TCP_NODELAY The TCP_NODELAY option which disables the Nagle algorithm was previously enabled on all RTR platforms except Solaris. This change improves network throughput under load. Response time may be slightly longer under some conditions. The option can be activated by defining the environment variable RTR_TCP_NODELAY. This restores the old behavior on most platforms. o 14-3-274 Opening channel for non-existent facility causes crash An uninitialized variable caused this crash which was seen in recent Field Test kits (214) and (218) for AIX and Solaris. This has been corrected. o 14-3-275 aio not available makes RTR fail with unresolved errors for kaio_rdrw etc. RTR for AIX exploits Asynchronous I/O for increased journal performance. By default, aio is only `defined', i.e., disabled, instead of `available'. Aio can be configured with the system management tool: # smit aio. The RTR installation procedure post_i script now makes aio available, and ensures that aio will also be available after a restart. o 14-3-276 SHOW TRANSACTION on FE after current TR trimmed and before FE reconnected causes ACP dump Executing the command RTR SHOW TRANS on a frontend immediately after trimming the current router from the facility could infrequently cause the ACP to crash. o 14-8-173 14-3-282 Dual ported TCP router not establishing facility links" Problems can arise if nodes in your configuration have multiple network adapters and the IP name server is not configured to return all the configured IP addresses for such nodes. This results in such nodes replying to connection requests with an ID that is different to that determined by the initiator of the connection. This can result in refused connections, or only the first connecting facility to gain a current router. This version of RTR has been changed to operate correctly in such a partially configured environment. o 14-3-283 Image identification shows previous rtr version Some versions of the OpenVMS/VAX kit for RTR V3.1D ECO10 were shipped with a shared library showing an incorrect version ID. o 14-3-285 OpenVMS process quotas artificially constained Prior versions of RTR would limit the maximum values that could be specified for the ACP process quotas to 64K. This restriction has been removed. Warning messages are generated if the requested quotas conflict with the system wide WSMAX parameter, of the remaining free page file space. o 14-3-82 Requester hangs in SYS$COMMIT_TXW with RTR V3.1D - null txn If rtr_start_tx() was called by a client followed immediately by rtr_accept_tx(), then the application would hang (unless rtr_start_tx() was called with a timeout). The status returned in the rtr_mt_accepted data in such cases is RTR_STS_SYNCHCOMM (transaction committed synchronously). This also corrects the equivalent problem with the RTR V2 API. Also, the status returned in the TXSB for such transactions using the V2 API is RTR$_SYNCHCOMM. o 14-8-128 The RTR ACP now uses an asynchronous method when closing its links This version of RTR will defer the deassigning of its network channels during the closesocket routine. This change allows RTR to handle other requests while the channels are being run-down. RTR no longer appears to pause while a network link is being deassigned. o 14-8-131 Failure to come up in remember mode The non-availability of a remote journal at shadow recovery time will cause a partition that was previously processing in remember mode to resume processing in that mode. Prior versions of RTR would set the partition state to 'shawdow-recovery-fail', and transactions could not be processed until the configuration was manually corrected. o 14-8-144 When disconnected the ASYNC cable from the client, the RTR dump was generated on the client. Disconnecting a cable that was being used by an asynchronous DECnet link to a remote machine could cause an ACP failure when the transport marked the sockets as invalid. RTR has been changed to handle this error by temporarily suspending all network activity on the affected node. Network activity will resume as soon as the network is found to be usable again. o 14-8-154 RTR Router Crash Router ACPs configured to accept anonymous clients could fail when handling a network link loss event. o 14-8-162 Hanging servers Transaction recovery as a result of server failover could result in server applications getting hung in 'local recovery' state if it also happened that more than 10 client channels had simultaneously caused new transactions to be presented to the backend node. This has been fixed both by increasing the limit to 50 and by adding a check to make sure that recovery is complete before enforcing the limit, which is designed to keep a backend node from getting overwhelmed when transactions are coming in at a rate faster than it can handle. o 14-8-175 RTRACP core dump while idle When a Frontend node is trimmed from a Frontend/Router Facility Definition, the Facility Descriptor block is not fully deleted. This causes a core dump when RTR attempts to verify the facility after a network link loss. RTR now properly deletes the Facility Descriptor Block associated with the trimmed frontend. o 14-8-181 RTRACP Crashes It was possible that RTR on a frontend could select a router as its current router immediately after that router had been trimmed from the facility. This could potentially leave the frontend in a 'connecting' state. The following restrictions apply to this kit: o 14-1-285: A temporary inconsistency in shadow server state can occur during initial facility startup of a shadowed configuration. A shadow server can erroneously remain in state "sec_act" until the rest of the facility has been started. o 14-3-67: An application's wakeup routine may be called more often than necessary. o 14-1-544: This version of RTR does not support a mixture of VAX and Alpha nodes in the same cluster if both are configured as Backends. This compatibility issue will be addressed in RTR V3.2. Problems Addressed in the RTRVVME010D Kit: This kit (ECO-10) contains the following corrections to RTR V3.1D (210) ECO8 (ECO9 was not released on OpenVMS): o 14-1-436,14-3-225 Additional problem with aborted transactions This bug was previously addressed in ECO7, but a further side effect of the original change has been discovered and fixed. The problem had to do with a potential ACP crash if a journal flush operation was attempted after an aborted transaction. o 14-1-496 Calling RTR_SET_INFO would get hung if ACP is not running Calling rtr_set_info() or running the RTR SET TRAN command would hang there if ACP is not running. This problem has been corrected. o 14-1-497 Monitor performance degrades as cube of row count MONITOR commands can now handle hundreds of rows efficiently. o 14-3-215 Rows dropped in monitor display with large # of rows MONITOR can now display more than 100 rows subject to /ROWS in the monitor file. RTR will now display up to over 1000 rows, provided the values for the /ROWS qualifiers in the relevant *.mon file are edited. This is most easily verified by redirecting the output to a file or pipe. If the output goes to a terminal, then you can use the SCROLL commands which are bound to various numeric keypad keys to scroll all except the last line of monitor output. o 14-3-221 RTR crash during server reject plus network problems Under unusual conditions (for example, one server rejecting the TX and another accepting the same TX, or a TX being aborted due to resource problems -- all at the same time that network fluctuations are occurring) it was possible that RTR would find an inconsistent TX state while recovering a TX from JNL. This resulted in RTR crashing and has now been fixed. o 14-3-224 Crash in 1-node configuration due to length mismatch RTR was aborting if it detected a length mismatch in the message passed to it. This has now been fixed. o 14-3-226,14-3-162 ACP crash after facility deletion After a facility is deleted, it is possibly for the RTRACP to receive a message from an application that references the deleted facility. The verification that the facility had been deleted failed on rare occasions causing the RTRACP to abort. This has now been fixed. o 14-3-229 Logging of journal record deletion errors If there is an error deleting records from the RTR journal, then an error is logged. Previously, RTR would silently continue. o 14-3-239 Virtual address space full error RTR tries to extend the virtual address space of the ACP if there is not enough space to allocate data structures when a client or server application is started. If the ACP failed to do this, it would crash. This has now been fixed. Any such failure will simply prevent the new application from starting, rather than crashing the ACP. o 14-3-241 Application crash trying to send large messages to looping ACP Several changes were made to combat this combination of application crash and rapidly expanding ACP heap: Flow control is now granted only to the channel and facility that requested it. A problem was discovered and corrected whereby a grant of flow control credit could allow unrelated channels to send too. This is believed to be the prime cause of the symptoms reported. An application that is unable to send to the ACP due to resource shortage, for example if the ACP is alive but no longer receiving for whatever reason, now keeps trying indefinitely, and will now appear to hang rather than crash. The TCP_NODELAY option which disables the Nagle algorithm is no longer enabled on any RTR platform. This will improve throughput under load, although there may be a slight impact on response time under certain conditions. o 14-3-260 Superfluous network traffic for nonexistent channels Whenever a channel opens or closes, RTR sends an update message to the router so that it can modify its broadcast routing information, if necessary. In previous versions of RTR such messages were sent even if no channel existed for the facility. In cases where machines with the Frontend role had a large number of facilities defined, this could result in significant network traffic that would be quite noticeable over slow links, such as asynchronous connections over telephone wire. RTR no longer sends these messages unless a channel exists on the facility. o 14-3-262,14-3-214 Hash table algorithm bug The algorithm for accessing certain RTR data stored using a hash table was found to be inefficient and could sometimes fail to find data elements correctly. This bug has primarily affected access to Transaction IDs and may have caused excessive CPU usage during data retrieval or failure to find certain data elements. This bug has been corrected. o 14-3-266 Broadcast message corruption Reception of an illegal or unrecognizable broadcast now results in a log file entry (BMHDRVSN) rather that the demise of the ACP process. If such entries persist you may wish to consider checking the network for correct operation. o 14-8-147,14-8-162,14-8-164 Servers hanging during failover recovery Transaction recovery as a result of server failover could result in server applications getting hung in 'local recovery' state if it also happened that more than 10 client channels had simultaneously caused new transactions to be presented to the backend node. This has been fixed both by increasing the limit to 50 and by adding a check to make sure that recovery is complete before enforcing the limit, which is designed to keep a backend node from getting overwhelmed when transactions are coming in at a rate faster than it can handle. o 14-8-152 Multiple broadcast or data received on wrong channel When running W95/NT and having PATHWORKS installed, RTR would not detect that the client had closed its channel when the client application was aborted by closing down the window. RTR now detects when the client has aborted the channel and closes the channel. o 14-8-163 Corrupt network message caused RTR crash Reception of a corrupt network message would hitherto result in a failed assertion and the demise of the RTR ACP process. The behavior has been changed to yield a log file entry (BADNETMSG), followed by a reset of the link concerned. If such log file entries persist for a particular pair of nodes, it may mean that a network problem exists, and you should consider checking the network hardware for correct operation. The RTR log entry has also been improved to be better able to identify the link on which it reports errors. o 14-8-167 Null bytes display in SHOW PARTITION output The display of null bytes in the upper and lower key bounds has been suppressed if the bytes appear at the end of a key of type string. The following restrictions apply to this kit: o 14-1-285: A temporary inconsistency in shadow server state can occur during initial facility startup of a shadowed configuration. A shadow server can erroneously remain in state "sec_act" until the rest of the facility has been started. o 14-3-67: An application's wakeup routine may be called more often than necessary. o 14-1-544: This version of RTR does not support a mixture of VAX and Alpha nodes in the same cluster if both are configured as Backends. This compatibility issue will be addressed in RTR V3.2. Problems Addressed in the RTRVVME08D Kit: o 14-8-144 RTR crash when ASYNC cable disconnected Disconnecting a cable that was being used by an asynchronous DECnet link to a remote machine could result in an ACP failure when the transport marked the sockets as invalid. RTR has been changed to handle this error by temporarily suspending all network activity on the affected node. Network activity will resume as soon as the network is found to be usable again. o 14-8-154 Router crash when link to Frontend disconnected Router ACPs configured to accept anonymous clients could under circumstances fail when handling a network link loss event. This has been corrected. o 14-8-155 New environment variables for adjusting connection timeout parameter Two new environment variables have been created to give operators greater discretion in determining how long to wait before retrying a network connection attempt. The RTR_TIMEOUT_CONNECT variable controls how long a connecting node will wait for a response from the connectee to its link initiation request. This value defaults to 60 seconds. If the RTR_TIMEOUT_CONNECT period expires without a response from the connectee, RTR will wait an additional period determined by the RTR_TIMEOUT_CONNECT_RELAX variable. This variable defaults to a value of 90 seconds. The purpose of the "relax" period is to allow the connector to accept a connection request from the connectee node, if any are forthcoming. It is important not to set this value too low on Backends and Routers, as such machines are likely to be receiving connection requests from many other machines. On machines configured to use only the Frontend role, however, you can safely set RTR_TIMEOUT_CONNECT_RELAX to just a few seconds so that the node can be free to attempt to connect to another router as quickly as possible. The minimum value for RTR_TIMEOUT_CONNECT is 5 and the minimum for RTR_TIMEOUT_CONNECT_RELAX is 1. The following restrictions apply to this kit: o 14-1-285: A temporary inconsistency in shadow server state can occur during initial facility startup of a shadowed configuration. A shadow server can erroneously remain in state "sec_act" until the rest of the facility has been started. o 14-3-67: An application's wakeup routine may be called more often than necessary. INSTALLATION NOTES: The Reliable Transaction Router installation procedure uses the POLYCENTER Software Installation Utility (PCSI). For details on using PCSI, refer to the OpenVMS System Manager's Manual, Section "Installing with the POLYCENTER Software Installation Utility". The logical name PCSI$SOURCE is used to define the location of the software kits you want to install. For example, if the Reliable Transaction Router software is located in DISK1:[KITS], enter the following at the DCL prompt (or include the line in the system manager's login command file): $ DEFINE PCSI$SOURCE DISK1:[KITS] When running the installation procedure for Reliable Transaction Router, you can choose whether to install the ODBC Over RTR Oracle7 Server. This is an RTR server used for supporting ODBC-enabled applications on Windows. You should not install the ODBC Over RTR Oracle7 Server unless you already have Oracle7 installed. To start the installation, type the command:- $ PRODUCT INSTALL RTR You will see a display similar to the following:- The following product has been selected: DEC VAXVMS RTR V3.1-D231 [Available] Do you want to continue? [YES] Press . You may safely accept the installation default options.

This patch can be found at any of these sites:

Files on this server are as follows:

dec-vaxvms-rtr-v0301-d231-1.README
dec-vaxvms-rtr-v0301-d231-1.CHKSUM
dec-vaxvms-rtr-v0301-d231-1.CVRLET_TXT
dec-vaxvms-rtr-v0301-d231-1.exe
dec-vaxvms-rtr-v0301-d231-1.CVRLET_TXT