RTR V3.1D] RTRVVME014D Reliable Trans. Router V3.1D (VAX) ECO Summary
TITLE: RTR V3.1D] RTRVVME014D Reliable Trans. Router V3.1D (VAX) ECO Summary
Copyright (c) Compaq Computer Corporation 1998, 1999. All rights reserved.
Modification Date: 18-JUN-1999
Modification Type: Updated Kit: Supersedes RTRVVME010D
PRODUCT: Reliable Transaction Router V3.1D for OpenVMS VAX (RTR)
OP/SYS: OpenVMS VAX
COMPONENTS: RTR.EXE
LIBRTR.EXE
SOURCE: Compaq Computer Corporation
ECO INFORMATION:
ECO Kit Name: RTRVVME014D
ECO Kits Superseded by This ECO Kit: RTRVVME010D
RTRVVME08D
ECO Kit Approximate Size: 6521 Blocks
3338752 Bytes
Kit Applies To: RTR V3.1D for OpenVMS VAX
OpenVMS VAX V6.1, V6.2, V7.0, V7.1, 7.2
System/Cluster Reboot Necessary: No
Installation Rating: INSTALL_UNKNOWN
Kit Dependencies:
The following remedial kit(s) must be installed BEFORE
installation of this kit:
None
In order to receive all the corrections listed in this
kit, the following remedial kits should also be installed:
None
ECO KIT SUMMARY:
An ECO kit exists for Reliable Transaction Router on OpenVMS VAX V6.1
through V7.2. This kit addresses the following problems:
Problems Addressed in the RTRVVME014D Kit:
o 14-1-239 WSAEventSelect 10055 knl_env line 1339
RTR was not handling some errors that could be returned when
an asynchronous DECnet line would shut down.
o 14-3-118 TX on pri_act not being played on sec_act
If RTR is configured with servers on backends that are running
DECnet Phase V, then under certain conditions, local recovery
from the remote node's journal would not be performed. For
example, local and shadow recovery would appear to work correctly
in a shadow server configuration after the primary shadow would
go down, but in actual fact any transactions in the remote node's
journal would not be recovered. This can only occur if the
backends are all using DECnet Phase V as the primary RTR transport,
and if the DECnet addresses of the nodes concerned match a particular
pattern. Note that this is a static DECnet configuration issue. If
recovery works in your particular configuration, then it will always
work as long as the DECnet network configuration is not changed.
o 14-3-124 Bug in V2 load balance
In earlier versions of RTR, it was occasionally possible for
a router to become permanently incapable of accepting new
incoming frontend connections if router load-balancing had
been enabled by specifying the /BALANCE qualifier, and a frontend
happened to connect to the router during a quorum state transition.
o 14-3-131 $DCL_TX_PRC crash when no privs
Running a V2 application from an account that does not have RTR
info privilege no longer causes the application to crash.
o 14-3-132 Extend journal failure
Previous versions of RTR would suffer from a crash of the rtr acp
process if the RTR journal had been created with /MAXIMUM_BLOCKS
greater than /BLOCKS and RTR attempted to extend the journal beyond
the initial size.
o 14-3-134 More Broadcast message counters required
Various counters connected with the delivery of broadcast events
have been added: Facility counters fdb_cn_bm_transit_brd_lost and
fdb_cn_bm_transit_brd_delivered, link counters ndb_cn_bm_transit_lost
and ndb_cn_bm_transit_delivered, and process counters bm_brd_lost and
bm_brd_delivered.
o 14-3-140 RTR V3 $DCL_TX_PRC does not complete correctly when no
TXSB is supplied
Using the V2 API verbs on V3 would not always generate the same
result as V2 when using undeclared or invalid channels.
o 14-3-150 RTR applications hang on trying to continue after ACP
restarted
If the application tries to open a channel again after seeing
the status RTR_STS_ACPNOTVIA, it hangs on the subsequent
rtr_receive_message call. This affects applications on threaded
RTR platforms, especially Win32 and AIX.
If the application tried to open a channel again after seeing
the status RTR_STS_ACPNOTVIA it could hang on the subsequent
rtr_receive_message call. This problem has been corrected for
threaded UNIX platforms. It is no longer necessary to restart
any RTR application for UNIX after restarting RTR.
o 14-3-151 'signed' identifier not in VAX C
For previous versions of RTR, compiling RTR applications with
the VAX C compiler generated compiler errors, since the VAX C
compiler does not recognize the 'signed' keyword used in RTR.H.
The signed keyword is no longer defined in RTR.H if compiling
with the VAX C compiler.
o 14-3-161 MONITOR CALLS/ID=n where 'n' is not a valid id -
monitors all ids
Use of the monitor command with any of the qualifers /link,
/process, /facility or /partition would generate an empty display
if the requested entity did not exist. This was unlike V2
behavior, and was considered by some to be misleading. V2
behavior has been restored.
o 14-3-165 Shadow servers experience deadlock using 2 partitions
If two shadowed server partitions were set up with primary
and secondary roles transposed on the two backends involved
and the servers for these partitions acccessed the same database
rows it was occasionally possible for a distributed deadlock to
occur in which servers on each site waited forever for locks
held by the primary server for the other partition to be released.
o 14-3-167 Corruption in large monitor pictures
Spurious and missing characters were seen in larger monitor
pictures displayed to a terminal window or terminal. The
corruption was particularly dramatic if a terminal escape
sequence was affected.
The corruption appeared to occur when more than 8k of data was
buffered without a newline or explicit flush, e.g. when monitoring
with a recent kit in which Rtr terminal output was changed to
line-buffered for efficiency. The output always seemed to be
correct when using MONITOR /OUTPUT, or when monitoring remotely
from an RTR platform other than VAX, or when buffering more than
BUFSIZ 64k of output on OpenVMS Alpha.
Investigation so far indicates that Rtr was buffering the output
correctly, and that there would seem to be an error in the OpenVMS
VAX runtime libraries. RTR now flushes output more frequently so
as not to provoke this problem.
o 14-3-168 How to calculate required quotas?
There is no easy way to calculate the required UAF and SYSGEN
parameters needed by RTR.
The following information may be used to estimate virtual
memory size requirements of the RTR ACP process.
The base virtual memory requirement of an unconfigured RTR
ACP process is approximately 5.8 Mbytes. To this should be added
allowances for the following:
- for each link, add 202 kBytes
- for each facility, add 13 kBytes, plus 80 bytes for each
link in the facility
- for each client or server application process, add 190 kBytes
for the first channel
- for each additional application channel, add 1350 bytes
It is also necessary to make allowance for the number of active
transactions in the system. Unless your client applications are
programmed to initiate multiple concurrent transactions, this
number will not exceed the total number of client channels in the
system, but you should verify this with your application providers.
It is also necessary to determine the size of the transaction
messages in use.
For each frontend:
- add 1 kByte per active transaction
- add 250 bytes per message per transaction
- plus the size of all the messages
For transaction routers, allow about 1 kByte for each active
transaction.
For backends, allow:
- 1 kByte per active transaction
- 50 bytes for each message of a transaction.
- plus to size of all replies
The total of all contributions detailed above will yield an
estimate of the likely virtual memory requirements of the ACP.
Apply a large factor for safety - it is better to grant RTR
resource limits exceeding its real requirements than to risk
a loss of service in production as a result of insufficient
resource allocation. Divide the result by VM size in pages to
obtain the virtual memory requirement.
You should set process memory and page file quotas to accommodate
at least this much memory.
Process quotas for the ACP process are controlled by qualifers
to the 'start rtr' command. See the RTR System Manager's Manual
for further information.
For more control, you may individually set all process quotas
for the ACP by using the appropriate qualifer with the 'start
rtr' command. For a more holistic approach, 'start rtr' accepts
'/links' and '/processes' as qualifers which can be used to
specify the expected number of links and application processes
in the configuration. The values supplied are used to calculate
reasonably safe minimum values for the following ACP process
quotas:
- astlm
- biolm
- fillm
- diolm
- pgflquota
The default value for '/links' is 512. This is high, but is
chosen to protect RTR routers against a failover scenario where
the number of frontends is large and the number of surviving
routers becomes small.
The default value for '/processes' is 64. This is large for
frontend and router nodes, but you may need to specify a larger
value on a backend hosting a complex application.
You may use an explicit process quota qualifier to specifiy a
value larger than that calculated through use of '/link' and
'/process', but you may not specify a smaller value.
Use of /link and /process do not consider memory requirement
for transactions. If your application passes a large amount of
data from client to server or vice-versa, you should include
this in your sizing calculations.
o 14-3-174 $ENQ ACCVIOs with bad channel message
Calls to the V2 API specifying an invalid channel identifier
would cause an access violation in LIBRTR.
o 14-3-195 $START_TX when ACP died
The emulation of the RTR V2 API on V3 has been improved to
correctly reflect V2 behaviour for channels which which were
idle at the time of ACP failure. Subsequent calls on such
channels now fail immediately with the status RTR$_NOACP.
o 14-3-196 $START_TX from AST w/o ACP"
Application calling $START_TX at AST level while the ACP died
would cause the application to crash inside LIBRTR.
This has been corrected and SYS$START_TX will simply return to
the caller a message indicating that the ACP is not available.
o 14-3-197 ACPNOTVIA error returned if RTR command $DCL_TX_PRC
issued
On RTR V3.1D (194-SWX01) ECO7-FT1 if the RTR command
$DCL_TX_PRC is issued for a non-existent facility, an
ACPNOTVIA error is returned. This does not happen the first
time - only subsequent times if RTR is stopped in between.
API verbs called from the RTR command line interpreter would
fail with the status ACPNOTVIA if RTR was stopped and restarted
without restarting the command server. The problem can be avoided
on earlier vesions of RTR by issuing the command 'disconnect
server' after stopping RTR.
o 14-3-203 ACP router crash when other nodes shut
Configurations where more than 100 frontends were connected to
any particular router may experience an ACP failure whilst managing
quorum loss.
Automatic router failback has been restored for RTR V2 frontends
connecting to RTR V3 routers.
o 14-3-205 Inconsistent TR TX timeout if no link to FE
Using previous versions of RTR, if a router lost a connection to
a frontend that had a transaction active in enqueuing state, then
the router would abort the transaction after a period of about one
minute if the frontend link was not re-established. This even if
the client had specified a transaction timeout much less than this
when starting the transaction.
This is now fixed, so that a transaction in enqueueing state on
the router would be aborted after the interval specified by the
client (if it's less than one minute) if the router loses its
connection to the frontend.
o 14-3-210 START RTR qualifiers from V2
Attempts to use obsolete V2 qualifiers to the 'start rtr' command
cause a warning to be issued. Qualifiers affected are
'partitions', 'cache_pages', and 'relations'. Warnings are
also generated if an OpenVMS qualifier is used on a non-OpenVMS
o 14-3-211 Long facility names were truncated in SHOW FACILITY
output
Facility names near or at the maximum now push the next column
to the right instead of being truncated.
Slightly shorter names are still preferred to prevent the SHOW
FACILITY columns becoming ragged.
o 14-3-213 Fac name can be 31 chars?
Although the %RTR-E-FACNAMLON message states the facility name
can only have 30 characters, it can take up to 31.
The documented maximum length of a facility name string is 30
characters. Prior versions of RTR permitted facility names as long
as 31 characters.
o 14-3-217 Unthreaded UNIX applications using rtr_set_wakeup can
fail, e.g., in malloc
When an unthreaded UNIX RTR application calls rtr_set_wakeup,
the non-reentrant RTR shared library -lrtr with which it is linked
installs a signal handler. This signal handler called functions
internal to RTR which could occasionally call runtime library
functions such as malloc() that are not async-safe, according to the
relevant standards. See man (4) signal.
In practice this may appear to work most of the time, but break
for no apparent reason when the signal happens to occur while
background code is also in a runtime library call such as malloc.
The problem in RTR has been corrected. The small penalty for this
is that RTR no longer makes any attempt to try to ensure that
messages available are not just housekeeping. Applications must
always be prepared for a timeout return status on calling
rtr_receive_message with a zero timeout, even after a wakeup
suggests that a message ought to be available.
Application writers are reminded that their RTR wakeup handlers
are subject to the same restrictions: routines like printf, malloc,
and the entire RTR API may not be used directly or indirectly from
within a signal handler. A workaround for applications with unsafe
wakeup handlers can be to link with the reentrant version of the
library -lrtr_r because different rules apply for wakeups in a
thread: applications should not call anything that is not
thread-safe, or anything that might block indefinitely, such as
rtr_send_to_server, rtr_reply_to_client, rtr_broadcast_event, or
rtr_receive_message with a non-zero timeout.
o 14-3-218 Microsoft Visual C compiler options /Gz (stdcall) and
/Gr (fastcall) supported
The RTR API functions in are now declared with the __cdecl
attribute so they can be used in applications compiled with
calling conventions other than the /Gd (cdecl) default.
o 14-3-250 Flow control has -ve credit
Applications with multiple channels engaged on more than one
facility could experience flow control difficulties causing
indefinite delays in transaction completion. This has
been corrected.
o 14-3-253 Restrictions on the RTR wakeup handler
The use of rtr_reply_to_client, rtr_send_to_server, or
rtr_broadcast_event in an RTR wakeup handler is not recommended.
They may block when they need transaction ids or flow control.
This will cause undesired behavior.
Functions permitted in an rtr_set_wakeup() handler:
In an RTR wakeup handler in an AST in an unthreaded OpenVMS
application, the use of rtr_reply_to_client(),
rtr_send_to_server(), rtr_broadcast_event(),
or rtr_receive_message() with a non-zero timeout is not
recommended. They may block when they need transaction ids or
flow control, which will cause the whole application to hang
until the wakeup completes.
In an RTR wakeup handler in a threaded application the same
rules apply. Note that wakeups are unnecessary in a threaded
paradigm, but they may be used in common code in applications
that also need to run on OpenVMS. Please note that your mainline
code continues to run while your wakeup is executing, so extra
synchronization may be required. Also note that if the wakeup
does block then it does not generally hang the whole application.
In an RTR wakeup handler in a signal in an unthreaded UNIX
application, no RTR API functions and only the very few asynch-safe
system and library functions may be called, because the wakeup
is performed in a signal handler context. An application can write
to a pipe or access a volatile sig_atomic_t variable, but using
malloc() and printf(), for example, will cause unexpected failures.
Alternatively, on most UNIX platforms, you can compile and link the
application as a threaded application with the reentrant RTR
shared library -lrtr_r.
For maximum portability the wakeup handler should do the minimum
necessary to wake up the mainline event loop. You should assume that
mainline code and other threads might continue to run in parallel
with the wakeup, especially on machines with more than one CPU.
o 14-3-255 Multiple broadcast or data received on wrong channel
When running W95/NT with Pathworks installed, RTR would not
detect that the client had closed its channel when the client
application was aborted by closing the window. RTR now
detects when the client has aborted the channel and closes the
channel.
o 14-3-258 Stop inquorate standby from going active
When there is a network segmentation in an active/standby
configuration, the segment in the minority would become active.
This behavior resulted in two active servers for the same
partition. RTR now puts the inquorate or minority server in
wt_quorum state and the majority server in active state.
o 14-8-185
14-3-259 Slow or hanging applications using large messages
and discarded broadcasts
The threshold for flow control has been increased from 100000
to 1000000, and can also now be changed by defining the environment
variable RTR_MAX_CHANNEL_WAITQ_BYTES to e.g. 10000000 when starting
the ACP.
When too many data bytes are queued to be sent to a destination
process then the flow control feature is activated. The sender
application may then be forced to wait for a while in the next
api call that sends data, or broadcasts may be discarded, until
the queue reduces and flow control credit is freely granted again.
You should increase this parameter if your application sends
large broadcasts or sends or replies with large amounts of data
per transaction.
Because broadcasts are subject to discarding you should not use
them to send large amounts of data reliably. You may wish to
consider using a sequence number and providing a read-only
transaction in your application to detect and request
re-transmission of any discarded broadcast data.
There is also a hard limit parameter with the default 100000000
(10^8). The channel will be closed immediately if the send wait
queue exceeds this.
These tunable flow control parameters are provisional and subject
to change.
o 14-3-265 Successive ACP crashes
Reception of a corrupt network message could cause a failed
assertion and demise of the RTR ACP process. The behavior has
been changed to yield a log file entry (BADNETMSG), followed
by a reset of the link concerned. If such log file entries
persist for a particular pair of nodes, it may mean that a
network problem exists, and you should consider checking the
network hardware for correct operation.
The RTR KNL subsystem log entry has also been improved to better
identify the link on which it reports errors.
o 14-3-272 RTR no longer disables the TCP/IP Nagle algorithm with
TCP_NODELAY
The TCP_NODELAY option which disables the Nagle algorithm was
previously enabled on all RTR platforms except Solaris. This
change improves network throughput under load. Response time
may be slightly longer under some conditions.
The option can be activated by defining the environment variable
RTR_TCP_NODELAY. This restores the old behavior on most platforms.
o 14-3-274 Opening channel for non-existent facility causes crash
An uninitialized variable caused this crash which was seen in
recent Field Test kits (214) and (218) for AIX and Solaris. This
has been corrected.
o 14-3-275 aio not available makes RTR fail with unresolved errors
for kaio_rdrw etc.
RTR for AIX exploits Asynchronous I/O for increased journal
performance. By default, aio is only `defined', i.e., disabled,
instead of `available'. Aio can be configured with the system
management tool: # smit aio.
The RTR installation procedure post_i script now makes aio
available, and ensures that aio will also be available after
a restart.
o 14-3-276 SHOW TRANSACTION on FE after current TR trimmed and
before FE reconnected causes ACP dump
Executing the command RTR SHOW TRANS on a frontend immediately
after trimming the current router from the facility could
infrequently cause the ACP to crash.
o 14-8-173
14-3-282 Dual ported TCP router not establishing facility links"
Problems can arise if nodes in your configuration have multiple
network adapters and the IP name server is not configured to
return all the configured IP addresses for such nodes. This
results in such nodes replying to connection requests with an
ID that is different to that determined by the initiator of the
connection. This can result in refused connections, or only the
first connecting facility to gain a current router.
This version of RTR has been changed to operate correctly in
such a partially configured environment.
o 14-3-283 Image identification shows previous rtr version
Some versions of the OpenVMS/VAX kit for RTR V3.1D ECO10 were
shipped with a shared library showing an incorrect version ID.
o 14-3-285 OpenVMS process quotas artificially constained
Prior versions of RTR would limit the maximum values that
could be specified for the ACP process quotas to 64K. This
restriction has been removed. Warning messages are generated
if the requested quotas conflict with the system wide WSMAX
parameter, of the remaining free page file space.
o 14-3-82 Requester hangs in SYS$COMMIT_TXW with RTR
V3.1D - null txn
If rtr_start_tx() was called by a client followed immediately
by rtr_accept_tx(), then the application would hang
(unless rtr_start_tx() was called with a timeout). The status
returned in the rtr_mt_accepted data in such cases is
RTR_STS_SYNCHCOMM (transaction committed synchronously).
This also corrects the equivalent problem with the RTR V2 API. Also,
the status returned in the TXSB for such transactions using the
V2 API is RTR$_SYNCHCOMM.
o 14-8-128 The RTR ACP now uses an asynchronous method when closing
its links
This version of RTR will defer the deassigning of its network
channels during the closesocket routine. This change allows RTR
to handle other requests while the channels are being run-down.
RTR no longer appears to pause while a network link is being
deassigned.
o 14-8-131 Failure to come up in remember mode
The non-availability of a remote journal at shadow recovery
time will cause a partition that was previously processing in
remember mode to resume processing in that mode. Prior versions
of RTR would set the partition state to 'shawdow-recovery-fail',
and transactions could not be processed until the configuration
was manually corrected.
o 14-8-144 When disconnected the ASYNC cable from the client, the
RTR dump was generated on the client.
Disconnecting a cable that was being used by an asynchronous
DECnet link to a remote machine could cause an ACP failure
when the transport marked the sockets as invalid. RTR has been
changed to handle this error by temporarily suspending all
network activity on the affected node. Network activity will
resume as soon as the network is found to be usable again.
o 14-8-154 RTR Router Crash
Router ACPs configured to accept anonymous clients could fail
when handling a network link loss event.
o 14-8-162 Hanging servers
Transaction recovery as a result of server failover could
result in server applications getting hung in 'local recovery'
state if it also happened that more than 10 client channels
had simultaneously caused new transactions to be presented to
the backend node. This has been fixed both by increasing the
limit to 50 and by adding a check to make sure that recovery is
complete before enforcing the limit, which is designed to keep
a backend node from getting overwhelmed when transactions are
coming in at a rate faster than it can handle.
o 14-8-175 RTRACP core dump while idle
When a Frontend node is trimmed from a Frontend/Router
Facility Definition, the Facility Descriptor block is not fully
deleted. This causes a core dump when RTR attempts to verify the
facility after a network link loss. RTR now properly deletes
the Facility Descriptor Block associated with the trimmed frontend.
o 14-8-181 RTRACP Crashes
It was possible that RTR on a frontend could select a router as
its current router immediately after that router had been trimmed
from the facility. This could potentially leave the frontend in a
'connecting' state.
The following restrictions apply to this kit:
o 14-1-285: A temporary inconsistency in shadow server state
can occur during initial facility startup of a
shadowed configuration. A shadow server can erroneously
remain in state "sec_act" until the rest of the facility
has been started.
o 14-3-67: An application's wakeup routine may be called more
often than necessary.
o 14-1-544: This version of RTR does not support a mixture of VAX
and Alpha nodes in the same cluster if both are configured
as Backends. This compatibility issue will be addressed
in RTR V3.2.
Problems Addressed in the RTRVVME010D Kit:
This kit (ECO-10) contains the following corrections to RTR V3.1D (210)
ECO8 (ECO9 was not released on OpenVMS):
o 14-1-436,14-3-225 Additional problem with aborted transactions
This bug was previously addressed in ECO7, but a further side effect
of the original change has been discovered and fixed. The problem
had to do with a potential ACP crash if a journal flush operation
was attempted after an aborted transaction.
o 14-1-496 Calling RTR_SET_INFO would get hung if ACP is not running
Calling rtr_set_info() or running the RTR SET TRAN command would
hang there if ACP is not running. This problem has been corrected.
o 14-1-497 Monitor performance degrades as cube of row count
MONITOR commands can now handle hundreds of rows efficiently.
o 14-3-215 Rows dropped in monitor display with large # of rows
MONITOR can now display more than 100 rows subject to /ROWS in the
monitor file.
RTR will now display up to over 1000 rows, provided the values for
the /ROWS qualifiers in the relevant *.mon file are edited.
This is most easily verified by redirecting the output to a file or
pipe. If the output goes to a terminal, then you can use the SCROLL
commands which are bound to various numeric keypad keys to scroll
all except the last line of monitor output.
o 14-3-221 RTR crash during server reject plus network problems
Under unusual conditions (for example, one server rejecting the TX
and another accepting the same TX, or a TX being aborted due to
resource problems -- all at the same time that network fluctuations
are occurring) it was possible that RTR would find an inconsistent
TX state while recovering a TX from JNL. This resulted in RTR
crashing and has now been fixed.
o 14-3-224 Crash in 1-node configuration due to length mismatch
RTR was aborting if it detected a length mismatch in the message
passed to it. This has now been fixed.
o 14-3-226,14-3-162 ACP crash after facility deletion
After a facility is deleted, it is possibly for the RTRACP to
receive a message from an application that references the deleted
facility. The verification that the facility had been deleted
failed on rare occasions causing the RTRACP to abort. This has now
been fixed.
o 14-3-229 Logging of journal record deletion errors
If there is an error deleting records from the RTR journal, then an
error is logged. Previously, RTR would silently continue.
o 14-3-239 Virtual address space full error
RTR tries to extend the virtual address space of the ACP if there is
not enough space to allocate data structures when a client or server
application is started. If the ACP failed to do this, it would
crash. This has now been fixed. Any such failure will simply
prevent the new application from starting, rather than crashing the
ACP.
o 14-3-241 Application crash trying to send large messages to looping ACP
Several changes were made to combat this combination of application
crash and rapidly expanding ACP heap:
Flow control is now granted only to the channel and facility
that requested it. A problem was discovered and corrected
whereby a grant of flow control credit could allow unrelated
channels to send too. This is believed to be the prime cause
of the symptoms reported.
An application that is unable to send to the ACP due to
resource shortage, for example if the ACP is alive but no
longer receiving for whatever reason, now keeps trying
indefinitely, and will now appear to hang rather than crash.
The TCP_NODELAY option which disables the Nagle algorithm is no
longer enabled on any RTR platform. This will improve
throughput under load, although there may be a slight impact on
response time under certain conditions.
o 14-3-260 Superfluous network traffic for nonexistent channels
Whenever a channel opens or closes, RTR sends an update message to
the router so that it can modify its broadcast routing information,
if necessary. In previous versions of RTR such messages were sent
even if no channel existed for the facility. In cases where
machines with the Frontend role had a large number of facilities
defined, this could result in significant network traffic that would
be quite noticeable over slow links, such as asynchronous
connections over telephone wire. RTR no longer sends these messages
unless a channel exists on the facility.
o 14-3-262,14-3-214 Hash table algorithm bug
The algorithm for accessing certain RTR data stored using a hash
table was found to be inefficient and could sometimes fail to find
data elements correctly. This bug has primarily affected access to
Transaction IDs and may have caused excessive CPU usage during data
retrieval or failure to find certain data elements. This bug has
been corrected.
o 14-3-266 Broadcast message corruption
Reception of an illegal or unrecognizable broadcast now results in a
log file entry (BMHDRVSN) rather that the demise of the ACP process.
If such entries persist you may wish to consider checking the
network for correct operation.
o 14-8-147,14-8-162,14-8-164 Servers hanging during failover recovery
Transaction recovery as a result of server failover could result in
server applications getting hung in 'local recovery' state if it
also happened that more than 10 client channels had simultaneously
caused new transactions to be presented to the backend node. This
has been fixed both by increasing the limit to 50 and by adding a
check to make sure that recovery is complete before enforcing the
limit, which is designed to keep a backend node from getting
overwhelmed when transactions are coming in at a rate faster than it
can handle.
o 14-8-152 Multiple broadcast or data received on wrong channel
When running W95/NT and having PATHWORKS installed, RTR would not
detect that the client had closed its channel when the client
application was aborted by closing down the window. RTR now detects
when the client has aborted the channel and closes the channel.
o 14-8-163 Corrupt network message caused RTR crash
Reception of a corrupt network message would hitherto result in a
failed assertion and the demise of the RTR ACP process. The
behavior has been changed to yield a log file entry (BADNETMSG),
followed by a reset of the link concerned. If such log file entries
persist for a particular pair of nodes, it may mean that a network
problem exists, and you should consider checking the network
hardware for correct operation.
The RTR log entry has also been improved to be better able to
identify the link on which it reports errors.
o 14-8-167 Null bytes display in SHOW PARTITION output
The display of null bytes in the upper and lower key bounds has been
suppressed if the bytes appear at the end of a key of type string.
The following restrictions apply to this kit:
o 14-1-285: A temporary inconsistency in shadow server state can
occur during initial facility startup of a shadowed configuration.
A shadow server can erroneously remain in state "sec_act" until the
rest of the facility has been started.
o 14-3-67: An application's wakeup routine may be called more often
than necessary.
o 14-1-544: This version of RTR does not support a mixture of VAX and
Alpha nodes in the same cluster if both are configured as Backends.
This compatibility issue will be addressed in RTR V3.2.
Problems Addressed in the RTRVVME08D Kit:
o 14-8-144 RTR crash when ASYNC cable disconnected
Disconnecting a cable that was being used by an asynchronous
DECnet link to a remote machine could result in an ACP failure
when the transport marked the sockets as invalid. RTR has been
changed to handle this error by temporarily suspending all
network activity on the affected node. Network activity will
resume as soon as the network is found to be usable again.
o 14-8-154 Router crash when link to Frontend disconnected
Router ACPs configured to accept anonymous clients could under
circumstances fail when handling a network link loss event. This
has been corrected.
o 14-8-155 New environment variables for adjusting connection timeout
parameter
Two new environment variables have been created to give operators
greater discretion in determining how long to wait before retrying
a network connection attempt.
The RTR_TIMEOUT_CONNECT variable controls how long a connecting
node will wait for a response from the connectee to its link
initiation request. This value defaults to 60 seconds.
If the RTR_TIMEOUT_CONNECT period expires without a response from
the connectee, RTR will wait an additional period determined by
the RTR_TIMEOUT_CONNECT_RELAX variable. This variable defaults
to a value of 90 seconds. The purpose of the "relax" period is
to allow the connector to accept a connection request from the
connectee node, if any are forthcoming. It is important not to
set this value too low on Backends and Routers, as such machines
are likely to be receiving connection requests from many other
machines. On machines configured to use only the Frontend role,
however, you can safely set RTR_TIMEOUT_CONNECT_RELAX to just a
few seconds so that the node can be free to attempt to connect to
another router as quickly as possible.
The minimum value for RTR_TIMEOUT_CONNECT is 5 and the minimum
for RTR_TIMEOUT_CONNECT_RELAX is 1.
The following restrictions apply to this kit:
o 14-1-285: A temporary inconsistency in shadow server state can occur
during initial facility startup of a shadowed configuration. A
shadow server can erroneously remain in state "sec_act" until the
rest of the facility has been started.
o 14-3-67: An application's wakeup routine may be called more often
than necessary.
INSTALLATION NOTES:
The Reliable Transaction Router installation procedure uses the
POLYCENTER Software Installation Utility (PCSI). For details on
using PCSI, refer to the OpenVMS System Manager's Manual, Section
"Installing with the POLYCENTER Software Installation Utility".
The logical name PCSI$SOURCE is used to define the location of the
software kits you want to install. For example, if the Reliable
Transaction Router software is located in DISK1:[KITS], enter the
following at the DCL prompt (or include the line in the system
manager's login command file):
$ DEFINE PCSI$SOURCE DISK1:[KITS]
When running the installation procedure for Reliable Transaction
Router, you can choose whether to install the ODBC Over RTR Oracle7
Server. This is an RTR server used for supporting ODBC-enabled
applications on Windows. You should not install the ODBC Over RTR
Oracle7 Server unless you already have Oracle7 installed.
To start the installation, type the command:-
$ PRODUCT INSTALL RTR
You will see a display similar to the following:-
The following product has been selected:
DEC VAXVMS RTR V3.1-D231 [Available]
Do you want to continue? [YES]
Press . You may safely accept the installation default
options.
This patch can be found at any of these sites:
Colorado Site
Georgia Site
Files on this server are as follows:
dec-vaxvms-rtr-v0301-d231-1.README
dec-vaxvms-rtr-v0301-d231-1.CHKSUM
dec-vaxvms-rtr-v0301-d231-1.CVRLET_TXT
dec-vaxvms-rtr-v0301-d231-1.exe
dec-vaxvms-rtr-v0301-d231-1.CVRLET_TXT
|