RTR V3.1D RTRI231 RTR for Windows NT _ 95 3.1D Intel ECO Summary
TITLE: RTR V3.1D RTRI231 RTR for Windows NT _ 95 3.1D Intel ECO Summary
Modification Date: 07-MAY-99
Modification Type: New Kit
Copyright (c) Compaq Computer Corporation 1999. All rights reserved.
PRODUCT: Reliable Transaction Router (RTR) V3.1D
OP/SYS: Windows NT and Windows 95
SOURCE: Compaq Computer Corporation
ECO INFORMATION:
ECO Kit Name: RTRI231
ECO Kits Superseded by This ECO Kit: None
ECO Kit Approximate Size: 3959 Blocks
Kit Applies To: RTR V3.1D
System/Cluster Reboot Necessary: Unknown
Rolling Re-boot Supported: Information Not Available
Installation Rating: INSTALL_UNKNOWN
Kit Dependencies:
The following remedial kit(s) must be installed BEFORE
installation of this kit:
None
In order to receive all the corrections listed in this
kit, the following remedial kits should also be installed:
None
ECO KIT SUMMARY:
An ECO kit exists for Reliable Transaction Router (RTR) V3.1D on
Windows NT V4.0. This kit addresses the following problems:
Problems Addressed in the RTRI231 Kit (ECO-14):
o 14-1-239 WSAEventSelect 10055 knl_env line 1339
RTR was not handling some errors that could be returned when an
asynchronous DECnet line would shut down. This has been corrected.
o 14-3-118 TX on pri_act not being played on sec_act
If RTR is configured with servers on backends that are running
DECnet Phase V, then under certain conditions, local recovery from
the remote node's journal would not be performed. For example, local
and shadow recovery would appear to work correctly in a shadow server
configuration after the primary shadow would go down, but in actual
fact any transactions in the remote node's journal would not be recovered.
This can only occur if the backends are all using DECnet Phase V as the
primary RTR transport, and if the DECnet addresses of the nodes
concerned match a particular pattern. Note that this is a static DECnet
configuration issue. If recovery works in your particular configuration,
then it will always work so long as the DECnet network configuration is
not changed. This has now been fixed.
o 14-3-124 Bug in V2 load balance
In earlier versions of RTR it was occasionally possible for
a router to become permanently incapable of accepting new
incoming frontend connections if router load-balancing had
been enabled by specifying the /BALANCE qualifier, and a frontend
happened to connect to the router during a quorum state transition.
This has been corrected.
o 14-3-131 $DCL_TX_PRC crash when no privs
Running a V2 application from an account that does not have RTR info
privilege no longer causes the application to crash.
o 14-3-132 Extend journal failure
Previous versions of RTR would suffer from a crash of the rtr acp process
if the RTR journal had been created with /MAXIMUM_BLOCKS greater than
/BLOCKS and RTR attempted to extend the journal beyond the initial size.
This has now been corrected.
o 14-3-134 More Broadcast message counters required
Various counters connected with the delivery of broadcast events have
been added: Facility counters fdb_cn_bm_transit_brd_lost and
fdb_cn_bm_transit_brd_delivered, link counters ndb_cn_bm_transit_lost
and ndb_cn_bm_transit_delivered, and process counters bm_brd_lost and
bm_brd_delivered.
o 14-3-140 RTR V3 $DCL_TX_PRC does not complete correctly when no TXSB is
supplied
Using the V2 API verbs on V3 would not always generate the same
result as V2 when using undeclared or invalid channels. This
has been corrected.
o 14-3-150 RTR applications hang on trying to continue after ACP restarted
If the application tries to open a channel again after seeing
the status RTR_STS_ACPNOTVIA, it hangs on the subsequent
rtr_receive_message call. This affects applications on threaded
RTR platforms, especially Win32 and AIX.
If the application tried to open a channel again after seeing
the status RTR_STS_ACPNOTVIA it could hang on the subsequent
rtr_receive_message call. This problem has been corrected for
threaded UNIX platforms. It is no longer necessary to restart
any RTR application for UNIX after restarting RTR.
o 14-3-151 'signed' identifier not in VAX C
For previous versions of RTR, compiling RTR applications with
the VAX C compiler generated compiler errors, since the VAX C
compiler does not recognize the 'signed' keyword used in RTR.H.
This has now been fixed. The signed keyword is no longer defined
in RTR.H if compiling with the VAX C compiler.
o 14-3-161 MONITOR CALLS/ID=n where 'n' is not a valid id - monitors all ids
Use of the monitor command with any of the qualifers /link,
/process, /facility or /partition would generate an empty display
if the requested entity did not exist. This was unlike V2
behavior, and was considered by some to be misleading. V2
behavior has been restored.
o 14-3-165 Shadow servers experience deadlock using 2 partitions
If two shadowed server partitions were set up with primary
and secondary roles transposed on the two backends involved
and the servers for these partitions acccessed the same database
rows it was occasionally possible for a distributed deadlock to
occur in which servers on each site waited forever for locks
held by the primary server for the other partition to be released.
his has now been corrected.
o 14-3-167 Corruption in large monitor pictures
Spurious and missing characters were seen in larger monitor pictures
displayed to a terminal window or terminal. The corruption was particularly
dramatic if a terminal escape sequence was affected.
The corruption appeared to occur when more than 8k of data was buffered
without a newline or explicit flush, e.g. when monitoring with a recent
kit in which Rtr terminal output was changed to line-buffered for efficiency.
The output always seemed to be correct when using MONITOR /OUTPUT, or when
monitoring remotely from an Rtr platform other than VAX, or when buffering
more than BUFSIZ 64k of output on OpenVMS Alpha.
Investigation so far indicates that Rtr was buffering the output correctly,
and that there would seem to be an error in the OpenVMS VAX runtime
libraries. RTR now flushes output more frequently so as not to provoke
this problem.
o 14-3-168 How to calculate required quotas?
There is no easy way to calculate the required UAF and SYSGEN parameters
needed by RTR.
The following information may be used to estimate virtual
memory size requirements of the RTR ACP process.
The base virtual memory requirement of an unconfigured RTR
ACP process is approximately 5.8 Mbytes. To this should be added
allowances for the following:
- for each link, add 202 kBytes
- for each facility, add 13 kBytes, plus 80 bytes for each
link in the facility
- for each client or server application process, add 190 kBytes
for the first channel
- for each additional application channel, add 1350 bytes
It is also necessary to make allowance for the number of active
transactions in the system. Unless your client applications are
programmed to initiate multiple concurrent transactions, this
number will not exceed the total number of client channels in the
system, but you should verify this with your application providers.
It is also necessary to determine the size of the transaction
messages in use.
For each frontend:
- add 1 kByte per active transaction
- add 250 bytes per message per transaction
- plus the size of all the messages
For transaction routers, allow about 1 kByte for each active
transaction.
For backends, allow:
- 1 kByte per active transaction
- 50 bytes for each message of a transaction.
- plus to size of all replies
The total of all contributions detailed above will yield an estimate
of the likely virtual memory requirements of the ACP. Apply a large
factor for safety - it is better to grant RTR resource limits
exceeding its real requirements than to risk a loss of service in
production as a result of insufficient resource allocation. Divide
the result by VM size in pages to obtain the virtual memory
requirement.
You should set process memory and page file quotas to accommodate
at least this much memory.
Process quotas for the ACP process are controlled by qualifers to the
'start rtr' command. See the RTR System Manager's Manual for further
information.
For more control, you may individually set all process quotas for the
ACP by using the appropriate qualifer with the 'start rtr' command.
For a more holistic approach, 'start rtr' accepts '/links' and
'/processes' as qualifers which can be used to specify the expected
number of links and application processes in the configuration. The
values supplied are used to calculate reasonably safe minimum values
for the following ACP process quotas:
- astlm
- biolm
- fillm
- diolm
- pgflquota
The default value for '/links' is 512. This is high, but is chosen to
protect RTR routers against a failover scenario where the number of
frontends is large and the number of surviving routers becomes small.
The default value for '/processes' is 64. This is large for
frontend and router nodes, but you may need to specify a larger
value on a backend hosting a complex application.
You may use an explicit process quota qualifier to specifiy a value
larger than that calculated through use of '/link' and '/process', but
you may not specify a smaller value.
Use of /link and /process do not consider memory requirement for
transactions. If your application passes a large amount of data from
client to server or vice-versa, you should include this in your
sizing calculations.
o 14-3-174 $ENQ ACCVIOs with bad channel message
Calls to the V2 API specifying an invalid channel identifier
would cause an access violation in LIBRTR. This has been corrected.
o 14-3-195 $START_TX when ACP died
The emulation of the RTR V2 API on V3 has been improved to correctly
reflect V2 behaviour for channels which which were idle at the time
of ACP failure. Subsequent calls on such channels now fail immediately
with the status RTR$_NOACP.
o 14-3-196 $START_TX from AST w/o ACP"
Application calling $START_TX at AST level while the ACP died would
cause the application to crash inside LIBRTR.
This has been corrected and SYS$START_TX will simply return to the
caller a message indicating that the ACP is not available.
o 14-3-197 ACPNOTVIA error returned if RTR command $DCL_TX_PRC issued
On RTR V3.1D (194-SWX01) ECO7-FT1 if the RTR command
$DCL_TX_PRC is issued for a non-existent facility, an
ACPNOTVIA error is returned. This does not happen the first
time - only subsequent times if RTR is stopped in between.
API verbs called from the RTR command line interpreter would
fail with the status ACPNOTVIA if RTR was stopped and restarted
without restarting the command server. This has been corrected. The
problem can be avoided on earlier vesions of RTR by issuing the
command 'disconnect server' after stopping RTR.
o 14-3-203 ACP router crash when other nodes shut
Configurations where more than 100 frontends were connected to
any particular router may experience an ACP failure whilst managing
quorum loss. This has been corrected.
Automatic router failback has been restored for RTR V2 frontends
connecting to RTR V3 routers.
o 14-3-205 Inconsistent TR TX timeout if no link to FE
Using previous versions of RTR, if a router lost a connection to a
frontend that had a transaction active in enqueuing state, then the
router would abort the transaction after a period of about one minute
if the frontend link was not re-established. This even if the client
had specified a transaction timeout much less than this when starting
the transaction.
This is now fixed, so that a transaction in enqueueing state on the
router would be aborted after the interval specified by the client
(if it's less than one minute) if the router loses its connection to
the frontend.
o 14-3-210 START RTR qualifiers from V2
Attempts to use obsolete V2 qualifiers to the 'start rtr' command
cause a warning to be issued. Qualifiers affected are
'partitions', 'cache_pages', and 'relations'. Warnings are
also generated if an OpenVMS qualifier is used on a non-OpenVMS
o 14-3-211 Long facility names were truncated in SHOW FACILITY output
Facility names near or at the maximum now push the next column to
the right instead of being truncated.
Slightly shorter names are still preferred to prevent the SHOW
FACILITY columns becoming ragged.
o 14-3-213 Fac name can be 31 chars?
Although the %RTR-E-FACNAMLON message states the facility name
can only have 30 characters, it can take up to 31.
The documented maximum length of a facility name string is 30
characters. Prior versions of RTR permitted facility names as long
as 31 characters. This has been corrected.
o 14-3-217 Unthreaded UNIX applications using rtr_set_wakeup can fail,
e.g., in malloc
When an unthreaded UNIX RTR application calls rtr_set_wakeup,
the non-reentrant RTR shared library -lrtr with which it is linked
installs a signal handler. This signal handler called functions
internal to RTR which could occasionally call runtime library
functions such as malloc() that are not async-safe, according to the
relevant standards. See man (4) signal.
In practice this may appear to work most of the time, but break
for no apparent reason when the signal happens to occur while
background code is also in a runtime library call such as malloc.
The problem in RTR has been corrected. The small penalty for this is
that RTR no longer makes any attempt to try to ensure that
messages available are not just housekeeping. Applications must
always be prepared for a timeout return status on calling
rtr_receive_message with a zero timeout, even after a wakeup
suggests that a message ought to be available.
Application writers are reminded that their RTR wakeup handlers are
subject to the same restrictions: routines like printf, malloc, and
the entire RTR API may not be used directly or indirectly from
within a signal handler. A workaround for applications with unsafe
wakeup handlers can be to link with the reentrant version of the
library -lrtr_r because different rules apply for wakeups in a
thread: applications should not call anything that is not
thread-safe, or anything that might block indefinitely, such as
rtr_send_to_server, rtr_reply_to_client, rtr_broadcast_event, or
rtr_receive_message with a non-zero timeout.
o 14-3-218 Microsoft Visual C compiler options /Gz (stdcall) and
/Gr (fastcall) supported
The RTR API functions in are now declared with the __cdecl
attribute so they can be used in applications compiled with
calling conventions other than the /Gd (cdecl) default.
o 14-3-250 Flow control has -ve credit
Applications with multiple channels engaged on more than one
facility could experience flow control difficulties causing
indefinite delays in transaction completion. This has
been corrected.
o 14-3-253 Restrictions on the RTR wakeup handler
The use of rtr_reply_to_client, rtr_send_to_server, or
rtr_broadcast_event in an RTR wakeup handler is not recommended.
They may block when they need transaction ids or flow control.
This will cause undesired behavior.
Functions permitted in an rtr_set_wakeup() handler:
In an RTR wakeup handler in an AST in an unthreaded OpenVMS
application, the use of rtr_reply_to_client(),
rtr_send_to_server(), rtr_broadcast_event(),
or rtr_receive_message() with a non-zero timeout is not
recommended. They may block when they need transaction ids or
flow control, which will cause the whole application to hang
until the wakeup completes.
In an RTR wakeup handler in a threaded application the same
rules apply. Note that wakeups are unnecessary in a threaded
paradigm, but they may be used in common code in applications
that also need to run on OpenVMS. Please note that your mainline
code continues to run while your wakeup is executing, so extra
synchronization may be required. Also note that if the wakeup
does block then it does not generally hang the whole application.
In an RTR wakeup handler in a signal in an unthreaded UNIX
application, no RTR API functions and only the very few asynch-safe
system and library functions may be called, because the wakeup
is performed in a signal handler context. An application can write
to a pipe or access a volatile sig_atomic_t variable, but using
malloc() and printf(), for example, will cause unexpected failures.
Alternatively, on most UNIX platforms, you can compile and link the
application as a threaded application with the reentrant RTR
shared library -lrtr_r.
For maximum portability the wakeup handler should do the minimum
necessary to wake up the mainline event loop. You should assume that
mainline code and other threads might continue to run in parallel
with the wakeup, especially on machines with more than one CPU.
o 14-3-255 Multiple broadcast or data received on wrong channel
When running W95/NT with Pathworks installed, RTR would not
detect that the client had closed its channel when the client
application was aborted by closing the window. RTR now
detects when the client has aborted the channel and closes the
channel.
o 14-3-258 Stop inquorate standby from going active
When there is a network segmentation in an active/standby
configuration, the segment in the minority would become active.
This behavior resulted in two active servers for the same
partition. RTR now puts the inquorate or minority server in
wt_quorum state and the majority server in active state.
o 14-8-185
o 14-3-259 Slow or hanging applications using large messages
and discarded broadcasts
The threshold for flow control has been increased from 100000 to 1000000,
and can also now be changed by defining the environment variable
RTR_MAX_CHANNEL_WAITQ_BYTES to e.g. 10000000 when starting the ACP.
When too many data bytes are queued to be sent to a destination process then
the flow control feature is activated. The sender application may then be
forced to wait for a while in the next api call that sends data, or
broadcasts
may be discarded, until the queue reduces and flow control credit is freely
granted again.
You should increase this parameter if your application sends large
broadcasts or sends or replies with large amounts of data per transaction.
Because broadcasts are subject to discarding you should not use them to
send large amounts of data reliably. You may wish to consider using a
sequence number and providing a read-only transaction in your application
to detect and request re-transmission of any discarded broadcast data.
There is also a hard limit parameter with the default 100000000 (10^8).
The channel will be closed immediately if the send wait queue exceeds this.
These tunable flow control parameters are provisional and subject to change.
o 14-3-265 Successive ACP crashes
Reception of a corrupt network message could cause a failed assertion
and demise of the RTR ACP process. The behavior has been changed to
yield a log file entry (BADNETMSG), followed by a reset of the link
concerned. If such log file entries persist for a particular pair of
nodes, it may mean that a network problem exists, and you should
consider checking the network hardware for correct operation.
The RTR KNL subsystem log entry has also been improved to better
identify the link on which it reports errors.
o 14-3-272 RTR no longer disables the TCP/IP Nagle algorithm with TCP_NODELAY
The TCP_NODELAY option which disables the Nagle algorithm was
previously enabled on all RTR platforms except Solaris. This
change improves network throughput under load. Response time
may be slightly longer under some conditions.
The option can be activated by defining the environment variable
RTR_TCP_NODELAY. This restores the old behavior on most platforms.
o 14-3-274 Opening channel for non-existent facility causes crash
An uninitialized variable caused this crash which was seen in recent
Field Test kits (214) and (218) for AIX and Solaris. This has been
corrected.
o 14-3-275 aio not available makes RTR fail with unresolved errors
for kaio_rdrw etc.
RTR for AIX exploits Asynchronous I/O for increased journal
performance. By default, aio is only `defined', i.e., disabled,
instead of `available'. Aio can be configured with the system
management tool: # smit aio.
The RTR installation procedure post_i script now makes aio
available, and ensures that aio will also be available after
a restart.
o 14-3-276 SHOW TRANSACTION on FE after current TR trimmed and before
FE reconnected causes ACP dump
Executing the command RTR SHOW TRANS on a frontend immediately after
trimming the current router from the facility could infrequently
cause the ACP to crash. This has been corrected.
o 14-8-173
o 14-3-282 Dual ported TCP router not establishing facility links"
Problems can arise if nodes in your configuration have multiple network
adapters and the IP name server is not configured to return all the
configured IP addresses for such nodes. This results in such nodes
replying to connection requests with an ID that is different to
that determined by the initiator of the connection. This can result
in refused connections, or only the first connecting facility to gain a
current router.
This version of RTR has been changed to operate correctly in such a
partially configured environment.
o 14-3-283 Image identification shows previous rtr version
Some versions of the OpenVMS/VAX kit for RTR V3.1D ECO10 were
shipped with a shared library showing an incorrect version ID.
This has now been corrected.
o 14-3-285 OpenVMS process quotas artificially constained
Prior versions of RTR would limit the maximum values that
could be specified for the ACP process quotas to 64K. This
restriction has been removed. Warning messages are generated
if the requested quotas conflict with the system wide WSMAX
parameter, of the remaining free page file space.
o 14-3-82 Requester hangs in SYS$COMMIT_TXW with RTR V3.1D - null txn
If rtr_start_tx() was called by a client followed immediately
by rtr_accept_tx(), then the application would hang
(unless rtr_start_tx() was called with a timeout). This has
been corrected. The status returned in the rtr_mt_accepted data in
such cases is RTR_STS_SYNCHCOMM (transaction committed synchronously).
This also corrects the equivalent problem with the RTR V2 API. Also,
the status returned in the TXSB for such transactions using the
V2 API is RTR$_SYNCHCOMM.
o 14-8-128 The RTR ACP now uses an asynchronous method when closing its links
This version of RTR will defer the deassigning of its network channels
during the closesocket routine. This change allows RTR to handle
other requests while the channels are being run-down. RTR no longer appears
to pause while a network link is being deassigned.
o 14-8-131 Failure to come up in remember mode
The non-availability of a remote journal at shadow recovery time will
cause a partition that was previously processing in remember mode to
resume processing in that mode. Prior versions of RTR would set the
partition state to 'shawdow-recovery-fail', and transactions could not be
processed until the configuration was manually corrected.
o 14-8-144 When disconnected the ASYNC cable from the client, the RTR dump
was generated on the client."
Disconnecting a cable that was being used by an asynchronous DECnet
link to a remote machine could cause an ACP failure when the
transport marked the sockets as invalid. RTR has been changed to
handle this error by temporarily suspending all network activity
on the affected node. Network activity will resume as soon as
the network is found to be usable again.
o 14-8-154 RTR Router Crash
Router ACPs configured to accept anonymous clients could fail
when handling a network link loss event. This has been corrected.
o 14-8-162 Hanging servers
Transaction recovery as a result of server failover could result
in server applications getting hung in 'local recovery' state if
it also happened that more than 10 client channels had simultaneously
caused new transactions to be presented to the backend node. This has
been fixed both by increasing the limit to 50 and by adding a check to
make sure that recovery is complete before enforcing the limit, which is
designed to keep a backend node from getting overwhelmed when transactions
are coming in at a rate faster than it can handle.
o 14-8-175 RTRACP core dump while idle
When a Frontend node is trimmed from a Frontend/Router
Facility Definition, the Facility Descriptor block is not fully
deleted. This causes a core dump when RTR attempts to verify the
facility after a network link loss. RTR now properly deletes
the Facility Descriptor Block associated with the trimmed frontend.
o 14-8-181 RTRACP Crashes
It was possible that RTR on a frontend could select a router as it's
current router immediately after that router had been trimmed from
the facility. This could potentially leave the frontend in a 'connecting'
state. This has now been corrected.
The following restrictions apply to this kit:
o 14-1-285: A temporary inconsistency in shadow server state can occur
during initial facility startup of a shadowed configuration. A
shadow server can erroneously remain in state "sec_act" until the
rest of the facility has been started.
o 14-3-67: An application's wakeup routine may be called more often
than necessary.
INSTALLATION NOTES:
Please refer to the Installation Guide supplied with previous versions.
Note that this kit runs only on Intel processors. The installation is
the same for both Windows NT and Windows 95.
From an empty directory, run the self-extracting RTRI231.EXE. Then run
SETUP.EXE to install RTRI231.
All trademarks are the property of their respective owners.
This patch can be found at any of these sites:
Colorado Site
Georgia Site
Files on this server are as follows:
rtri231.README
rtri231.CHKSUM
rtri231.CVRLET_TXT
rtri231.exe
rtri231.CVRLET_TXT
|