Reliable Transaction Router
System Manager's Manual


Previous Contents Index

2.9.3 Interoperation with RTR Version 2 Using DECnet

Reliable Transaction Router is interoperable with RTR Version 2.2D ECO3 or later when running on a platform that supports DECnet; that is OpenVMS, Compaq Tru64 UNIX, SUN, Windows 95 or Windows NT.

Note that it is not possible to mix Version 2 and Version 3 routers and backends; all router and backend nodes in a facility must be either Version 2 or Version 3. Frontend nodes may be either Version 2 and Version 3.

Defining the facility:

The use of the "dna." prefix assumes that the default network transport is TCP/IP. The default network transport can be changed to DECnet by setting the environment variable RTR_PREF_PROT. On Windows 95 and Windows NT, you can use one of the following statements in your autoexec.bat .


set RTR_PREF_PROT=RTR_TCP_FIRST 
set RTR_PREF_PROT=RTR_TCP_ONLY 
set RTR_PREF_PROT=RTR_DNA_FIRST 
set RTR_PREF_PROT=RTR_DNA_ONLY 

These set the choice of network transport to TCP/IP with fallback to DECnet, TCP/IP only, DECnet with fallback to TCP/IP or DECnet only.

For Reliable Transaction Router Version 3.2 for OpenVMS, refer to Section 2.10 for further information))

Trouble-shooting network connections:-

If the RTR V3 frontend fails to connect with the RTR V2 router node, then you can make a basic check by executing a dlogin from the RTR V3 node to the OpenVMS router node. If this fails, consult your Network Manager. (For Compaq Tru64 UNIX machines, ensure that the DECnet library is installed as /usr/shlib/libdna.so).

Note

1 For Reliable Transaction Router Version 3.2 for OpenVMS, the default network transport is DECnet.

2.10 Network Protocol Selection on OpenVMS

2.11 Running RTR as a Service on Windows NT

Once the RTR as Service has been installed (see Installation Guide), RTR may be started or stopped from the Control Panel / Services panel using the START and STOP buttons provided.

Note

Pressing START and STOP or the reverse in quick succession (within five or so seconds, depending on the speed of your computer) may cause undesirable results. This is because the Service executes quickly, making available the other action button, but the requested RTR action may not have completed when the second action button is pressed. It is therefore possible, for example, that the STOP action may be blocked by an incomplete START action. Although the Service will claim to be STOPped, RTR may in fact remain started. Pressing whichever action button is functioning should repair the problem.

By default, RTR will not restart automatically at system reboot time. You can change this by setting the Control Panel / Services entry for RTR.

Occasionally, an RTR process may continue to run after STOP has been pressed, and subsequent START and STOP actions may have no effect or produce an error. Under these circumstances, it will be necessary to intervene directly, as a privileged (SYSTEM) user, to stop RTR. This can be done either using RTR commands or with the Task Manager, or by rebooting.

2.11.1 Customizing the RTR Windows NT Service

While starting RTR, the Service looks for the file usrstart.rtr in the RTR home directory. On finding the file, the Service executes any RTR commands it may contain. RTR commands from usrstart.rtr execute after RTR has been started.

From the point of view of the Service, the RTR home directory is found in the system-level environment variable rtr_directory , or, if that is not defined, then the directory from which the Service was executed.

For the RTR Service to use it, rtr_directory must be defined in the system-level environment variables list, not the user-level environment variables list. Also, the system must be rebooted after the definition of rtr_directory is either created or changed for it to be used.

If a user-level copy of rtr_directory exists, it must identify the same RTR home directory as the system-level copy, or if there is no system-level copy, the directory containing the currently registered Service program. If it does not, behavior of RTR is undefined. Changing the value of rtr_directory , or reregistering the service from another directory while RTR is running, is dangerous and should be avoided. Starting RTR from the Service, then stopping it from DOS (or the reverse) should also be avoided.

If you put STOP RTR in the usrstart.rtr file, it will stop RTR. The Service will not detect that RTR has been stopped and will offer only the STOP action button. Pressing the STOP button will fix the problem.

Similarly, when the Service stops RTR, it searches the RTR home directory for the file usrstop.rtr and, if the file exists, execute any RTR commands in it. User commands from usrstop.rtr are executed before RTR has stopped.

WARNING

If you put QUIT or EXIT in either usrstart.rtr or usrstop.rtr , RTR will exit improperly. As a result, an RTR command server process incorrectly remains active, preventing the Service from starting or stopping RTR, and preventing the RTR command server from exiting. Because the RTR command server executes under the SYSTEM account, it cannot be stopped from Task Manager other than by the SYSTEM account.

2.11.2 Files Created by the RTR Windows NT Service

If RTR is started from the Service rather than via a Command Prompt window, several files are created in the RTR root directory. srvcin.txt is created to act as a command line input source; srvcout.txt acts as a container for console output; rtrstart.rtr contains the startup commands. When the Service stops RTR, it recreates srvcin.txt and creates rtrstop.rtr for stopdown commands. Creation of these files is unconditional; that is, they are created every time RTR is started or stopped, whether or not they already exist. RTR will thus ignore (and overwrite) any changes made to one of these files.

2.12 How RTR Selects Processing-states (Roles) for Nodes

This section discusses how RTR assigns roles to backend node partitions, and how routers are selected.

2.12.1 Role Assignment for Backend Node Partitions

RTR assigns a primary or secondary processing state to a partition (or a key-range definition), consisting of one or more server application channels, which may or may not share a common process. All such server channels belonging to a given partition will have the same processing state on a given node. However, the processing state for the same partition will normally be different on different nodes. The exception is the case of the standby processing state. Because a given partition can have multiple standby nodes, several of these nodes may be in this state.

RTR determines the processing state of a given partition through the use of a globally managed sequence number for that partition. By default, the RTR master router will automatically assign sequence numbers to partitions during startup. When a server is started up on a backend node and declares a new partition for that node, the partition initially has a sequence number of zero. When the partition on that backend makes an initial connection to the master router, the router increases its sequence number count for that partition by one and assigns the new sequence number to the new backend partition. The active node with the lowest backend partition sequence number gets the primary processing state in both shadow and standby configurations. That node is also referred to as the primary node, though the same node could have a standby processing state for a different partition.

Under certain failover conditions, backend partitions may either retain their original sequence number or be assigned a new sequence number by the router. If a failure is caused by a network disruption, for example, a backend partition will retain its sequence number when it reconnects with the router. However, if the backend node is rebooted or RTR is restarted on the backend node, a new sequence number will be assigned by the router to any partitions that start up on that node. Routers will only assign new sequence numbers to backend partitions that have a current sequence number of zero, or if the backend partition is joining an existing facility and has a sequence number that conflicts with another backend partition on another node.

Sequence number information can be obtained from the SHOW PARTITION command. In the output of this command the sequence number is indicated by the relative priority. The following example shows a sample of the SHOW PARTITION command from a router partition. This example shows that the backend partition called Bronze has a sequence number of 1, and backend partition called Gold has a sequence number of 2.


Router partitions on node SILVER in group test at Mon Mar 22 14:51:16 1999 
 
State:                        ACTIVE 
Low bound:                         0     High bound:               4294967295 
Failover policy:                                              fail_to_standby 
Backends:                                                         bronze,gold 
 States:                                                      pri_act,sec_act 
 Relative priorities:                                                     1,2 
Primary main:                 bronze     Shadow main:                    gold 
 

The SHOW PARTITION command on each backend node is as follows:


Backend partitions on node BRONZE in group "test" at Mon Mar 22 14:52:32 1999 
 
 
Partition name:                                                            p1 
Facility:       RTR$DEFAULT_FACILITY     State:                       pri_act 
Low bound:                         0     High bound:               4294967295 
Active servers:                    0     Free servers:                      1 
Transaction presentation:     active     Last Rcvy BE:                   gold 
Active transaction count:          0     Transactions recovered:            0 
Failover policy:     fail_to_standby     Key range ID:               16777217 
Master router:                silver     Relative priority:                 1 
Features:                                         Shadow,NoStandby,Concurrent 
 
Backend partitions on node GOLD in group "test" at Mon Mar 22 14:54:12 1999 
 
Partition name:                                                            p1 
Facility:       RTR$DEFAULT_FACILITY     State:                       sec_act 
Low bound:                         0     High bound:               4294967295 
Active servers:                    0     Free servers:                      1 
Transaction presentation:     active     Last Rcvy BE:                 bronze 
Active transaction count:          0     Transactions recovered:            0 
Failover policy:     fail_to_standby     Key range ID:               16777216 
Master router:                silver     Relative priority:                 2 
Features:                                         Shadow,NoStandby,Concurrent 

The following description shows how sequence numbers are initially assigned in a simple partition with two backends named Bronze and Gold, and a router named Silver.


  1. A partition (with shadowing enabled) is started on node Bronze.
  2. The partition on Bronze obtains sequence number 1 from the router and becomes the primary.
  3. Another server on the same partition (with the same attributes) is started on Gold.
  4. The partition on Gold obtains sequence number 2 from the router and becomes the secondary.
  5. Node Bronze crashes and reboots (the partition sequence number on Bronze is reset to 0). The partition on Gold goes into Remember.
  6. When the server starts, The partition on Bronze obtains sequence number 3 from the router and becomes the secondary, Gold now becomes the primary.
  7. The network connection from node Silver to node Gold fails. The partition on Bronze becomes the primary. The partition on node Gold loses quorum and is in a wait-for-quorum state.
  8. The network connection to node Gold is reestablished. The partition on Gold retained its original sequence number of 2 and retains the primary role while the partition on Bronze reassumes the secondary role.

Alternately, the roles of backend nodes can be specifically assigned with the /PRIORITY_LIST qualifier to the SET PARTITION command. In the previous example the /PRIORITY_LIST qualifier can be used to insure that when Bronze fails and then returns to participate in the facility it then becomes the active primary member. To insure this, the following command would be issued on both backend systems immediately after the creation of the partition:


SET PARTITION test/PRIORITY_LIST=(bronze,gold) 

It is recommended that the same priority list order be used on all partition members. If a different list is used then the router will determine the sequence number for conflicting members through the order in which those members joined the facility. For example, if the above command were issued only on Bronze and Gold had the opposite priority list, then the router would assign the lower sequence number to the backend that joined the facility first.

The /PRIORITY_LIST feature is very useful in cluster configurations. For example, Site A and Site B each contain 2-node clusters. The facility is configured such that at Site A, Node-A1 has the primary active partition and Node-A2 has the standby partition. At Site B, Node-B1 is the secondary active partition and Node-B2 has the standby of the secondary. The partition could be defined such that the standby node, Node-A2, would become active if the primary node were to fail. For example, issuing the following command on all four nodes for this partition guarantees that the specified list is followed when there is a failure.


SET PARTITION test/PRIORITY_LIST=(Node-A1,Node-A2,Node-B1,Node-B2) 

Using the SHOW PARTITION command from the router, this partition would be as follows:


Router partitions on node SILVER in group "test" at Mon Mar 22 17:22:06 1999 
 
 
State:                        ACTIVE 
Low bound:                         0     High bound:               4294967295 
Failover policy:                                              fail_to_standby 
Backends:                                     node-a1,node-a2,node-b1,node-b2 
 States:                                      pri_act,standby,sec_act,standby 
 Relative priorities:                                                 1,2,3,4 
Primary main:                node-a1     Shadow main:                 node-b1 

However, the partition could also be configured so that the secondary active node, Node-B1, would become the primary node if the original primary system were to fail. This is controlled with the /FAILOVER_POLICY qualifier to the SET PARTITION command. The default is /FAILOVER_POLICY=STAND_BY.


If the relative priority (sequence number) for Node-A2 is changed to four it still becomes the primary active server if Node-A1 fails because the failover policy indicates a fail_to_standby requirement for this facility.


SET PARTITION test/PRIORITY_LIST=(Node-A1,Node-B1,Node-B2,Node-A2) 

After issuing this command the router partition appears as follows. Note the change in relative priorities for the backends.


Router partitions on node SILVER in group test at Tue Mar 23 13:29:41 1999 
 
 
State:                        ACTIVE 
Low bound:                         0     High bound:               4294967295 
Failover policy:                                              fail_to_standby 
Backends:                                     node-a1,node-a2,node-b1,node-b2 
 States:                                      pri_act,standby,sec_act,standby 
 Relative priorities:                                                 1,4,2,3 
Primary main:                node-a1     Shadow main:                 node-b1 

The following SET PARTITION command can be issued to change the facility so that Node-B1 will become the primary active server if Node-A1 fails.


SET PARTITION test/FAILOVER_POLICY=shadow 

The /FAILOVER_POLICY qualifier is intended for use in selecting a new active primary in configurations where shadowing is enabled. This qualifier takes precedence over the /PRIORITY_LIST qualifier. The /PRIORITY_LIST qualifier is intended for use in determining the failover order for specific nodes. It is most useful in cluster configurations where it can be used to specify the exact failover order for the nodes within the cluster. For example, in a standby facility on a cluster of four nodes, the /PRIORITY_LIST qualifier can be used to specify the desired order of failover for those cluster members. Some machines within a cluster may be more powerful than other machines. This feature allows for the most efficient use of those machines.

2.12.2 Router Selection

Within the scope of a given facility, routers and backends connect to one another. However, nodes with a specific role do not connect to nodes with the same role, i.e., routers do not connect to other routers. Frontends choose only one router to connect to at a given time. This router is called the Current Router for that frontend within the scope of a facility.

A backend connects to all routers defined within a facility. The connected router with the lowest network address is designated the master router. Internally, a node is identified through a structure called the Kernel Net ID. The Kernel Net ID is a concatenation of all network addresses a node is known as for all the protocols and interfaces that it supports. The master router designation is only relevant to a backend. It is where the backend goes to obtain and verify partition configuration and facility information.

Routers are made known to the frontend systems through the list specified in the /ROUTER=(list) qualifier to the CREATE FACILITY command. This list specifically determines the preferred router. If the first router specified is not available, the next one on the list is chosen. When the facility is created on the frontend, the list of routers specified can be a subset of the routers contained within the entire facility. This can be used to prevent a frontend from selecting a router that is reserved for other frontend systems. Fail back of routers is supported. This means that if the preferred router was not available, and it became available later, the frontend would automatically fail back and connect to its preferred router.

Router connectivity can also be controlled through the use of the /BALANCE qualifier either on the CREATE FACILITY command or on the SET FACILITY command. When the /BALANCE qualifier is used, the list of routers specified in the router list is randomized, making the preferred router a random selection within the list. Assume the following command is issued from a frontend:


RTR CREATE FACILITY test/FRONTEND=Z/ROUTER=(A,B,C) 

The frontend attempts to select a router based on the priority list A, B, C, with A being the preferred router. If the /BALANCE qualifier is added to the end of this command then the preferred router is randomly selected from the three nodes. This random list exists for the duration of the facility. After the facility is stopped, a new random list is made when the facility is created again. The exception to this is if a router does not have quorum (sufficient access to backend systems) then that router will no longer accept connections from frontend systems until it has again achieved quorum. The /BALANCE qualifier is only valid for frontend systems.


Previous Next Contents Index