Reliable Transaction Router
System Manager's Manual


Previous Contents Index

3.6.6 Controlling Transaction Replay

RTR has implemented the capability of controlling transaction replay in cases where a "killer message" happens during a transaction replay preventing recovery from continuing normally. A "killer message" presents a situation where server availability is lost because of the presence of a message capable of causing repeated server application failure during recovery. This is typically the result of an improperly handled condition or application programming error within the server itself. Under such circumstances it may be desirable to sidestep a particular transaction, maintain server operation, and manually process the transaction at some later time.

The RTR solution is to establish, for a given partition, the maximum number of retries for any given transaction presented during recovery. Once this limit has been exceeded, the offending transaction is removed from the recovery process and is written to the journal as an exception record. Subsequent processing of this transaction requires manual intervention by someone qualified to evaluate and correct the situation in both the application and in RTR. Once the application status is understood, the set transaction command can be used to update the journal, thus insuring that the final state of any manually transacted exceptions are accurately reflected in future recovery operations.

The recovery retry count indicates the maximum number of times that a transaction should be presented for recovery before being written to the journal as an exception. Once a transaction has been recorded as an exception, it is no longer considered eligible for recovery and requires manual processing by a qualified individual.

The recovery retry count is partition-specific, and applies to both local and shadow recovery operations. The default is no limit on the number of retries, which permits a killer message to bring down all available servers servicing a given partition.

The recovery retry count should be set before starting (or restarting) the application servers so that the limit is established prior to the start of recovery operations.

3.6.6.1 Command Line Example


RTR> SET PARTITION/RECOVERY_RETRY_COUNT=3 Facility1:Partition1 

For more information on the SET PARTITION command see Chapter 6.

3.6.6.2 Programming Information

To set the partition transaction recovery limit, program the set_qualifier argument of rtr_set_info() as follows:


    rtr_qualifier_value_t   set_qualifiers[ 2 ]; 
    rtr_uns_32_t   newLimit = . . .; 
 
    set_qualifiers[ 0 ].qv_qualifier = rtr_partition_rcvy_retry_count; 
    set_qualifiers[ 0 ].qv_value = &newLimit; 
    set_qualifiers[ 1 ].qv_qualifier = rtr_qualifiers_end; 
    set_qualifiers[ 1 ].qv_value = NULL; 

3.7 Displaying Partition Information

Information on the definition and state of a partition is displayed with the SHOW PARTITION command. The information of interest in the context of partition management relates to the backend instance of the partition. For more information see the SHOW PARTITION command in Chapter 6.

3.7.0.1 Command Line Example


RTR> show partition/backend 
 
Backend partitions on node BE1 in group "Facility1" at Wed Feb 24 15:07:50 1999 
 
Partition name                     Facility                       State 
RTR$DEFAULT_PARTITION_16777217     RTR$DEFAULT_FACILITY           active 
RTR$DEFAULT_PARTITION_16777218     RTR$DEFAULT_FACILITY           active 
 


Chapter 4
Transaction Management

4.1 Overview

This section describes the concepts of RTR's transaction management capability.

The RTR transaction is the heart of an RTR application, and transaction state is the property that characterizes a transaction's current condition. Whenever a transaction progresses from one stage to another, the transaction state is updated to reflect a transaction transition. Transaction states are maintained in memory and some types of transaction states are also stored in the RTR Journal for recovery purposes.

Three different types of states are used internally by RTR to keep track of transaction status.

These three state types are very closely related. The Transaction Runtime State, also known as Transaction State, describes how a transaction progresses from a RTR role (FE, TR, BE) point of view. For example, a transaction can enter a stage in which its transaction state from an RTR frontend viewpoint is different than the transaction state of an RTR router.

The Transaction Journal State describes how a transaction running on an RTR backend progresses from the RTR journal perspective. When a transaction transitions, its Transaction Journal State gets updated and the new state along with other information pertaining to this transaction is stored in the RTR journal. The Transaction Journal State is primarily used by RTR to perform the recovery replay of a transaction after a failure, if necessary. An RTR frontend and router will not see this state.

The Transaction Server State describes transaction state transition seen by the server. RTR uses this state to determine if a server is available to process a new transaction or if a server has voted on a particular transaction. As with the Transaction Journal State, the Transaction Server State is only managed at the backend.

RTR provides a set of comprehensive management utilities to help users closely monitor the flow of a transaction and all three types of states associated with that transaction. These utilities help users understand how a transaction migrates from one stage to another and help diagnose problems.

The RTR SHOW TRANSACTION command can be used to examine a transaction's up-to-date status on frontend, router or backend roles. With this command, users can see all three types of transaction states of a particular transaction and also understand how the RTR journal and application servers perceive this transaction. When a transaction commits or aborts, all status associated with this transaction is removed from memory and can no longer be monitored by the command.

The RTR DUMP JOURNAL command can be used to trace and review the flow of a transaction. The RTR journal saves all of the information about a transaction, its transaction journal state, the transaction messages (records) received from the RTR client, and the content of a message sent to the server. The information will be kept until a transaction is committed and forgotten.

The RTR SET TRANSACTION command is used to modify a live transaction to change the current state of a transaction to a new state. This command can be used to circumvent a difficult situation. For example, in a situation where two shadowed servers are configured, the system administrator might decide not to replay (recover) all transactions in a shadowed RTR journal after a failure. The SET TRANSACTION command could set specified transactions in a PRI_DONE or remember state to a DONE state and avoid the delay of transactions being remembered from a journal for fast recovery. The SET TRANSACTION command should only be used by experienced RTR system administrators as the command introduces the risk of corrupting or losing transactions if used incorrectly. It can be used on the backend only and the RTR log file must be turned on for this command.

Log file entries are made for all transaction state changes for debugging and auditing purposes.

4.1.0.1 Command Line Examples

An example of the use of the SET TRANSACTION command:


RTR> start rtr 
RTR> set log/file=settran 
RTR> set transaction/state=PRI_DONE/new_state=DONE/facility=Facility1/partition=Partition1 * 

This example would set all transactions with the wildcard * in the current state of PRI_DONE (remember) to DONE on the facility Facility1 and the partition Partition1. The log file, settran, would record the transaction state changes. The changes could be viewed with the SHOW TRANSACTION command or the DUMP JOURNAL command. In a shadow recovery situation this would clear the journal of remember transactions and provide for a fast recovery of access to the database if needed.

For detailed information on these commands see Chapter 6.

4.1.1 Exception Transactions

Transactions can cause servers to fail after the VOTE phase and impact availability of a server in a recovery. These "EXCEPTION" transactions can now be flagged by RTR as "fail transactions" after the user sets the attempts at recovery from a failure with the SET PARTITION/RECOVERY_RETRY_COUNT=nn command. They then can be identified and removed from the RTR journal and from the system to allow recovery to continue with the SET TRANSACTION command. In the case of a flagged "EXCEPTION" transaction the system administrator can take action by changing the state of the "EXCEPTION" transaction to that of "DONE" with the SET TRANSACTION/STATE=EXCEPTION/NEW_STATE=DONE to allow the recovery to continue.

4.1.2 Transaction State Changes

There are eight valid state changes allowed for the SET TRANSACTION command. Attempting to change transaction state to a state that is not allowed produces an error message of %RTR-E-INVSTATCHANGE, Invalid to change from current state to the specified state. The Table 6-19 table identifies the valid state changes.

Table 4-19 Valid Transaction State Transitions
    NEW STATE  
Current State COMMIT ABORT EXCEPTION DONE
SENDING   YES    
VOTED YES YES    
COMMIT     YES YES
EXCEPTION YES     YES
PRI_DONE       YES

Four typical situations are listed below where transaction state changes by the system administrator are allowed.

  1. State SENDING changed to state ABORT.
    The application server, after receiving a rtr_mt_msg_1 message and before calling rtr_accept_tx() for a particular transaction, experiences a "hung" situation and cannot proceed. Aborting this transaction with the SET TRANSACTION command is the only way to correct it. Internally, RTRACP will send the ABORT message to the router as well as the all participating servers to abort this transaction in a consistent matter.
  2. State VOTED changed to state COMMIT.
    This is the case where a application server running on the backend may have been separated from the rest of participating servers after casting the VOTE for the transaction. The other servers may have already committed the transaction but not "forgotten" it. As far as the application is concerned, this global transaction is committed and all changes have been committed to the underlying database on the different sites. However, the local transaction record is still in VOTED state in the RTR journal. You can use the command to manually commit the local transaction branch.
    Note that this command is only applicable if there is no coordinating router running, i.e., servers are separated from the rest of the RTR network. If this is not the case, RTR rejects the command.
  3. State VOTED changed to state ABORT.
    In a similar manner to the VOTED-to-COMMIT situation described above, the server has been separated from the other participating servers and all other participants aborted this transaction; use this command to manually abort the local transaction branch.
    Note that this command is only applicable if there is no coordinating router running and servers are separated from the rest of the RTR network. If this is not the case, RTR rejects the command.
  4. State COMMIT changed to state DONE.
    This is the case where, for example, a server crashed while performing an SQL commit immediately after receiving a mt_accepted message. The transaction is in COMMIT state as recorded in the RTR journal and the transaction is also committed in the underlying database.

After the SET TRANSACTION command is executed the DUMP JOURNAL command can be used to verify the result.


Chapter 5
RTR Monitoring

This chapter contains a description of the RTR monitor. The RTR monitor gives you a means of viewing the activities of RTR and your applications. Many different aspects of RTR's behaviour can be viewed, allowing the activities and performance of RTR to be analyzed.

5.1 Introduction

The RTR monitor provides a means to continuously display the status of RTR and the applications using it.

It can be used to check the correct operation of an RTR network, showing information useful for tuning, capacity planning, and locating configuration and application errors.

The information displayed is composed of named data items which are continuously updated by RTR. These data items can be displayed in various formats, and combined using simple arithmetic operators and constants.

The monitor is invoked with the RTR MONITOR command. RTR monitor displays a monitor picture that is periodically updated. See Section 6.2 for the full syntax of the MONITOR command.

A monitor picture contains elements that are either text (such as labels and titles) or variables derived from data items. Monitor pictures can be defined either interactively at the RTR> prompt or defined in a file called a monitor file.

You can use monitor files that are provided with RTR, and you can create your own. See Appendix A for information about creating monitor files.

5.2 Standard Monitor Pictures

A number of standard monitor pictures are supplied with RTR. These cover most of the usual monitoring requirements. You may define your own monitor pictures or alter the standard ones to suit your particular needs. Table 5-1 contains a list of the standard monitor pictures. To display one of these pictures use the following command at the RTR prompt:


 
 RTR> MONITOR picture-name
 

The files for standard monitor pictures are installed on your system when RTR is installed. The location of these files is platform-specific. The filenames are the picture name appended with .mon (You type the filename without .mon when starting the display.)

Note

Obsolete monitor pictures have been removed from the documentation.

Table 5-1 Standard Monitor Pictures
Picture name Description
accfail Shows link transport name for links on which a connection attempt was declined, with a reason for failure. The most recent entry is highlighted.
acp2app Displays counts of messages and number of bytes from RTRACP to the application, as viewed from a specific node.
active Displays a list of RTR processes, and for each process the number of transactions they have started, the number of transactions they have completed and the number of transactions that are still active.
app2acp Displays counts of messages and number of bytes from the application to RTRACP, as viewed from a specific node.
broadcast Displays information about RTR user events by process, including number of user events enqueued, received, and discarded.
calls Displays the total number of RTR API calls and their success or failure for the processes on all the nodes being monitored. All RTR message are also show by message type. (Pending messages are ones that an application has not received yet). Use the /IDENTIFICATION=process-id qualifier to display the values for one specific process, otherwise the total values for all processes are displayed.
channel Displays the roles of the channels declared by an application. This can be useful as a debugging tool in the early stages of application development.
congest Displays a sorted list of nodes responsible for causing the most congestion since RTR was last started, and the instantaneous state.
connects Displays connection status summary, including the number of links up and down, and a list of links with state (up or down), architecture, network transport, and fail-reason, if any.
ddtm Displays counts of RTR calls to DECdtm, as viewed from a specific node, for all PIDs, processes, and images.
dtx Displays counts of RTR DTX calls including open, start, prepare, rollback, commit, and close, as viewed from a specific node for all PIDs, processes, and images.
dtxrec Displays a summary of DECdtm transaction recovery (DTX), as viewed from a specific node for all PIDs, processes, and images.
event Displays event routing data by facility. Information includes events in transit and destination information showing number of events enqueued, processed, and discarded.
facility Displays a number of per facility data items. The /FACILITY qualifier can be used to say which facility should be monitored. If this is not specified then the totals of the data items for all facilities are displayed.
flow Displays the flow control counters.
frontend Displays frontend status and counts by node and facility, including frontend state current router, reject status, retry count, and quorum rejects.
group Shows server and transaction concurrency on a partition basis.
ipc Shows counts of inter-process communication (IPC) activity in the RTR ACP and active RTR applications.
ipcrate Displays rate information on IPC messages, byte counts, and IO primitive usage.
jcalls Displays counts of successful (success), failed (fail) and total journal calls for local and remote journals.
journal Displays the current journal usage on a node. Local node journal statistics are provided, and data for non-local journals accessed from the local node. Include statistics covering total number of entries and records written, the number of records read, and how many bytes were involved. Bar graphs showing current usage of journal blocks (as a percentage of the total) are also provided.
link Displays a number of per link data items. The /LINK=link-name qualifier can be used if the values for one specific link are to be displayed, otherwise the total values for all links are displayed.
netbytes Displays a list of the links to other nodes. For each link, the total number of bytes received and sent on that link and the number of bytes received and sent per second are displayed.
netstat For each link, displays the connection status in detail, with the link state (up or down), and architecture type of remote node (such as VAX, I386, Alpha, and so on).
partit Displays the status of server partitions. Shows the partition identifiers, key ranges and key segments, and the status of the servers (active, recovering and so on).
queues Shows transaction queues on a partition basis.
quorum Tracks (by facility) the configuration, reachability and quorum status of one or more nodes.
rdm Displays memory used by each RTR subsytem.
recovery Displays the status of server recovery procedures, such as waiting for quorum, catching up transactions, and so on.
rejects Displays the last rtr_mt_rejected message received by each running process.
rejhist Displays the last ten rtr_mt_rejected messages received by the selected process.
response Displays the elapsed time that a transaction has been active on the opened channels of a process.
rfb Displays router failback operations, including both a summary and detail by facility.
rolequor A detailed picture of the various data items displayed in the QUORUM picture, separated by roles. If a quorum problem is encountered, this picture may be useful for problem diagnosis.
routers Displays information on a router node. It gives an indication of the utilization of the router in terms of transactions and broadcasts routed through this node. Useful to monitor performance, or locate problems.
routing Displays statistics of transaction and broadcast traffic by facility.
rscbe Displays the most recent calls history for the RSC subsystem on a backend node.
rtr Displays various per node data items.
stalls Displays in real time any network links that are currently stalling in their outbound traffic, and provides a history of the stalls that the various links encountered during their lifetime.
system Displays the state of critical resources within the RTR environment. If a resource has exceeded a predefined threshold, a warning indicator is displayed.
tps Displays the rate of transaction commits carried out by each process using RTR.
tpslo Displays low end of the rate of transaction commits carried out by each process using RTR.
traffic Displays a list of the links to other nodes. Shown for each link are: byte rate, packet rate, message rate and congestion, in both directions. Average packets per second is also shown.
trans Displays transactions for a frontend, router and backend.
v2calls Shows RTR Version 2 verb usage through the interoperability subsystem. The screen layout is identical to the RTR Version 2 monitor calls picture.
xa Displays XA counter information including success and failure as well as call and readonly counters.


Previous Next Contents Index