2    Using CAA for Single-Instance Application Availability

The cluster application availability (CAA) subsystem tracks the state of members and resources in a cluster (such as networks, applications, tape drives, and media changers). CAA monitors the required resources of application resources in a cluster, and ensures that applications run on members that meet their needs.

This chapter covers the following topics:

2.1    When to Use CAA

CAA is designed to work with applications that run on one cluster member at a time. If the cluster member on which an application is running fails, or if a particular required resource fails, CAA relocates or "fails over" the application to another member that either has the required resources available or on which the required resource can be started.

Multi-instance applications may find it more useful to use a cluster alias to provide transparent application failover. Typically, multi-instance applications achieve high availability to clients by using the cluster alias as discussed in Chapter 3. However, CAA is often useful for multi-instance applications because it allows for simplified, central management (start and stop) of the applications and restarting on application failure. Using CAA gives you the added value of automatic application startup and shutdown at boot time or at shutdown time, without having to add additional rc3 scripts.

See the TruCluster Server Cluster Administration manual for a general discussion of the differences between the cluster alias subsystem and CAA. Also, see Chapter 3 for examples of how to use the default cluster alias with multi-instance applications for high availability.

2.2    Resource Profiles

A resource profile is a file containing attributes that describe how a resource is started, managed, and monitored by CAA. Profiles designate resource dependencies and determine what happens to an application when it loses access to another resource on which it depends.

There are four resource profile types: application, network, tape, and changer.

Some of the attributes that you can specify in a resource profile are:

Complete lists of profile attributes according to resource type are located in Section 2.2.2, Section 2.2.3, Section 2.2.4, and Section 2.2.5.

All resource profiles are located in the clusterwide directory, /var/cluster/caa/profile. The file names of resource profiles take the form resource_name .cap. The CAA commands refer to the resources only by the resource name resource_name.

Each resource type, application, network, tape, and changer, has its own kind of resource profile. The examples and tables in the following sections show each type of resource profile and the entries in that profile.

There are required and optional profile attributes for each type of profile. The optional profile attributes may be left unspecified in the profile. Optional profile attributes that have default values are merged at registration time with the values stored in the template for that type and the generic template. Each resource type has a template file that is stored in /var/cluster/caa/template, named TYPE_resource_type.cap, with default values for attributes. A generic template file for values that are used in all types of resources is stored in /var/cluster/caa/template/TYPE_ generic.cap.

The examples in the following sections show the syntax of a resource profile. Lines starting with a pound sign (#) are treated as comment lines and are not processed as part of the resource profile. A backslash (\) at the end of a line indicates that the next line is a continuation of the previous line. For a more detailed description of profile syntax, see caa(4).

2.2.1    Creating a Resource Profile

The first step to making an application highly available is to create a resource profile. You can use any of the following methods to do this:

You can use any of these methods together. For example, you can use the caa_profile command to make a resource profile and then use a text editor to manually edit the profile.

You can find several example profiles in the /var/cluster/caa/examples directory.

After you create a resource profile, you must register it with CAA before a resource can be managed or monitored. See Section 2.4 for a description of how to register an application.

2.2.2    Application Resource Profiles

Table 2-1 lists the application profile attributes. For each attribute, the table indicates whether the attribute is required, its default value, and a description.

Table 2-1:  Application Profile Attributes

Attribute Required Default Description
TYPE Yes None The type of the resource. The type application is for application resources.
NAME Yes None The name of the resource. The resource name is a string that contains a combination of letters a-z or A-Z, digits 0-9, or the underscore (_) or period (.). The resource name may not start with a period.
DESCRIPTION No Name of the resource A description of the resource.
FAILURE_THRESHOLD No 0 The number of failures detected within FAILURE_INTERVAL before CAA marks the resource as unavailable and no longer monitors it. If an application's check script fails this number of times, the application resource is stopped and set offline. Tracking of failures is disabled if the value is zero (0).
FAILURE_INTERVAL No 0 The interval, in seconds, during which CAA applies the failure threshold. Tracking of failures is disabled if the value is zero (0).
REQUIRED_RESOURCES No None A white-space separated, ordered list of resource names that this resource depends on. Each resource to be used as a required resource in this profile must be registered with CAA or profile registration will fail. For a more detailed explanation, see Section 2.2.2.1.
OPTIONAL_RESOURCES No None A white-space separated, ordered list of optional resources that this resource uses during placement decisions. Up to 58 optional resources can be listed. For a more complete explanation, see Section 2.2.2.3.
PLACEMENT No balanced The placement policy (balanced, favored, or restricted) specifies how CAA chooses the cluster member on which to start the resource.
HOSTING_MEMBERS No None An ordered, white-space separated list of cluster members that can host the resource. This attribute is required only if PLACEMENT equals favored or restricted. This attribute must be empty if PLACEMENT equals balanced.
RESTART_ATTEMPTS No 1 The number of times CAA will attempt to restart the resource on a single cluster member before attempting to relocate the application. A value of 1 means that CAA will only attempt to restart the application once on a member. A second failure will cause an attempt to relocate the application.
FAILOVER_DELAY No 0 The amount of time, in seconds, CAA will wait before attempting to restart or fail over the resource.
AUTO_START No 0 A flag to indicate whether CAA should automatically start the resource after a cluster reboot, regardless of whether the resource was running prior to the cluster reboot. When set to 0, CAA starts the application resource only if it had been running before the reboot. When set to 1, CAA always starts the application after a reboot.
ACTION_SCRIPT Yes None The resource-specific script for starting, stopping, and checking a resource. You may specify a full path for the action script file; otherwise, the path /var/cluster/caa/script is assumed.
ACTIVE_PLACEMENT No 0 When set to 1, CAA will reevaluate the placement of an application on addition or restart of a cluster member.
SCRIPT_TIMEOUT No 60 The maximum time, in seconds, that an action script may take to complete execution before an error is returned.
CHECK_INTERVAL No 60 The time interval, in seconds, between repeated executions of the check entry point of the resource's action script.

The following example creates an application resource with CAA using caa_profile:

# /usr/sbin/caa_profile -create clock -t application -B /usr/bin/X11/xclock \
-d "Clock Application" -r network1 -l application2 \
-a clock.scr -o ci=5,ft=2,fi=12,ra=2
 

The contents of the resource profile file that was created by the previous example are as follows:

NAME=clock
TYPE=application
ACTION_SCRIPT=clock.scr
ACTIVE_PLACEMENT=0
AUTO_START=0
CHECK_INTERVAL=5
DESCRIPTION=Clock Application
FAILOVER_DELAY=0
FAILURE_INTERVAL=12
FAILURE_THRESHOLD=2
HOSTING_MEMBERS=
OPTIONAL_RESOURCES=application2
PLACEMENT=balanced
REQUIRED_RESOURCES=network1
RESTART_ATTEMPTS=2
SCRIPT_TIMEOUT=60
 

For more information on the application resource profile syntax, see caa_profile(8) and caa(4).

2.2.2.1    Required Resources

CAA uses the required resources list, in conjunction with the placement policy and hosting members list, to determine which members are eligible to host the application resource. Required resources must be ONLINE on any member on which the application is running or started. Only application resources can have required resources, but any type of resource can be defined as a required resource for an application resource.

A failure of a required resource on the hosting member causes CAA to initiate failover of the application or to attempt to restart it on the current member if RESTART_ATTEMPTS is not 0. This can cause CAA to fail the application resource over to another member, which provides the required resources, or to stop the application if there is no suitable member. In the latter case, CAA continues to monitor the required resources and restarts the application when the resource is again available on a suitable cluster member.

Required resources lists can also be useful to start, stop, and relocate a group of interdependent application resources when the caa_start, caa_stop, or caa_relocate commands are run with the -f option.

2.2.2.2    Application Resource Placement Policies

The placement policy specifies how CAA selects a cluster member on which to start a resource, and on which to relocate it after a failure.

Note

Only cluster members that have all the required resources available (as listed in an application resource's profile) are eligible to be considered in any placement decision involving that application.

The following placement policies are supported:

You must specify hosting members in the HOSTING_MEMBERS attribute to use a favored or restricted placement policy. You must not specify hosting members in the HOSTING_MEMBERS attribute with a balanced placement policy.

If ACTIVE_PLACEMENT is set to 1, the placement of the application resource is reevaluated whenever a cluster member is either added to the cluster or it restarts. This allows applications to be relocated to a preferred member of a cluster after the member recovers from a failure.

2.2.2.3    Optional Resources in Placement Decisions

Optional resources are used to choose a hosting member based on the number of optional resources that are in the ONLINE state on each hosting member. If each member has an equal number of optional resources in the ONLINE state, CAA considers the order the optional resources as follows.

CAA compares the state of the optional resources on each member starting at the first resource and proceeding successively through the list. For each consecutive resource in the list, if the resource is ONLINE on one member, any member that does not have the resource ONLINE is removed from consideration. Each resource on the list is evaluated in this manner until only one member is available to host the resource. The maximum number of optional resources is 58.

If this algorithm results in multiple favored members, the application is placed on one of these members chosen according to its placement policy.

2.2.3    Network Resource Profiles

Table 2-2 describes the network profile attributes. For each attribute, the table indicates whether the attribute is required, its default value, and a description.

Table 2-2:  Network Profile Attributes

Attributes Required Default Description
TYPE Yes None The type of the resource. The type network is for network resources.
NAME Yes None The name of the resource. The resource name is a string that contains a combination of letters a-z or A-Z, digits 0-9, or the underscore (_) or period (.). The resource name may not start with a period.
DESCRIPTION No None A description of the resource.
SUBNET Yes None The subnet address of the network resource in nnn.nnn.nnn.nnn format (for example, 16.140.112.0). The SUBNET value is the bitwise AND of the IP address and the netmask. If you consider an IP address of 16.69.225.12 and a netmask of 255.255.255.0, then the subnet will be 16.69.225.0.
FAILURE_THRESHOLD No 0 The number of failures detected within FAILURE_INTERVAL before CAA marks the target value OFFLINE and no longer monitors it. Tracking of failures is disabled if the value is zero (0).
FAILURE_INTERVAL No 0 The interval, in seconds, during which CAA applies the failure threshold. Tracking of failures is disabled if the value is zero (0).

The following example creates a network resource profile:

# /usr/sbin/caa_profile -create network1 -t network -s "16.69.244.0" \
-d "Network1"
 

The contents of the profile in file /var/cluster/caa/profile/network1.cap created by the preceding command are as follows:

NAME=network1
TYPE=network
DESCRIPTION=Network1
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
SUBNET=16.69.244.0
 

For more information on the network resource profile syntax, see caa_profile(8) and caa(4).

Through routing, all members in a cluster can indirectly access any network that is attached to any member. Nevertheless, an application may require the improved performance that comes by running on a member with direct connectivity to a network. For that reason, an application resource may define an optional or required dependency on a network resource. CAA optimizes the placement of that application resource based on the location of the network resource.

When you make a network resource an optional resource (OPTIONAL_RESOURCES) for an application, the application may start on a member that is directly connected to the subnet, depending on the required resources, placement policy, and cluster state. If the network adapter fails, the application may still access the subnet remotely through routing.

If you specify a network resource as a required resource (REQUIRED_RESOURCES) and the network adapter fails, CAA relocates or stops the application. If the network fails on all eligible hosting members, CAA will stop the application.

2.2.4    Tape Resource Profiles

Table 2-3 describes the tape profile attributes. For each attribute, the table indicates whether the attribute is required, its default value, and a description.

Table 2-3:  Tape Profile Attributes

Attributes Required Default Description
TYPE Yes None The type of the resource. The type tape is for tape resources.
NAME Yes None The name of the resource. The resource name is a string that contains a combination of letters a-z or A-Z, digits 0-9, or the underscore (_) or period (.). The resource name may not start with a period.
DESCRIPTION No None A description of the resource.
DEVICE_NAME Yes None The device name of the tape resource. Use the full path to the device special file (for example, /dev/tape/tape1).
FAILURE_THRESHOLD No 0 The number of failures detected within FAILURE_INTERVAL before CAA marks the target value OFFLINE and no longer monitors it. Tracking of failures is disabled if the value is zero (0).
FAILURE_INTERVAL No 0 The interval, in seconds, during which CAA applies the failure threshold. Tracking of failures is disabled if the value is zero (0).

Through the device request dispatcher, all cluster members can indirectly access any tape device that is attached to any cluster member. Nevertheless, an application may require the improved performance that comes from running on a member with direct connectivity to the tape. For that reason, an application resource may define an optional or required dependency on a tape resource. CAA optimizes the placement of that application based on the location of the tape resource.

The following example creates a tape resource profile. After a tape resource has been defined in a resource profile, an application resource profile can designate it as a required or optional resource.

# /usr/sbin/caa_profile -create tape1 -t tape -n /dev/tape/tape1 -d "Tape Drive"
 

The contents of the profile that was created in the file /var/cluster/caa/profile/tape1.cap by the preceding command are as follows:

NAME=tape1
TYPE=tape
DESCRIPTION=Tape Drive
DEVICE_NAME=/dev/tape/tape1
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
 

2.2.5    Media Changer Resource Profiles

Table 2-4 describes the media changer profile attributes. For each attribute, the table indicates whether the attribute is required, its default value, and a description.

Table 2-4:  Media Changer Attributes

Attributes Required Default Description
TYPE Yes None The type of the resource. The type changer is for media changer resources.
NAME Yes None The name of the resource. The resource name is a string that contains a combination of letters a-z or A-Z, digits 0-9, or the underscore (_) or period (.). The resource name may not start with a period.
DESCRIPTION No None A description of the resource.
DEVICE_NAME Yes None The device name of the media changer resource. Use the full path to the device special file (for example, /dev/changer/mc1).
FAILURE_THRESHOLD No 0 The number of failures detected within FAILURE_INTERVAL before CAA marks the target value OFFLINE and no longer monitors it. Tracking of failures is disabled if the value is zero (0).
FAILURE_INTERVAL No 0 The interval, in seconds, during which CAA applies the failure threshold. Tracking of failures is disabled if the value is zero (0).

Through the device request dispatcher, all cluster members can indirectly access any media changer that is attached to any member. Nevertheless, an application may require the improved performance that comes from running on a member with direct connectivity to the media changer. For that reason, an application resource may define an optional or required dependency on a media changer resource. CAA optimizes the placement of that application based on the location of the media changer resource.

The following example creates a media changer resource profile. After a media changer resource has been defined in a resource profile, an application resource profile can designate it as a dependency.

# /usr/sbin/caa_profile -create mchanger1 -t changer -n /dev/changer/mc1 \
-d "Media Changer Drive"
 

The contents of the profile that was created in the file /var/cluster/caa/profile/mchanger1.cap by the preceding command are as follows:

NAME=mchanger1
TYPE=changer
DESCRIPTION=Media Changer Drive
DEVICE_NAME=/dev/changer/mc1
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
 

2.3    Writing Action Scripts

Action scripts are necessary for application resources to start, stop, and relocate an application that is managed and monitored by CAA.

You use action scripts to specify the following:

Action scripts are located by default in the clusterwide /var/cluster/caa/script directory. The file names of action scripts take the form name.scr.

The easiest way to create an action script is to have the caa_profile command automatically create one for you when you create the resource profile. Do this by using the -B option. For example:

# caa_profile -create resource_name -t application -B application_path
 

Use the -B option in the caa_profile command to specify the full pathname of an application executable; for example, /usr/bin/X11/xterm. When you use the -B option, the caa_profile command creates an action script named /var/cluster/caa/script/resource_name.scr. To specify a different action script name, use the -a option.

Depending on the application, you might need to edit the action script to correctly set up the environment for the application. For example, for an X application like xclock, you need to set the DISPLAY environment variable on the command line in the action script as appropriate for the current shell. It might look something like:

    DISPLAY=`hostname`:0
    export DISPLAY
 

Because an action script is required for an application resource, when you use the caa_profile -create command to create an application resource profile, one of the following conditions must be true:

2.3.1    Guidelines for Writing Application Resource Action Scripts

When writing an action script for an application resource, note the following:

For any X-windows applications that you may be running under CAA, you must also consider the following:

2.3.2    Action Script Examples

The action script template is located in /var/cluster/caa/template/template.scr. It is the basis for action scripts that are created by the caa_profile command, and it is a good example of the elements of an action script.

The following action scripts for application resources can be used as examples and are found in the /var/cluster/caa/script directory:

The scripts shown in Section 2.12 are also good examples of action scripts. These example scripts and others can be found in the /var/cluster/caa/examples directory. There are examples of several applications that are commonly administered using CAA. The script sysres_templ.scr that is located in this directory is an example script that contains extra system performance related code that can be used to examine the system load, swap space usage, and disk space available. If you want to use these features in your scripts, set the values for variables that are associated with these features appropriately for your system.

2.4    Registering Resources (caa_register)

Each resource must have a profile. Each resource must be registered with CAA. Use the caa_register command to register your resources. For example, to register the clock application, enter the following command:

# /usr/sbin/caa_register clock
 

All resources must be registered in order for them to be managed by CAA. After a resource is registered, the information in the profile is stored in the database /var/cluster/caa/registry/caa.reg. If the profile is modified, you must update the database with caa_register -u.

See caa_register(8) for more information.

2.5    Starting Application Resources (caa_start)

To start an application that is registered with CAA, use the caa_start command. The name of the application resource may or may not be the same as the name of the application. For example:

# /usr/sbin/caa_start clock
 

The following text is an example of the command output:

Attempting to start `clock` on member `polishham`
Start of `clock` on member `polishham` succeeded.
 

The application is now running on the system named polishham.

The command will wait up to the SCRIPT_TIMEOUT value to receive notification of success or failure from the action script each time that the action script is called.

Application resources can be started and non-application resources can be restarted if they have stopped due to exceeding their failure threshold values. (See the TruCluster Server Cluster Administration manual for more information on restarting non-application resources.) You must register a resource (caa_register) before you can start it.

Note

Always use caa_start and caa_stop, or the equivalent SysMan feature, to start and stop resources. Do not start or stop the applications manually at the command line or by executing the action scripts.

See caa_start(8) for more information.

When you try to start a resource that has required resources that are ONLINE on another cluster member, the start will fail. All required resources must either be OFFLINE or ONLINE on the member where the resource will be started.

If you use the command caa_start -f resource_name on a resource that has required resources that are OFFLINE, the resource starts and all required resources that are not currently ONLINE start as well.

Executing the caa_start command on an application resource actually only sets the target to ONLINE. CAA attempts to change the state to match the target and attempts to start the application by running the action script start entry point. When an application is running, both the target state and current state are ONLINE.

The TruCluster Server Cluster Administration manual has a more detailed description of how target and state fields describe resources.

2.6    Relocating Application Resources

Use the caa_relocate command to relocate application resources. You cannot relocate network, tape, or changer resources.

To relocate an application resource to an available cluster member, or to the cluster member specified, use the caa_relocate command. For example, to relocate the clock application to member provolone, enter the following command:

# /usr/sbin/caa_relocate clock -c provolone
 

The following text is an example of the command output:

Attempting to stop `clock` on member `polishham`
Stop of `clock` on member `polishham` succeeded.
Attempting to start `clock` on member `provolone`
Start of `clock` on member `provolone` succeeded.
 

To relocate the clock application to another member using the placement policy that is defined in the application resource's profile, enter the following command:

# /usr/sbin/caa_relocate clock
 

The following text is an example of the command output:

Attempting to stop `clock` on member `pepicelli`
Stop of `clock` on member `pepicelli` succeeded.
Attempting to start `clock` on member `polishham`
Start of `clock` on member `polishham` succeeded.
 

The following text is an example of the command output if the application cannot be relocated successfully due to a script returning a nonzero value or a script timeout:

Attempting to stop `clock` on member `pepicelli`
Stop of `clock` on member `pepicelli` succeeded.
Attempting to start `clock` on member `provolone`
Start of `clock` on member `provolone` failed.
No more members to consider
Attempting to restart `clock` on member `pepicelli`
Could not relocate resource clock.
 

Each time that the action script is called, the caa_relocate command will wait up to the SCRIPT_TIMEOUT value to receive notification of success or failure from the action script.

A relocate attempt will fail if:

If you use the caa_relocate -f resource_name command on a resource that has required resources that are ONLINE, or has resources that require it that are ONLINE, the resource is relocated and all resources that require it and are ONLINE are relocated. All resources that are required by the resource specified are relocated or started regardless of their state.

See caa_relocate(8) for more information.

2.7    Stopping Application Resources (caa_stop)

To stop applications that are running in a cluster environment, use the caa_stop command. Immediately after the caa_stop command is executed, the target is set to OFFLINE. Because CAA always attempts to match a resource's state to its target, the CAA subsystem stops the application. Only application resources can be stopped. Network, tape, and media changer resources cannot be stopped.

In the following example, the clock application resource is stopped:

# /usr/sbin/caa_stop clock
 

The following text is an example of the command output:

Attempting to stop `clock` on member `polishham`
Stop of `clock` on member `polishham` succeeded.
 

When you try to stop a resource that has resources that require it that are ONLINE, the stop will fail.

If you use the command caa_stop -f resource_name on a resource that has resources that require it and are ONLINE, the resource is stopped and all resources that require it and are ONLINE are stopped.

See caa_stop(8) for more information.

2.8    Unregistering Application Resources

To unregister an application resource, use the caa_unregister command.

In the following example, the clock application is unregistered:

# /usr/sbin/caa_unregister clock
 

You cannot unregister an application that is ONLINE or required by another resource.

See caa_unregister(8) for more information.

2.9    CAA Status Information (caa_stat)

This section describes how to display status information on CAA resources.

To display status information on resources on cluster members, use the caa_stat command.

In the following example the status information for the clock resource is displayed:

# /usr/bin/caa_stat clock
 
NAME=clock
TYPE=application
TARGET=ONLINE
STATE=ONLINE on provolone
 

To view information on all resources, enter the following command:

# /usr/bin/caa_stat
 
NAME=clock
TYPE=application
TARGET=ONLINE
STATE=ONLINE on provolone
NAME=dhcp
TYPE=application
TARGET=ONLINE
STATE=ONLINE on polishham
 
NAME=named
TYPE=application
TARGET=ONLINE
STATE=ONLINE on polishham
 
NAME=network1
TYPE=network
TARGET=ONLINE on provolone
TARGET=ONLINE on polishham
STATE=ONLINE on provolone
STATE=ONLINE on polishham
 

To view information on all resources in a tabular form, enter the following command:

# /usr/bin/caa_stat -t
 
Name             Type           Target      State     Host
-------------------------------------------------------------------
cluster_lockd    application    ONLINE      ONLINE    provolone
dhcp             application    OFFLINE     OFFLINE
engine_server    application    OFFLINE     OFFLINE
network1         network        ONLINE      ONLINE    provolone
network1         network        ONLINE      ONLINE    polishham
 

To find out how many times a resource has been restarted or has failed within the resource failure interval, the maximum number of times that a resource can be restarted or fail, and the target state of the application, as well as normal status information, enter the following command:

# /usr/bin/caa_stat -v
 
NAME=cluster_lockd
TYPE=application
RESTART_ATTEMPTS=30
RESTART_COUNT=0
FAILURE_THRESHOLD=0
FAILURE_COUNT=0
TARGET=ONLINE
STATE=ONLINE on provolone
 
NAME=dhcp
TYPE=application
RESTART_ATTEMPTS=1
RESTART_COUNT=0
FAILURE_THRESHOLD=3
FAILURE_COUNT=1
TARGET=ONLINE
STATE=OFFLINE
 
NAME=network1
TYPE=network
FAILURE_THRESHOLD=0
FAILURE_COUNT=0 on polishham
FAILURE_COUNT=0 on polishham
TARGET=ONLINE on provolone
TARGET=ONLINE on polishham
STATE=ONLINE on provolone
STATE=OFFLINE on polishham
 

To view verbose content in a tabular form, enter the following command:

# /usr/bin/caa_stat -v -t
 
Name           Type          R/RA   F/FT   Target     State     Host
---------------------------------------------------------------------------
cluster_lockd  application   0/30   0/0    ONLINE    ONLINE
provolone
dhcp           application   0/1    0/0    OFFLINE   OFFLINE
named          application   0/1    0/0    OFFLINE   OFFLINE
network1       network              0/5    ONLINE    ONLINE
provolone
network1       network              0/5    ONLINE    ONLINE
polishham
 

To view the profile information that is stored in the database, enter the following command:

# /usr/bin/caa_stat -p
 
NAME=cluster_lockd
TYPE=application
ACTION_SCRIPT=cluster_lockd.scr
ACTIVE_PLACEMENT=0
AUTO_START=1
CHECK_INTERVAL=5
DESCRIPTION=Cluster lockd/statd
FAILOVER_DELAY=30
FAILURE_INTERVAL=60
FAILURE_THRESHOLD=1
HOSTING_MEMBERS=
OPTIONAL_RESOURCES=
PLACEMENT=balanced
REQUIRED_RESOURCES=
RESTART_ATTEMPTS=2
SCRIPT_TIMEOUT=60
 
NAME=ln0
TYPE=network
DESCRIPTION=
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
SUBNET=16.69.224.0
 

See the TruCluster Server Cluster Administration manual and caa_stat(1) for more information.

2.10    Graphical User Interfaces

The following sections discuss how to use the SysMan and SysMan Station graphical user interfaces (GUIs) to manage CAA.

2.10.1    Using SysMan Menu to Manage CAA

You can start the SysMan Menu from the command line with /usr/sbin/sysman. To access the CAA tools, select the Cluster Application Availablility (CAA) Management task under the TruCluster Specific branch.


.
.
.
+ TruCluster Specific |Cluster Application Availability (CAA) Management  

To start only the Cluster Application Availability (CAA) Management task, use /usr/sbin/sysman caa.

See the Tru64 UNIX System Administration manual for more information on accessing SysMan Menu.

Using the SysMan Menu you can:

The CAA GUI provides graphical assistance for cluster administration based on event reports from the Event Manager (EVM) and CAA daemon.

2.10.2    Using SysMan Station to Manage and Monitor CAA

SysMan Station gives users a comprehensive graphical view of their cluster. SysMan Station lets you view the current status of CAA resources on a whole cluster, and manage those resources. SysMan Station also contains the management tool SysMan Menu to manage individual CAA resources. See the Tru64 UNIX System Administration manual for further information on accessing the SysMan Station.

To access the CAA SysMan Menu tools in the SysMan Station, follow these steps:

  1. Select one of the views under Views, for example, CAA_Applications_(active) or CAA_Applications_(all).

  2. Select the cluster name under the Views window, for example, CAA_Applications_(active) View or CAA_Applications_(all) View.

  3. From the Tools menu, select SysMan Menu. The Cluster Application Availablility (CAA) Management task is located under the TruCluster Specific branch.

For more detailed descriptions of the SysMan Menu and SysMan Station, see the online help or the Tru64 UNIX System Administration manual.

2.11    CAA Tutorial

This CAA tutorial helps you with the basic instructions necessary to quickly make an application highly available using CAA. For in-depth details on specific commands, you must read all the necessary documentation that pertains to the CAA commands.

2.11.1    Preconditions

You must have root access to a two-member TruCluster Server cluster.

In this tutorial you use CAA to make the Tru64 UNIX application dtcalc highly available. Make sure that the test application /usr/dt/bin/dtcalc exists.

An X-based application is used only for demonstrative purposes in this example. The X-based application is used to provide immediate viewing of the results of starts, stops, and relocation. You will most likely not find a use for highly available applications of this sort.

2.11.2    Miscellaneous Setup

If you are making an application with a graphical interface highly available using CAA, make sure that you set your DISPLAY variable correctly in the ActionScript.scr file. Modify the DISPLAY variable, and copy the file ActionScript.scr into the scripts directory /var/cluster/caa/script.

Verify that the host on which you want to display the application is able to display X applications from the cluster. If you need to modify the access, execute a command similar to following command on the machine that is displaying the application:

# xhost + clustername
 

If you are not sure of the actual names of each member, look in the /etc/hosts file on your system to get the names of each member. You also can use the clu_get_info command to get information on each cluster member, including the host names.

The following command is an example showing the results of the clu_get_info command:

# clu_get_info
 
 
		   Cluster information for cluster deli
 
 
                Number of members configured in this cluster = 3
                memberid for this member = 3
                Quorum disk = dsk10h
                Quorum disk votes = 1
 
                   Information on each cluster member
 
                Cluster memberid = 1
                Hostname = polishham.zk4.com
                Cluster interconnect IP name = polishham=mc0
                Member state = UP
 
                Cluster memberid = 2
                Hostname = provolone.zk4.com
                Cluster interconnect IP name = provolone = mc0
                Member state = UP
 
                Cluster memberid = 3
                Hostname = pepicelli.zk4.com
                Cluster interconnect IP name = pepicelli=mc0
                Member state = UP
 

2.11.3    Example of an Action Script for dtcalc

The following example an action script that you can use for the dtcalc tutorial, or you can use the more complex action script that is created by the caa_profile command:

#!/usr/bin/ksh -p
#
# This action script will be used to launch dtcalc.
#
export DISPLAY=`hostname`:0
PATH=/sbin:/usr/sbin:/usr/bin
export PATH
CAATMPDIR=/tmp
 
CMDPATH=/usr/dt/bin
 
APPLICATION=${CMDPATH}/dtcalc
 
CMD=`basename $APPLICATION`
 
case $1 in
 
 'start')  [1]
        if [ -f $APPLICATION ]; then
                $APPLICATION &           exit 0   else
                echo "Found exit1" >/dev/console
                exit 1
        fi
        ;;
 'stop')  [2]
       PIDLIST=`ps ax | grep $APPLICATION | grep -v 'caa_' \
                       | grep -v 'grep' | awk '{print $1}''
       if [ -n "$PIDLIST" ]; then
               kill -9 $PIDLIST
               exit 0
       fi
       exit 0
       ;;
'check')  [3]
       PIDLIST='ps ax | grep $CMDPATH | grep -v 'grep' | awk '{print
               $1}''
       if [ -z "$PIDLIST" ]; then
               PIDLIST='ps ax | grep $CMD | grep -v 'grep'
                       | awk '{print $1}''
       fi
               if [-n "$PIDLIST" ]; then
                       exit 0
               else
                echo "Error: CAA could not find $CMD." >/dev/console
                exit 1
               fi
       ;;
esac
 

  1. The start entry point is executed when an application is started. [Return to example]

  2. The stop entry point is executed when an application is stopped. [Return to example]

  3. The check entry point is executed every CHECK_INTERVAL seconds. [Return to example]

2.11.4    Step 1: Creating the Application Resource Profile

Create the resource profile for dtcalc with the following options to the caa_profile command:

# /usr/sbin/caa_profile -create dtcalc -t application -B /usr/dt/bin/dtcalc \
-d "dtcalc application" -p balanced
 

When you examine the dtcalc.cap file that is located in /var/cluster/caa/profile/, you will see the following:

# cat dtcalc.cap
 
NAME=dtcalc
TYPE=application
ACTION_SCRIPT=dtcalc.scr
ACTIVE_PLACEMENT=0
AUTO_START=0
CHECK_INTERVAL=60
DESCRIPTION=dtcalc application
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=
OPTIONAL_RESOURCES=
PLACEMENT=balanced
REQUIRED_RESOURCES=
RESTART_ATTEMPTS=1
SCRIPT_TIMEOUT=60

2.11.5    Step 2: Validating the Application Resource Profile

To validate the resource profile syntax, enter the following command:

# caa_profile -validate dtcalc
 

If there are syntax errors in the profile, caa_profile displays messages indicating that the profile did not pass validation.

2.11.6    Step 3: Registering the Application

To register the application, enter the following command:

# /usr/sbin/caa_register dtcalc
 

If the profile cannot be registered, messages are displayed explaining why.

To check that the application is registered, enter the following command:

# /usr/bin/caa_stat dtcalc
 
NAME=dtcalc
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE
 

2.11.7    Step 4: Starting the Application

To start the application, enter the following command:

# /usr/bin/caa_start dtcalc
 

The following messages are displayed:

Attempting to start `dtcalc` on member `provolone`
Start of `dtcalc` on member `provolone` succeeded.
 

You can execute the /usr/bin/caa_stat dtcalc command to check that the dtcalc action script start entry point executed successfully and dtcalc is started. For example:

# /usr/bin/caa_stat dtcalc
 
NAME=dtcalc
TYPE=application
TARGET=ONLINE
STATE=ONLINE on provolone
 

If the DISPLAY variable is set correctly in the script, dtcalc appears on your display.

2.11.8    Step 5: Relocating the Application

To relocate the application, enter the following command:

# /usr/bin/caa_relocate dtcalc -c polishham
 

Execute the command /usr/bin/caa_stat dtcalc to verify that dtcalc started successfully. An example follows:

# /usr/bin/caa_stat dtcalc
 
NAME=dtcalc
TYPE=application
TARGET=ONLINE
STATE=ONLINE on polishham
 

The cluster member is listed in the STATE attribute.

2.11.9    Step 6: Stopping the Application

To stop the application, enter the following command:

# /usr/bin/caa_stop dtcalc
 

The following information is displayed:

Attempting to stop `dtcalc` on member `provolone`
Stop of `dtcalc` on member `provolone` succeeded.
 

You can execute the /usr/bin/caa_stat dtcalc command to verify that the stop entry point of the dtcalc action script executed successfully and dtcalc is stopped. For example:

# /usr/bin/caa_stat dtcalc
 
NAME=dtcalc
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE
 

2.11.10    Step 7: Unregistering the Application

To unregister the application, enter the following command:

# /usr/sbin/caa_unregister dtcalc
 

2.12    Example Applications Managed by CAA

The following sections contain examples of highly available single-instance applications that are managed by CAA. You can follow the examples to set up the specific applications listed or review them as good examples of the process of setting up any application for use with CAA.

2.12.1    Creating a Single-Instance, Highly Available Netscape FastTrack Server Using CAA

To create a highly available Netscape FastTrack server in a TruCluster Server environment, follow these steps:

  1. If the Netscape FastTrack server is not installed, unpack and install the Netscape FastTrack kit.

  2. Configure the Netscape server to use the default cluster alias name (tcralias2 in this example). If you do not want all cluster members to handle requests for the service, create a new cluster alias. See the TruCluster Server Cluster Administration manual for information about creating additional cluster aliases.

  3. Add the following entry to the /etc/clua_services file:

    http            80/tcp          in_single
     
    

    Setting the in_single attribute means that the cluster alias subsystem will distribute connection requests that are directed to the default cluster alias to one member of the alias. If that member becomes unavailable, the cluster alias subsystem will select another member of the default cluster alias to receive all requests.

  4. To reload service definitions, enter the following command on all members:

    # cluamgr -f
     
    

  5. Netscape has separate start and stop scripts. Combine the start and stop scripts into one script that CAA will use to start, stop, and verify the application. The following example shows a combined script:

    #!/usr/bin/ksh -p
    #
    # TruCluster V5 sample CAA script for Netscape Webserver
    #
    # Some initial setup
    #
    # 8<--------------8<----------- Start Custom variables 8<----------8<----------
    #
    svcName="netscape"                      # Servicename
    CAA_ADMIN="root"                        # Account to receive CAA mail
    CAALOGDIR="/var/cluster/caa/log"        # Directory for logfiles
    ACTION=$1                               # Action (either start or stop)
    LOG="${CAALOGDIR}/${ACTION}_${svcName}.$$" # Destination for script output
    #LOG="/dev/console"
    #
    # Application specific stuff
    #
    PROBE_PROCS="ns-admin ns-httpd"         # Processes to probe
    START_APPCMD="/cludemo/netscape/start-admin" # Application startup cmd
    START_APPCMD2="/cludemo/netscape/https-pingpong/start" # Application startup cmd
    STOP_APPCMD="/cludemo/netscape/https-pingpong/stop"  # Application stop cmd
    STOP_APPCMD2="/cludemo/netscape/stop-admin"   # Application stop cmd
    APPDIR="/cludemo/netscape"              # Application home directory
    ADVFSDIRS=" "                           # Application directories to
    #
    FUSER="/usr/sbin/fuser"                 # Command to use for closing
    EVMPOST="/usr/bin/evmpost -p 650 -a"    # EVM command to post events
    #
    export START_APPCMD START_APPCMD2 STOP_APPCMD STOP_APPCMD2 APPDIR
    export ADVFSDIRS PROBE_PROCS
    #
      
    .
    .
    .
    # Main section # # Start section # case $1 in 'start') echo "" >> ${LOG} echo ""Start action script for service : ${svcName} \ `/bin/date +"%A %d %B %H:%M:%S"` "" >> ${LOG} # # Start Netscape # echo "Starting Netscape ... " >> ${LOG} echo "Starting Netscape Admin Server ... " >> ${LOG} cd $APPDIR $START_APPCMD >> ${LOG} if [ $? -ne 0 ]; then postevent "Netscape Admin Server" start exit 2 fi echo "Started Netscape Admin Server" >> ${LOG} $START_APPCMD2 >> ${LOG}   if [ $? -ne 0 ]; then postevent "Netscape https Server" start exit 2 fi echo "Started Netscape https Server" >> ${LOG} echo "Started Netscape." >> ${LOG} # # All done ... # ${EVMPOST} "Start action script for service ${svcName} DONE" echo ""Start action script for service ${svcName} DONE, \ `/bin/date +"%A %d %B %H:%M:%S"` "" >> ${LOG} echo "" >> ${LOG} exit 0 # ;; # # Stop section # 'stop') echo "" >> ${LOG} echo ""Stop action script for service : ${svcName} \ `/bin/date +"%A %d %B %H:%M:%S"` "" >> ${LOG} # # Stop Netscape # echo "Stopping Netscape ... " >> ${LOG} echo "Stopping Netscape Admin Server ... " >> ${LOG} $STOP_APPCMD >> ${LOG} if [ $? -ne 0 ]; then postevent "Netscape Admin Server" stop exit 2 fi echo "Netscape Admin Server shutdown done ." >> ${LOG} echo "Stopping Netscape https Server ... " >> ${LOG} $STOP_APPCMD2 >> ${LOG} if [ $? -ne 0 ]; then postevent "Netscape https Server" stop exit 2 fi echo "Netscape https Server shutdown done ." >> ${LOG} echo "Netscape shutdown done ." >> ${LOG} ${EVMPOST} "Stop action script for service ${svcName} DONE" echo ""Stop action script for service ${svcName} DONE, \ `/bin/date +"%A %d %B %H:%M:%S"` "" >> ${LOG} echo "" >> ${LOG} exit 0 ;; # # Probe if application is still alive # 'check') echo ""Probing Netscape daemons at \ `/bin/date +"%A %d %B %H:%M:%S"`"" >> ${LOG} for i in ${PROBE_PROCS} do probeapp ${i} >> ${LOG} done echo ""Probing Netscape daemons DONE at \ `/bin/date +"%A %d %B %H:%M:%S"`"" >> ${LOG} exit 0 ;; *) echo "usage: $0 {start|stop|check}" exit 1 ;; esac  

  6. Copy the Netscape CAA script to /var/cluster/caa/script/netscape.scr.

  7. Create a CAA application resource profile:

    # caa_profile -create netscape -t application
     
    

    Make sure that your Netscape CAA resource profile looks like the example profile in /var/cluster/caa/examples/NetScape/ns-httpd.cap.

  8. Register Netscape FastTrack server with CAA:

    # caa_register netscape
    # caa_stat netscape
     
    NAME = netscape
    TYPE = application
    STATE = OFFLINE
     
    

  9. Start the Netscape FastTrack server:

    # caa_start netscape
    # caa_stat netscape
     
    RESOURCE = netscape
    TYPE = application
    STATE = ONLINE on provolone
     
    

2.12.2    Creating a Single-Instance, Highly Available Apache HTTP Server Using CAA

To create a single-instance Apache HTTP server with failover capabilities, follow these steps:

  1. Download the latest, standard Apache distribution from the www.apache.org Web site to the cluster and follow the site's instructions for building and installing Apache in the /usr/local/apache directory.

  2. Create a default CAA application resource profile and action script with the following command:

    # caa_profile -create httpd -t application -B /usr/local/apache/bin/httpd
     
    

    The default profile adopts a failover policy that causes the httpd service to fail over to another member when the member on which it is running leaves the cluster. It also allows the httpd service to be placed on any active cluster member. You can edit the profile to employ other failover and placement policies and resource dependencies.

    The default action script contains a start entry point that starts the httpd service and a stop entry point that stops the httpd service.

  3. Register the profile with CAA by entering the following command on one member:

    # caa_register httpd
     
    

  4. Start the httpd service through CAA by entering the following command on one member:

    # caa_start httpd
     
    

2.12.3    Creating a Single-Instance Oracle8i Server Using CAA

To create a single-instance Oracle8i Version 8.1.6 or 8.1.7 database server with failover capabilities, follow these steps:

  1. Install and configure Oracle8i 8.1.6 or 8.1.7 using the instructions in the Oracle8i documentation.

    Oracle requires that certain kernel attributes be set to specific values, that specific UNIX groups (dba, oinstall) be created, and that special environment variables be initialized.

  2. Before proceeding to set up the CAA service for the Oracle8i single server, you must decide how client applications will reach the service. You can use either the cluster alias feature of the TruCluster Server product or use an interface (IP) alias. If you choose to use a cluster alias, create a new cluster alias for each Oracle8i single server because you can tune the routing and scheduling attributes of each alias independently. (For information on how to create a cluster alias, see cluamgr(8).)

    If you want to use a cluster alias, add the IP address and name of the cluster alias to the /etc/hosts file.

    Add the following line to the /etc/clua_services file to set up the properties of the port that the Oracle8i listener uses:

    listener   1521/tcp     in_single
     
    

    Setting the in_single attribute means that the cluster alias subsystem will distribute connection requests directed to the cluster alias to one member of the alias. If that member becomes unavailable, the cluster alias subsystem will select another member of that cluster alias to receive all requests.

    To reload service definitions, enter the following command on all members:

    # cluamgr -f
     
    

  3. If you choose to use an interface address as the target of client requests to the Oracle8i service, add the IP address and name of the cluster alias to the /etc/hosts file.

  4. In the listener.ora and tnsnames.ora files, edit the HOST field so that it contains the alias that clients will use to reach the service. For example:

       .
       .
       .
       (ADDRESS = (PROTOCOL = TCP) (HOST = alias1) (PORT = 1521))
       .
       .
       .
     
    

  5. An example Oracle CAA script is located in /var/cluster/caa/examples/DataBase/oracle.scr. Copy the script to /var/cluster/caa/script/oracle.scr, and edit it to meet your environment needs such as e-mail accounts, log file destinations, alias preference, and so on. Do not include any file system references in the script.

  6. Perform some initial testing of the scripts by first executing the start and stop entry points outside of CAA. For example:

    # cd /var/cluster/caa/script
    # ./oracle.scr start
     
    

  7. Create a CAA application resource profile using the SysMan Station or by entering the following command:

    # caa_profile -create oracle -t application \
    -d "ORACLE Single-Instance Service" -p restricted -h "provolone polishham"
     
    

    Make sure that your Oracle CAA resource profile looks like the example profile in /var/cluster/caa/examples/DataBase/oracle.cap.

  8. Register the oracle profile with CAA using the SysMan Station or by entering the following command on one member:

    # caa_register oracle
     
    

  9. Start the oracle service using the SysMan Station or by entering the following command on one member:

    # caa_start oracle
     
    

2.12.4    Creating a Single-Instance Informix Server Using CAA

To create a single-instance Informix server with failover capabilities, follow these steps:

  1. Install and configure Informix using the instructions in the Informix documentation. Edit the clusterwide /etc/passwd and /etc/group files to contain entries for informix and dba, respectively.

  2. Before proceeding to set up the CAA service for the Informix single server, you must decide how client applications will reach the service. You can use either the cluster alias feature of the TruCluster Server product or use an interface (IP) alias. If you choose to use a cluster alias, create a new cluster alias for each Informix single server because you can tune the routing and scheduling attributes of each alias independently. (For information on how to create a cluster alias, see cluamgr(8).)

    If you want to use a cluster alias, add the IP address and name of the cluster alias to the /etc/hosts file.

    Add the following line to the /etc/clua_services file to set up the properties of the port that the Informix listener uses:

    informix   8888/tcp     in_single
     
    

    Setting the in_single attribute means that the cluster alias subsystem will distribute connection requests directed to the cluster alias to one member of the alias. If that member becomes unavailable, the cluster alias subsystem will select another member of that cluster alias to receive all requests.

    To reload service definitions, enter the following command on all members:

    # cluamgr -f
     
    

  3. If you choose to use an interface address as the target of client requests to the Informix service, add the IP address and name of the cluster alias to the /etc/hosts file.

  4. An example Informix CAA script is located in /var/cluster/caa/examples/DataBase/informix.scr. Copy the script to /var/cluster/caa/script/informix.scr, and edit it to meet your environment needs such as e-mail accounts, log file destinations, and so on. Do not include any file system references in the script.

  5. Perform some initial testing of the scripts by first executing the start and stop entry points outside of CAA. For example:

    # cd
    /var/cluster/caa/script
    # ./informix.scr
    start
     
    

  6. Create a CAA application resource profile using the SysMan Station or by entering the following command:

    # caa_profile -create informix -t application \
    -d "INFORMIX Single-Instance Service" -p restricted -h "provolone polishham"
     
    

    Make sure that your Informix CAA resource profile looks like the example profile in /var/cluster/caa/examples/DataBase/informix.cap.

  7. Register the informix profile with CAA using the SysMan Station or by entering the following command on one member:

    # caa_register informix
     
    

  8. Start the informix service using the SysMan Station or by entering the following command on one member:

    # caa_start informix