This chapter provides the following information:
A general overview of the cluster application availability (CAA) subsystem (Section 5.1)
A discussion of the CAA architecture (Section 5.2)
An introduction to CAA resources (Section 5.3)
A description of resource profiles and their use (Section 5.4)
A description of the action scripts used by CAA commands to manage applications and other resources (Section 5.5)
The cluster application availability (CAA) subsystem provides high
availability for single-instance applications and the capability to
monitor applications and the state of other types of resources, such
as network interfaces, tape devices, and media changer devices.
(A
single-instance application runs on a single member of a cluster, and
cannot be run on more than one member at a time.) A single instance of
any application that can run on Tru64 UNIX can be made highly
available in a cluster with CAA.
For example, in a cluster, the
daemons for BIND (named
), DHCP (joind
), and network locking (rpc.lockd
and
rpc.statd
) are managed by CAA.
Each application under CAA control has a resource profile, which describes that application's resource requirements and the circumstances under which it can be relocated to another cluster member. CAA monitors the state of cluster members and resources to ensure that each application runs on a member that meets its resource requirements. Resource profiles can be created and managed through either a command-line interface or a graphical user interface (GUI).
CAA can automatically relocate an application to another cluster member if a required resource, or the current member itself, becomes unavailable. This feature requires no changes to the application itself, and can be used with any single-instance application. CAA also monitors resources so that it can restart applications resources that have gone off line due to a resource failure.
Note
CAA's resource monitoring and application restart capabilities are enhancements to the type of application availability provided by available server environment (ASE) for user-defined services in previous TruCluster products.
Figure 5-1
shows how the failure of one member
results in the failover of an application to the second member.
If
clients access the application through a cluster alias, the cluster
alias subsystem automatically forwards connection requests to the
second member.
Figure 5-1: Application Failover with CAA
The CAA subsystem consists of the following components:
A resource is a cluster software or hardware component that provides a service to end users or to other software components. Resources are the building blocks that CAA uses to make services highly available to clients. CAA supports the following types of resources: applications, network interfaces, tape drives, and media changers.
The resource manager communicates with all the components of the CAA subsystem, as well as the connection manager and the event manager (EVM).
The resource manager consists of all the CAA daemons running on
cluster members.
Each CAA daemon (caad
) starts,
stops, relocates, and restarts application resources when a required
resource, the application itself, or a cluster member fails.
Each
cluster member runs a CAA daemon.
These daemons are independent but
they communicate with each other, sharing information about the status
of the resources.
The resource manager also uses the resource monitors that monitor the status of a particular type of resource.
A resource monitor is a shared library located in
/var/cluster/caa/monitors
, which is loaded by the
resource manager,
caad
, at boot time.
There is one resource monitor for each type of resource (application,
network, tape, and media changer).
Resource profiles contain the information needed by the resource manager and monitors to control application relocation and monitor resources.
A resource profile contains keyword/value pairs that define
a resource, its dependencies (for application resources), and how the
resource is managed by CAA.
Once the resource is registered with
caa_register
, the resource manager can use the
resource profile.
The
caa_profile
command and SysMan can
create resource profiles, or they can be created in any text
editor.
Profiles that are created or modified using a text editor
should be validated using
caa_profile -validate
to
ensure correct syntax.
Errors other than syntactical errors are
detected at the time of registration.
This two-stage validation allows
for profiles to be created with dependencies on resources that are
currently off line or yet to be created.
Resource profiles are located in the
/var/cluster/caa/profile
directory.
The file
names of resource profiles take the form
resource_name.cap
.
An action script is a set of commands used by CAA to start, stop, and check an application. The name of an application's action script is defined in that application's resource profile.
You can create or update an action script using the command-line interface, SysMan, or a text editor.
Action scripts are located in the
/var/cluster/caa/script
directory.
The file names of action scripts take the form
resource_name.scr
.
The CAA subsystem provides the
caa_profile
,
caa_register
,
caa_unregister
,
caa_start
,
caa_stop
,
caa_relocate
,
and
caa_stat
commands to manage and monitor resources.
See
caa
(4)
for a list of all CAA reference
pages.
The command-line interface interacts with resource profiles, action scripts, and the resource manager.
SysMan Menu and SysMan Station provide graphical user
interfaces (GUIs) to perform system management tasks for the cluster,
cluster members, and CAA applications.
For more information on using
the GUIs for performing system management tasks for CAA applications,
see
sysman
(8) and the online help for the
SysMan Menu and SysMan Station.
The CAA GUI calls the command-line interface to interact with resource profiles, action scripts, and the resource manager.
Although the connection manager and event manager are not part of the CAA subsystem, the subsystem makes extensive use of these facilities.
Figure 5-2
shows a graphical representation of the
CAA architecture.
Figure 5-2: CAA Architecture
A resource is a cluster software or hardware component that provides a service to end users or to other software components. Resources are the building blocks that CAA uses to make services highly available to clients. CAA supports the following types of resources:
application
: an executable program.
An application
resource can have dependencies on other resources, including another
application resource.
In the resource profile that defines an
application resource, these dependencies are defined as either
required,
REQUIRED_RESOURCES
or optional,
OPTIONAL_RESOURCES
.
If you define a resource as a required resource and the required resource becomes unavailable, CAA stops the application. CAA then attempts to restart the application on another member that has the required resource(s). If CAA cannot restart the application on another member because the other member is down or the placement policy forbids starting the application on that member, the application is stopped. CAA will not restart the application until all required resources are available.
You can use optional resources in conjunction with required resources and the placement policy to help determine the optimal system on which to start an application. If an optional resource becomes unavailable the application does not fail over.
network
: a network interface.
All cluster members can indirectly access any network attached to any member. An application that makes extensive use of a network connection available on another cluster member can add traffic to the cluster interconnect, and slow down performance of both the application and the cluster. Defining a network resource as a required resource for an application is useful when you want an application to run on a member with direct connectivity to a specific network.
If you define a network resource as a required resource for an application and the network interface adapter fails, CAA relocates or stops the application if it cannot relocate the resource.
If you define a network resource as an optional resource for an application, CAA will start the application on a member that is directly connected to the network. If the subnet adapter fails, the application reverts to accessing the network indirectly.
tape
or
changer
: a tape drive or
media changer.
If you define a tape or media changer resource as a required resource for an application, the application always runs on a cluster member with direct connectivity to the tape device or changer. If the device fails, CAA attempts to relocate the application, or stops the application if relocation is not possible.
If you define a tape or media changer resource as an optional resource for an application, CAA attempts to start the application on a member with direct connectivity, but will also run the application on a member that does not have direct connectivity to the device. Running on a member with direct connectivity to a tape device is desirable to maximize performance.
Each resource has a resource profile, which defines the resource,
lists any dependencies, and provides instructions for how CAA
should manage the resource.
A resource profile is a simple text file
containing a list of keyword/value pairs described in
caa
(4).
By default, all resource
profiles are located in the
/var/cluster/caa/profile
directory.
A resource profile must be registered through the
caa_register
command in order for CAA to monitor
and manage the resource.
The following sections describe the two types of resource profiles:
Application resource profiles (Section 5.4.1)
Nonapplication resource profiles (Section 5.4.2)
5.4.1 Application Resource Profiles
For an application resource, a resource profile can contain the
application's type, name, check interval, monitoring
thresholds, resource dependencies (required resources), optional
resources, hosting member list, placement policy, restart attempts,
failover delay, auto start value, active placement value, and name of
the resource's action script.
Some keywords are optional.
For example,
the following sample
named.cap
resource profile
does not set an active placement value, which means that the placement
of the application will not be reevaluated when a member boots into
the cluster.
#
cat named.cap
TYPE = application NAME = named DESCRIPTION = BIND Server CHECK_INTERVAL = FAILURE_THRESHOLD = 0 FAILURE_INTERVAL = 0 REQUIRED_RESOURCES = OPTIONAL_RESOURCES = HOSTING_MEMBERS = PLACEMENT = balanced RESTART_ATTEMPTS = FAILOVER_DELAY = AUTO_START = ACTION_SCRIPT = named.scr
The
caa
(4)
reference page provides
detailed descriptions of each type of profile and keyword.
In
addition, see the TruCluster Server
Highly Available Applications
manual and
caa_profile
(8)
for more information on the the contents and creation of application
resource profiles.
The remainder of this section takes a brief look at placement policies, hosting members, active placement, and failure threshold and failure interval. Action scripts are described in Section 5.5.
An application's placement policy determines where the application is
started.
Supported policies are:
balanced
,
favored
, and
restricted
.
balanced
CAA favors starting or restarting the application resource on the member currently running the fewest application resources. Placement due to optional resources is considered first. Next, the host with the fewest application resources running is chosen. If no cluster member is favored by these criteria, any available member is chosen.
favored
CAA refers to the list of members in the
HOSTING_MEMBERS
attribute of the resource
profile.
Only cluster members that are both in this list and satisfy
the required resources are eligible for placement
consideration.
Placement due to optional resources is considered
first.
If no member can be chosen based on optional resources, the
order of the hosting members decides which member will run the
application resource.
If none of the members in the hosting member
list are available, CAA favors placing the application resource on the
member running the fewest application
resources.
You must specify a hosting members list when you select a favored placement policy.
restricted
Similar to the favored placement policy, except that if none of the members on the hosting members list are available, CAA will not start or restart the application resource. A restricted placement policy ensures that the resource will never run on a member that is not on the list, unless you manually relocate it to that member.
You must specify a hosting members list when you select a restricted placement policy.
Hosting members are, in order of preference, members to consider when the application is (a) started, or (b) relocated. A hosting member list is used in conjunction only with the favored or restricted placement policies.
Active placement causes CAA to reevaluate the placement of an application when a new cluster member is added to a cluster or rebooted. If a more highly favored cluster member joins the cluster and active placement is on, then the application will stop on its current member and restart on the more favored member.
Failure threshold and failure interval values are used together to stop an application that repeatedly fails. If an application fails too many times during the failure interval time, the application is not started again. These values are considered only when a check of the application fails, and not at initial start attempts.
The
restart attempts value defines the maximum number of times that an
application start or restart is attempted on one cluster member before
that attempt is considered failed.
5.4.2 Nonapplication Resource Profiles
All other types of currently supported resources (network, tape, and media changer) have resource profiles that define which resource to monitor and specify the failure threshold and failure interval values. If a nonapplication resource fails too many times during the failure interval time, monitoring of the resource is stopped.
For tape and media changer resources, you define which tape to monitor by its device name; for a network resource you must define a subnet.
See the TruCluster Server
Highly Available Applications
manual,
caa_profile
(8),
and
caa
(4)
for detailed descriptions of the contents and creation of resource
profiles.
5.5 Action Scripts
An action script is a set of commands used by CAA to start, stop, and
check an application.
Only application resources have action
scripts.
The name of an action script is specified as the
ACTION_SCRIPT
value in the application's resource
profile.
By default, action scripts are located in the
/var/cluster/caa/script
directory although they
can be placed anywhere.
The file names of action scripts take the form
resource_name.scr
The TruCluster Server Highly Available Applications manual provides examples of action scripts.
In function, an action script is similar to available server
environment (ASE) scripts, and to the system initialization scripts
located in the
/sbin/init.d
directory.
An action script has multiple entry points that are executed by the
CAA commands when an application resource needs to be started or
stopped.
The
start
entry point is used by
caa_start
and
caa_relocate
to
start an application, and the
stop
entry point is
used by
caa_stop
and
caa_relocate
to stop an application.
The
check
entry point is used by the resource manager
to validate that an application is still running.
Each action script has an associated timeout value defined in the application resource profile. If the action script does not finish executing within this time, CAA considers the start attempt a failure and will either attempt to start the application on another member or fail completely.
Both the
caa_profile
command and the SysMan
suite of applications can be used to create simple action scripts when
creating resource profiles.
You may need to edit these action scripts
to customize the start, stop, and check procedures for an application.