6    Cluster Alias

A cluster alias is an IP address that makes some or all of the systems in a cluster look like a single system to TCP and UDP applications. Figure 6-1 shows how a network client views the systems in a cluster with and without a cluster alias.

Figure 6-1:  Client's View of a Cluster With and Without Cluster Alias

Think of a cluster alias as a distributed virtual clusterwide network interface. In that sense, a cluster alias is conceptually similar to an ifconfig alias, where a single physical network interface responds to more than one IP address.

Note

Before TruCluster Server Version 5.0, TruCluster products used the asemgr command to control application failover. The asemgr command ran the ifconfig command to create IP aliases as needed. Because the cluster alias subsystem creates and manages aliases on a clusterwide basis, there is no longer any need to explicitly establish and remove IP aliases with ifconfig when an application fails over.

Each system in a cluster can receive packets addressed to a cluster alias. The receiving system silently redirects packets to a cluster member running the requested application or service.

A cluster can have more than one cluster alias. One alias, the default cluster alias, is created during cluster installation and all members can receive packets addressed to this alias. A cluster administrator can create additional aliases as needed. One suggestion is to use the default alias for a while, and then decide whether your site can benefit from additional aliases. In many cases, the default alias is sufficient; the TruCluster Server Cluster Administration manual describes a situation where a site uses two aliases for load balancing.

The cluster alias subsystem has the following main components:

Figure 6-2 provides a functional overview of the cluster alias subsystem components.

Figure 6-2:  Cluster Alias Functional Overview

This remainder of this chapter discusses the following topics:

6.1    Overview

Cluster aliases free clients from having to connect to specific cluster members for services. Just as clients can request a variety of services from a single host, clients can request a variety of services from a cluster alias. For example, you can telnet or rlogin to a cluster alias as you would to single host.

Each system in a cluster explicitly joins the aliases to which it wants to belong. Once a system joins an alias, it is a member of that alias. Using the analogy that a cluster alias is similar to an address on virtual network interface, joining an alias is similar to issuing an ifconfig up command for that alias interface. The member can now receive packets addressed to the alias.

Clients send Transmission Control Protocol (TCP) connection requests or User Datagram Protocol (UDP) messages to the IP address representing an alias. The cluster transparently routes the request or message to a cluster node that is a current member of that alias. The hop within the cluster uses the cluster interconnect, not network routing.

If a member of an alias is unavailable, the cluster stops sending packets to that member and routes packets to active members of that alias. As long as one member of an alias is active, the alias is available.

6.2    The Default Cluster Alias

There is one special alias, called the default cluster alias. During installation, the cluster is given a name, which is stored in /etc/sysconfigtab as the value of the cluster_name attribute. The installation procedure adds an entry to /etc/hosts, which associates this cluster name with a user-specified default cluster alias IP address. For example, for a cluster named deli whose alias IP address is 16.140.112.209, the installation procedure adds the following entry to /etc/hosts:

16.140.112.209   deli.zk3.dec.com   deli
 

Each cluster member is a member of the default cluster alias. The command that makes a cluster member a member of the default cluster alias is in each member's /etc/clu_alias.config file. All cluster members automatically join the default cluster alias at boot time.

Figure 6-3 shows a three-node cluster with two cluster aliases. All members belong to alias A, the default cluster alias, but only two members belong to alias B.

Figure 6-3:  Cluster Using Two Aliases

Several standard Internet services use the IP address of the default cluster alias as the source address for outgoing packets. Therefore, all cluster alias IP addresses, including that of the default cluster alias, must be on a network accessible to cluster clients; that is, clients must be able to route to this subnet. For this reason, cluster alias IP addresses cannot be on the Memory Channel subnet used by the cluster for internal communication.

6.3    NFS and the Default Cluster Alias

In order to preserve single-system semantics and avoid NFS locking problems, when a cluster is configured as NFS server, the default cluster alias is the IP address through which clients must request NFS services. NFS mount requests directed at individual cluster members are rejected.

UDP NFS packets are redirected to the cluster member that is serving the file system. All UDP NFS traffic for that file system will be handled on that member. If another cluster member becomes the CFS server for the file system, UDP packets are tunneled to the new server. UDP packets always follow the CFS server for the file system.

Note

Some clients, for example PCs, broadcast UDP requests when trying to find an NFS server. The cluster responds to these requests by returning the IP address of the default cluster alias. This ensures that later NFS client requests are sent to the default cluster alias.

TCP connection requests are assigned to a member based on the alias round-robin algorithm and alias selection weights. (Because there is no file system information in the connection request, the cluster cannot route the request to the member that is currently the CFS server for the file system.) All subsequent TCP NFS packets for that file system from that client are handled by the same member that was assigned the connection, regardless of file-system relocations.

6.4    Number of Aliases

After cluster installation, a site can define as many aliases as it needs. The cluster administrator determines which cluster members join which aliases.

For many clusters, the default cluster alias provides sufficient access for cluster clients. Whether or not a cluster will benefit from having additional aliases depends on the symmetry of the cluster, and whether you want all members to handle client requests for all services. Additional aliases are useful in the following situations:

6.5    Location of Alias IP Addresses

A cluster alias address can be in one of two types of subnets:

common subnet

A subnet connected to a physical network interface.

Using a common subnet for cluster aliases works well when the cluster is connected to only a single local area network, and that network is managed as a single IP address domain.

Cluster alias routing in a common subnet is based on proxy ARP support. For each alias, one cluster member acts as the proxy ARP master for that alias.

virtual subnet

A cluster alias resides in a virtual subnet if its address is in a subnet that is not associated with any physical interfaces. A virtual subnet is registered as a normal subnet with local routers.

If the cluamgr virtual option is assigned to an alias address, a cluster member will advertise a host route and a network route to the alias.

Note that multiple clusters on the same local area network (LAN) can use the same virtual subnet.

Note

A virtual subnet must not have any real systems in it.

The choice of subnet type depends mainly on whether the existing subnet (that is, the common subnet) has enough addresses available for cluster aliases. If addresses are not easily available on an existing subnet, consider creating a virtual subnet. A lesser consideration is that if a cluster is connected to multiple subnets, configuring a virtual subnet has the advantage of being 'uniformly reachable' from all of the connected subnets. However, this advantage is more a matter of style than substance. It does not make much practical difference which type of subnet you use for cluster alias addresses; do whatever makes the most sense at your site.

Regardless of the type of subnet, it must be configured so that packets from clients can be routed to alias addresses. Services that use cluster aliases will not be accessible to clients if those alias addresses are on a virtual or a common subnet that clients cannot reach.

A cluster alias address should not be a broadcast address or a multicast address, nor should it reside in the Memory Channel subnet.

6.6    Routing for Alias Addresses

An alias router is a cluster member that makes a cluster alias address known to the network and receives incoming packets for that alias. By default, all cluster members are configured as alias routers at boot time.

A cluster member does not have to be a member of an alias in order to route for that alias. However the cluster member must have an entry for that alias in its /etc/clu_alias.config file. In the following example, this cluster member routes for alias1 and alias2, but will receive requests/packets only for alias1:

/usr/sbin/cluamgr -a alias=alias1,join
/usr/sbin/cluamgr -a alias=alias2
 

Which cluster members route for which aliases in common subnets is determined by the router priority assigned to each alias on each cluster member. Section 6.9 describes router priority.

The cluster alias daemon, aliasd, transparently handles the routing configuration for cluster aliases, automatically adding any needed host routes for cluster aliases to that member's /etc/gated.conf.membern file. The daemon starts gated using this file as gated's configuration file rather than the member's /cluster/members/{memb}/etc/gated.conf file.

Note

The aliasd daemon supports only the Routing Information Protocol (RIP).

6.6.1    Common Subnet Routing

Cluster alias routing in a common subnet is based on proxy ARP support. For each alias, one cluster member acts as the proxy ARP master for that alias.

The node elected to advertise the alias configures the alias's IP address to be advertised using proxy ARP over a network interface in the same subnet. Other cluster members that join the alias, and that are configured to be routing members, become capable of taking over the proxy ARP function if necessary. When a designated alias router or network interface fails, other potential routers are notified. They then elect a new router for the alias.

If a cluster is connected to two external common subnets, and if an alias address resides on one of those common subnets, interfaces directly connected to that subnet will use proxy ARP to advertise the alias. Interfaces connected to all subnets will also use host routes for the alias.

6.6.2    Virtual Subnet Routing

The alias daemon, aliasd, creates a /etc/gated.conf.membern file for each cluster member. The alias configuration process modifies this configuration file to advertise each alias address in a virtual subnet as a host route. No manual modification is required; each member's alias daemon automatically modifies that member's /etc/gated.conf.membern file to advertise a route to each cluster alias host address through each network interface on that member.

If the cluamgr virtual option is assigned to an alias address, the cluster member will advertise a network route to the virtual subnet.

6.6.3    Routing Example

Figure 6-4 shows a cluster with interfaces on three networks, two public common networks and one private virtual network. The default cluster alias IP address is on a virtual subnet. Because clients on the Red and Green networks are not on the same subnet as the alias, a host route to the address is advertised on both networks.

Figure 6-4:  Alias Routing Example

Note that although all alias addresses on virtual subnets are advertised through host routes, using a host route does not necessarily mean that an address resides on a virtual subnet.

6.7    Cluster Alias vMAC Support

When a cluster alias IP address is configured in a common subnet, one cluster member in that subnet will, based on its router priority (rpri) value for that alias, act as the alias's proxy ARP master. This member will respond to local ARP requests addressed to the alias, and will broadcast a gratuitous ARP packet to let other systems know the hardware media access control (MAC) address associated with the alias's IP address. The other local systems then update their ARP tables to reflect this cluster-alias-to-MAC association.

However, this broadcast packet is a problem for systems that do not understand gratuitous ARP packets. These systems will not become aware of changes in the cluster alias-to-MAC association until the normal timeout interval for their ARP tables has elapsed. A solution is to provide a virtual hardware address (vMAC address) for each cluster alias.

A virtual MAC address is a unique hardware address that can be automatically created for each alias IP address. An alias vMAC address follows the cluster alias proxy ARP master from node to node as needed. Regardless of which cluster member is serving as the proxy ARP master for an alias, the alias's vMAC address does not change.

The TruCluster Server Cluster Administration manual describes how to enable vMAC support for a cluster alias.

6.8    in_single and in_multi Services

Service ports accessed through a cluster alias are defined as either in_single or in_multi. These service port attributes determine the routing of network requests to applications, not whether an application can run on more than one member at the same time. From the point of view of the cluster alias subsystem:

By default, the cluster alias subsystem treats all services as in_single. In order for the cluster alias subsystem to treat a service's port as in_multi, the port must either be registered as in_multi in /etc/clua_services or through a call to clua_registerservice(). See Section 6.10 for more information on service attributes.

A service whose port is designated as in_multi can take advantage of cluster aliasing to distribute incoming TCP connection requests and UDP packets among members of the alias. The alias subsystem provides load balancing through a weighted round-robin algorithm that distributes requests/packets among alias members. If one member of an alias cannot respond to client requests, the cluster alias software transparently distributes requests/packets among the remaining alias members.

Note

Cluster alias and CAA are separate subsystems with complementary but different functions. CAA is an application-control tool; cluster alias is a routing tool. CAA decides where an application will run; cluster alias decides how to get there. You cannot use CAA to control routing within the cluster; you cannot use cluster aliases to control where an application is running in the cluster. The TruCluster Server Cluster Administration manual provides more information on the differences between cluster alias and CAA.

The following two figures show how the alias subsystem distributes client requests for in_single and in_multi services. For the in_single service (Figure 6-5), all requests are sent to the alias member currently running the service. For the in_multi service (Figure 6-6), requests are distributed among all alias members.

Figure 6-5:  in_single Service Accessed Through Default Cluster Alias

Figure 6-6:  in_multi Service Accessed Through Default Cluster Alias

6.9    Alias Attributes

Alias attributes are member-specific. Each cluster member has its own view of an alias. For example, one cluster member could route for an alias but not be a member of that alias, but another cluster member could both route for that alias and be an end recipient for requests or messages addressed to that alias.

Aliases and their attributes are managed through the cluamgr command and the SysMan Menu. The SysMan Menu calls cluamgr as needed.

The following attributes control the routing and distribution of connection requests and packets among members of an alias. The descriptions are paraphrased from those in cluamgr(8), which describes these and other alias attributes.

router priority

The router priority (rpri) controls the proxy ARP router selection for an alias on a common subnet. For each alias in a common subnet, the cluster member with the highest router priority for that alias will route for that alias. (This option is not valid for an alias whose address is in a virtual subnet.)

When a cluster has more than one cluster alias, you can use router priority to spread the routing overhead for aliases among cluster members.

selection priority

The selection priority (selp) determines the order in which members of an alias receive new connection requests. The selection priority establishes a hierarchy within the members of an alias. Connection requests are distributed among those members sharing the highest selection priority value. If an alias has three members, two with selp=10 and one with selp=5, no connection requests or messages are given to the selp=5 member as long as either of the selp=10 members is available.

You can use selection priority values to set up a failover order for members of a particular cluster alias.

selection weight

The selection weight (selw) indicates the number of connections (on average) this member is given before connections are given to the next alias member with the same selp value. (The selp value determines the order in which members are eligible to receive requests or messages; the selw value determines how many requests or messages a member gets once it is eligible.)

Selection weight applies only to applications that are registered as in_multi services. (All traffic for an in_single service must go to the cluster member running that service.)

Selection weight and routing priority address two different load balancing issues; selection weight is a way to balance application overhead within a cluster, and router priority is a way to balance alias-routing overhead within a cluster.

In general, the default routing priority provides acceptable performance. The selection weight is probably more useful when balancing application loads within a heterogeneous cluster consisting of both large and small systems.

6.10    Service Attributes

The /etc/clua_services file is similar in concept and syntax to the /etc/services file. The clua_services file provides a method for associating alias-related attributes with the port numbers used by services. (When application source code is available, the clua_registerservice() function serves the same purpose.) Any service with a fixed port assignment can have an entry in /etc/clua_services.

With the exception of the out_alias attribute, these attributes apply to services accessed through any cluster alias. The out_alias attribute, which applies only to connections originating from the cluster, is specific to the default cluster alias.

You can associate the following attributes with a service's port:

in_single

A service that, from the cluster alias point of view, runs on only one cluster member at a time, but can fail over to another instance of the service on another member should the active service go away. (Active, in this context, relates only to messages addressed to the cluster alias. All instances of a service are always active for their node's local IP address(es) unless the in_nolocal attribute is also set.) As each service binds to the application's port, the first is flagged as active for the alias, and the others flagged as inactive. If the active service fails, one of the inactive service daemons is marked as active.

Any port not explicitly listed in clua_services as in_multi, or registered as in_multi through a call to the clua_registerservice() function, is treated as in_single.

in_multi

Indicates a service that can run concurrently on two or more cluster members. For a service using UDP, each packet might go to a different alias member. For a service using TCP, each connection is bound to a single alias member, but different connections to the service from the same client might be established on different alias members.

An in_multi service must be explicitly registered, either in the /etc/clua_services file or through the clua_registerservice() function.

in_noalias

Indicates that the port that will not honor connection request to alias addresses.

in_nolocal

Indicates that the port will not honor connection requests to nonalias addresses. For TCP, the port will not accept connections; for UDP, the port will drop messages unless addressed to a cluster alias.

out_alias

Indicates that the default cluster alias is used as the source address whenever this port is used as a destination. Normally, outbound connections (or UDP messages) use the local IP address of the cluster member on which the client is running. It is often beneficial to use the cluster alias address as the source address for outbound traffic from the cluster (for example, to simplify authentication).

It is important to remember that the out_alias attribute applies only when the connection (assuming TCP, not UDP) is originated from the cluster; that is, the cluster is the client. If a process running on a cluster member initiates an outbound connection, and the destination port (the port representing that half of the connection that is not in the cluster) is flagged in the cluster's /etc/clua_services file as out_alias, the connection will use the default alias as its source address.

The same logic holds true when the outbound traffic is a UDP send, because each send can be viewed as a microconnection.

static

Indicates that the port cannot be assigned as a dynamic port. Any well-known port between 512 and 1024 (> 512 and < 1024) that is either assigned to a specific network service in /etc/services or is actively listened to by an application should be declared as static in /etc/clua_services.

The in_multi, in_single, and in_noalias attributes are mutually exclusive. The in_nolocal and in_noalias attributes are mutually exclusive. See clua_services(4) and clua_registerservice(3) for more information about the use of these attributes.

6.11    RPC Services and Cluster Alias

RPC services can call either the clusvc_getcommport() function or the clusvc_getresvcommport() function to bind to a port. (Use clusvc_getresvcommport() when binding to a reserved (privileged) port, a port number in the range 0-1023.) Both functions call clua_registerservice() to automatically set the CLUASRV_MULTI (in_multi) attribute on the port.

Use the clusvc_getcommport() and clusvc_getresvcommport() functions in the following circumstances:

These two functions make it possible to run an RPC application on multiple cluster members, all accessible via a cluster alias. In addition to ensuring that each instance of an RPC application uses the same common port, the functions also inform the portmapper that the application is a multi-instance, alias application.

If you do not use one of these functions to bind to the port, you can still run multiple instances of the application, but only one instance will receive requests directed to a cluster alias.

6.12    Redirecting Packets Within a Cluster

A packet addressed to a cluster alias can arrive at any cluster member. This member must determine which cluster member should receive and process the packet. The alias subsystem monitors calls to bind() and listen(), and bases its decision on which members of an alias are available and the type of packet received. The following table shows how packets are redirected within a cluster:

New TCP/IP connection Look at the packet and make a list of eligible members for the target port. Look for active listens on the port. Use the weighted round-robin algorithm to select a member from the list of active listening members. Forward the packet to the selected member.
Existing TCP/IP connection Determine which alias member owns this connection. Forward the packet to the member.
UDP Look at the packet and make a list of eligible members for the alias. Use the weighted round-robin algorithm to select the member that should get this packet. Forward the packet to the member.
ICMP (some ICMP packets must be handled in cluster-alias context) Look at the packet and determine whether to handle it or forward it to another member. If needed, forward the packet to the member.