3    Connection Manager

Clustered systems share various data and system resources, such as access to disks and files. To achieve the coordination that is necessary to maintain resource integrity, the cluster must have clear criteria for membership and must disallow participation in the cluster by systems that do not meet those criteria.

The connection manager is a distributed kernel component that monitors whether cluster members can communicate with each other, and enforces the rules of cluster membership. The connection manager:

An instance of the connection manager runs on each cluster member. These instances maintain contact with each other, sharing information such as the cluster's membership list. The connection manager uses a three-phase commit protocol to ensure that all members have a consistent view of the cluster.

This chapter provides the following information:

3.1    Quorum and Votes

The connection manager ensures data integrity in the face of communication failures by using a voting mechanism. It allows processing and I/O to occur in a cluster only when a majority of votes are present. When the majority of votes are present, the cluster is said to have quorum.

The mechanism by which the connection manager calculates quorum and allows systems to become and remain cluster members depends on a number of factors, including expected votes, current votes, node votes, and quorum disk votes. This section describes these concepts.

3.1.1    What is a Cluster Member?

The connection manager is the sole arbiter of cluster membership. A node that has been configured to become a cluster member, either through the clu_create or clu_add_member command does not become a cluster member until it has rebooted with a clusterized kernel and is allowed to form or join a cluster by the connection manager. The difference between a cluster member and a node configured to become a cluster member is important in any discussion of quorum and votes.

Once a node has formed or joined a cluster, the connection manager forever considers it to be a cluster member (until someone uses clu_delete_member to remove it from the cluster). A disruption of communications in a cluster (such as that caused by broken or disconnected hardware) might cause an existing cluster to divide into two or more clusters. If the cluster divides, known as a cluster partition, nodes may consider themselves to be members of one cluster or another. However, as discussed in Section 3.2, the connection manager will at most allow only one of these clusters to function.

3.1.2    Node Votes

Node votes are the fixed number of votes that a given member contributes towards quorum. Cluster members can have either 1 or 0 (zero) node votes. Each member with a vote is considered to be a voting member of the cluster. A member with 0 (zero) votes is considered to be a nonvoting member.

Voting members can form a cluster. Nonvoting members can only join or maintain an existing cluster.

A member's votes are initially determined by the cluster_node_votes kernel attribute in the clubase subsystem of its member-specific etc/sysconfigtab file.

3.1.3    Quorum Disk Votes

In certain cluster configurations, described in Section 3.3, you may enhance cluster availability by configuring a quorum disk. Quorum disk votes are the fixed number of votes that a quorum disk contributes towards quorum. A quorum disk can have either 1 or 0 (zero) votes.

Quorum disk votes are initialized from the cluster_qdisk_votes kernel attribute in the clubase subsystem of each member's etc/sysconfigtab file.

When configured, a quorum disk's vote plays a unique role in cluster formation. This is because of the following rules enforced by the connection manager:

In the situation where the booting node needs the quorum disk vote to achieve quorum, these rules create an impasse: the booting node would never be able to form a cluster.

The connection manager resolves this dilemma by allowing booting members to provisionally apply the quorum disk vote towards quorum. This allows a booting member to achieve quorum and form the cluster. Once it has formed the cluster, it claims the quorum disk. At that point, the quorum disk's vote is no longer provisional; it is real.

3.1.4    Expected Votes

Expected votes are the number of votes the connection manager should expect when all configured votes are available. In other words, expected votes should be the sum of all node votes (see Section 3.1.2) configured in the cluster, plus the vote of the quorum disk, if one is configured (see Section 3.1.3). Each member brings its own notion of expected votes to the cluster; it is important that all members agree on the same number of expected votes.

The connection manager refers to the node expected votes settings of booting cluster members to establish its own internal clusterwide notion of expected votes, referred to as cluster expected votes. The connection manager uses its cluster expected votes value when determining the number of votes the cluster requires to maintain quorum, as explained in Section 3.2.

Use the clu_quorum command or the clu_get_info -full command to display the current value of cluster expected votes.

The clu_create and clu_add_member scripts automatically adjust each member's expected votes as a new voting member or quorum disk is configured in the cluster. The clu_delete_member command automatically lowers expected votes when a member is deleted. Similarly, the clu_quorum command adjusts each member's expected votes as a quorum disk is added or deleted, or node votes are assigned to or removed from a member. These commands ensure that the member-specific expected votes value is the same on each cluster member, and that it is the sum of all node votes and the quorum disk vote, if a quorum disk is configured.

A member's expected votes are initialized by the cluster_expected_votes kernel attribute in the clubase subsystem of its member-specific etc/sysconfigtab file. Use the clu_quorum command to display a member's expected votes.

3.1.5    Current Votes

Current votes are the actual number of votes that are visible within the cluster. If expected votes are the number of configured votes in a cluster, current votes are the number of votes contributed by current members and any configured quorum disk that is on line.

3.2    Calculating Cluster Quorum

The quorum algorithm is the method by which the connection manager determines the circumstances under which a given member can participate in a cluster, safely access clusterwide resources, and perform useful work. The algorithm operates dynamically: that is, cluster events trigger its calculations, and the results of its calculations can change over the lifetime of a cluster. This section describes how the connection manager's quorum algorithm works.

The quorum algorithm operates as follows:

  1. The connection manager selects a set of cluster members upon which it bases its calculations. This set includes all members with which it can communicate. For example, it does not include configured nodes that have not yet booted, members that are down, or members that it cannot reach due to a hardware failure (for example, a detached cluster interconnect cable or a bad Memory Channel adapter).

  2. When a cluster is formed, and each time a node boots and joins the cluster, the connection manager calculates a value for cluster expected votes using the largest of the following values:

    Consider a three-member cluster with no quorum disk. Each member has one vote and has its member-specific expected votes set to 3. The value of cluster expected votes is currently 3.

    A fourth member is then added to the cluster. When the new member boots, the connection manager calculates the new cluster expected votes as 4, the sum of node votes in the cluster.

    Use the clu_quorum or clu_get_info -full command to display the current value of cluster expected votes.

  3. Whenever the connection manager recalculates cluster expected votes (or resets cluster expected votes as the result of a clu_quorum -e command), it calculates a value for quorum votes.

    Quorum votes is a dynamically calculated clusterwide value, based on the value of cluster expected votes, that determines whether a given node can form, join, or continue to participate in a cluster. The connection manager computes the clusterwide quorum votes value using the following formula:

    quorum votes = round_down((cluster_expected_votes+2)/2)
     
    

    For example, consider the three-member cluster described in the previous step. With cluster expected votes set to 3, quorum votes would be calculated as round_down((3+2)/2), or 2. In the case where the fourth member was added successfully, quorum votes would be calculated as 3 (round_down((4+2))/2).

    Note

    Expected votes (and, hence, quorum votes) are based on cluster configuration, rather than on which nodes are up or down. When a member is shut down, or goes down for any other reason, the connection manager does not decrease the value of quorum votes. Only member deletion and the clu_quorum -e command can lower the quorum votes value of a running cluster.

  4. Whenever a cluster member determines that the number of votes it can see has changed (a node has joined the cluster, an existing member has been deleted from the cluster, or a communications error is reported), it compares current votes to quorum votes.

    The action the member takes is based on the following conditions:

Note that the comparison of current votes to quorum votes occurs on a member-by-member basis, although events may make it appear that quorum loss is a clusterwide event.

Depending upon how the member lost quorum, you may be able to remedy the situation by booting a member with enough votes for the member in quorum hang to achieve quorum. If all cluster members have lost quorum, your options are limited to booting a new member with enough votes for the members in quorum hang to achieve quorum, rebooting the entire cluster, or using the troubleshooting procedures discussed in the Cluster Administration manual.

3.3    Using a Quorum Disk

In a two-member cluster configuration, where each member has one member vote and expected votes has the value of 2, the loss of a single member will cause the cluster to lose quorum and all applications to be suspended. This type of configuration is not highly available.

A more realistic (but not substantially better) two-member configuration would assign one member 1 vote and the second member 0 (zero) votes. Expected votes would be 1. This cluster could lose its second member (the one with no votes) and remain up. However, it cannot afford to lose the first member (the voting one).

To foster better availability in such a configuration, you can designate a disk on a shared bus as a quorum disk. The quorum disk acts as a virtual cluster member whose purpose is to add one vote to the total number of expected votes. When a quorum disk is configured in a two-member cluster, the cluster can survive the failure of either the quorum disk or one member and continue operating.

For example, consider the two-member deli cluster without a quorum disk as shown in Figure 3-1.

Figure 3-1:  Two-Member deli Cluster Without a Quorum Disk

One member contributes 1 node vote and the other contributes 0, so cluster expected votes is 1. The connection manager calculates quorum votes as follows:

quorum votes =
round_down((cluster_expected_votes+2)/2) =
round_down((1+2)/2) = 1
 

The failure or shutdown of member salami causes member polishham to lose quorum. Cluster operations are suspended.

However, if the cluster includes a quorum disk (adding one vote to the total of cluster expected votes), and member polishham is also given a vote, expected votes will become 3 and quorum votes will become 2:

quorum votes =
round_down((cluster_expected_votes+2)/2) =
round_down((3+2)/2) = 2
 

Now, if either member or the quorum disk left the cluster, sufficient current votes will remain to keep the cluster from losing quorum. The cluster shown in Figure 3-2 can continue operation.

Figure 3-2:  Two-Member deli Cluster with Quorum Disk Survives Member Loss

The clu_create utility allows you to specify a quorum disk at cluster creation and assign it a vote. You can also use the clu_quorum utility to add a quorum disk at some other moment in the life of a cluster; for example, when the result of a clu_delete_member is a two-member cluster with compromised availability.

To configure a quorum disk, use the clu_quorum -d add command. For example, the following command defines /dev/disk/dsk11 as a quorum disk with one vote:

# clu_quorum -d add dsk11 1

Collecting quorum data for Member(s): 1 2
 
  Initializing cnx partition on quorum disk : dsk11h
 
  Successful quorum disk creation
# clu_quorum

Collecting quorum data for Member(s): 1 2
 
Quorum Data for Cluster: deli as of Thu  Mar  9 09:59:18 EDT 2000
 
Cluster Common Quorum Data
Quorum disk: dsk11h
        .
        .
        .
 

The following restrictions apply to the use of a quorum disk: