Clustered systems share various data and system resources, such as access to disks and files. To achieve the coordination that is necessary to maintain resource integrity, the cluster must have clear criteria for membership and must disallow participation in the cluster by systems that do not meet those criteria.
The connection manager is a distributed kernel component that monitors whether cluster members can communicate with each other, and enforces the rules of cluster membership. The connection manager:
Forms a cluster, adds members to a cluster, and removes members from a cluster
Tracks which members in a cluster are active
Maintains a cluster membership list that is consistent on all cluster members
Provides timely notification of membership changes using Event Manager (EVM) events
Detects and handles possible cluster partitions
An instance of the connection manager runs on each cluster member. These instances maintain contact with each other, sharing information such as the cluster's membership list. The connection manager uses a three-phase commit protocol to ensure that all members have a consistent view of the cluster.
This chapter provides the following information:
A discussion of quorum, votes, and cluster membership (Section 3.1)
A discussion of how the connection manager calculates quorum (Section 3.2)
When and how to use a quorum disk (Section 3.3)
The connection manager ensures data integrity in the face of communication failures by using a voting mechanism. It allows processing and I/O to occur in a cluster only when a majority of votes are present. When the majority of votes are present, the cluster is said to have quorum.
The mechanism by which the connection manager calculates quorum and allows
systems to become and remain cluster members depends on a number of
factors, including expected votes, current votes, node votes, and quorum
disk votes.
This section describes these concepts.
3.1.1 What is a Cluster Member?
The connection manager is the sole arbiter of cluster membership.
A node
that has been configured to become a cluster member, either through the
clu_create
or
clu_add_member
command
does not become a cluster member until it has rebooted with a clusterized
kernel and is allowed to form or join a cluster by the connection manager.
The difference between a cluster member
and a node configured to become a cluster member is important in any
discussion of quorum and votes.
Once a node has formed or joined a cluster, the connection manager
forever considers it to be a cluster member (until someone uses
clu_delete_member
to remove it from the cluster).
A disruption of communications in a cluster (such as that caused by
broken or disconnected hardware) might cause an existing cluster to
divide into two or more clusters.
If the cluster divides, known as a
cluster partition, nodes may consider
themselves to be members of one cluster or another.
However, as
discussed in
Section 3.2, the connection
manager will at most allow only one of these clusters to function.
3.1.2 Node Votes
Node votes are the fixed number of votes that a given member contributes towards quorum. Cluster members can have either 1 or 0 (zero) node votes. Each member with a vote is considered to be a voting member of the cluster. A member with 0 (zero) votes is considered to be a nonvoting member.
Voting members can form a cluster. Nonvoting members can only join or maintain an existing cluster.
A member's votes are initially determined
by the
cluster_node_votes
kernel attribute in the
clubase
subsystem of its
member-specific
etc/sysconfigtab
file.
3.1.3 Quorum Disk Votes
In certain cluster configurations, described in Section 3.3, you may enhance cluster availability by configuring a quorum disk. Quorum disk votes are the fixed number of votes that a quorum disk contributes towards quorum. A quorum disk can have either 1 or 0 (zero) votes.
Quorum disk votes are initialized from the
cluster_qdisk_votes
kernel attribute in the
clubase
subsystem of each member's
etc/sysconfigtab
file.
When configured, a quorum disk's vote plays a unique role in cluster formation. This is because of the following rules enforced by the connection manager:
A booting node cannot form a cluster unless it has quorum.
Before the node can claim the quorum disk and its vote, it must be a cluster member.
In the situation where the booting node needs the quorum disk vote to achieve quorum, these rules create an impasse: the booting node would never be able to form a cluster.
The connection manager resolves this dilemma by allowing booting
members to provisionally apply the quorum disk vote towards
quorum.
This allows a booting member to achieve quorum and form the
cluster.
Once it has formed the cluster, it claims the quorum disk.
At
that point, the quorum disk's vote is no longer provisional; it is
real.
3.1.4 Expected Votes
Expected votes are the number of votes the connection manager should expect when all configured votes are available. In other words, expected votes should be the sum of all node votes (see Section 3.1.2) configured in the cluster, plus the vote of the quorum disk, if one is configured (see Section 3.1.3). Each member brings its own notion of expected votes to the cluster; it is important that all members agree on the same number of expected votes.
The connection manager refers to the node expected votes settings of booting cluster members to establish its own internal clusterwide notion of expected votes, referred to as cluster expected votes. The connection manager uses its cluster expected votes value when determining the number of votes the cluster requires to maintain quorum, as explained in Section 3.2.
Use the
clu_quorum
command or the
clu_get_info
-full
command to display the current value
of cluster expected votes.
The
clu_create
and
clu_add_member
scripts automatically adjust each
member's expected votes as a new voting member or quorum disk is
configured in the cluster.
The
clu_delete_member
command automatically lowers expected votes when a member is
deleted.
Similarly, the
clu_quorum
command adjusts
each member's expected votes as a quorum disk is added or deleted, or
node votes are assigned to or removed from a member.
These commands
ensure that the member-specific expected votes value is the same on
each cluster member, and that it is the sum of all node votes and the
quorum disk vote, if a quorum disk is configured.
A member's expected votes are initialized by the
cluster_expected_votes
kernel attribute in the
clubase
subsystem of its member-specific
etc/sysconfigtab
file.
Use the
clu_quorum
command to display a member's expected
votes.
3.1.5 Current Votes
Current votes
are the actual number of votes
that are visible within the cluster.
If expected votes are the number
of configured votes in a cluster, current votes are the number of
votes contributed by current members and any configured quorum disk
that is on line.
3.2 Calculating Cluster Quorum
The quorum algorithm is the method by which the connection manager determines the circumstances under which a given member can participate in a cluster, safely access clusterwide resources, and perform useful work. The algorithm operates dynamically: that is, cluster events trigger its calculations, and the results of its calculations can change over the lifetime of a cluster. This section describes how the connection manager's quorum algorithm works.
The quorum algorithm operates as follows:
The connection manager selects a set of cluster members upon which it bases its calculations. This set includes all members with which it can communicate. For example, it does not include configured nodes that have not yet booted, members that are down, or members that it cannot reach due to a hardware failure (for example, a detached cluster interconnect cable or a bad Memory Channel adapter).
When a cluster is formed, and each time a node boots and joins the cluster, the connection manager calculates a value for cluster expected votes using the largest of the following values:
Maximum member-specific expected votes value from the set of proposed members selected in step 1.
The sum of the node votes from the set of proposed members selected in step 1, plus the quorum disk vote if a quorum disk is configured.
The previous cluster expected votes value.
Consider a three-member cluster with no quorum disk. Each member has one vote and has its member-specific expected votes set to 3. The value of cluster expected votes is currently 3.
A fourth member is then added to the cluster. When the new member boots, the connection manager calculates the new cluster expected votes as 4, the sum of node votes in the cluster.
Use the
clu_quorum
or
clu_get_info
-full
command to display the current value
of cluster expected votes.
Whenever
the connection manager recalculates cluster expected votes (or
resets cluster expected votes as the result of a
clu_quorum -e
command), it calculates a value for
quorum votes.
Quorum votes is a dynamically calculated clusterwide value, based on the value of cluster expected votes, that determines whether a given node can form, join, or continue to participate in a cluster. The connection manager computes the clusterwide quorum votes value using the following formula:
quorum votes = round_down((cluster_expected_votes+2)/2)
For example, consider the three-member cluster described in the previous step. With cluster expected votes set to 3, quorum votes would be calculated as round_down((3+2)/2), or 2. In the case where the fourth member was added successfully, quorum votes would be calculated as 3 (round_down((4+2))/2).
Note
Expected votes (and, hence, quorum votes) are based on cluster configuration, rather than on which nodes are up or down. When a member is shut down, or goes down for any other reason, the connection manager does not decrease the value of quorum votes. Only member deletion and the
clu_quorum -e
command can lower the quorum votes value of a running cluster.
Whenever a cluster member determines that the number of votes it can see has changed (a node has joined the cluster, an existing member has been deleted from the cluster, or a communications error is reported), it compares current votes to quorum votes.
The action the member takes is based on the following conditions:
If the value of current votes is greater than or equal to quorum votes, the member continues running or resumes (if it had been in a suspended state).
If the value of current votes is less than quorum votes, all of its I/O is suspended and all network interfaces except the Memory Channel interfaces are turned off. No commands that access a clusterwide resource work on that member. The member may appear to be hung.
This state is maintained until sufficient votes are added (that is, enough members have joined the cluster or the communications problem is mended) to bring current votes to a value greater than or equal to quorum votes.
Note that the comparison of current votes to quorum votes occurs on a member-by-member basis, although events may make it appear that quorum loss is a clusterwide event.
Depending upon how the member lost quorum, you may be able to remedy
the situation by booting a member with enough votes for the member in
quorum hang to achieve quorum.
If all cluster members have lost
quorum, your options are limited to booting a new member with enough
votes for the members in quorum hang to achieve quorum, rebooting the
entire cluster, or using the troubleshooting procedures discussed in
the
Cluster Administration
manual.
3.3 Using a Quorum Disk
In a two-member cluster configuration, where each member has one member vote and expected votes has the value of 2, the loss of a single member will cause the cluster to lose quorum and all applications to be suspended. This type of configuration is not highly available.
A more realistic (but not substantially better) two-member configuration would assign one member 1 vote and the second member 0 (zero) votes. Expected votes would be 1. This cluster could lose its second member (the one with no votes) and remain up. However, it cannot afford to lose the first member (the voting one).
To foster better availability in such a configuration, you can designate a disk on a shared bus as a quorum disk. The quorum disk acts as a virtual cluster member whose purpose is to add one vote to the total number of expected votes. When a quorum disk is configured in a two-member cluster, the cluster can survive the failure of either the quorum disk or one member and continue operating.
For example, consider the two-member
deli
cluster
without a quorum disk as shown in
Figure 3-1.
Figure 3-1: Two-Member deli Cluster Without a Quorum Disk
One member contributes 1 node vote and the other contributes 0, so cluster expected votes is 1. The connection manager calculates quorum votes as follows:
quorum votes = round_down((cluster_expected_votes+2)/2) = round_down((1+2)/2) = 1
The failure or shutdown of member
salami
causes
member
polishham
to lose quorum.
Cluster operations are
suspended.
However, if the cluster includes a quorum disk (adding one vote to the
total of cluster expected votes), and member
polishham
is also given a vote, expected votes will become 3 and
quorum votes will become 2:
quorum votes = round_down((cluster_expected_votes+2)/2) = round_down((3+2)/2) = 2
Now, if either member or the quorum disk left the cluster, sufficient
current votes will remain to keep the cluster from losing quorum.
The
cluster shown in
Figure 3-2
can continue operation.
Figure 3-2: Two-Member deli Cluster with Quorum Disk Survives Member Loss
The
clu_create
utility allows you to specify a quorum
disk at cluster creation and assign it a vote.
You can also
use the
clu_quorum
utility to add a quorum disk
at some other moment in the life of a cluster; for example, when
the result of a
clu_delete_member
is a two-member
cluster with compromised availability.
To configure a quorum disk, use the
clu_quorum -d add
command.
For example, the following command defines
/dev/disk/dsk11
as a quorum disk with one vote:
#
clu_quorum -d add dsk11 1
Collecting quorum data for Member(s): 1 2 Initializing cnx partition on quorum disk : dsk11h Successful quorum disk creation
#
clu_quorum
Collecting quorum data for Member(s): 1 2 Quorum Data for Cluster: deli as of Thu Mar 9 09:59:18 EDT 2000 Cluster Common Quorum Data Quorum disk: dsk11h . . .
The following restrictions apply to the use of a quorum disk:
A cluster can have only one quorum disk.
The quorum disk should be on a shared bus to which all cluster members are directly connected. If it is not, members that do not have a direct connection to the quorum disk may lose quorum before members that do have a direct connection to it.
The quorum disk must not contain any data.
The
clu_quorum
command will overwrite existing data when
initializing the quorum disk.
The integrity of data (or file system
metadata) placed on the
quorum disk from a running cluster is not guaranteed across member failures.
This means that the member boot disks and the disk holding the clusterwide root (/) cannot be used as quorum disks.
The quorum disk can be quite small. The cluster subsystems use only 1 MB of the disk.
A quorum disk can have either 1 vote or no votes. In general, a quorum disk should always be assigned a vote. You might assign an existing quorum disk no votes in certain testing or transitory configurations, such as a one-member cluster (in which a voting quorum disk introduces a second point of failure).
You cannot use the Logical Storage Manager (LSM) on the quorum disk.