Sun Microsystems
Products & Services
 
Support & Training
 
 

Previous Previous     Contents     Index     Next Next
Chapter 8

Cluster Membership Manager

For information about how the cluster membership is managed and configured, and how the presence of peer nodes is monitored, see the following sections:

Introduction to the Cluster Membership Manager

The Cluster Membership Manager (CMM) is implemented by the nhcmmd daemon. There is a nhcmmd daemon on each peer node.

The nhcmmd daemon on the master node has the current view of the cluster configuration. It communicates its view to the nhcmmd daemons on the other peer nodes. The nhcmmd daemon on the master node determines which nodes are members of the cluster, and assigns roles and attributes to the nodes. It detects the failure of nodes and configures routes for reliable transport.

The nhcmmd daemon on the vice-master node monitors the status of the master node. If the master node fails, the vice-master node is able to take over as the master node.

The nhcmmd daemons on the master-ineligible nodes do not communicate with one another. Each nhcmmd daemon exports an API to do the following:

  • Notify clients of changes to the cluster

  • Notify services and applications when the cluster membership or master changes

Notification messages describe the change and the nodeid of the affected node. Clients can use notifications to maintain an accurate view of the peer nodes in the cluster.

For further information about the nhcmmd daemon, see the nhcmmd(1M) man page.

You can use the CMM API to write applications that manage peer nodes or that register clients to receive notifications. For further information about writing applications that use the CMM API, see the Netra High Availability Suite Foundation Services 2.1 6/03 CMM Programming Guide.

Configuring the Cluster Membership

Cluster membership information is stored in the configuration files, cluster_nodes_table and nhfs.conf.

At cluster startup, the cluster membership is configured as follows:

  1. Both of the master-eligible nodes retrieve the list of peer nodes and their attributes from the cluster_nodes_table, and configuration information from nhfs.conf. All other peer node retrieve configuration information from nhfs.conf.

  2. The nhcmmd daemon on the master node uses the list of nodes and their attributes to generate its view of the cluster configuration. It communicates this view to the nhcmmd daemons on the other peer nodes, including the vice-master node.

  3. Using the master node view of the cluster, the nhcmmd daemon on the vice-master node updates its local cluster_nodes_table.

The nhcmmd daemon on the master node updates its cluster_nodes_table and its view of the cluster configuration when a peer node is added, removed, or disqualified. The nhcmmd daemon on the master node communicates the updated view to the nhcmmd daemons on the other peer nodes. The vice-master node uses this view to update its local cluster_nodes_table. In this way, the master node and vice-master node always have an up-to-date view of the cluster.

Monitoring the Presence of Peer Nodes

Each peer node runs a daemon called nhprobed that periodically sends a heartbeat in the form of an IP packet. Heartbeats are sent through each of the two physical interfaces of each peer node. When a heartbeat is detected through a physical interface, it indicates that the node is reachable and that the physical interface is alive. If a heartbeat is not detected for a period of time exceeding the detection delay, the physical interface is considered to have failed. If both of the node's physical interfaces fail, the node itself is considered to have failed. The detection delay is 900 milliseconds. At least one heartbeat must be detected each 900 milliseconds.

For more information about the nhprobed daemon, see the nhprobed(1M) man page.

Interaction Between the nhprobed Daemon and the nhcmmd Daemon

On the master-eligible nodes, the nhprobed daemon receives a list of nodes from the nhcmmd daemon. The nhprobed daemon monitors the heartbeats of the nodes on the list. On the master node, the list contains all of the master-ineligible nodes and the vice-master node. On the vice-master node, the list contains the master node only.

On the master-eligible nodes, the nhprobed daemon notifies the nhcmmd daemon when, for any node on its list, any of the following events occur:

  • One link becomes available, indicating that the node is accessible through the link.

  • One link becomes unavailable, indicating that the node is not accessible through the link.

  • The node becomes available, indicating that the first link to the node becomes available.

  • The node becomes unavailable, indicating that the last available link to the node becomes unavailable.

When a node other than the master node becomes unavailable, the master node eliminates the node from the cluster. The master node uses the TCP abort facility to close communication to the node. When the master node becomes unavailable, a failover is provoked.

Using the Direct Link to Prevent Split Brain Errors

Split brain is an error scenario in which the cluster has two master nodes. A direct communication link between the master-eligible nodes prevents the occurrence of split brain when the communication between the master node and vice-master node fails.

As described in Monitoring the Presence of Peer Nodes, the nhprobed daemon on the vice-master node monitors the presence of the master node. If the nhprobed daemon on the vice-master node fails to detect the master node, the master node itself or the communication to the master node has failed. If this happens, the vice-master node uses the direct link to try to contact the master node.

  • If the vice-master node does not receive a reply from the master node by using the direct link, it is assumed that the master node has failed. The vice-master node becomes the master node.

  • If the vice-master node receives a reply from the master node by using the direct link, it is assumed that the communication to the master node has failed but the master node is alive. The vice-master node is rebooted.

The Node Management Agent can monitor the following statistics on the direct link:

  • The number of times that the vice-master node has requested to become the master node.

  • The state of the direct communication link. The state can be up or down.

For information about how to connect the direct link between the master-eligible nodes, see the Netra High Availability Suite Foundation Services 2.1 6/03 Hardware Guide.

Multicast Transmission of Heartbeats

Probe heartbeats are multicast. Each cluster on a local area network (LAN) is assigned to a different multicast group, and each network interface card (NIC) on a node is assigned to a different multicast group. For example, NICs connected to an hme0 Ethernet network are assigned to one multicast group, and NICs connected to an hme1 Ethernet network are assigned to another multicast group.

A heartbeat sent from one multicast group cannot be detected by another multicast group. Therefore, heartbeats sent from one cluster cannot be detected by another cluster on the same LAN. Similarly, for a cross-switched topology, heartbeats sent from one Ethernet network cannot be detected on another Ethernet network.

Multicast addresses are 32-bit. The lower 28 bits of the multicast address represent the multicast group. The multicast address is broken into the following parts:

  • Bits 28 to 31 are fixed.

  • Bits 23 to 27 identify the Foundation Services. For the Foundation Services bits 23 to 27 are always set to 10100.

  • Bits 8 to 22 identify the cluster. The value for a given cluster is specified in the nhfs.conf file by the Node.DomainId parameter.

  • Bits 0 to 7 identify the NIC. The value for a given NIC is specified in the nhfs.conf file by the Node.NIC0 and Node.NIC1 parameters.

Previous Previous     Contents     Index     Next Next