![]() |
|||
![]() |
![]() ![]() |
![]() |
![]() ![]() |
![]() |
![]() ![]() |
![]() |
| |||
Chapter 6Recovering From Failover and Switchover ProblemsFor information about how to recover from problems associated with failover and switchover, see the following sections: Two Master Nodes Are Elected at RuntimeDuring runtime, one master-eligible node should be the master node, and the other master-eligible node should be the vice-master node. When both master-eligible nodes act as master nodes, you have an error scenario called split brain. For information about split brain and the use of a direct link, see Two Master Nodes Are Elected at Startup. If a split brain error occurs during runtime on a cluster with a direct link, perform the procedure inTo Investigate Split Brain on Clusters With a Direct Link. If a split brain error occurs during runtime on a cluster without a direct link, perform the procedure in To Investigate Split Brain During Runtime on Clusters Without a Direct Link.
|
# nhcmmstat -c all |
Each master node should see itself as master, and see the other master as being out of the cluster.
Test the communication between the master nodes.
On the console of each master-eligible node, run:
# nhadm check starting |
When this command is run on a node, it pings all of the other nodes in the cluster. If one master node cannot ping the other master node, the nodes are not communicating.
If the nodes are able to communicate with each other, go to Step 4.
If the nodes are not able to communicate with each other, examine the network interface values of the nodes.
For information, see "Examining the Cluster Networking Configuration" in the Netra High Availability Suite Foundation Services 2.1 6/03 Cluster Administration Guide.
When the problem is resolved, Reliable NFS should automatically detect the split-brain situation. Reliable NFS reboots the master-eligible nodes, so that there is a master node and a vice-master node.
Determine whether the spanning tree protocol is disabled, as described in Step 1 of To Investigate Why the Solaris Operating System Does Not Start on a Diskless Node.
If you cannot resolve this problem, contact your customer support center.
If a failover occurs during the boot or reboot of a diskless node, the DHCP files can be corrupted. If this problem occurs, perform the following procedure.
Confirm that the cluster has recovered from the failover.
For information, see "Reacting to a Failover" in the Netra High Availability Suite Foundation Services 2.1 6/03 Cluster Administration Guide.
Reconfigure the boot policy for the diskless node.
For information, see "Configuring DHCP for a Diskless Node" in the Netra High Availability Suite Foundation Services 2.1 6/03 Custom Installation Guide.
Reload the DHCP table on the master node and the vice-master node:
# pkill -HUP in.dhcpd |
If replication does not resume after failover or switchover, examine the replication between the master-eligible nodes, as described in The Vice-Master Node Remains Unsynchronized After Startup.
![]() ![]() |