Sun Microsystems
Products & Services
 
Support & Training
 
 

Previous Previous     Contents     Index     Next Next

Shutting Down and Restarting a Cluster

This section describes how to shut down and restart a cluster.

ProcedureTo Shut Down a Cluster

  1. Log in to a peer node as superuser.

  2. Identify the role of each peer node:

    # nhcmmstat -c all

    Record the role of each node.

  3. Shut down each diskless and dataless node:

    # init 5

  4. Verify that the vice-master node is synchronized with the master node:

    # /usr/opt/SUNWesm/sbin/scmadm -S -M

    If the vice-master node is not synchronized with the master node, synchronize it:

    # nhcrfsadm -f all

  5. Shut down the vice-master node by logging in to the vice-master node and typing this command:

    # init 5

  6. Shut down the master node by logging in to the master node and typing this command:

    # init 5

For further information about the init command, see the init(1M) man pages.

ProcedureTo Restart a Cluster

This procedure describes how to restart a cluster that has been shut down as described in To Shut Down a Cluster.


Caution Caution - To restart a cluster, you boot each peer node. The order in which you boot the nodes is important. Restart the nodes so that they have the same role as they had before cluster was shut down. If you do not maintain the roles of the nodes, you might lose data.


  1. Log in to the master node and in the console window type:

    ok> boot

  2. When the node has finished booting, verify that the master node is correctly configured:

    # nhadm check configuration

  3. Log in to the vice-master node, and in the console window type:

    ok> boot

  4. When the node has finished booting, verify that the vice-master node is correctly configured:

    # nhadm check configuration

  5. Log in to each diskless node or dataless node, and in the console window type:

    ok> boot

  6. When the nodes have finished booting, verify that each node is correctly configured:

    # nhadm check configuration

  7. From any node in the cluster, verify that the cluster has started up successfully:

    # nhadm check starting

  8. Confirm that each node has the same role as it had before it was shut down.


Caution Caution - After an emergency shut down, the order in which the nodes are rebooted is important if availability or data integrity are a priority on your cluster. The order in which these nodes are restarted depends on the Data Management Policy you have selected in your initial cluster configuration. For more information, see the nhfs.conf(4) and cluster_definition.conf(4) man pages.


Triggering a Switchover

Before you perform a switchover, verify that the master and vice-master disks are synchronized, as described in To Verify That the Master Node and Vice-Master Node Are Synchronized. To trigger a switchover, perform the following procedure.

ProcedureTo Trigger a Switchover With nhcmmstat

  1. Log in to the master node as superuser.

  2. Trigger a switchover:

    # nhcmmstat -c so

    • If there is a vice-master node qualified to become master, this node is elected as the master node. The old master node becomes the vice-master node.

    • If there is no potential master node, nhcmmstat does not perform the switchover.

  3. Verify the cluster configuration:

    # nhadm check

    If the switchover was successful, the current node is the vice-master node.

  4. Verify that the current node is synchronized with the new master node:

    # nhcmmstat -c vice

For more information, see the nhcmmstat(1M) man page.

Recovering a Cluster

This section describes how to recover when a cluster fails.

If the master node and the vice-master node both act as master nodes, this error is called split brain. For information about how to recover from split brain at startup and at runtime, see the Netra High Availability Suite Foundation Services 2.1 6/03 Troubleshooting Guide.

ProcedureTo Recover a Cluster After Failure

  1. Stop all of the nodes in the cluster:

    # init 5

  2. Boot both of the master-eligible nodes in single user mode.

    ok> boot -s

  3. Confirm that the master-eligible nodes are configured correctly.

    For each master-eligible node, do the following:

    1. Confirm that the following files exist and are not empty:

      • cluster_nodes_table

      • target.conf

    2. Reset the replication configuration (answer = y):
      # /usr/opt/SUNWscm/sbin/dscfg -i

    3. Re-create an empty replicated configuration file by typing Y at this prompt:
      # (Type Y for YES) Y

      # /usr/opt/SUNWscm/sbin/dscfg -i -p /etc/opt/SUNWesm/pconfig

    4. Synchronize the file system by using /sbin/sync.

    5. Stop the master-eligible node.

  4. Boot the nodes in the following order:

    1. Boot the first master-eligible node. This node has the most up-to-date set of data.


      Caution Caution - The node that becomes the vice-master node will have the recent file system data erased.


    2. Confirm that the first master-eligible node has become the master node.

    3. Boot the second master-eligible node.

    4. Confirm that the second master-eligible node has become the vice-master node.

    5. Wait until the master node and vice-master node are synchronized.

      This is a full resynchronization and might take some time.

    6. Boot the diskless and dataless nodes if there are any.

      Diskless and dataless nodes can be booted in any order.

Previous Previous     Contents     Index     Next Next