Sun Microsystems
Products & Services
 
Support & Training
 
 

Previous Previous     Contents     Index     Next Next
Chapter 10

Daemon Monitor

This chapter describes how the Daemon Monitor is used to survey other process daemons. It describes how the Daemon Monitor can be monitored and its recovery response changed and reset.

This chapter includes the following sections:

The nhpmd Daemon

The nhpmd daemon provides the Daemon Monitor service. The nhpmd daemon runs at the multiuser level on all nodes in the cluster. The nhpmd daemon surveys other Foundation Services daemons, many Solaris operating system daemons, and some companion product daemons. If a daemon that provides a critical service fails, the nhpmd daemon detects the failure and triggers a recovery response. The recovery response is specific to the daemon that has failed. For a list of monitored daemons and their recovery responses, see the nhpmd(1M) man page.

The nhpmd daemon operates at a higher priority than the other Foundation Services daemons.

The Daemon Monitor is surveyed by a kernel module. When the kernel module detects an abnormal exit of the Daemon Monitor, it implements a panic that results in the crash and reboot of the node.

Foundation Services daemons and Solaris operating system daemons are launched by startup scripts. A nametag is assigned to the daemon or group of daemons that is launched by each startup script. In some cases, such as for syslogd, a nametag is assigned to only one daemon. In other cases, such as for nfs_client, a nametag is assigned to a group of daemons. If one of the daemons covered by a nametag fails, the recovery response is performed on all of the daemons covered by that nametag. If the recovery response is to restart the failed daemon, all of the daemons grouped under that nametag are killed and then restarted.

Information about monitored daemons can be collected using the nhpmdadm command, as described in the nhpmdadm(1M) man page.

Information about the actions taken by the nhpmd daemon can be gathered from the system log files. For information on how to configure the system log files, see the Netra High Availability Suite Foundation Services 2.1 6/03 Cluster Administration Guide.

Using the Node Management Agent With the Daemon Monitor

The Node Management Agent (NMA) can be used to collect the following information from a Daemon Monitor:

  • Which daemons are monitored

  • Which monitored processes have failed

  • The number of times a failed daemon has been restarted

  • The maximum number of times a failed daemon is allowed to be restarted

The NMA can be used to change the following parameters of the Daemon Monitor:

  • The maximum number of times that the Daemon Monitor attempts to restart a daemon or group of daemons

  • The reset of the current retry count for a monitored daemon

For information about the NMA, see Chapter 11, Node Management Agent and also the Netra High Availability Suite Foundation Services 2.1 6/03 NMA Programming Guide.

Previous Previous     Contents     Index     Next Next