Change in behaviour for monitor_rpcbind.

As a result of bugid 4263858 the way in which rpcbind is monitored
and how we handle detected problems has been extensively changed.

As a result of these changes there are now a number of tunable
values that are set in the CDB for the cluster that control how
tolerant of failures the cluster will be. These have been introduced
to allow customers running very heavily loaded systems to tune the
number of probes made, and the interval between them to allow the system
to schedule rpcbind to run before the cluster takes the decision it has
not responded and aborts.

To modify a tunable parameter you take these steps :

	1. stop the cluster on all nodes.
	2. on all nodes, edit the CDB file. This is the file : 

	/etc/opt/SUNWcluster/conf/{clustername}.cdb

	3. edit the relevent line for the desired tunable. 
	4. start the cluster software on all nodes.

You must not edit the cdb file on only one node. This must be the same across
all nodes in the cluster. It must only be changed with the cluster stopped.

A list of the tunable values that can be adjusted, their description and
the default settings is included below. You must not change any other
values in the CDB file without the specific instruction of a Sun
Microsystems support engineer. Changing other values could severely impact
the operation of the cluster, or render it inoperable. It is strongly 
advised you make a copy of the CDB file before making any changes to it.

List of tunables.

Name :		Default :	Description :

rpcmon.retries		1	The number of times rpcbind will be tested 
				and allowed not to respond before a fault
				is declared. 

rpcmon.ival		30	The number of seconds between tests.
				This may be decreased to test more often
				or increased if the system is very busy
				and you want fewer probes running. Don't
				set this too high however or the system 
				take too long to detect a genuine fault.

rpcmon.noresponse	5	The number of times rpcbind is allowed to
				fail to respond while remaining in the
				process table before it is declared as 
				failed.

rpcmon.loops		0	The number of times to re-try a SIGTERM
				on rpcbind if it appears to be hung. 
				Note - it will always be tried once - this
				value is the number of extra times to 
				attempt this.

rpcmon.sleep		2	The number of seconds to pause after 
				sending a SIGTERM to rpcbind before
				attempting to restart it.

In general use these defaults seem to provide a good balance between 
the need to find a fault in good time and the need to minimise the 
overhead on the system of the cluster framework. Caution should be
exercised when changing them not to go to extremes as this balance may
be upset. 
