This chapter discusses the following topics:
Managing configuration variables (Section 5.1)
Managing kernel attributes (Section 5.2)
Managing remote access to the cluster (Section 5.3)
Shutting down the cluster (Section 5.4)
Shutting down and starting one cluster member (Section 5.5)
Shutting down a cluster member to single-user mode (Section 5.6)
Deleting a cluster member (Section 5.7)
Removing a member and restoring it as a standalone system (Section 5.8)
Changing the cluster name or IP address (Section 5.9)
Changing the member name, IP address, or cluster interconnect address (Section 5.10)
Managing software licenses (Section 5.11)
Installing and deleting layered applications (Section 5.12)
Managing accounting services (Section 5.13)
For information on the following topics that are related to managing cluster members, see the TruCluster Server Cluster Installation manual:
For information about configuring and managing your Tru64 UNIX and TruCluster Server systems for availability and serviceability, see Managing Online Addition and Removal. This manual provides users with guidelines for configuring and managing any system for higher availability, with an emphasis on those capable of Online Addition and Replacement (OLAR) management of system components.
Note
As described in Managing Online Addition and Removal, the
/etc/olar.config
file is used to define system-specific policies and the/etc/olar.config.common
file is used to define cluster-wide policies. Any settings in a system's/etc/olar.config
file override clusterwide policies in the/etc/olar.config.common
file for that system only.
5.1 Managing Configuration Variables
The hierarchy of the
/etc/rc.config*
files lets you
define configuration variables consistently over all systems within a local
area network (LAN) and within a cluster.
Table 5-1
presents the uses of the configuration files.
Table 5-1: /etc/rc.config* Files
File | Scope |
/etc/rc.config |
Member-specific variables.
Configuration variables
in
|
/etc/rc.config.common
|
Clusterwide variables. These configuration variables apply to all members. Configuration
variables in
|
/etc/rc.config.site file
|
Sitewide variables, which are the same for all machines on the LAN. Values in this file are
overridden by any corresponding values in
By default, there is
no
You must then edit
For
more information, see
|
The
rcmgr
command
accesses these variables in a standard search order
(first
/etc/rc.config
, then
/etc/rc.config.common
, and finally
etc/rc.config.site
)
until it finds or sets
the specified configuration variable.
Use the
-h
option to get or set the run-time
configuration variables for a specific member.
The command then acts on
/etc/rc.config
,
the member-specific CDSL configuration file.
To make the command act clusterwide, use the
-c
option.
The command then acts on
/etc/rc.config.common
,
which is the clusterwide configuration file.
If you specify neither
-h
nor
-c
,
then the member-specific values in
/etc/rc.config
are used.
For information about member-specific configuration variables, see
Appendix B.
5.2 Managing Kernel Attributes
Each member of a cluster runs its own kernel and therefore has
its own
/etc/sysconfigtab
file.
This file
contains static member-specific attribute settings.
Although
a clusterwide
/etc/sysconfigtab.cluster
file exists,
its purpose is different from that of
/etc/rc.config.common
, and it is reserved to
utilities that are shipped in the TruCluster Server product.
This section presents a partial list of those kernel attributes that are provided by each TruCluster Server subsystem.
Use the following command to display the current settings of these attributes for a given subsystem:
# sysconfig -q subsystem-name attribute-list
To get a list and the status of all the subsystems, use the following command:
# sysconfig -s
In addition to the cluster-related kernel attributes presented here,
two kernel attributes are set during cluster
installation.
Table 5-2
lists these kernel attributes.
You can increase the values for these attributes, but do
not decrease them.
Table 5-2: Kernel Attributes Not to Decrease
Attribute | Value (Do Not Decrease) |
vm_page_free_min |
30 |
vm_page_free_reserved |
20 |
Table 5-3
lists the subsystem names
that are associated
with each TruCluster Server component.
Table 5-3: Configurable TruCluster Server Subsystems
Subsystem Name | Component | For More Information |
cfs |
Cluster File System (CFS) |
sys_attrs_cfs (5) |
clua |
Cluster alias |
sys_attrs_clua (5) |
clubase |
Cluster base |
sys_attrs_clubase (5) |
cms |
Cluster mount service |
sys_attrs_cms (5) |
cnx |
Connection manager |
sys_attrs_cnx (5) |
dlm |
Distributed lock manager |
sys_attrs_dlm (5) |
drd |
Device request dispatcher |
sys_attrs_drd (5) |
hwcc |
Hardware components cluster |
sys_attrs_hwcc (5) |
icsnet |
Internode communications service's network service |
sys_attrs_icsnet (5) |
ics_hl |
Internode communications service (ICS) high level |
sys_attrs_ics_hl (5) |
mcs |
Memory Channel application programming interface (API) |
sys_attrs_mcs (5) |
rm |
Memory Channel |
sys_attrs_rm (5) |
token |
CFS token subsystem |
sys_attrs_token (5) |
To tune the performance of a kernel subsystem, use one of the
following methods to set one or more attributes in the
/etc/sysconfigtab
file:
Add or edit a
subsystem
name
stanza
entry in the
/etc/sysconfigtab
file to change an attribute's
value and have the new value take effect at the next system boot.
Use the following command to change the value of an attribute that can be reset so that its new value takes effect immediately at run time:
# sysconfig -r subsystem-name attribute-list
To allow the change to be preserved over the next system boot,
you must also edit the
/etc/sysconfigtab
file.
For example,
to change the value of the
drd-print-info
attribute to
1, enter the following command:
# sysconfig -r drd drd-print-info=1 drd-print-info: reconfigured
You can also use the configuration manager framework, as described in
the Tru64 UNIX
System Administration
manual, to
change attributes and otherwise administer a cluster kernel subsystem on
another host.
To do this, set up the host names in the
/etc/cfgmgr.auth
file on the remote client system and then specify the
-h
option to the
/sbin/sysconfig
command,
as in the following example:
# sysconfig -h fcbra13 -r drd drd-do-local-io=0 drd-do-local-io: reconfigured
5.3 Managing Remote Access Within and From the Cluster
An
rlogin
,
rsh
, or
rcp
command from the cluster uses the default
cluster alias as the source address.
Therefore, if a noncluster host must
allow remote host access from any account in the cluster,
the
.rhosts
file on the noncluster member
must include the cluster alias name in one of the forms by which it is listed in
the
/etc/hosts
file or one resolvable through
Network Information Service (NIS) or
Domain Name System (DNS).
The same requirement holds
for
rlogin
,
rsh
, or
rcp
to work between cluster members.
At cluster creation, the
clu_create
utility
prompts for all required host names and puts them in the correct
locations in the proper format.
The
clu_add_member
does the same when a new
member is added to the cluster.
You do not need to edit
/.rhosts
to enable
/bin/rsh
commands from a
cluster member to the cluster alias or between individual members.
Do not change the
generated name entries in
/etc/hosts
and
/.rhosts
.
If the
/etc/hosts
and
/.rhosts
files are configured incorrectly, many applications will not function
properly.
For example, the Advanced File System (AdvFS)
rmvol
and
addvol
commands use
rsh
when the member where the commands are
executed is not the server of the domain.
These commands fail if
/etc/hosts
or
/.rhosts
is configured incorrectly.
The following error indicates that the
/etc/hosts
or
/.rhosts
file has been
configured incorrectly:
rsh cluster-alias date Permission denied.
To halt all members of a cluster, use the
-c
option to the
shutdown
command.
For example, to shut down the cluster
in 5 minutes, enter the following command:
# shutdown -c +5 Cluster going down in 5 minutes
For information on shutting down a single cluster member, see Section 5.5.
During the shutdown grace period,
which is the time between when the cluster
shutdown
command is
entered and when actual shutdown occurs,
the
clu_add_member
command
is disabled and new members cannot be added to the cluster.
To cancel a cluster shutdown during the grace period, kill the
processes that are associated with the
shutdown
command as follows:
Get the process identifiers (PIDs) that are associated with the
shutdown
command.
For example:
# ps ax | grep -v grep | grep shutdown 14680 ttyp5 I < 0:00.01 /usr/sbin/shutdown +20 going down
Depending on how far along
shutdown
is in the grace
period,
ps
might show either
/usr/sbin/shutdown
or
/usr/sbin/clu_shutdown
.
Terminate all shutdown processes by specifying their PIDs in a
kill
command from any member.
For example:
# kill 14680
If you kill the shutdown processes during the grace period, the shutdown is canceled.
The
shutdown -c
command fails if a
clu_quorum
,
clu_add_member
,
clu_delete_member
, or
clu_upgrade
is in progress.
There is no clusterwide reboot.
The
shutdown -r
command, the
reboot
command, and the
halt
command
act only on the member on which they are executed.
The
halt
,
reboot
, and
init
commands have been modified
to leave file systems in a cluster mounted, so the cluster continues
functioning when one of its members is halted or rebooted, as long as
it retains quorum.
For more information, see
shutdown
(8).
5.5 Shutting Down and Starting One Cluster Member
When booting a member, you must boot from the boot disk that was created
by the
clu_add_member
command.
You cannot boot from a copy
of the boot disk.
Shutting down a single cluster member is more complex than shutting
down a standalone server.
If you halt a
cluster member whose vote is required for quorum
(referred to as a critical voting member), the
cluster will lose quorum and hang.
As a result, you will be unable to enter
commands from any cluster member until you reboot the halted
member.
Therefore, before you shut down a cluster member, you must first
determine whether that member's vote is required for quorum.
5.5.1 Identifying a Critical Voting Member
A cluster that contains a critical voting member is either operating in a degraded mode (for example, one or more voting members or a quorum disk is down) or was not configured for availability to begin with (for example, it is a two-member configuration with each member assigned a vote). Removing a critical voting member from a cluster causes the cluster to hang and compromise availability. Before halting or deleting a cluster member, ensure that it is not supplying a critical vote.
To determine whether a member is a critical voting member, follow these steps:
If possible, make sure that all voting cluster members are up.
Enter the
clu_quorum
command and note the
running values of current votes, quorum votes, and the node votes
of the member in question.
Subtract the member's node votes from the current votes. If the result is less than the quorum votes, the member is a critical voting member and you cannot shut it down without causing the cluster to lose quorum and hang.
5.5.2 Preparing to Halt or Delete a Critical Voting Member
Before halting or deleting a critical voting member, ensure that its votes are no longer critical to the cluster retaining quorum. The best way to do this involves restoring node votes or a quorum disk vote to the cluster without increasing expected votes. Some ways to accomplish this are:
Booting a voting member that is currently down.
Removing the vote of a down member (using the
clu_quorum -f
-m
command) and configuring a quorum disk with a vote
(using the
clu_quorum -f -d add
command).
This has
the effect of not increasing expected votes or changing the value of
quorum votes, but brings an additional current vote to the cluster.
If the cluster has an even number of votes, adding a new voting
member or configuring a quorum disk can also make a critical
voting member noncritical.
In these cases, expected votes is incremented,
but quorum votes remains the same.
5.5.3 Halting a Noncritical Member
A noncritical member, one with no vote or whose vote is not required to maintain quorum, can be shut down, halted, or rebooted like a standalone system.
Execute the
shutdown
command on the member to be
shut down.
To halt a member, enter the following command:
# shutdown -h time
To reboot a member, enter the following command:
# shutdown -r time
For information on identifying critical voting
members, see
Section 5.5.1.
5.5.4 Shutting Down a Hosting Member
The cluster application availability (CAA) profile for an application
allows you to specify an
ordered list of members, separated by white space, that
can host the application resource.
The hosting members list is used
in conjunction with the application resource's failover policy
(favored or restricted), as discussed in
caa
(4).
If the cluster member that you are shutting down is the only hosting member for one or more applications with a restricted placement policy, you need to specify another hosting member or the application cannot run while the member is down. You can add an additional hosting member, or replace the existing hosting member with another.
To do this, perform these steps:
Verify the current hosting members and placement policy.
# caa_profile -print resource-name
If the cluster member that you are shutting down is the only hosting member, you can add an additional hosting member to the hosting members list, or replace the existing member.
# caa_profile -update resource-name -h hosting-member another-hosting-member # caa_profile -update resource-name -h hosting-member
Update the CAA registry entry with the latest resource profile.
# caa_register -u resource-name
Relocate the application to the other member.
# caa_relocate resource-name -c member-name
5.6 Shutting Down a Cluster Member to Single-User Mode
If you need to shut down a cluster member to single-user mode,
you must first halt the member and then boot it to single user-mode.
Shutting down the member in this manner assures that the member
provides the minimal set of services to the cluster and that the
running cluster has a minimal reliance on the member running in
single-user mode.
In particular, halting the member satisfies
services that require the cluster member to have a status of
DOWN
before completing a service failover.
If you
do not first halt the cluster member, the services do not fail over
as expected.
To take a cluster member to single-user mode, use the
shutdown -h
command to halt the member, and then boot the
member to single-user mode.
When the system reaches single-user mode,
run the
init s
,
bcheckrc
,
and
lmf reset
commands.
For example:
Note
Before halting a cluster member, make sure that the cluster can maintain quorum without the member's vote.
# /sbin/shutdown -h now >>> boot -fl s # /sbin/init s # /sbin/bcheckrc # /usr/sbin/lmf reset
A cluster member that is shut down to single-user mode (that is, not shut down
to a halt and then booted to single-user mode as recommended)
continues to have a status of
UP
.
Shutting down a
cluster member to single-user mode in this manner does not
affect the voting status of the member: a member contributing a vote
before being shut down to single-user mode continues contributing
the vote in single-user mode.
5.7 Deleting a Cluster Member
The
clu_delete_member
command permanently removes a
member from the cluster.
Caution
If you are reinstalling TruCluster Server, see the TruCluster Server Cluster Installation manual. Do not delete a member from an existing cluster and then create a new single-member cluster from the member that you just deleted. If the new cluster has the same name as the old cluster, the newly installed system might join the old cluster. This can cause data corruption.
The
clu_delete_member
command has the following syntax:
/usr/sbin/clu_delete_member
[-f
]
[-m memberid
]
If you do not supply a member ID, the command prompts you for the member ID of the member to delete.
The
clu_delete_member
command does the following:
Mounts the member's boot partition and deletes all files in the boot partition. The system can no longer boot from this disk.
Caution
The
clu_delete_member
command will delete a member, even when the member's boot disk is inaccessible. This lets you delete a member whose boot disk has failed.If the command cannot access the disk, you must make sure that no cluster member can inadvertently boot from that disk. Remove the disk from the cluster, reformat it, or use the
disklabel
command to make it a nonbootable disk.
If the member has votes, adjusts the value of cluster expected votes throughout the cluster.
Deletes all member-specific directories and files in the clusterwide file systems.
Note
The
clu_delete_member
command deletes member-specific files from the/cluster
,/usr/cluster
, and/var/cluster
directories. However, an application or an administrator can create member-specific files in other directories, such as/usr/local
. You must manually remove those files after runningclu_delete_member
. Otherwise, if you add a new member and reuse the same member ID, the new member will have access to these (outdated and perhaps erroneous) files.
Removes the deleted member's host name for its Memory Channel interface
from the
/.rhosts
and
/etc/hosts.equiv
files.
Writes a log file of the deletion to
/cluster/admin/clu_delete_member.log
.
Appendix C
contains a sample
clu_delete_member
log file.
To delete a member from the cluster, follow these steps:
Determine whether or not the member is a critical voting member of the cluster. If the member supplies a critical vote to the cluster, halting it will cause the cluster to lose quorum and suspend operations. Before halting the member, use the procedure in Section 5.5 to determine whether it is safe to do so.
Halt the member to be deleted.
If possible, make sure that all voting cluster members are up.
Use the
clu_delete_member
command from another member
to remove the member from the cluster.
For example, to delete a halted member whose member ID is 3,
enter the following command:
# clu_delete_member -m 3
When you run
clu_delete_member
and the boot disk
for the member is inaccessible, the command displays a message to that
effect.
If the member being deleted is a voting member, after the member is deleted you must manually lower by one vote the expected votes for the cluster. Do this with the following command:
# clu_quorum -e expected-votes
Note
This step applies only when the member boot disk cannot be accessed by
clu_delete_member
and the member that is being deleted is a voting member.
For an example of the
/cluster/admin/clu_delete_member.log
that results
when a member is deleted, see
Appendix C.
5.8 Removing a Cluster Member and Restoring It as a Standalone System
To restore a cluster member as a standalone system, follow these steps:
Halt and delete the member by following the procedures in Section 5.5 and Section 5.7.
Physically disconnect the halted member from the cluster, disconnecting the Memory Channel and storage.
On the halted member, select a disk that is local to the member and install Tru64 UNIX. See the Tru64 UNIX Installation Guide for information on installing system software.
For information about moving clusterized Logical Storage Manager (LSM)
volumes to a noncluster
system, see
Section 10.5.
5.9 Changing the Cluster Name or IP Address
Changing the name of a cluster requires a shutdown and reboot of the entire cluster. Changing the IP address of a cluster requires that you shut down and reboot each member individually.
To change the cluster name, follow these steps carefully. Any mistake can prevent the cluster from booting.
Create a file with the new
cluster_name
attribute for the
clubase
subsystem stanza entry.
For example, to change the cluster name to
deli
,
add the following
clubase
subsystem stanza entry:
clubase: cluster_name=deli
Notes
Ensure that you include a line-feed at the end of each line in the file that you create. If you do not, when the
sysconfigtab
file is modified, you will have two attributes on the same line. This may prevent your system from booting.If you create the file in the cluster root directory, you can use it on every system in the cluster without a need to copy the file.
On each cluster member, use the
sysconfigdb -m -f
file
clubase
command to merge the new
clubase
subsystem
attributes from the file that you created with the
clubase
subsystem attributes in the
/etc/sysconfig
file.
For example, assume that the file
cluster-name-change
contains the
information shown in the example in step 1.
To use the
file
cluster-name-change
to change the
cluster name from
poach
to
deli
, use the following command:
# sysconfigdb -m -f cluster-name-change clubase Warning: duplicate attribute in clubase: was cluster_name = poach, now cluster_name = deli
Caution
Do not use the
sysconfigdb -u
command with a file with only one or two attributes to be changed. The-u
flag causes the subsystem entry in the input file to replace a subsystem entry (for instanceclubase
). If you specify only thecluster_name
attribute for theclubase
subsystem, the newclubase
subsystem will contain only thecluster_name
attribute and none of the other required attributes.
Change the cluster name in each of the following files:
/etc/hosts
/etc/hosts.equiv
There is only one copy of these files in a cluster.
Add the new cluster name to the
/.rhosts
file (which is common to all cluster members).
Leave the current cluster name
in the file.
The current name is needed for the
shutdown
-c
command in the next step to function.
Change any client
.rhosts
file as appropriate.
Shut down the entire cluster with the
shutdown
-c
command and reboot each system in the cluster.
Remove the previous cluster name from the
/.rhosts
file.
To verify that the cluster name has changed, run
the
/usr/sbin/clu_get_info
command:
# /usr/sbin/clu_get_info Cluster information for cluster deli
.
.
.
5.9.1 Changing the Cluster IP Address
To change the cluster IP address, follow these steps:
Edit the
/etc/hosts
file, and change the IP
address for the cluster.
One at a time (to keep quorum), shut down and reboot each cluster member system.
To verify that the cluster IP address has changed, run the
/usr/sbin/ping
command from a system that is not in the
cluster to ensure that the cluster provides the echo response when you
use the cluster address:
# /usr/sbin/ping -c 3 16.160.160.160 PING 16.160.160.160 (16.160.160.160): 56 data bytes 64 bytes from 16.160.160.160: icmp_seq=0 ttl=64 time=26 ms 64 bytes from 16.160.160.160: icmp_seq=1 ttl=64 time=0 ms 64 bytes from 16.160.160.160: icmp_seq=2 ttl=64 time=0 ms ----16.160.160.160 PING Statistics---- 3 packets transmitted, 3 packets received, 0% packet loss round-trip (ms) min/avg/max = 0/9/26 ms
5.10 Changing the Member Name, IP Address, or Cluster Interconnect Address
To change the member name, member IP address, or member cluster interconnect address, you must remove the member from the cluster and then add it back in with the desired member name or address. Do this as follows:
Halt the member. See Section 5.5 for information on shutting down a single cluster member.
On an active member of the cluster, delete the member that you just shut
down.
Do this by running the
clu_delete_member
command:
# clu_delete_member -m memberid
To learn the member ID of the member to be deleted, use the
clu_get_info
command.
See
Section 5.7
for details on using
clu_delete_member
.
Use the
clu_add_member
command to add the system
back into the cluster, specifying the desired member name, member IP
address, and cluster interconnect address.
For details on adding a member to the cluster, see the TruCluster Server Cluster Installation manual.
5.11 Managing Software Licenses
When you add a new member to a cluster, you must register application licenses on that member for those applications that may run on that member.
For information about adding new cluster members and
Tru64 UNIX licenses, see the chapter on adding members in the
TruCluster
Cluster Installation
manual.
5.12 Installing and Deleting Layered Applications
The procedure to install or delete an application is usually the same for both a cluster and a standalone system. Applications can be installed once in a cluster. However, some applications require additional steps.
Installing an application
If an application has member-specific configuration requirements, you might need to log on to each member where the application will run and configure the application. For more information, see the configuration documentation for the application.
Deleting an application
Before using
setld
to delete an application,
make sure that the application is not running.
This may require you
to stop the application on several members.
For example, for multi-instance
application, stopping the application may involve killing
daemons running on multiple cluster members.
For applications that are managed by CAA, use the following command to find out the status of the highly available applications:
# caa_stat
If the application to be deleted is running
(STATE=ONLINE
), stop it
and remove it from the CAA registry with the following commands:
# caa_stop application_name # caa_unregister application_name
After the application is stopped, delete it with the
setld
command.
Follow any application-specific
directions in the documentation for the application.
If the
application is installed on a member that is not currently available,
the application is automatically removed from the unavailable member when that
member rejoins the cluster.
5.13 Managing Accounting Services
The system accounting services are not cluster-aware. The services rely on files and databases that are member-specific. Because of this, to use accounting services in a cluster, you must set up and administer the services on a member-by-member basis.
The
/usr/sbin/acct
directory is a CDSL.
The
accounting services files in
/usr/sbin/acct
are specific to each cluster member.
To set up accounting services on a cluster, use the following modifications to the directions in the chapter on administering system accounting services in the Tru64 UNIX System Administration manual:
You must enable accounting services on each cluster member where you want accounting to run. To enable accounting on all cluster members, enter the following command on each member:
# rcmgr -c set ACCOUNTING YES
If you want to enable accounting on only certain members, use the
-h
option to the
rcmgr
command.
For example, to enable accounting on members 2, 3, and 6,
enter the following commands:
# rcmgr -h 2 set ACCOUNTING YES # rcmgr -h 3 set ACCOUNTING YES # rcmgr -h 6 set ACCOUNTING YES
You must start accounting on each member. Log in to each member where you want to start accounting, and enter the following command:
# /usr/sbin/acct/startup
To stop accounting on a member, you must log in to that
member and run the command
/usr/sbin/acct/shutacct
.
The directory
/usr/spool/cron
is a CDSL; the
files in this directory are member-specific, and you can use them to tailor
accounting on a per-member basis.
To do so, log in to each member
where accounting is to run.
Use the
crontab
command
to modify the
crontab
files as desired.
For more information, see the chapter on administering the system accounting
services in the Tru64 UNIX
System Administration
manual.
The file
/usr/sbin/acct/holidays
is a
CDSL.
Because of this, you set accounting service holidays
on a per-member basis.
For more information on accounting services,
see
acct
(8).