11    Troubleshooting Clusters

This chapter presents the following topics:

11.1    Resolving Problems

This section describes solutions to problems that can arise during the day-to-day operation of a cluster.

11.1.1    Booting Systems Without a License

You can boot a system that does not have a TruCluster Server license. The system joins the cluster and boots to multiuser mode, but only root can log in (with a maximum of two users). The cluster application availability (CAA) daemon, caad, is not started. The system displays a license error message reminding you to load the license. This policy enforces license checks while making it possible to boot, license, and repair a system during an emergency.

11.1.2    Shutdown Leaves Members Running

A cluster shutdown (shutdown -c) can leave one or more members running. In this situation, you must complete the cluster shutdown by shutting down all members.

Imagine a three-member cluster where each member has one vote and no quorum disk is configured. During cluster shutdown, quorum is lost when the second-to-last member goes down. If quorum checking is on, the last member running suspends all operations and cluster shutdown never completes.

To avoid an impasse in situations like this, quorum checking is disabled at the start of the cluster shutdown process. If a member fails to shut down during cluster shutdown, it might appear to be a normally functioning cluster member, but it is not, because quorum checking is disabled. You must manually complete the shutdown process.

The shutdown procedure depends on the state of the systems that are still running:

11.1.3    Dealing with CFS Errors at Boot

During system boot when the clusterwide root (/) is mounted for the first time, CFS can generate the following warning message:

"WARNING:cfs_read_advfs_quorum_data: cnx_disk_read failed with error-number
 

Usually error-number is the EIO value.

This message is accompanied by the following message:

"WARNING: Magic number on ADVFS portion of CNX partition on quorum disk \
is not valid"
 

These messages indicate that the booting member is having problems accessing data on the CNX partition of the quorum disk, which contains the device information for the cluster_root domain. This can occur if the booting member does not have access to the quorum disk, either because the cluster is deliberately configured this way or because of a path failure. In the former case, the messages can be considered informational. In the latter case, you need to adddress the cause of the path failure.

The messages can mean that there are problems with the quorum disk itself. If hardware errors are also being reported for the quorum disk, then replace it. For information on replacing a quorum disk, see Section 4.5.1.

For a description of error numbers, see errno(5). For a description of EIO, see errno(2).

11.1.4    Backing Up and Repairing a Member's Boot Disk

A member's boot disk contains three partitions. Table 11-1 presents some details about these partitions.

Table 11-1:  File Systems and Storage Differences

Partition Content
a Advanced File System (AdvFS) boot partition, member root file system (128 MB)
b Swap partition (all space between the a and h partitions)
h

CNX binary partition (1 MB)

AdvFS and Logical Storage Manager (LSM) store information critical to their functioning on the h partition. This information includes whether the disk is a member or quorum disk, and the name of the device where the cluster root file system is located.

If a member's boot disk is damaged or becomes unavailable, you need the h partition information to restore the member to the cluster. The clu_bdmgr command enables you to configure a member boot disk, and to save and restore data on a member boot disk.

The clu_bdmgr command can do the following tasks:

For specifics on the command, see clu_bdmgr(8).

Whenever a member boots, clu_bdmgr automatically saves a copy of the h partition of that member's boot disk. The data is saved in /cluster/members/memberID/boot_partition/etc/clu_bdmgr.conf.

As a rule, the h partitions on all member boot disks contain the same data. There are two exceptions to this rule:

If a member's boot disk is damaged, you can use clu_bdmgr to repair or replace it. Even if the cluster is not up, as long as you can boot the clusterized kernel on at least one cluster member, you can use the clu_bdmgr command.

For a description of how to add a new disk to the cluster, see Section 9.2.3.

To repair a member's boot disk, you must first have backed up the boot partition. One method is to allocate disk space in the shared /var file system for a dump image of each member's boot partition.

To save a dump image for member3's boot partition in the member-specific file /var/cluster/members/member3/boot_part_vdump, enter the following command:

# vdump -0Df /var/cluster/members/member3/boot_part_vdump \
/cluster/members/member3/boot_partition

11.1.4.1    Example of Recovering a Member's Boot Disk

The following sequence of steps shows how to use the file saved by vdump to replace a boot disk. The sequence makes the following assumptions:

Note

A member's boot disk should always be on a bus shared by all cluster members. This arrangement permits you to make repairs to any member's boot disk as long as you can boot at least one cluster member.

  1. Use clu_get_info to determine whether member3 is down:

    # clu_get_info -m 3
     
    Cluster memberid = 3
    Hostname = member3.zk3.dec.com
    Cluster interconnect IP name = member3-mc0
    Member state = DOWN
    

  2. Select a new disk (in this example, dsk5) as the replacement boot disk for member3. Because the boot disk for member3 is dsk3, you are instructed to edit member3's /etc/sysconfigtab so that dsk5 is used as the new boot disk for member3.

    To configure dsk5 as the boot disk for member3, enter the following command:

    # /usr/sbin/clu_bdmgr  -c  dsk5 3
     
    The new member's disk, dsk5, is not the same name as the original disk
    configured for domain root3_domain.  If you continue the following
    changes will be required in member3's/etc/sysconfigtab file:
            vm:
            swapdevice=/dev/disk/dsk5b
            clubase:
            cluster_seqdisk_major=19
            cluster_seqdisk_minor=175
     
    

  3. Mount member3's root domain (now on dsk5) so you can edit member3's /etc/sysconfigtab and restore the boot partitions:

    # mount root3_domain#root /mnt
     
    

  4. Restore the boot partition:

    # vrestore -xf /var/cluster/members/member3/boot_part_vdump -D /mnt
    

  5. Edit member3's /etc/sysconfigtab

    # cd /mnt/etc
    # cp sysconfigtab sysconfigtab-bu
     
    

    As indicated in the output from the clu_bdmgr command, change the values of the swapdevice attribute in the vm stanza and the cluster_seqdisk_major and cluster_seqdisk_minor attributes in the clubase stanza:

            swapdevice=/dev/disk/dsk5b
            clubase:
            cluster_seqdisk_major=19
            cluster_seqdisk_minor=175
     
    

  6. Restore the h partition CNX information:

    # /usr/sbin/clu_bdmgr -h  dsk5
    

    The h partition information is copied from the cluster member where you run the clu_bdmgr command to the h partition on dsk5.

    If the entire cluster is down, you need to boot one of the members from the clusterized kernel. After you have a single-member cluster running, you can restore the CNX h partition information to member3's new boot disk, dsk5, from /mnt/etc/clu_bdmgr.conf. Enter the following command:

    # /usr/sbin/clu_bdmgr -h  dsk5 /mnt/etc/clu_bdmgr.conf
    

  7. Unmount the root domain for member3:

    # umount root3_domain#root /mnt
    

  8. Boot member3 into the cluster.

  9. Optionally, use the consvar -s bootdef_dev disk_name command on member3 to set the bootdef_dev variable to the new disk.

11.1.5    Specifying cluster_root at Boot Time

At boot time you can specify the device that the cluster uses for mounting cluster_root, the cluster root file system. Use this feature only for disaster recovery, when you need to boot with a new cluster root.

The Cluster File System (CFS) kernel subsystem supports six attributes for designating the major and minor numbers of up to three cluster_root devices. Because the cluster_root domain that is being used for disaster recovery may consist of multiple volumes, you can specify one, two, or three cluster_root devices:

To use these attributes, shut down the cluster and boot one member interactively, specifying the appropriate cluster_root_dev major and minor numbers. When the member boots, the CNX partition (h partition) of the member's boot disk is updated with the location of the cluster_root devices. If the cluster has a quorum disk, its CNX partition is also updated. As other nodes boot into the cluster, their member boot disk information is also updated.

For example, assume that you want to use a cluster_root that is a two-volume file system that comprises dsk6b and dsk8g. Assume that the major/minor numbers of dsk6b are 19/227, and the major/minor numbers of dsk8g are 19/221. You boot the cluster as follows:

  1. Boot one member interactively:

    >>> boot -fl "ia"
     (boot dkb200.2.0.7.0 -flags ia)
     block 0 of dkb200.2.0.7.0 is a valid boot block
     reading 18 blocks from dkb200.2.0.7.0
     bootstrap code read in
     base = 200000, image_start = 0, image_bytes = 2400
     initializing HWRPB at 2000
     initializing page table at fff0000
     initializing machine state
     setting affinity to the primary CPU
     jumping to bootstrap code
     
     
    .
    .
    .
    Enter kernel_name [option_1 ... option_n] Press Return to boot default kernel 'vmunix':vmunix cfs:cluster_root_dev1_maj=19 \ cfs:cluster_root_dev1_min=227 cfs:cluster_root_dev2_maj=19 \ cfs:cluster_root_dev2_min=221[Return]  

  2. Boot the other cluster members.

For information about using these attributes to recover the cluster root file system, see Section 11.1.6 and Section 11.1.7.

11.1.6    Recovering the Cluster Root File System to a Disk Known to the Cluster

Use the recovery procedure described here when all of the following are true:

This procedure is based on the following assumptions:

To restore the cluster root, do the following:

  1. Boot the system with the base Tru64 UNIX disk.

    For the purposes of this procedure, we assume this system to be member 1.

  2. If this system's name for the device that will be the new cluster root is different than the name that the cluster had for that device, use the dsfmgr -m command to change the device name so that it matches the cluster's name for the device.

    For example, if the cluster's name for the device that will be the new cluster root is dsk6b and the system's name for it is dsk4b, rename the device with the following command:

    # dsfmgr -m dsk4 dsk6
    

  3. If necessary, partition the disk so that the partition sizes and file system types will be appropriate after the disk is the cluster root.

  4. Create a new domain for the new cluster root:

    # mkfdmn /dev/disk/dsk6d cluster_root
    

  5. Make a root fileset in the domain:

    # mkfset cluster_root root
    

  6. This restoration procedure allows for cluster_root to have up to three volumes. After restoration is complete, you can add additional volumes to the cluster root. For this example, we add only one volume, dsk6b:

    # addvol /dev/disk/dsk6b cluster_root
     
    

  7. Mount the domain that will become the new cluster root:

    # mount cluster_root#root /mnt
    

  8. Restore cluster root from the backup media. (If you used a backup tool other than vdump, use the appropriate restore tool in place of vrestore.)

    # vrestore -xf /dev/tape/tape0 -D /mnt
    

  9. Change /etc/fdmns/cluster_root in the newly restored file system so that it references the new device:

    # cd /mnt/etc/fdmns/cluster_root
    # rm *
    # ln -s /dev/disk/dsk6b
     
    

  10. Use the file command to get the major/minor numbers of the new cluster_root device. Make note of these major/minor numbers.

    For example:

    # file /dev/disk/dsk6b
    /dev/disk/dsk6b:        block special (19/221)
     
    

  11. Shut down the system and reboot interactively, specifying the device major and minor numbers of the new cluster root:

    >>> boot -fl "ia"
     (boot dkb200.2.0.7.0 -flags ia)
     block 0 of dkb200.2.0.7.0 is a valid boot block
     reading 18 blocks from dkb200.2.0.7.0
     bootstrap code read in
     base = 200000, image_start = 0, image_bytes = 2400
     initializing HWRPB at 2000
     initializing page table at fff0000
     initializing machine state
     setting affinity to the primary CPU
     jumping to bootstrap code
     
     
    .
    .
    .
    Enter kernel_name [option_1 ... option_n] Press Return to boot default kernel 'vmunix':vmunix cfs:cluster_root_dev1_maj=19 \ cfs:cluster_root_dev1_min=221[Return]  

  12. Boot the other cluster members.

11.1.7    Recovering the Cluster Root File System to a New Disk

The process of recovering cluster_root to a disk that was previously unknown to the cluster is complicated. Before you attempt it, try to find a disk that was already installed on the cluster to serve as the new cluster boot disk, and follow the procedure in Section 11.1.6.

Use the recovery procedure described here when:

This procedure is based on the following assumptions:

To restore the cluster root, do the following:

  1. Boot the system with the base Tru64 UNIX disk.

    For the purposes of this procedure, we assume this system to be member 1.

  2. If necessary, partition the new disk so that the partition sizes and file system types will be appropriate after the disk is the cluster root.

  3. Create a new domain for the new cluster root:

    # mkfdmn /dev/disk/dsk5b new_root
    

    As described in the TruCluster Server Cluster Installation guide, the cluster_root file system is often put on a b partition. In this case, /dev/disk/dsk5b is used for example purposes.

  4. Make a root fileset in the domain:

    # mkfset new_root root
    

  5. This restoration procedure allows for new_root to have up to three volumes. After restoration is complete, you can add additional volumes to the cluster root. For this example, we add one volume, dsk8e:

    # addvol /dev/disk/dsk8e new_root
    

  6. Mount the domain that will become the new cluster root:

    # mount new_root#root /mnt
    

  7. Restore cluster root from the backup media. (If you used a backup tool other than vdump, use the appropriate restore tool in place of vrestore.)

    # vrestore -xf /dev/tape/tape0 -D /mnt
    

  8. Copy the restored cluster databases to the /etc directory of the base Tru64 UNIX system:

    # cd /mnt/etc
    # cp dec_unid_db dec_hwc_cdb dfsc.dat /etc
    

  9. Copy the restored databases from the member-specific area of the current member to the /etc directory of the base Tru64 UNIX system:

    # cd /mnt/cluster/members/member1/etc
    # cp dfsl.dat /etc
    

  10. If one does not already exist, create a domain for the member boot disk:

    # cd /etc/fdmns
    # ls
    # mkdir root1_domain
    # cd root1_domain
    # ln -s /dev/disk/dsk2a
    

  11. Mount the member boot partition:

    # cd /
    # umount /mnt
    # mount root1_domain#root /mnt
    

  12. Copy the databases from the member boot partition to the /etc directory of the base Tru64 UNIX system:

    # cd /mnt/etc
    # cp dec_devsw_db dec_hw_db dec_hwc_ldb dec_scsi_db /etc
    

  13. Unmount the member boot disk:

    # cd /
    # umount /mnt
    

  14. Update the database .bak backup files:

    # cd /etc
    # for f in dec_*db ; do cp $f $f.bak ; done
    

  15. Reboot the system into single-user mode using the same base Tru64 UNIX disk so that it will use the databases that you copied to /etc.

  16. After booting to single-user mode, scan the devices on the bus:

    # hwmgr -scan scsi
    

  17. Remount the root as writable:

    # mount -u /
    

  18. Verify and update the device database:

    # dsfmgr -v -F
    

  19. Use hwmgr to learn the current device naming.

    # hwmgr -view devices
    

  20. If necessary, update the local domains to reflect the device naming (especially usr_domain, new_root, and root1_domain).

    Do this by going to the appropriate /etc/fdmns directory, deleting the existing link and creating new links to the current device names. (You learned the current device names in the previous step.) For example:

    # cd /etc/fdmns/root_domain
    # rm *
    # ln -s /dev/disk/dsk1a
    # cd /etc/fdmns/usr_domain
    # rm *
    # ln -s /dev/disk/dsk1g
    # cd /etc/fdmns/root1_domain
    # rm *
    # ln -s /dev/disk/dsk2a
    # cd /etc/fdmns/new_root
    # rm *
    # ln -s /dev/disk/dsk5b
    # ln -s /dev/disk/dsk8e
     
    

  21. Run the bcheckrc command to mount local file systems, particularly /usr:

    #  bcheckrc
    

  22. Copy the updated cluster database files onto the cluster root:

    # mount new_root#root /mnt
    # cd /etc
    # cp dec_unid_db* dec_hwc_cdb* dfsc.dat /mnt/etc
    # cp dfsl.dat /mnt/cluster/members/member1/etc
     
    

  23. Update the cluster_root domain on the new cluster root:

    # rm /mnt/etc/fdmns/cluster_root/*
    # cd /etc/fdmns/new_root
    # tar cf - * | (cd /mnt/etc/fdmns/cluster_root && tar xf -)
     
    

  24. Copy the updated cluster database files to the member boot disk:

    # umount /mnt
    # mount root1_domain#root /mnt
    # cd /etc
    # cp dec_devsw_db* dec_hw_db* dec_hwc_ldb* dec_scsi_db* /mnt/etc
     
    

  25. Use the file command to get the major/minor numbers of the cluster_root devices. Write down these major/minor numbers for use in the next step.

    For example:

    # file /dev/disk/dsk5b
    /dev/disk/dsk5b:        block special (19/227)
    # file /dev/disk/dsk8e
    /dev/disk/dsk8e:        block special (19/221)
    

  26. Halt the system and reboot interactively, specifying the device major and minor numbers of the new cluster root:

    >>> boot -fl "ia"
     (boot dkb200.2.0.7.0 -flags ia)
     block 0 of dkb200.2.0.7.0 is a valid boot block
     reading 18 blocks from dkb200.2.0.7.0
     bootstrap code read in
     base = 200000, image_start = 0, image_bytes = 2400
     initializing HWRPB at 2000
     initializing page table at fff0000
     initializing machine state
     setting affinity to the primary CPU
     jumping to bootstrap code
     
     
    .
    .
    .
    Enter kernel_name [option_1 ... option_n] Press Return to boot default kernel 'vmunix':vmunix cfs:cluster_root_dev1_maj=19 \ cfs:cluster_root_dev1_min=227 cfs:cluster_root_dev2_maj=19 \ cfs:cluster_root_dev1_min=221[Return]  

  27. Boot the other cluster members.

    If during boot you encounter errors with device files, run the command dsfmgr -v -F.

11.1.8    Dealing with AdvFS Problems

This section describes some problems that can arise when you use AdvFS.

11.1.8.1    Responding to Warning Messages from addvol or rmvol

Under some circumstances, using addvol or rmvol on the cluster_root domain can cause the following warning message:

"WARNING:cfs_write_advfs_root_data: cnx_disk_write failed for quorum disk with error-number."
 

Usually error-number is the EIO value.

This message indicates that the member where the addvol or rmvol executed cannot write to the CNX partition of the quorum disk. The CNX partition contains device information for the cluster_root domain.

The warning can occur if the member does not have access to the quorum disk, either because the cluster is deliberately configured this way or because of a path failure. In the former case, the message can be considered informational. In the latter case, you need to adddress the cause of the path failure.

The message can mean that there are problems with the quorum disk itself. If hardware errors are also being reported for the quorum disk, then replace the disk. For information on replacing a quorum disk, see Section 4.5.1.

For a description of error numbers, see errno(5). For a description of EIO, see errno(2).

11.1.8.2    Resolving AdvFS Domain Panics Due to Loss of Device Connectivity

AdvFS can domain panic if one or more storage elements containing a domain or fileset become unavailable. The most likely cause of this problem is when a cluster member is attached to private storage that is used in an AdvFS domain, and that member leaves the cluster. A second possible cause is when a storage device has hardware trouble that causes it to become unavailable. In either case, because no cluster member has a path to the storage, the storage is unavailable and the domain panics.

Your first indication of a domain panic is likely to be I/O errors from the device, or panic messages written to the system console. Because the domain might be served by a cluster member that is still up, CFS commands such as cfsmgr -e might return a status of OK and not immediately reflect the problem condition.

# ls -l /mnt/mytst
/mnt/mytst: I/O error
 
# cfsmgr -e
Domain or filesystem name = mytest_dmn#mytst
Mounted On = /mnt/mytst
Server Name = deli
Server Status : OK
 

If you are able to restore connectivity to the device and return it to service, use the cfsmgr command to relocate the affected filesets in the domain to the same member that served them before the panic (or to another member) and then continue using the domain.

# cfsmgr -a SERVER=provolone -d mytest_dmn
 
# cfsmgr -e
Domain or filesystem name = mytest_dmn#mytests
Mounted On = /mnt/mytst
Server Name = provolone
Server Status : OK
 

11.1.8.3    Forcibly Unmounting an AdvFS File System or Domain

If you are not able to restore connectivity to the device and return it to service, TruCluster Server Version 5.1A includes the cfsmgr -u command that you can use to forcibly unmount an AdvFS file system or domain that is not being served by any cluster member. The unmount is not performed if the file system or domain is being served.

How you invoke this command depends on how the Cluster File System (CFS) currently views the domain:

If there are nested mounts on the file system being unmounted, the forced unmount is not performed. Similarly, if there are nested mounts on any fileset when the entire domain is being forcibly unmounted, and the nested mount is not in the same domain, the forced unmount is not performed.

For detailed information on the cfsmgr command, see cfsmgr(8).

11.1.8.4    Avoiding Domain Panics

The AdvFS graphical user interface (GUI) agent, advfsd, periodically scans the system disks. If a metadata write error occurs, or if corruption is detected in a single AdvFS file domain, the advfsd daemon initiates a domain panic (rather than a system panic) on the file domain. This isolates the failed domain and allows a system to continue to serve all other domains.

From the viewpoint of the advfsd daemon running on a member of a cluster, any disk that contains an AdvFS domain and becomes inaccessible can trigger a domain panic. In normal circumstances, this is expected behavior. To diagnose such a panic, follow the instructions in the chapter on troubleshooting in the Tru64 UNIX AdvFS Administration manual. However, if a cluster member receives a domain panic because another member's private disk becomes unavailable (for instance, when that member goes down), the domain panic is an unnecessary distraction.

To avoid this type of domain panic, edit each member's /usr/var/advfs/daemon/disks.ignore file so that it lists the names of disks on other members' private storage that contain AdvFS domains. This will stop the advfsd daemon on the local member from scanning these devices.

To identify private devices, use the sms command to invoke the graphical interface for the SysMan Station, and then select Hardware from the Views menu.

11.1.9    Accessing Boot Partitions on Down Systems

When a member leaves the cluster, either cleanly through a shutdown or in an unplanned fashion, such as a panic, that member's boot partition is unmounted. If the boot partition is on the shared bus, any other member can gain access to the boot partition by mounting it.

Suppose the system provolone is down and you want to edit provolone's /etc/sysconfigtab. You can enter the following commands:

# mkdir /mnt
# mount root2_domain#root /mnt
 

Before rebooting provolone, you must unmount root2_domain#root. For example:

# umount root2_domain#root
 

11.1.10    Booting a Member While Its Boot Disk Is Already Mounted

Whenever the number of expected quorum votes or the quorum disk device is changed, the /etc/sysconfigtab file for each member is updated. In the case where a cluster member is down, the cluster utilities that affect quorum (clu_add_member, clu_quorum, clu_delete_member, and so forth) mount the down member's boot disk and make the update. If the down member tries to boot while its boot disk is mounted, it receives the following panic:

cfs_ mountroot: CFS server already exists for this nodes boot partition

The cluster utilities do the right thing and unmount the down member's boot disk after they complete the update.

In general, attempting to boot a member while another member has the first member's boot disk mounted causes the panic. For example, if you mount a down member's boot disk in order to make repairs, you generate the panic if you forget to unmount the boot disk before booting the repaired member.

11.1.11    Generating Crash Dumps

If a serious cluster problem occurs, crash dumps might be needed from all cluster members. To get crash dumps from functioning members, use the dumpsys command, which saves a snapshot of the system memory to a dump file.

To generate the crash dumps, log in to each running cluster member and run dumpsys. By default, dumpsys writes the dump to the member-specific directory /var/adm/crash.

For more information, see dumpsys(8).

11.1.12    Fixing Network Problems

This section describes potential networking problems in a cluster and solutions to resolve them.

Symptoms

Things to Verify

After you have made sure that the entries in /etc/rc.config and /etc/hosts are correct and have fixed any other problems, try stopping and then restarting the gateway and inet daemons. Do this by entering the following commands on each cluster member:

# /sbin/init.d/gateway stop
# /sbin/init.d/gateway start
 

11.1.13    Running routed in a Cluster

Although it is technically possible to run routed in a cluster, doing so can cause the loss of failover support in the event of a cluster member failure. Running routed is considered a misconfiguration of the cluster and generates console and Event Manager (EVM) warning messages.

The only supported router is gated.

11.2    Hints for Managing Clusters

This section contains hints and suggestions for configuring and managing clusters.

11.2.1    Moving /tmp

By default, member-specific /tmp areas are in the same file system, but they can be moved to separate file systems. In some cases, you may want to move each member's /tmp area to a disk local to the member in order to reduce traffic on the shared SCSI bus.

If you want a cluster member to have its own /tmp directory on a private bus, you can create an AdvFS domain on a disk on the bus local to that cluster member and add an entry in /etc/fstab for that domain with a mountpoint of /tmp.

For example, the following /etc/fstab entries are for the /tmp directories for two cluster members, tcr58 and tcr59, with member IDs of 58 and 59, respectively.

    tcr58_tmp#tmp   /cluster/members/member58/tmp   advfs rw 0 0
    tcr59_tmp#tmp   /cluster/members/member59/tmp   advfs rw 0 0

The tcr58_tmp domain is on a bus that only member tcr58 has connectivity to. The tcr59_tmp domain is on a disk that only member tcr59 has connectivity to.

When each member boots, it attempts to mount all file systems in /etc/fstab but it can mount only those domains that are not already mounted and for which a path to the device exists. In this example, only tcr58 can mount tcr58_tmp#tmp and only tcr59 can mount tcr59_tmp#tmp.

You could have put the following in /etc/fstab:

    tcr58_tmp#tmp   /tmp    advfs rw 0 0
   tcr59_tmp#tmp   /tmp    advfs rw 0 0

Because /tmp is a context-dependent symbolic link (CDSL), it will be resolved to /cluster/members/membern/tmp. However, putting the full pathname in /etc/fstab is clearer and less likely to cause confusion.

11.2.2    Running the MC_CABLE Console Command

All members must be shut down to the console prompt before you run the MC_CABLE Memory Channel diagnostic command on any member. This is normal operation.

Running the MC_CABLE command from the console of a down cluster member when other members are up crashes the cluster.

11.2.3    Korn Shell Does Not Record True Path to Member-Specific Directories

The Korn shell (ksh) remembers the path that you used to get to a directory and returns that pathname when you enter a pwd command. This is true even if you are in some other location because of a symbolic link somewhere in the path. Because TruCluster Server uses CDSLs to maintain member-specific directories in a clusterwide namespace, the Korn shell does not return the true path when the working directory is a CDSL.

If you depend on the shell interpreting symbolic links when returning a pathname, use a shell other than the Korn shell. For example:

# ksh
# ls -l /var/adm/syslog
lrwxrwxrwx   1 root system  36 Nov 11 16:17 /var/adm/syslog
->../cluster/members/{memb}/adm/syslog
# cd /var/adm/syslog
# pwd
/var/adm/syslog
# sh
# pwd
/var/cluster/members/member1/adm/syslog