This chapter explains some of the procedures you can follow to recover from errors.
There are a several steps that you can take to prevent loss of data and to make it easier to recover your system, in case of failure:
Backups are necessary, in case all copies of a volume are lost or corrupted in some way. For example, a power surge could damage several (or all) disks on your system. See Section 7.3.5 for information on how you can use the volassist command to reduce backup downtime.
The volassist utility locates the plexes such that the loss of one disk will not result in a loss of data. Note that you can edit the file /etc/default/volassist to set the default number of plexes for newly created volumes to two.
LSM provides the volwatch, volnotify, and voltrace commands to monitor LSM events and configuration changes.
The volwatch shell script is started automatically when you install LSM. This script sends mail to the root login when certain LSM configuration events occur, such as a plex detach caused by a disk failure.
The volwatch script sends mail to root by default. You can specify another login as the mail recipient.
If you need to restart volwatch, use the following command:
#
volwatch root
The volnotify command is useful for monitoring disk and configuration changes and for creating customized scripts similar to /usr/sbin/volwatch.
The voltrace command provides a trace of physical or logical I/O events or error events.
For further information, refer to the volnotify(8), volwatch(8), and voltrace(8) reference pages.
The following sections describe some of the more common problems that LSM users might encounter and suggests corrective actions.
When an LSM command fails to execute, LSM may display the following message:
Volume daemon is not accessible
This message often means that the volume daemon vold is not running.
To correct the problem, try to restart vold. Refer to Section 14.4 for detailed instructions.
If the vold daemon fails to restart (either during system reboot or from the command line), the following message may be displayed:
lsm:vold: Error: enable failed: Error in disk group configuration copies
No valid disk found containing disk group; transactions are disabled.
This message could imply that the /etc/vol/volboot file has no valid disks that are in the rootdg diskgroup.
To correct the problem, update the /etc/vol/volboot file by adding disks that belong to the rootdg disk group and have a configuration copy. Then, restart vold. For example:
#
voldctl add disk rz8h
#
voldctl add disk rz9
#
vold -k
If I/O to a LSM volume or mirroring of a LSM volume does not complete, check whether or not the LSM error daemon, voliod, is running on the system. Refer to Section 14.5 for details.
When creating a new volume or adding a disk, the operation may fail with the following message:
No more space in disk group configuration
This often means that you are out of room in the disk group's configuration database. Refer to Section 6.3.8 and Section 6.3.9 for more information.
If a file system cannot be mounted or an open function on an LSM volume fails, check if errno is set to EBADF. This could mean that the LSM volume is not started.
Use the volinfo command to determine whether or not the volume is started. For example:
#
volinfo -g rootdg
vol1 fsgen Startable
vol-rz3h fsgen Started
vol2 fsgen Started
swapvol1 gen Started
rootvol root Started
swapvol swap Started
To start volume vol1 you would enter the following command:
#
volume -g rootdg start vol1
Refer to
Section 7.6.4
and
Section 14.9
for further information
Before any LSM operations can be performed, the vold daemon must be running. Typically, the vold daemon is configured to start automatically during the reboot procedure. Perform the following steps to determine the state of the volume daemon:
#
voldctl mode
If... | Then... |
The vold daemon is both running and enabled |
The following message displays:
mode:enabled |
The vold daemon is running, but is not enabled |
The following message displays:
mode:disabled |
The vold daemon is not running |
The following message displays:
mode:not-running |
#
voldctl enable
#
vold
For additional information about the
vold
daemon, refer to the
vold(8)
reference page.
Volume log I/O voliod kernel threads are started by the vold daemon (if block-change logging is enabled) and are killed by the kernel when these threads are no longer needed. Volume error kernel threads are started automatically by LSM startup procedures. Rebooting after your initial installation should start the voliod error daemon automatically.
Note
Digital recommends that there be at least as many voliod error daemons as the number of processors on the system.
You can perform these steps to determine the state of the error daemon:
#
voliod
If... | Then... |
Any voliod processes are running |
The following message displays:
n "volume I/O daemons running"The n symbol in the previous example indicates the number of voliod daemons running. |
There are no voliod daemons currently running |
Start some daemons by entering the following command:
#
voliod set 2
|
#
voliod set 2
For more detailed information about the voliod daemon, refer to the voliod(8) reference page.
The following sections describe two recovery procedures you can try if problems occur during the encapsulation procedure described in Section 5.2.
If something goes wrong during the conversion from the root partition to the root LSM volume, the encapsulation procedure tries to back out all changes made, and restores the use of partitions for the root file system. Under some circumstances, you might need to manually undo the changes made as a result of encapsulating the root disk.
The following steps describe how to manually reset the changes made during root encapsulation:
#
voldctl -z
#
mount -u /dev/rzXa /
#
mount -u /
#
cd /etc/fdmns/root_domain
#
rm rootvol
#
ln -s /dev/rzxa rzxa
Change the primary swap device from /dev/vol/rootdg/swapvol to the block-device file of the swap partition.
lsm_rootdev_is_volume = 1
lsm_swapdev_is_volume = 1
to
lsm_rootdev_is_volume = 0
lsm_swapdev_is_volume = 0
#
mv /sbin/swapdefault /sbin/swapdefault.swapvol
#
ln -s /dev/rz8b /sbin/swapdefault
#
rm -rf /etc/vol/reconfig.d/disk.d/*
#
rm -rf /etc/vol/reconfig.d/
disks-cap-part
If you encounter problems in which booting to multiuser mode is impossible, you can use the following steps to allow booting from the physical disk partition, so that you can perform maintenance to fix the problem:
When the boot disk is mirrored, failures occurring on the original boot disk are transparent to all users. However, during a failure, the system might do one or both of the following:
To reboot the system before the original boot disk is repaired, you can boot from any disk that contains a valid root and swap volume plex. Chapter 5 shows how to set an alternate boot device from your system console.
If all copies of rootvol are corrupted, and you cannot boot the system, you must reinstall the system. Refer to Section 14.11 for details. docroff: ignoring superfluous symbol replace_failed_disks
Normally, replacing a failed disk is as simple as putting a new disk somewhere on the controller and running LSM replace disk commands. It's even possible to move the data areas from that disk to available space on other disks, or to use a "hot spare" disk already on the controller to replace the failure. For data that is not critical for booting the system, it doesn't matter where the data is located. All data that is not boot critical is only accessed by LSM after the system is fully operational. LSM can find this data for you. On the other hand, boot-critical data must be placed in specific areas on specific disks in order for the boot process to find it.
When a disk fails, there are two possible routes that can be taken to correct the action. If the errors are transient or correctable, then the same disk can be re-used. This is known as re-adding a disk. On the other hand, if the disk has truly failed, then it should be completely replaced.
Re-adding a disk is the same procedure as replacing a disk, except that the same physical disk is used. Usually, a disk that needs to be re-added has been detached, meaning that LSM has noticed that the disk has failed and has ceased to access it.
If the boot disk has a transient failure, its plexes can be recovered using the following steps. The rootvol and swapvol volumes can have two or three LSM disks per physical disk, depending on the layout of the original root disk.
#
voldisk list
DEVICE TYPE DISK GROUP STATUS rz10 sliced - - error rz10b nopriv - - error rz10f nopriv - - error rz21 sliced rz21 rootdg online rz21b nopriv rz21b rootdg online - - rz10 rootdg removed was:rz10 - - rz10b rootdg removed was:rz10b - - rz10f rootdg removed was:rz10f
In this example, if rz10 was the failed boot disk, then you can assume that rz10, rz10b, and rz10f are the LSM disks associated with the physical disk rz10.
#
voldisk online rz10 rz10b rz10f
#
voldg -k adddisk rz10=rz10
#
voldg -k adddisk rz10b=rz10b
#
voldg -k adddisk rootrz10=rz10f
#
volrecover -sb rootvol swapvol
If a boot disk that is under LSM control fails and you are replacing it with a new disk, perform the following steps:
If a disk is unavailable when the system is running, any plexes of volumes that reside on that disk will become stale, meaning the data on that disk is out of date relative to the other plexes of the volume.
During the boot process, the system accesses only one copy of the root and swap volumes (the copies on the boot disk) until a complete configuration for those volumes can be obtained. If it turns out that the plex of one of these volumes that was used for booting is stale, the system must be rebooted from a backup boot disk that contains nonstale plexes. This problem can occur, for example, if the boot disk was replaced and restarted without adding the disk back into the LSM configuration. The system will boot normally, but the plexes that reside on the newly powered disk will be stale.
Another possible problem can occur if errors in the LSM headers on the boot disk prevents LSM from properly identifying the disk. In this case, LSM will be unable to know the name of that disk. This is a problem because plexes are associated with disk names, and therefore any plexes on that disk are unusable.
If either of these situations occurs, the LSM daemon vold will notice it when it is configuring system as part of the init processing of the boot sequence. It will output a message describing the error, describe what can be done about it, and halt the system. For example, if the plex rootvol-01 of the root volume rootvol on disk disk01 of the system was stale, vold would print the following message:
lsm:vold: Warning Plex rootvol-01 for root volume is stale or unusable. lsm:vold: Error: System boot disk does not have a valid root plex Please boot from one of the following disks:
Disk: disk02 Device: rz2
lsm:vold: Error: System startup failed
This informs the administrator that the disk disk02 contains usable copies of the root and swap plexes and should be used for booting. This is the name of the system backup disk. When this message appears, the administrator should reboot the system from a backup boot disk.
Once the system has booted, the exact problem needs to be determined. If the plexes on the boot disk were simply stale, they will be caught up automatically as the system comes up. If, on the other hand, there was a problem with the private area on the disk, the administrator will need to re-add or replace the disk.
If the plexes on the boot disk were unavailable, the administrator should get mail from the LSM volwatch utility describing the problem. Another way to discover the problem is by listing the disks with the voldisk utility. In the previous example, if the problem is a failure in the private area of disk01 (such as due to media failures or accidentally overwriting the LSM private region on the disk), enter the following command:
#
voldisk list
This command produces the following output:
DEVICE TYPE DISK GROUP STATUS - - disk02 rootdg failed was: rz1 rz2 sliced disk02 rootdg online
If a system failure occurs, the system console writes a crash dump to the boot disk. However, if the original boot disk has had a problem such that the corresponding plex in the root or swap volumes has been disabled, then the crash dump is written to the first available plex in the swap volume. The system reports the name of the disk that has the crash dump by printing a message on the system console.
For example, the following messages are printed to the console along with other dump information:
WARNING: LSM: Original dump device not found
LSM attempting to dump to SCSI device unit number rz1
To obtain the crash dump when the system reboots, you must boot the system from the disk that contains the crash dump.
The following sections describe recovery procedures for problems related to LSM disks.
If one plex of a volume encounters a disk I/O failure (for example, because the disk has an uncorrectable format error), one of the the following may happen:
If a plex is detached, I/O stops on that plex but continues on the remaining plexes of the volume.
If a disk is detached, all plexes on the disk are disabled. If there are any unmirrored volumes on a disk when it is detached, those volumes are disabled as well.
If a volume, a plex, or a disk is detached by failures, the volwatch(8) utility sends mail to root indicating the failed objects. For example, if a disk containing two mirrored volumes fails you might receive a mail message similar to the following:
To: root Subject: Logical Storage Manager failures on mobius.lsm.com
Failures have been detected by LSM on host mobius.lsm.com:
failed plexes: home-02 src-02
No data appears to have been lost. However, you should replace the drives that have failed.
To determine which disks are causing the failures in this message, enter the following command:
#
volstat -sff home-02 src-02
This produces output such as the following:
FAILED TYP NAME READS WRITES sd disk01-04 0 0 sd disk01-06 0 0 sd disk02-03 1 0 sd disk02-04 1 0
This display indicates that the failures are on disk02 (the basename for the displayed subdisks).
Sometimes these errors are caused by cabling failures. You should look at the cables connecting your disks to your system. If there are any obvious problems, correct them and recover the plexes with the following command:
#
volrecover -b home src
This command starts a recovery of the failed plexes in the background (the command returns before the operation is done). If an error message appears later, or if the plexes become detached again, replace the disk.
If you do not see any obvious cabling failures, then the disk probably needs to be replaced.
If a disk fails completely, the mail message will list the disks that have failed, all plexes that use the disk, and all volumes defined on the disk that was disabled because the volumes were not mirrored. For example:
To: root Subject: Logical Storage Manager failures on mobius.lsm.com
Failures have been detected by LSM on host mobius.lsm.com:
failed disks: disk02
failed plexes: home-02 src-02 mkting-01
failed volumes: mkting
The contents of failed volumes may be corrupted, and should be restored from any available backups. To restart one of these volumes so that you can restore it from backup, replace disks as appropriate then use the command:
volume -f start <volume-name>
You can then restore or recreate the volume.
This message indicates that disk02 was detached by a failure; that plexes home-02, src-02, and mkting-01 were also detached (probably because of the failure of the disk); and that the volume mkting was disabled.
Again, the problem may be a cabling error. If the problem is not a cabling error, then you must replace the disk.
Disks that have failed completely, and that have been detached by failure, can be replaced by running the voldiskadm menu utility and selecting item 5, Replace a failed or removed disk, from the main menu. If you have any disks that are initialized for LSM but have never been added to a disk group, you can select one of those disks as a replacement. Do not choose the old disk drive as a replacement even though it may appear in the selection list. If there are no suitable initialized disks, you can choose to initialize a new disk.
If a disk failure caused a volume to be disabled, then the volume must be restored from backup after the disk is replaced. To identify volumes that wholly reside on disks that were disabled by a disk failure, use the volinfo command.
Any volumes that are listed as Unstartable must be restored from backup. For example, the volinfo command might display:
home fsgen Started mkting fsgen Unstartable src fsgen Started
To restart volume mkting so that it can be restored from backup, use the following command:
#
volume -obg -f start mkting
The -obg option causes any plexes to be recovered in a background task.
Often a disk has recoverable (soft) errors before it fails completely. If a disk is getting an unusual number of soft errors, replace it. This involves two steps:
To detach the disk, run voldiskadm and select item 4, Remove a disk for replacement, from the main menu. If there are initialized disks available as replacements, you can specify the disk as part of this operation. Otherwise, you must specify the replacement disk later by selecting item 5, Replace a failed or removed disk, from the main menu.
When you select a disk to remove for replacement, all volumes that will be affected by the operation are displayed. For example, the following output might be displayed:
The following volumes will lose mirrors as a result of this operation:
lhome src
No data on these volumes will be lost.
The following volumes are in use, and will be disabled as a result of this operation:
mkting
Any applications using these volumes will fail future accesses. These volumes will require restoration from backup.
Are you sure you want do do this? [y,n,q,?] (default: n)
If any volumes would be disabled, quit from voldiskadm and save the volume. Either back up the volume or move the volume off of the disk. To move the volume mkting to a disk other than disk02, use the command:
#
volassist move mkting disk02
After the volume is backed up or moved, run voldiskadm again and continue to remove the disk for replacement.
After the disk has been removed for replacement, specify a replacement disk by selecting item 5, Replace a failed or removed disk, from the main menu in voldiskadm.
Refer to Section C.10 for examples of how to replace disks.
In LSM Version 1.0, disks added to LSM skip physical block 0 and start at block 1 because block 0 contains the disk label and is write-protected.
Starting with LSM Version 1.1, disks added to LSM start at physical block 16 for performance reasons with certain disks. To start a disk at physical block 1 instead of block 16, use the disklabel command to modify the partition start offset and length accordingly before adding the disk to LSM.
For example:
#
disklabel -e /dev/rrz16c
#
voldisk init rz16 type=sliced
Refer to the disklabel(8) reference page for details.
The following sections describe recovery procedures for problems relating to LSM volumes.
An unstartable volume is likely to be incorrectly configured or has other errors or conditions that prevent it from being started. To display unstartable volumes, use the volinfo command, which displays information on the accessibility and usability of one or more volumes:
#
volinfo -g
diskgroup
[
volname
]
If a system crash or an I/O error corrupts one or more plexes of a volume and no plex is CLEAN or ACTIVE, mark one of the plexes CLEAN and instruct the system to use that plex as the source for reviving the others. To place a plex in a CLEAN state, use the following command:
#
volmend fix clean
plex_name
For example, the command line to place one plex labeled vol01-02 in the CLEAN state looks like this:
#
volmend fix clean vol01-02
Refer to the volmend(8) reference pages for more information.
If you used the volsave command to save a copy of your configuration, you can use the volrestore command to restore the configuration. This section describes problems that may arise in restoring a configuration.
See Section 7.4 and Section 7.5 for information on volsave and volrestore. See Appendix C for examples of handling restore failures.
When volrestore executes, it can encounter conflicts in the LSM configuration, for example, if another volume uses the same plex name or subdisk name, or the same location on a disk. When volrestore finds a conflict, it displays error messages and the configuration of the volume, as found in the saved LSM description set. In addition, it removes all volumes created in that disk group during the restoration. The disk group that had the conflict remains imported, and volrestore continues to restore other disk groups.
If volrestore fails because of a conflict, you can use the -b option to do the "best possible" restoration in a disk group. You will then have to resolve the conflicts and restore the volumes in the affected disk group.
See Section C.26 for further information and examples.
The restoration of volumes fails if one or more disks associated with the volumes are unavailable, for example due to disk failure. This, in turn, can cause the restoration of a disk group to fail. You can use a command like the following to restore the LSM configuration of a disk group:
#
volrestore -b -g
diskgroup
The volumes associated with the failed disks can then be restored by editing the volmake description file to remove the plexes that use the failed disks. Note that editing the description file will affect the checksum of the files in the backup directory, so you will have to override the checksum validation by using the -f option.
See Section C.26 for further information and examples.
Occasionally, your system may need to be reinstalled after some types of failures. Reinstallation is necessary if all copies of your root (boot) disk are damaged, or if certain critical files are lost due to file system damage. When a failure of either of these types occurs, you must reinstall the entire system.
If these types of failures occur, attempt to preserve as much of the original LSM configuration as possible. Any volumes not directly involved in the failure may be saved. You do not have to reconfigure any volumes that are preserved.
The following sections describe the procedures used to reinstall LSM and preserve as much of the original configuration as possible after a failure.
A system reinstallation completely destroys the contents of any disks that are reinstalled. Any LSM related information, such as data in the LSM private areas on reinstalled disks (containing the disk identifier and copies of the LSM configuration), is removed during reinstallation. The removal of this information makes the disk unusable as an LSM disk.
If a disk was placed under LSM control (either during the LSM installation or by later encapsulation), that disk and any volumes on it are lost during reinstallation. If a disk was not under LSM control before the failure, no volumes are lost at reinstallation. You can replace any other disks by following the procedures in Section 9.2.6,
When reinstallation is necessary, the only volumes saved are those that reside on, or have copies on, disks that are not directly involved with the failure, the reinstallation, or both; volumes on disks involved with the failure or reinstallation are lost during reinstallation. If backup copies of these volumes are available, you can restore them after reinstallation. The system root disk is always involved in reinstallation. Other disks may also be involved.
If the root disk was placed under LSM control by encapsulation, that disk and any volumes or volume plexes on it are lost during reinstallation. In addition, any other disks that are involved in the reinstallation (or that are removed and replaced), also lose any LSM data (including volumes and plexes).
If a disk (including the root disk) is not under LSM control prior to the failure, no volumes are lost at reinstallation. Although having the root disk under LSM control simplifies the recovery process after reinstallation, not having the root disk under LSM control increases the likelihood of a reinstallation being necessary. Having the root disk under LSM control, and creating plexes of the root disk contents, eliminates many of the problems that require system reinstallation.
To reinstall the system and recover the LSM configuration you need to perform the following procedures:
Each of these procedures is described in detail in the sections that follow.
To prevent the loss of data on disks not involved in the reinstallation, you should only involve the root disk in the reinstallation procedure. It is recommended that any other disks (that contain volumes) be disconnected from the system before you start the reinstallation procedure. Disconnecting the other disks ensures that they are unaffected by the reinstallation. For example, if the operating system was originally installed with a file system on the second drive, the file system may still be recoverable. Removing the second drive ensures that the file system remains intact.
Once any failed or failing disks have been replaced and disks uninvolved with the reinstallation have been detached, reinstall the operating system as described in the Installation Guide.
While the operating system installation progresses, make sure no disks other than the root disk are accessed in any way. If anything is written on a disk other than the root disk, the LSM configuration on that disk could be destroyed.
Once the LSM subsets have been loaded, recover the LSM configuration by doing the following:
#
volinstall
#
shutdown now
#
rm -rf /etc/vol/reconfig.d/state.d/install-db
#
/sbin/voliod set 2
#
/sbin/vold -m disable
If a saved copy of /etc/vol/volboot does not exist, initialize /etc/vol/volboot by entering:
#
voldctl init
Add one or more disks that have configuration databases to the /etc/vol/volboot file. You must do this otherwise LSM cannot restart after a reboot.
To reenable the previous LSM configuration, you need to determine the name of one of the disks that was in the rootdg disk group. If you do not know the name of one of the disks, you can scan the disk label on the disks available on the system for LSM disk label tags such as LSMpubl or LSMsimp. If you find the LSMpubl disk label tag on a disk, add the disk as an LSM sliced disk. If you find the LSMsimp disk label tag, add the partition as an LSM simple disk.
#
voldctl add disk rz3
#
voldctl enable
#
volrecover -sb
The configuration preserved on the disks not involved with the reinstallation has now been recovered. However, because the root disk has been reinstalled, it appears to LSM as a non-LSM disk. Therefore, the configuration of the preserved disks does not include the root disk as part of the LSM configuration.
Note
If the root disk of your system and any other disk involved in the reinstallation were not under LSM control at the time of failure and reinstallation, then the reconfiguration is complete at this point. If any other disks containing volumes or volume plexes are to be replaced, follow the replacement procedures in Chapter 6. There are several methods available to replace a disk. Choose the method that you prefer.
If the root disk (or another disk) was involved with the reinstallation, any volume or volume plexes on that disk (or other disks no longer attached to the system) are now inaccessible. If a volume had only one plex (contained on a disk that was reinstalled, removed, or replaced), then the data on that the volume is lost and must be restored from backup. In addition, the system's root file system and swap area are not located on volumes any longer. To correct these problems, follow the instructions in Section 14.11.6.
The following sections describe how to clean up the configuration of your system after reinstallation of LSM.
To clean up the LSM configuration, remove any volumes associated with rootability, and their associated disks. This must be done if the root disk was under LSM control prior to installation. The volumes to remove are:
Follow these steps:
#
volume stop rootvol
#
voledit -r rm rootvol
For example, if disk rz3 was associated with rootvol and disk rz3b was associated with swapvol, you would enter the following commands:
#
voldg rmdisk rz3 rz3b
#
voldisk rm rz3 rz3b
If /usr and /var were on LSM volumes prior to the reinstallation, clean up the volumes using the voledit command similar to the previous example shown for rootvol. Remove the LSM disks associated with the volumes used for /usr and /var.
After completing the rootability cleanup, you must determine which volumes need to be restored from backup. The volumes to be restored include any volumes that had all plexes residing on disks that were removed or reinstalled. These volumes are invalid and must be removed, recreated, and restored from backup. If only some plexes or a volume exist on reinitialized or removed disks, these plexes must be removed. The plexes can be readded later.
To restore the volumes, do the following:
#
voldisk list
LSM displays a list of system disk devices and the status of these devices. For example, for a reinstalled system with three disks and a reinstalled root disk, the output of the voldisk list command produces an output similar to this:
DEVICE TYPE DISK GROUP STATUS rz0 sliced - - error rz1 sliced disk02 rootdg online rz2 sliced disk03 rootdg online - - disk01 rootdg failed was: rz0
The previous display shows that the reinstalled root device, rz0 is not recognized as an LSM disk and is marked with a status of error. disk02 and disk03 were not involved in the reinstallation and are recognized by LSM and associated with their devices (rz1 and rz2). The former disk01, the LSM disk that had been associated with the replaced disk device, is no longer associated with the device (rz0).
If there had been other disks (with volumes or volume plexes on them) removed or replaced during reinstallation, these disks would also have a disk device in error state and an LSM disk listed as not associated with a device.
#
volprint -sF "%vname" -e 'sd_disk = "<
disk
>'
In this command, the variable <disk> is the name of a disk with a failed status.
Note
Be sure to enclose the disk name in quotes in the command. Otherwise, the command will return an error message.
The volprint command returns a list of volumes that have plexes on the failed disk. Repeat this command for every disk with a failed status.
#
volprint -th <
volume_name
>
In this command, volume_name is the name of the volume to be examined.
The volprint command displays the status of the volume, its plexes, and the portions of disks that make up those plexes. For example, a volume named fnah with only one plex resides on the reinstalled disk named disk01. The volprint -th command, applied to the volume fnah, produces the following display:
V NAME USETYPE KSTATE STATE LENGTH READPOL PREFPLEX PL NAME VOLUME KSTATE STATE LENGTH LAYOUT ST-WIDTH MODE SD NAME PLEX PLOFFS DISKOFFS LENGTH DISK-MEDIA ACCESS
v fnah fsgen DISABLED ACTIVE 24000 SELECT - pl fnah-01 fnah DISABLED NODEVICE 24000 CONCAT - sd disk01-06 fnah-01 0 519940 24000 disk01 -
To remove the volume, use the voledit command. To remove fnah, enter the command:
#
voledit -r rm fnah
It is possible that only part of a plex is located on the failed disk. If the volume has a striped plex associated with it, the volume is divided between several disks. For example, the volume named woof has one striped plex, striped across three disks, one of which is the reinstalled disk disk01. The output of the volprint -th command for woof returns:
V NAME USETYPE KSTATE STATE LENGTH READPOL PREFPLEX PL NAME VOLUME KSTATE STATE LENGTH LAYOUT ST-WIDTH MODE SD NAME PLEX PLOFFS DISKOFFS LENGTH DISK-MEDIA ACCESS
v woof fsgen DISABLED ACTIVE 4224 SELECT - pl woof-01 woof DISABLED NODEVICE 4224 STRIPE 128 RW sd disk02-02 woof-01 0 14336 1408 disk02 rz1 sd disk01-05 woof-01 1408 517632 1408 disk01 - sd disk03-01 woof-01 2816 14336 1408 disk03 rz2
The display shows three disks, across which the plex woof-01 is striped (the lines starting with sd represent the stripes). The second stripe area is located on LSM disk01. This disk is no longer valid, so the plex named woof-01 has a state of NODEVICE. Since this is the only plex of the volume, the volume is invalid and must be removed. If a copy of woof exists on the backup media, it can be restored later.
Note
Keep a record of the volume name and length of any volumes you intend to restore from backup.
Use the voledit command to remove the volume, as described earlier.
A volume that has one plex on a failed disk may also have other plexes on disks that are still valid. In this case, the volume does not need to be restored from backup, since the data is still valid on the valid disks. The output of the volprint -th command for a volume with one plex on a failed disk (disk01) and another plex on a valid disk (disk02) would look like this:
V NAME USETYPE KSTATE STATE LENGTH READPOL PREFPLEX PL NAME VOLUME KSTATE STATE LENGTH LAYOUT ST-WIDTH MODE SD NAME PLEX PLOFFS DISKOFFS LENGTH DISK-MEDIA ACCESS
v foo fsgen DISABLED ACTIVE 10240 SELECT - pl foo-01 foo DISABLED ACTIVE 10240 CONCAT - RW sd disk02-01 foo-01 0 0 10240 disk02 rz1 pl foo-02 foo DISABLED NODEVICE 10240 CONCAT RW sd disk01-04 foo-02 0 507394 10240 disk01 -
This volume has two plexes, foo-01 and foo-02. The first plex, foo-01, does not use any space on the invalid disk, so it can still be used. The second plex, foo-02, uses space on the invalid disk, disk01, and has a state of NODEVICE. Mirror foo-02 must be removed. However, the volume still has one valid plex containing valid data. If the volume needs to be mirrored, another plex can be added later. Note the name of the volume if you want to create another plex later.
To remove an invalid plex, the plex must be dissociated from the volume and then removed. This is done with the volplex command. To remove the plex foo-02, enter the following command:
#
volplex -o rm dis foo-02
Once all invalid volumes and volume plexes have been removed, the disk configuration can be cleaned up. Each disk that was removed, reinstalled, or replaced (as determined from the output of the voldisk list command) must be removed from the configuration.
To remove the disk, use the voldg command. To remove the failed disk01, enter:
#
voldg rmdisk disk01
If the voldg command returns an error message, some invalid volume plexes exist. Repeat the processes described in "Volume Cleanup" until all invalid volumes and volume plexes are removed.
Once all the invalid disks have been removed, the replacement or reinstalled disks can be added to LSM control. If the root disk was originally under LSM control (the root file system and the swap area were on volumes), or you now want to put the root disk under LSM control, add this disk first.
To add the root disk to LSM control, enter the following command:
#
/usr/sbin/volencap <boot_disk>
For more information see Chapter 5.
When the encapsulation is complete, reboot the system to multi-user mode.
Once the root disk is encapsulated, any other disks that were replaced should be added using voldiskadm. If the disks were reinstalled during the operating system reinstallation, they should be encapsulated; otherwise, simply add them. See Chapter 6.
Once all the disks have been added to the system, any volumes that were completely removed as part of the configuration cleanup can be recreated on their contents restored from backup. The volume recreation can be done using either volassist or the Logical Storage Visual Administrator (dxlsm) interface.
To recreate the volumes fnah and woof using the volassist command, enter:
#
volassist make fnah 24000
#
volassist make woof 4224 layout=stripe nstripe=3
Once the volumes are created, they can be restored from backup using normal backup/restore procedures.
Any volumes that had plexes removed as part of the volume cleanup can have these plexes recreated following the instructions for mirroring a volume for the interface (volassist, voldiskadm, or dxlsm) you choose.
To replace the plex removed from the volume foo using volassist, enter:
#
volassist mirror foo
Once you have restored the volumes and plexes lost during reinstallation, the recovery is complete and your system should be configured as it was prior to the failure.