This chapter explains some of the procedures you can follow to recover from errors.
There are a several steps that you can take to prevent loss of data and to make it easier to recover your system, in case of failure:
Backups are necessary, in case all copies of a volume are lost
or corrupted in some way. For example, a power surge
could damage several (or all) disks on your system.
See
Section 7.3.5
for information on how you can use the
volassist
command to reduce backup downtime.
The
volassist
utility locates the plexes such that the loss of one
disk will not result in a loss of data.
Note that you can edit the file
/etc/default/volassist
to set the default number of plexes for newly
created volumes to two.
volsave
command to save copies of your LSM configuration files, in case
you need to recreate the configuration.
LSM provides the
volwatch,
volnotify,
and
voltrace
commands to monitor LSM events and configuration changes.
The
volwatch
shell script is
started automatically when you install LSM. This
script sends mail to the
root
login
when certain LSM configuration events occur, such as
a plex detach caused by a disk failure.
The
volwatch
script sends mail to
root
by default. You can specify another login as the mail recipient.
If you need to restart
volwatch,
use the following command:
#
volwatch root
The
volnotify
command is useful for monitoring
disk and configuration changes and for creating customized scripts similar to
/usr/sbin/volwatch.
The
voltrace
command provides a trace of physical or logical I/O
events or error events.
For further information, refer to the
volnotify(8),
volwatch(8),
and
voltrace(8)
reference pages.
The following sections describe some of the more common problems that LSM users might encounter and suggests corrective actions.
When an LSM command fails to execute, LSM may display the following message:
Volume daemon is not accessible
This message often means that the volume daemon
vold
is not running.
To correct the problem, try to restart
vold.
Refer to
Section 14.4
for detailed instructions.
If the
vold
daemon fails to restart (either during system reboot or
from the command line), the following message may be displayed:
lsm:vold: Error: enable failed: Error in disk group configuration copies
No valid disk found containing disk group; transactions are disabled.
This message
could imply that the
/etc/vol/volboot
file has no valid
disks that are in the
rootdg
diskgroup.
To correct the problem,
update the
/etc/vol/volboot
file by
adding disks that belong to the
rootdg
disk group and have a configuration copy.
Then, restart
vold.
For example:
#
voldctl add disk rz8h
#
voldctl add disk rz9
#
vold -k
If I/O to a LSM volume or mirroring of a LSM volume does not complete,
check whether or not the LSM error daemon,
voliod,
is running on the system.
Refer to
Section 14.5
for details.
When creating a new volume or adding a disk, the operation may fail with the following message:
No more space in disk group configuration
This often means that you are out of room in the disk group's configuration database. Refer to Section 6.3.8 and Section 6.3.9 for more information.
If a file system cannot be mounted or an open function on
an LSM volume fails, check if
errno
is set to EBADF. This could mean that the LSM volume is
not started.
Use the
volinfo
command to determine whether or not the volume is
started. For example:
#
volinfo -g rootdg
vol1 fsgen Startable
vol-rz3h fsgen Started
vol2 fsgen Started
swapvol1 gen Started
rootvol root Started
swapvol swap Started
To start volume
vol1
you would enter the following command:
#
volume -g rootdg start vol1
Refer to
Section 7.6.4
and
Section 14.9
for further information
Before any LSM operations can be performed,
the
vold
daemon must be running.
Typically, the
vold
daemon is configured to start automatically during the reboot
procedure. Perform the following steps to determine the state of the volume
daemon:
voldctl
mode command as follows:
#
voldctl mode
| If... | Then... |
The
vold
daemon is both running and enabled
|
The following message displays:
|
The
vold
daemon is running, but is not enabled
|
The following message displays:
|
The
vold
daemon is not running
|
The following message displays:
|
voldctl
enable command:
#
voldctl enable
vold
command:
#
vold
For additional information about the
vold
daemon, refer to the
vold(8)
reference page.
Volume log I/O
voliod
kernel threads are started by the
vold
daemon (if block-change logging is enabled) and are killed by the
kernel when these threads are no longer needed. Volume error kernel
threads are started automatically by LSM startup procedures. Rebooting
after your initial installation should start the
voliod
error daemon automatically.
Note
Digital recommends that there be at least as many
volioderror daemons as the number of processors on the system.
You can perform these steps to determine the state of the error daemon:
#
voliod
| If... | Then... |
Any
voliod
processes are running
|
The following message displays:
The
n
symbol in the previous example
indicates the number of
voliod
daemons running.
|
There are no
voliod
daemons currently running
|
Start some daemons by entering the following command:
#
voliod set 2
|
#
voliod set 2
For more detailed information about the
voliod
daemon, refer to the
voliod(8)
reference page.
The following sections describe two recovery procedures you can try if problems occur during the encapsulation procedure described in Section 5.2.
If something goes wrong during the conversion from the root partition to the root LSM volume, the encapsulation procedure tries to back out all changes made, and restores the use of partitions for the root file system. Under some circumstances, you might need to manually undo the changes made as a result of encapsulating the root disk.
The following steps describe how to manually reset the changes made during root encapsulation:
#
voldctl -z
#
mount -u /dev/rzXa /
#
mount -u /
/etc/fstab
file as follows:
/dev/vol/rootdg/rootvol
to the
a
partition of the boot disk. Change the primary swap device from
/dev/vol/rootdg/swapvol
to the block-device file of the swap partition.
#
cd /etc/fdmns/root_domain
#
rm rootvol
#
ln -s /dev/rzxa rzxa
Change the primary swap device from
/dev/vol/rootdg/swapvol
to the block-device file of the swap partition.
/etc/sysconfigtab
file and change the LSM entry from
lsm_rootdev_is_volume = 1
lsm_swapdev_is_volume = 1
to
lsm_rootdev_is_volume = 0
lsm_swapdev_is_volume = 0
/sbin/swapdefault
file (if it exists) to be a link to the swap partition's device-special file.
For example, if the disk
rz8b
is the swap partition, enter the following commands:
#
mv /sbin/swapdefault /sbin/swapdefault.swapvol
#
ln -s /dev/rz8b /sbin/swapdefault
#
rm -rf /etc/vol/reconfig.d/disk.d/*
#
rm -rf /etc/vol/reconfig.d/
disks-cap-part
If you encounter problems in which booting to multiuser mode is impossible, you can use the following steps to allow booting from the physical disk partition, so that you can perform maintenance to fix the problem:
volmend
utility to set the good plex in your
rootvol
volume to ACTIVE. Refer to the
volmend(8)
reference page for information about fixing the volume.
When the boot disk is mirrored, failures occurring on the original boot disk are transparent to all users. However, during a failure, the system might do one or both of the following:
root
or
swap
volumes).
To reboot the system before the original boot disk is repaired, you can
boot from any disk that contains a valid
root
and
swap
volume plex.
Chapter 5
shows how to set an alternate boot device from your system console.
If all copies of
rootvol
are corrupted, and you cannot boot the system, you must reinstall the
system. Refer to
Section 14.11
for details.
docroff: ignoring superfluous symbol replace_failed_disks
Normally, replacing a failed disk is as simple as putting a new disk somewhere on the controller and running LSM replace disk commands. It's even possible to move the data areas from that disk to available space on other disks, or to use a "hot spare" disk already on the controller to replace the failure. For data that is not critical for booting the system, it doesn't matter where the data is located. All data that is not boot critical is only accessed by LSM after the system is fully operational. LSM can find this data for you. On the other hand, boot-critical data must be placed in specific areas on specific disks in order for the boot process to find it.
When a disk fails, there are two possible routes that can be taken to correct the action. If the errors are transient or correctable, then the same disk can be re-used. This is known as re-adding a disk. On the other hand, if the disk has truly failed, then it should be completely replaced.
Re-adding a disk is the same procedure as replacing a disk, except that the same physical disk is used. Usually, a disk that needs to be re-added has been detached, meaning that LSM has noticed that the disk has failed and has ceased to access it.
If the boot disk has a transient failure, its plexes can be recovered
using the following steps. The
rootvol
and
swapvol
volumes can have two or three LSM disks per physical disk, depending
on the layout of the original root disk.
voldisk
command to list the LSM disks that are associated with the failed
physical disk. For example:
#
voldisk list
DEVICE TYPE DISK GROUP STATUS rz10 sliced - - error rz10b nopriv - - error rz10f nopriv - - error rz21 sliced rz21 rootdg online rz21b nopriv rz21b rootdg online - - rz10 rootdg removed was:rz10 - - rz10b rootdg removed was:rz10b - - rz10f rootdg removed was:rz10f
In this example, if
rz10
was the failed boot disk,
then you can assume that
rz10,
rz10b,
and
rz10f
are the LSM disks associated with the physical disk
rz10.
rootdg
disk group:
#
voldisk online rz10 rz10b rz10f
#
voldg -k adddisk rz10=rz10
#
voldg -k adddisk rz10b=rz10b
#
voldg -k adddisk rootrz10=rz10f
rootdg
disk group, enter the
volrecover
command to resynchronize the plexes in the
rootvol
and
swapvol
volumes. For example:
#
volrecover -sb rootvol swapvol
If a boot disk that is under LSM control fails and you are replacing it with a new disk, perform the following steps:
rootvol
and
swapvol.
volplex(8),
voldg(8),
and
voldisk(8)
for more information about how to accomplish this.
rootvol
and
swapvol
volumes onto the new disk, as described in
Section 5.3.1.
The replacement disk should have at least as much storage capacity as
was in use on the old disk.
If a disk is unavailable when the system is running, any plexes of volumes that reside on that disk will become stale, meaning the data on that disk is out of date relative to the other plexes of the volume.
During the boot process, the system accesses only one copy of the root and swap volumes (the copies on the boot disk) until a complete configuration for those volumes can be obtained. If it turns out that the plex of one of these volumes that was used for booting is stale, the system must be rebooted from a backup boot disk that contains nonstale plexes. This problem can occur, for example, if the boot disk was replaced and restarted without adding the disk back into the LSM configuration. The system will boot normally, but the plexes that reside on the newly powered disk will be stale.
Another possible problem can occur if errors in the LSM headers on the boot disk prevents LSM from properly identifying the disk. In this case, LSM will be unable to know the name of that disk. This is a problem because plexes are associated with disk names, and therefore any plexes on that disk are unusable.
If either of these situations occurs, the LSM daemon
vold
will notice it when it is configuring system as part of the
init
processing of the boot sequence. It will output a message describing
the error, describe what can be done about it, and halt the system.
For example, if the plex
rootvol-01
of the root volume
rootvol
on disk
disk01
of the system was stale,
vold
would print the following message:
lsm:vold: Warning Plex rootvol-01 for root volume is stale or unusable. lsm:vold: Error: System boot disk does not have a valid root plex Please boot from one of the following disks:
Disk: disk02 Device: rz2
lsm:vold: Error: System startup failed
This informs the administrator that the disk
disk02
contains usable copies of the root and swap plexes and should be
used for booting. This is the name of the system backup disk. When
this message appears, the administrator should reboot the system from
a backup boot disk.
Once the system has booted, the exact problem needs to be determined. If the plexes on the boot disk were simply stale, they will be caught up automatically as the system comes up. If, on the other hand, there was a problem with the private area on the disk, the administrator will need to re-add or replace the disk.
If the plexes on the boot disk were unavailable, the administrator
should get mail from the LSM
volwatch
utility describing the problem.
Another way to discover the problem is by listing the disks with the
voldisk
utility. In the previous example, if the problem is a failure in
the private area of
disk01
(such as due to media failures or accidentally overwriting the LSM
private region on the disk), enter the following command:
#
voldisk list
This command produces the following output:
DEVICE TYPE DISK GROUP STATUS - - disk02 rootdg failed was: rz1 rz2 sliced disk02 rootdg online
If a system failure occurs, the system console writes a crash dump to the boot disk. However, if the original boot disk has had a problem such that the corresponding plex in the root or swap volumes has been disabled, then the crash dump is written to the first available plex in the swap volume. The system reports the name of the disk that has the crash dump by printing a message on the system console.
For example, the following messages are printed to the console along with other dump information:
WARNING: LSM: Original dump device not found
LSM attempting to dump to SCSI device unit number rz1
To obtain the crash dump when the system reboots, you must boot the system from the disk that contains the crash dump.
The following sections describe recovery procedures for problems related to LSM disks.
If one plex of a volume encounters a disk I/O failure (for example, because the disk has an uncorrectable format error), one of the the following may happen:
If a plex is detached, I/O stops on that plex but continues on the remaining plexes of the volume.
If a disk is detached, all plexes on the disk are disabled. If there are any unmirrored volumes on a disk when it is detached, those volumes are disabled as well.
If a volume, a plex, or a disk is detached by failures, the
volwatch(8)
utility sends mail to
root
indicating the failed objects. For example, if a disk containing two
mirrored volumes fails you might receive a mail message similar to the
following:
To: root Subject: Logical Storage Manager failures on mobius.lsm.com
Failures have been detected by LSM on host mobius.lsm.com:
failed plexes: home-02 src-02
No data appears to have been lost. However, you should replace the drives that have failed.
To determine which disks are causing the failures in this message, enter the following command:
#
volstat -sff home-02 src-02
This produces output such as the following:
FAILED TYP NAME READS WRITES sd disk01-04 0 0 sd disk01-06 0 0 sd disk02-03 1 0 sd disk02-04 1 0
This display indicates that the failures are on
disk02
(the
basename for the displayed subdisks).
Sometimes these errors are caused by cabling failures. You should look at the cables connecting your disks to your system. If there are any obvious problems, correct them and recover the plexes with the following command:
#
volrecover -b home src
This command starts a recovery of the failed plexes in the background (the command returns before the operation is done). If an error message appears later, or if the plexes become detached again, replace the disk.
If you do not see any obvious cabling failures, then the disk probably needs to be replaced.
If a disk fails completely, the mail message will list the disks that have failed, all plexes that use the disk, and all volumes defined on the disk that was disabled because the volumes were not mirrored. For example:
To: root Subject: Logical Storage Manager failures on mobius.lsm.com
Failures have been detected by LSM on host mobius.lsm.com:
failed disks: disk02
failed plexes: home-02 src-02 mkting-01
failed volumes: mkting
The contents of failed volumes may be corrupted, and should be restored from any available backups. To restart one of these volumes so that you can restore it from backup, replace disks as appropriate then use the command:
volume -f start <volume-name>
You can then restore or recreate the volume.
This message indicates that
disk02
was detached by a failure; that plexes
home-02,
src-02,
and
mkting-01
were also detached (probably because of the failure of the disk); and that the
volume
mkting
was disabled.
Again, the problem may be a cabling error. If the problem is not a cabling error, then you must replace the disk.
Disks that have failed completely, and that have been detached by failure,
can be replaced by running the
voldiskadm
menu utility and selecting item 5,
Replace a failed or removed disk,
from the main menu. If you have any disks that are initialized for LSM but
have never been added to a disk group, you can
select one of those disks as a replacement. Do not choose the old
disk drive as a replacement even though it may appear in the selection
list. If there are no suitable initialized disks, you can choose to
initialize a new disk.
If a disk failure caused a volume to be disabled, then the volume must be
restored from backup after the disk is replaced. To identify volumes that
wholly reside on disks that were disabled by a disk failure, use the
volinfo
command.
Any volumes that are listed as
Unstartable
must be restored from backup. For example, the
volinfo
command might display:
home fsgen Started mkting fsgen Unstartable src fsgen Started
To restart volume
mkting
so that it can be restored from backup, use the following command:
#
volume -obg -f start mkting
The
-obg
option causes any plexes to be recovered in a background task.
Often a disk has recoverable (soft) errors before it fails completely. If a disk is getting an unusual number of soft errors, replace it. This involves two steps:
To detach the disk, run
voldiskadm
and select item 4,
Remove a disk for replacement,
from the main menu. If there are initialized disks available as replacements,
you can specify the disk as part of this operation. Otherwise, you must
specify the replacement disk later by selecting item 5,
Replace a failed or removed disk,
from the main menu.
When you select a disk to remove for replacement, all volumes that will be affected by the operation are displayed. For example, the following output might be displayed:
The following volumes will lose mirrors as a result of this
operation:
lhome src
No data on these volumes will be lost.
The following volumes are in use, and will be disabled as a
result of this operation:
mkting
Any applications using these volumes will fail future accesses.
These volumes will require restoration from backup.
Are you sure you want do do this? [y,n,q,?] (default: n)
If any volumes would be disabled, quit from
voldiskadm
and save the volume. Either back up the volume or move the volume off of the
disk. To move the volume
mkting
to a disk other than
disk02,
use the command:
#
volassist move mkting disk02
After the volume is backed up or moved, run
voldiskadm
again and continue to remove the disk for replacement.
After the disk has been removed for replacement, specify a replacement disk
by selecting item 5,
Replace a failed or removed disk,
from the main menu in
voldiskadm.
Refer to Section C.10 for examples of how to replace disks.
In LSM Version 1.0, disks added to LSM skip physical block 0 and start at block 1 because block 0 contains the disk label and is write-protected.
Starting with LSM Version 1.1, disks added to LSM start at physical block 16
for performance reasons with certain disks. To start a disk
at physical block 1 instead of block 16, use the
disklabel
command to modify the partition start offset and length accordingly before
adding the disk to LSM.
For example:
#
disklabel -e /dev/rrz16c
#
voldisk init rz16 type=sliced
Refer to the
disklabel(8)
reference page for details.
The following sections describe recovery procedures for problems relating to LSM volumes.
An unstartable volume is likely to be incorrectly configured or
has other errors or conditions that prevent it from being started.
To display unstartable volumes,
use the
volinfo
command, which displays information on the accessibility and
usability of one or more volumes:
#
volinfo -g
diskgroup
[
volname
]
If a system crash or an I/O error corrupts one or more plexes of a volume and no plex is CLEAN or ACTIVE, mark one of the plexes CLEAN and instruct the system to use that plex as the source for reviving the others. To place a plex in a CLEAN state, use the following command:
#
volmend fix clean
plex_name
For example, the command line to place one plex labeled
vol01-02
in the CLEAN state looks like this:
#
volmend fix clean vol01-02
Refer to the
volmend(8)
reference pages for more information.
If you used the
volsave
command to save a copy of your configuration,
you can use the
volrestore
command to restore the configuration.
This section describes problems that may arise in restoring
a configuration.
See
Section 7.4
and
Section 7.5
for information on
volsave
and
volrestore.
See
Appendix C
for examples of handling restore failures.
When
volrestore
executes, it can encounter conflicts in the LSM
configuration, for example, if another volume uses the same plex name
or subdisk name, or the same location on a disk. When
volrestore
finds a conflict, it displays error messages and the configuration of the
volume, as found in the saved LSM description set. In addition, it removes
all volumes created in that disk group during the restoration.
The disk group that had the conflict remains imported, and
volrestore
continues to restore other disk groups.
If
volrestore
fails because of a conflict, you can use the
-b
option to do the "best possible" restoration in a disk group. You will
then have to resolve the conflicts and restore the volumes in the affected
disk group.
See Section C.26 for further information and examples.
The restoration of volumes fails if one or more disks associated with the volumes are unavailable, for example due to disk failure. This, in turn, can cause the restoration of a disk group to fail. You can use a command like the following to restore the LSM configuration of a disk group:
#
volrestore -b -g
diskgroup
The volumes associated with the failed disks can then be restored by
editing the
volmake
description file to remove the
plexes that use the failed disks.
Note that editing the description file will affect the checksum
of the files in the backup directory, so you will have to override the
checksum validation by using the
-f
option.
See Section C.26 for further information and examples.
Occasionally, your system may need to be reinstalled after some types of failures. Reinstallation is necessary if all copies of your root (boot) disk are damaged, or if certain critical files are lost due to file system damage. When a failure of either of these types occurs, you must reinstall the entire system.
If these types of failures occur, attempt to preserve as much of the original LSM configuration as possible. Any volumes not directly involved in the failure may be saved. You do not have to reconfigure any volumes that are preserved.
The following sections describe the procedures used to reinstall LSM and preserve as much of the original configuration as possible after a failure.
A system reinstallation completely destroys the contents of any disks that are reinstalled. Any LSM related information, such as data in the LSM private areas on reinstalled disks (containing the disk identifier and copies of the LSM configuration), is removed during reinstallation. The removal of this information makes the disk unusable as an LSM disk.
If a disk was placed under LSM control (either during the LSM installation or by later encapsulation), that disk and any volumes on it are lost during reinstallation. If a disk was not under LSM control before the failure, no volumes are lost at reinstallation. You can replace any other disks by following the procedures in Section 9.2.6,
When reinstallation is necessary, the only volumes saved are those that reside on, or have copies on, disks that are not directly involved with the failure, the reinstallation, or both; volumes on disks involved with the failure or reinstallation are lost during reinstallation. If backup copies of these volumes are available, you can restore them after reinstallation. The system root disk is always involved in reinstallation. Other disks may also be involved.
If the root disk was placed under LSM control by encapsulation, that disk and any volumes or volume plexes on it are lost during reinstallation. In addition, any other disks that are involved in the reinstallation (or that are removed and replaced), also lose any LSM data (including volumes and plexes).
If a disk (including the root disk) is not under LSM control prior to the failure, no volumes are lost at reinstallation. Although having the root disk under LSM control simplifies the recovery process after reinstallation, not having the root disk under LSM control increases the likelihood of a reinstallation being necessary. Having the root disk under LSM control, and creating plexes of the root disk contents, eliminates many of the problems that require system reinstallation.
To reinstall the system and recover the LSM configuration you need to perform the following procedures:
/etc/vol/volboot.
/etc/vol/volboot.
Each of these procedures is described in detail in the sections that follow.
To prevent the loss of data on disks not involved in the reinstallation, you should only involve the root disk in the reinstallation procedure. It is recommended that any other disks (that contain volumes) be disconnected from the system before you start the reinstallation procedure. Disconnecting the other disks ensures that they are unaffected by the reinstallation. For example, if the operating system was originally installed with a file system on the second drive, the file system may still be recoverable. Removing the second drive ensures that the file system remains intact.
Once any failed or failing disks have been replaced and disks uninvolved with the reinstallation have been detached, reinstall the operating system as described in the Installation Guide.
While the operating system installation progresses, make sure no disks other than the root disk are accessed in any way. If anything is written on a disk other than the root disk, the LSM configuration on that disk could be destroyed.
Once the LSM subsets have been loaded, recover the LSM configuration by doing the following:
volinstall
script to create LSM special device files and to add LSM entries to the
/etc/inittab
file:
#
volinstall
#
shutdown now
#
rm -rf /etc/vol/reconfig.d/state.d/install-db
#
/sbin/voliod set 2
vold,
by entering the command:
#
/sbin/vold -m disable
/etc/vol/volboot
exists on backup media, restore it. Go to the next step.
If a saved copy of
/etc/vol/volboot
does not exist, initialize
/etc/vol/volboot
by entering:
#
voldctl init
Add one or more disks that have configuration databases to the
/etc/vol/volboot
file. You must do this otherwise LSM cannot restart after a reboot.
To reenable the previous LSM configuration, you need to
determine the name of one of the disks that was in the
rootdg
disk group. If you do not know the name of one of the disks,
you can scan the disk label on the disks available on the system for
LSM disk label tags such as
LSMpubl
or
LSMsimp.
If you find the
LSMpubl
disk label tag on a disk, add the disk as an LSM sliced disk. If you
find the
LSMsimp
disk label tag, add the partition as an LSM simple disk.
#
voldctl add disk rz3
vold
by entering:
#
voldctl enable
#
volrecover -sb
The configuration preserved on the disks not involved with the reinstallation has now been recovered. However, because the root disk has been reinstalled, it appears to LSM as a non-LSM disk. Therefore, the configuration of the preserved disks does not include the root disk as part of the LSM configuration.
Note
If the root disk of your system and any other disk involved in the reinstallation were not under LSM control at the time of failure and reinstallation, then the reconfiguration is complete at this point. If any other disks containing volumes or volume plexes are to be replaced, follow the replacement procedures in Chapter 6. There are several methods available to replace a disk. Choose the method that you prefer.
If the root disk (or another disk) was involved with the reinstallation, any volume or volume plexes on that disk (or other disks no longer attached to the system) are now inaccessible. If a volume had only one plex (contained on a disk that was reinstalled, removed, or replaced), then the data on that the volume is lost and must be restored from backup. In addition, the system's root file system and swap area are not located on volumes any longer. To correct these problems, follow the instructions in Section 14.11.6.
The following sections describe how to clean up the configuration of your system after reinstallation of LSM.
To clean up the LSM configuration, remove any volumes associated with rootability, and their associated disks. This must be done if the root disk was under LSM control prior to installation. The volumes to remove are:
Follow these steps:
voledit
command, as follows:
#
volume stop rootvol
#
voledit -r rm rootvol
swapvol
in place of
rootvol,
to remove the swap volume.
rootvol
and
swapvol.
For example, if disk
rz3
was associated with
rootvol
and disk
rz3b
was associated with
swapvol,
you would enter the following commands:
#
voldg rmdisk rz3 rz3b
#
voldisk rm rz3 rz3b
If
/usr
and
/var
were on LSM volumes prior to the reinstallation, clean up
the volumes using the
voledit
command similar to the previous example shown for
rootvol.
Remove the LSM disks associated with the volumes used for
/usr
and
/var.
After completing the rootability cleanup, you must determine which volumes need to be restored from backup. The volumes to be restored include any volumes that had all plexes residing on disks that were removed or reinstalled. These volumes are invalid and must be removed, recreated, and restored from backup. If only some plexes or a volume exist on reinitialized or removed disks, these plexes must be removed. The plexes can be readded later.
To restore the volumes, do the following:
#
voldisk list
LSM displays a list of system disk devices and the
status of these devices. For example, for a reinstalled system with
three disks and a reinstalled root disk, the output of the
voldisk
list
command produces an output similar to this:
DEVICE TYPE DISK GROUP STATUS rz0 sliced - - error rz1 sliced disk02 rootdg online rz2 sliced disk03 rootdg online - - disk01 rootdg failed was: rz0
The previous display shows that the reinstalled root device,
rz0
is not recognized as an LSM disk and is marked with a
status of
error.
disk02
and
disk03
were not involved in the reinstallation and are recognized by LSM
and associated with their devices
(rz1
and
rz2).
The former
disk01,
the LSM disk that had been associated with the replaced disk device, is no
longer associated with
the device
(rz0).
If there had been other disks (with volumes or volume plexes on them)
removed or replaced during reinstallation, these disks would also have
a disk device in
error
state and an LSM disk listed as not associated with a device.
failed
must be located. Enter the command:
#
volprint -sF "%vname" -e 'sd_disk = "<
disk
>'
In this command, the variable
<disk>
is the name of a disk with a
failed
status.
Note
Be sure to enclose the disk name in quotes in the command. Otherwise, the command will return an error message.
The
volprint
command returns a list of volumes that have plexes on the failed disk.
Repeat this command for every disk with a
failed
status.
#
volprint -th <
volume_name
>
In this command,
volume_name
is the name of the volume to be examined.
The
volprint
command displays the status of the volume, its
plexes, and the portions of disks that make up those plexes.
For example, a volume named
fnah
with only one plex resides on
the reinstalled disk named
disk01.
The
volprint
-th
command, applied to the volume
fnah,
produces the following display:
V NAME USETYPE KSTATE STATE LENGTH READPOL PREFPLEX PL NAME VOLUME KSTATE STATE LENGTH LAYOUT ST-WIDTH MODE SD NAME PLEX PLOFFS DISKOFFS LENGTH DISK-MEDIA ACCESS
v fnah fsgen DISABLED ACTIVE 24000 SELECT - pl fnah-01 fnah DISABLED NODEVICE 24000 CONCAT - sd disk01-06 fnah-01 0 519940 24000 disk01 -
pl.
The
STATE
field for the plex named
fnah-01
is
NODEVICE.
The plex has space on a disk that has been
replaced, removed, or reinstalled. Therefore, the plex is no longer
valid and must be removed. Since
fnah-01
was the only plex of the volume, the volume contents are irrecoverable
except by restoring the volume from a backup. The volume must also be
removed. If a backup copy of the volume exists, you can restore the volume
later. Keep a record of the volume name and its length, you will need it for
the backup procedure.
To remove the volume, use the
voledit
command. To remove
fnah,
enter the command:
#
voledit -r rm fnah
It is possible that only part of a plex is located on the failed
disk. If the volume has a striped plex associated with it, the
volume is divided between several disks. For example, the volume
named
woof
has one striped plex, striped across three disks,
one of which is the reinstalled disk
disk01.
The output of the
volprint
-th
command for
woof
returns:
V NAME USETYPE KSTATE STATE LENGTH READPOL PREFPLEX PL NAME VOLUME KSTATE STATE LENGTH LAYOUT ST-WIDTH MODE SD NAME PLEX PLOFFS DISKOFFS LENGTH DISK-MEDIA ACCESS
v woof fsgen DISABLED ACTIVE 4224 SELECT - pl woof-01 woof DISABLED NODEVICE 4224 STRIPE 128 RW sd disk02-02 woof-01 0 14336 1408 disk02 rz1 sd disk01-05 woof-01 1408 517632 1408 disk01 - sd disk03-01 woof-01 2816 14336 1408 disk03 rz2
The display shows three disks, across which the plex
woof-01
is striped (the lines starting with
sd
represent the stripes). The second stripe area is located on LSM
disk01.
This disk is no
longer valid, so the plex named
woof-01
has a state of
NODEVICE.
Since this is the only plex of the volume, the
volume is invalid and must be removed. If a copy of
woof
exists on the backup media, it can be restored later.
Note
Keep a record of the volume name and length of any volumes you intend to restore from backup.
Use the
voledit
command to remove the volume, as described earlier.
A volume that has one plex on a failed disk may also have other
plexes on disks that are still valid. In this case, the volume does
not need to be restored from backup, since the data is still valid on
the valid disks. The output of the
volprint
-th
command for a
volume with one plex on a failed disk
(disk01)
and another plex on a valid disk
(disk02)
would look like this:
V NAME USETYPE KSTATE STATE LENGTH READPOL PREFPLEX PL NAME VOLUME KSTATE STATE LENGTH LAYOUT ST-WIDTH MODE SD NAME PLEX PLOFFS DISKOFFS LENGTH DISK-MEDIA ACCESS
v foo fsgen DISABLED ACTIVE 10240 SELECT - pl foo-01 foo DISABLED ACTIVE 10240 CONCAT - RW sd disk02-01 foo-01 0 0 10240 disk02 rz1 pl foo-02 foo DISABLED NODEVICE 10240 CONCAT RW sd disk01-04 foo-02 0 507394 10240 disk01 -
This volume has two plexes,
foo-01
and
foo-02.
The first
plex,
foo-01,
does not use any space on the invalid disk, so
it can still be used. The second plex,
foo-02,
uses space on
the invalid disk,
disk01,
and has a state of
NODEVICE.
Mirror
foo-02
must be removed. However, the volume still has one valid plex containing
valid data. If the volume needs to be mirrored, another plex can be added
later. Note the name of the volume if you want to create another plex
later.
To remove an invalid plex, the plex must be dissociated from the
volume and then removed. This is done with the
volplex
command.
To remove the plex
foo-02,
enter the following command:
#
volplex -o rm dis foo-02
Once all invalid volumes and volume plexes have been removed, the
disk configuration can be cleaned up. Each disk that was removed,
reinstalled, or replaced (as determined from the output of the
voldisk
list
command) must be removed from the configuration.
To remove the disk, use the
voldg
command. To remove the failed
disk01,
enter:
#
voldg rmdisk disk01
If the
voldg
command returns an error message, some invalid
volume plexes exist. Repeat the processes described in "Volume
Cleanup" until all invalid volumes and volume plexes are removed.
Once all the invalid disks have been removed, the replacement or reinstalled disks can be added to LSM control. If the root disk was originally under LSM control (the root file system and the swap area were on volumes), or you now want to put the root disk under LSM control, add this disk first.
To add the root disk to LSM control, enter the following command:
#
/usr/sbin/volencap <boot_disk>
For more information see Chapter 5.
When the encapsulation is complete, reboot the system to multi-user mode.
Once the root disk is encapsulated, any other disks that were replaced
should be added using
voldiskadm.
If the disks were reinstalled
during the operating system reinstallation, they should be
encapsulated; otherwise, simply add them. See
Chapter 6.
Once all the disks have been added to the system, any volumes that
were completely removed as part of the configuration cleanup can be
recreated on their contents restored from backup. The volume
recreation can be done using either
volassist
or the Logical Storage Visual Administrator (dxlsm) interface.
To recreate the volumes
fnah
and
woof
using the
volassist
command, enter:
#
volassist make fnah 24000
#
volassist make woof 4224 layout=stripe nstripe=3
Once the volumes are created, they can be restored from backup using normal backup/restore procedures.
Any volumes that had plexes removed as part of the volume cleanup can
have these plexes recreated following the instructions for mirroring
a volume for the interface
(volassist,
voldiskadm,
or dxlsm) you choose.
To replace the plex removed from the volume
foo
using
volassist,
enter:
#
volassist mirror foo
Once you have restored the volumes and plexes lost during reinstallation, the recovery is complete and your system should be configured as it was prior to the failure.