This chapter examines problems that, while universal for file systems, might have solutions unique to AdvFS. See System Configuration and Tuning for related information about diagnosing performance problems.
This chapter covers the following:
Section 5.1 details the commands that you can use to check disk usage.
Section 5.2 suggests preventative maintenance strategies.
Section 5.3 explains how to increase the size of the root domain.
Section 5.4 identifies operating system incompatibilities.
Section 5.5 explains I/O method incompatibilities.
Section 5.7 lists ways that you can improve poor performance.
Section 5.8 suggests ways to fix disk problems.
Section 5.9
explains how to restore the
/etc/fdmns
directory.
Section 5.10 examines how to recover from volume failure.
Section 5.11 explains how to recover from the failure of the root domain.
Section 5.12 details how to restore a multivolume domain.
Section 5.13 suggests methods of crash recovery.
5.1 Checking Free Space and Disk Usage
You can look at the way space is allocated on a disk by file, fileset,
or domain.
Table 5-1
describes command-line
commands that you can use to examine disk space usage.
Table 5-1: Disk Space Usage Commands
Command | Description |
df |
Displays disk space usage by fileset. Available space for a fileset is limited by the fileset quota if it is set. |
du |
Displays information about block allocation for files.
Use the
-a
option to display information for individual
files. |
ls |
Displays the space used by files.
The
-l
option shows the space spanned by a sparse file.
The
-s
option shows the actual block usage and might be more useful for use with
sparse files. |
showfdmn |
Displays the attributes and block usage for each volume in an active domain. For multivolume domains, additional volume information is displayed. |
showfile |
Displays block usage and volume information for a file or for the contents of a directory. |
showfsets |
Displays information about the filesets in a domain. Use to display fileset quota limits. |
vdf |
Displays used and available disk space for a fileset or a domain. |
See the reference pages for the commands for more complete information.
Under certain conditions, the disk usage information for AdvFS
can become corrupt.
Run the
quotacheck -v
command to correct
the disk usage information.
5.2 Preventative Maintenance
This section describes a number of things you can do to prevent problems
with your AdvFS file system.
5.2.1 Failing Disks
Back up your data regularly and frequently, and watch for signs of impending
disk failure.
Try to remove files from a problem disk before it fails.
See
the Event Management information in
System Administration
for more information
about examining disk activity.
5.2.2 Verifying File System Consistency
To ensure that metadata is consistent, run the
verify
command to verify the file system structure.
The
verify
utility checks disk structures such as the bitfile metadata table (BMT), the
storage bitmaps, the tag directory, and the
frag
file for
each fileset.
It verifies that the directory structure is correct, that all
directory entries reference a valid file, and that all files have a directory
entry.
You must be the root user to run this command.
It is a good idea to run the
verify
command:
When problems are evident (corruptions, domain panic, lost data, I/O errors)
Before an update installation
If your files have not been accessed in three to six months
or longer, and you plan to run utilities such as
balance
,
defragment
,
migrate
,
quotacheck
,
repquota
,
rmfset
,
rmvol
, or
vdump
that access every file in a domain
Use the SysMan Manage an AdvFS Domain utility, or enter the
verify
command from the command line:
verify
domain_name
The
verify
command mounts filesets in special directories.
If the
verify
command is unable to mount a fileset due
to the failure of a domain, as a last resort you can run the
verify
-F
command.
The
-F
option mounts the fileset
using the
-d
option of the
mount
command,
which means that AdvFS initializes the transaction log file for the domain
without recovering the domain.
Caution
Because no domain recovery occurs for previously incomplete operations, using the
verify -F
command could cause data corruption.
Under some circumstances the
verify
command might
fail to unmount the filesets.
If this occurs, you must unmount the affected
filesets manually and delete the mount points that were created in the
/etc/fdmns/<domain_name>
file.
On machines with millions of files, sufficient swap space must be allocated
for the
verify
utility to run to completion.
If the amount
of memory required by the
verify
utility exceeds the kernel
variable
proc/max_per_proc_data_size
process variable,
the utility does not complete.
To overcome this problem, allocate up to 10%
of the domain size in swap space for running the
verify
command.
The following example verifies the
domainx
domain,
which contains the filesets
setx
and
sety
:
# verify domainx +++Domain verification+++ Domain Id 2f03b70a.000f1db0 Checking disks ... Checking storage allocated on disk /dev/disk/dsk10g Checking storage allocated on disk /dev/disk/dsk10a Checking mcell list ... Checking mcell position field ... Checking tag directories ... +++ Fileset verification +++ +++ Fileset setx +++ Checking frag file headers ... Checking frag file type lists ... Scanning directories and files ... 1100 Scanning tags ... 1100 Searching for lost files ... 1100 +++ Fileset sety +++ Checking frag file headers ... Checking frag file type lists ... Scanning directories and files ... 5100 Scanning tags ... 5100 Searching for lost files ... 5100
In this example, the
verify
utility finds no problems
with the domain.
See
verify
(8)
for more information.
5.3 Increasing the Size of an AdvFS root Domain
The AdvFS root domain is limited to one volume (partition) unless you are running a cluster configuration. If you want to increase the size of the root domain, you must recreate the root domain on a larger volume. This section explains how to recreate the root domain on a different device. It does not cover the case of repartitioning the current root volume and restoring root to it. If you are moving the root domain to another disk already installed in the system, you can skip the section on installing the disk and begin at Section 5.3.2.
You need the following to move your root domain:
Current operating system CD-ROM
You can use the operating system CD-ROM that is packaged with the distribution media to recreate your root domain.
If your local site provides a Remote Installation Service (RIS) server, you can boot your system across the network. If you choose RIS services, follow your site-specific procedures and consult the Installation Guide.
Backup device
You will need either a backup tape or an unused disk partition to backup the root domain.
Information about console commands
You will use Alpha System Reference Manual (SRM) console commands at
the system console prompt (>>>
) to perform
some tasks.
These commands are documented in the hardware manual for your
Alpha system.
If you cannot find the printed document, it is usually shipped
as a printable file on a CD-ROM supplied with the system.
This section explains increasing the size of a root domain on a non-clustered
system.
For other configurations, see
System Administration
and
Cluster Administration.
If your root volume is an LSM volume, see
Logical Storage Manager.
5.3.1 Installing a New Disk for the root Domain
To move your root domain to a new disk, you must first install the disk and have it recognized.
Log in as root and shut down the system.
# shutdown -h now
Add the new disk device. For more information see your hardware manuals.
Verify that the SRM console recognizes the newly added disk.
In this example,
DKB300
(an RZ1BB-CS) was added.
>>> show device polling ncr0(NCR 53C810) slot 1, bus 0 PCI, hose 1 A SCSI Bus ID 7 dka500.5.0.1.1 DKA500 RRD45 1645 polling isp0(QLogic ISP1020) slot 4, bus 0 PCI, hose 1 SCSI Bus ID 7 dkb0.0.0.4.1 DKB0 RZ1DB-CA LYJ0 dkb100.1.0.4.1 DKB100 RZ1CB-CA LYJ0 dkb200.2.0.4.1 DKB200 RZ1CB-CA LYJ0 dkb300.3.0.4.1 DKB300 RZ1BB-CS 0656 mkb400.4.0.4.1 MKB400 TLZ10 02ab ...
Boot the original system disk to update the device information
databases with the new device.
In this example, the default boot device,
dkb0
, is booted.
>>> show bootdef_dev bootdef_dev dkb0.0.0.4.1 >>> boot
During the boot process, the operating system recognizes the new device and updates the device information databases accordingly.
... dsfmgr: NOTE: updating kernel basenames for system at / scp kevm tty00 tty01 lp0 random urandom dmapi dsk3 dsk4 dsk5 floppy0 cdrom0 -dsk6a +dsk6a -dsk6a +dsk6a -dsk6b +dsk6b -dsk6b +dsk6b -dsk6c +dsk6c -dsk6c +dsk6c -dsk6d +dsk6d -dsk6d +dsk6d -dsk6e +dsk6e -dsk6e +dsk6e -dsk6f +dsk6f -dsk6f +dsk6f -dsk6g +dsk6g -dsk6g +dsk6g -dsk6h +dsk6h -dsk6h +dsk6h ...
In this example, the operating system's device name for the added disk
is
dsk6
.
5.3.2 Configuring a Device for Use as the root Volume
To make the device available to be used as the root volume, you must configure, that is, label and partition, it. You must be the root user to perform this operation. For methods of labeling your disk, see System Administration, "Partitioning Disks Using diskconfig" and "Manually Partitioning Disks."
Be sure to specify AdvFS for the Boot Block.
Use the
disklabel
command with the
-t advfs
option or, if you
are using the
disconfig
utility, choose AdvFS from the
Boot Block:
list.
Caution
Modifying a disk's partition layout destroys some or all of the data on disk. Be certain that you do not need any data on the disk that you choose for the new root domain.
For example, if you have expanded the
a
partition
to 500 MB (1024000 512-byte sectors) and allocated the remaining space
on the disk to the
b
partition as swap, your disk label
might look like the following:
# disklabel dsk6 # size offset fstype fsize bsize cpg # ~Cyl values a: 1024000 0 unused 0 0 # 0 - 744* b: 3086480 1024000 swap 0 0 # 744*- 2987* c: 4110480 0 unused 0 0 # 0 - 2987* d: 0 0 unused 0 0 # 0 - 0 e: 0 0 unused 0 0 # 0 - 0 f: 0 0 unused 0 0 # 0 - 0 g: 1858632 393216 unused 0 0 # 285*- 1636* h: 1858632 2251848 unused 0 0 # 636*- 2987*
5.3.3 Backing up the Current root Domain
The first step in moving a root domain is to make a full backup of the domain. Use a backup tape or an unused disk partition.
For example, to back up the root domain to tape
/dev/tape/tape0_d1
:
# vdump -0 -f /dev/tape/tape0_d1 /
To back up the root domain to an unused partition, create a temporary
domain, fileset and mount-point directory.
Back up to a file in that fileset.
For example, for the domain
TMP_BACKUP
, the fileset
tmp_backup
, the mount point
/tmp_backup
, and
the file containing the dump,
root_backup.vdump
:
# mkfdmn /dev/disk/dsk5c TMP_BACKUP # mkfset TMP_BACKUP tmp_backup # mkdir /tmp_backup # mount TMP_BACKUP#tmp_backup /tmp_backup # vdump -0 -f /tmp_backup/root_backup.vdump / path : / dev/fset : root_domain#root type : advfs advfs id : 0x3b000fb0.000919cc.1 vdump: Dumping directories vdump: Dumping 96402959 bytes, 117 directories, 2024 files vdump: Dumping regular files vdump: Status at Thu May 17 12:52:38 2001 vdump: Dumped 96525730 of 96402959 bytes; 100.0% completed vdump: Dumped 117 of 117 directories; 100.0% completed vdump: Dumped 2024 of 2024 files; 100.0% completed vdump: Dump completed at Thu May 17 12:52:38 2001
5.3.4 Recreating the root Domain on a Different Volume
To recreate the root domain on the new volume, you must restore the
backup of the root domain to the new volume.
This example also moves the swap
partition from
dsk3b
to
dsk6b
.
Shut down the system booted from your old root domain.
# shutdown -h now
Boot from the current operating system CD-ROM or Remote Installation Service (RIS) server. For example, from the CD-ROM:
>>> boot dka500
From the RIS server:
>>> boot ewa0
Exit the installation.
If you have a VGA graphics console, choose to exit the installation, or from the File menu of the Installation and Configuration Welcome dialog box, choose shell window.
If you have a serial console terminal, select option
3) Exit Installation
.
You will get a shell (#
) prompt.
If you have backed your root domain to tape, install the tape device.
# dn_setup -install_tape
For more information see System Administration "Using dn_setup to Perform Generic Operations."
Verify that the new device is recognized properly by the Operating System and that the backup device is properly installed.
# hwmgr -view devices HWID: Device Name Mfg Model Location ------------------------------------------------------------ 4:(unknown) 6:(unknown) 38:/dev/disk/floppy0c 3.5in floppy fdi0-unit-0 41:/dev/disk/cdrom0c DEC RRD45 (C) DEC bus-0-targ-5-lun-0 42:/dev/disk/dsk3c DEC RZ1DB-CA (C) DEC bus-1-targ-0-lun-0 43:/dev/disk/dsk4c DEC RZ1CB-CA (C) DEC bus-1-targ-1-lun-0 44:/dev/disk/dsk5c DEC RZ1CB-CA (C) DEC bus-1-targ-2-lun-0 45:/dev/disk/dsk6c DEC RZ1BB-CS (C) DEC bus-1-targ-3-lun-0 46:/dev/ntape/tape0 DEC TLZ10 (C) DEC bus-1-targ-4-lun-0
If the new root disk Device Name is listed as
(unknown)
,
check for proper hardware installation and configuration.
In this example
the root domain will be moved to
dsk6
.
The tape backup
device is
tape0
and the original root domain resides on
dsk3
.
Create a new root domain and root fileset on the new root
device and mount it at
/mnt
.
# mkfdmn -r /dev/disk/dsk6a root_domain # mkfset root_domain root # mount root_domain#root /mnt
Restore the root domain from backup.
If your backup is on tape:
# vrestore -x -f /dev/tape/tape0_d1 -D /mnt
If your backup is on disk:
First create a directory entry for the backup domain in the
/etc/fdmns
directory.
This new directory will only exist in the
UNIX installation environment.
# mkdir /etc/fdmns/TMP_BACKUP
Then create a soft link in the new directory pointing to the volume used for the backup domain.
# ln -s /dev/disk/dsk5c /etc/fdmns/TMP_BACKUP/dsk5c
Mount the domain and fileset containing the backup.
The new directory
is created in
/var
because the installation root file system
is mounted read-only.
# mkdir /var/tmp_backup # mount TMP_BACKUP#tmp_backup /var/tmp_backup
Restore the files from the
TMP_BACKUP
domain to the
new root domain.
# vrestore -x -f /var/tmp_backup/root_backup.vdump -D /mnt vrestore: Date of the vdump save-set:Fri May 11 2001 vrestore: Save-set source directory : / vrestore: informational: [13] posting event: sys.unix.fs.advfs.fset.backup.lock If running in single user mode, EVM is not running. Please ignore this posting. vrestore: informational: [13] posting event: sys.unix.fs.advfs.fset.backup.unlock If running in single user mode, EVM is not running. Please ignore this posting.
The new root domain is now created and populated with files from the original root domain.
To finish the process, you must update system bookkeeping
to point to the new root volume.
In this example, the root domain and the
swap partition were moved from
dsk3
to
dsk6
.
Nothing else was changed.
Update the
/etc/fdmns
directory to identify the new
root domain.
Here
dsk6a
is the volume containing the new
root domain and
dsk3a
is the volume containing the original
root domain.
# cd /mnt/etc/fdmns/root_domain # ln -s /dev/disk/dsk6a dsk6a # rm dsk3a
Change the swap partition in
sysconfigtab
in the new root domain using the editor of your choice.
This example uses
the vi editor.
# vi /mnt/etc/sysconfigtab
In the
vm:
section (stanza), change the
swap device line from
swapdevice=/dev/disk/dsk3b
to
swapdevice=/dev/disk/dsk6b
.
This change reflects
the new location of the swap partition.
Save the changes and exit the editor.
Halt the system and change the default boot device.
# halt . . . >>> set bootdef_dev dkb300
Boot the new root domain.
>>> boot
Retain the original root domain until you are certain that the data
in the original root domain was successfully transferred to the new root domain,
then remove the original domain with the
rmfdmn
command.
5.4 Disk File Structure Incompatibility
Domains created on operating system software Version 5.0 and later have a
new on-disk format that is incompatible with earlier versions (see
Section 1.6.3).
The newer operating system recognizes the older disk structure, but older
operating systems do not recognize the newer disk structure.
If you install
your new operating system software as an update to your Version 4 operating
system software (not a full installation), your
/root
,
/usr
, and
/var
files retain a domain version
number (DVN) of 3 (see
Section 1.6.3.1).
If you fully install
your Version 5 operating system, the
/root
,
/usr
, and
/var
files have a DVN of 4.
To access a DVN4 fileset from an older operating system, NFS mount the fileset from a server running Version 5.0 or later operating system software, or upgrade your operating system to Version 5.0 or later.
If you try to mount a fileset belonging to a DVN4 domain when you are running a version of the operating system earlier than Version 5.0, you get an error message.
There is no tool that automatically upgrades DVN3 domains to DVN4.
To
upgrade a domain to DVN4, use the procedure in
Section 1.6.3.2.
5.4.1 Utility Incompatibility
Because of the new on-disk file formats in Version 5.0 and later of the operating system, some AdvFS utilities from earlier releases have the potential to corrupt domains created using the new on-disk formats. All statically-linked AdvFS-specific utilities from earlier operating system versions do not run on Version 5.0 and later. These utilities are usually from operating system versions prior to Version 4.0. In addition, the following dynamically-linked AdvFS utilities from earlier releases of Tru64 UNIX do not run on Version 5.0 and later:
advfsstat
balance
chvol
defragment
rmvol
showfdmn
verify
5.4.2 Avoiding Metadata Incompatibility
If a system crashes or goes down unexpectedly, after reboot, AdvFS performs recovery when the filesets that were mounted at the time of the crash are remounted. This recovery keeps the AdvFS metadata consistent and makes use of the AdvFS transaction log file.
Different versions of the operating system use different AdvFS log record types. Therefore, it is important that AdvFS recovery operations be done on the same version of the operating system as was running at the time of the crash.
To reboot without error using a different version of the operating system,
cleanly unmount all filesets before rebooting.
If the system failed due to
a system panic or an AdvFS domain panic, it is best to reboot using the original
version of the operating system and then run the
verify
command to ensure that the domain is not corrupted.
If it is not corrupted,
you can reboot your system using a different version of the operating system.
If the
verify
utility indicates that the domain is corrupt,
see
Section 5.8.4.
5.5 Memory Mapping, Direct I/O, and Data Logging Incompatibility
Unless you have turned on atomic-write data logging by using the
mount -o adl
command, memory mapping, atomic-write data logging,
and direct I/O are mutually exclusive.
If a file is open in one of these modes,
attempting to open the same file in a conflicting mode fails.
For more information
see
Section 4.4,
Section 4.6, and
mmap
(2).
5.6 Invalid or Corrupt Saveset Format
If you are restoring a saveset that has been written to disk and get
an error message that its format is invalid or corrupt, check that you have
not backed the saveset up to partition
a
or
c
, which include block 0 of the disk.
Block 0, the disk label block,
is protected from accidental writes to it.
To dump to a partition that starts
at block 0 of a disk, you must first clear the disk label.
If you do not,
the output from the
vdump
command might appear to contain
valid savesets, but when the
vrestore
command attempts
to interpret the disk label as part of the saveset, it returns an error (see
Section 3.2.6).
5.7 Improving Poor Performance
The performance of a disk depends upon the I/O demands upon it. If you structure your domain so that heavy access is focused on one volume, it is likely that system performance will degrade. After you determine the load balance, there are a number of ways that you can equalize the activity and increase throughput. See System Configuration and Tuning, command reference pages, and Chapter 4 for more complete information.
To discover the causes of poor performance, first check system activity (see Section 4.1). There are a number of ways to improve performance:
Upgrade domains (Section 1.6.3.2)
DVN4 domains are indexed when a directory grows beyond a page, that is, about 200 files. Directories with more than 5000 files show the most benefit.
Eliminate disk access incompatibility (Section 4.6)
If you initiate direct I/O (which turns off caching) to read and write data to a file, any application that accesses the same file also has direct I/O. This might prove inefficient (see Section 4.6).
Defragment domains (Section 4.8)
As files grow, contiguous space on disk often is not available to accommodate new data, so files become fragmented. File fragmentation can reduce system performance because more I/O is required to read or write a file.
Move filesets to different volumes (Section 4.11)
You can move a domain to a volume that is larger or less congested. You can create another domain on another volume and move a fileset to it.
If you have AdvFS Utilities, you can also:
Balance a multivolume domain (Section 4.10)
Files that are distributed unevenly can degrade system performance.
Use the
balance
command to redistribute the files evenly
over all your volumes.
Stripe individual files (Section 4.13)
AdvFS allows you to stripe individual files across multiple volumes. Use AdvFS striping only on directly attached storage that is not otherwise striped. Combining AdvFS striping with other striping might degrade performance.
Migrate individual files (Section 4.12)
You can use the
migrate
utility to move a heavily
accessed file or selected pages of a file to another volume in the domain.
You can move the file to a specific volume, or you can let the system choose
where to move the file.
Change AdvFS resources
You can change your file system size in the following ways:
Increase the size of a domain by adding a volume using the
addvol
command (Section 1.6.6)
For optimum performance, each volume you add should consist of the entire
disk (typically, partition
c
).
Do not add a volume that
contains data you want to keep.
When you run the
addvol
command, data on the added device is destroyed.
Shrink a domain by removing a volume using the
rmvol
command (Section 1.6.7)
You can interrupt the
rmvol
process by pressing Ctrl/c
without damaging your domain.
Files already removed from the volume remain
in their new location.
Files that had not been moved at the time of the interrupt
remain in their original location.
If the volume from which the files have been removed does not allow
new file allocations after an aborted
rmvol
operation,
use the
chvol -A
command to reactivate the volume.
Striped file segments are moved to a volume that does not contain a stripe. If this is not possible, the system requests confirmation before doubling up on stripes (see Section 4.13.
Change the size of a domain by changing volumes (Section 4.12)
Add a new volume, move your files to it, then remove the old volume.
See System Limits for the number of volumes, domains, and so forth that the AdvFS file system can handle.
There are a number of problems that may be directly related to the way
you are using storage.
5.8.1 Reusing Space
If you want to add storage from an existing domain (the
/etc/fdmns
directory entry exists) to another domain, you can remove the volume
by using the
rmvol
command then add it to the other domain.
For example, if your volume is
/dev/disk/dsk5c
, your
original domain is
old_domain
, and the domain you want
to add the volume to is
new_domain
, mount all the filesets
in
old_domain
, then enter:
# rmvol /dev/disk/dsk5c old_domain # addvol /dev/disk/dsk5c new_domain
If the disk or disk partition you want to add is not part of an existing
domain but is giving you a warning message because it is labeled, reset the
disk label.
If you answer
yes
to the prompt on the
addvol
or
mkfdmn
command, the disk label is reset.
All information that is on the disk or disk partition that you are adding
is lost.
5.8.2 Limiting Disk Space Usage
If your system is running without any limits on resource usage, you can add quotas to limit the amount of disk space your users can access. AdvFS quotas provide a layer of control beyond that available with UFS.
User and group quotas limit the amount of space a user or group can allocate for a fileset. Fileset quotas restrain a fileset from using all of the available space in a domain.
You can set two types of quota limits: hard limits that cannot be exceeded, and soft limits that can be exceeded for a period of time called the grace period. You can turn quota enforcement on and off. See Chapter 2 for complete information.
If you are working in an editor and realize that the information you
need to save will exceed your quota limit, do not abort the editor or write
the file because data might be lost.
Instead, remove files to make room for
the edited file before writing it.
You can also write the file to another
fileset, such as
tmp
, remove files from the fileset whose
quota you exceeded, and then move the file back to that fileset.
AdvFS imposes quota limits in the rare case that you are 8 KB below
the user, group, or fileset quota and are attempting to use some or all of
the remaining space.
This is because AdvFS allocates storage in units of 8
KB.
If adding 8 KB to a file exceeds the quota limit, then that file is not
extended.
5.8.3 Fixing On-Disk Metadata Corruptions
If you have a domain that cannot be mounted without a domain panic,
or if the
verify
command detects on-disk corruption and
is unable to fix it, run the
fixfdmn
utility.
The
fixfdmn
utility is designed primarily to put a domain into a usable
(mountable) state.
In the process, as much data as possible is retrieved.
However, if recovering data from a file is your priority, use the
salvage
utility (see
Section 5.8.4).
The
fixfdmn
utility runs on unmounted filesets.
It
scans on-disk metadata looking for corruptions and, if enough viable data
is intact, it attempts to correct the corrupt metadata.
If not enough viable
metadata is available, the
fixfdmn
utility attempts to
bypass the corruption by moving or deleting the corrupt metadata and deleting
files as necessary.
You can run the
fixfdmn -n
command to check the domain
and not do any repairs.
The utility saves a message log file and two undo files.
The utility
can use the undo files to restore the domain to the configuration it had before
you ran the
fixfdmn
command.
See
fixfdmn
(8)
for more information.
5.8.4 Recovering File Data from a Corrupted Domain
The way you recover the contents of a corrupted domain depends on the nature of the corruption. Follow the recovery path for as many steps as needed. The following procedure assumes that you are only experiencing file system corruption, not hardware failure.
Run the
verify
command to try to repair
the domain (see
Section 5.2.2
and
verify
(8)).
The
verify
command fixes only a limited set of problems.
If the
verify
command detects on-disk corruption,
run the
fixfdmn
command (see
Section 5.8.3
and
fixfdmn
(8)).
If running the
fixfdmn
command does not
solve the problem, determine the date of the most recent backup.
Run the
salvage
command to recover as many of the recent file changes as
possible.
The
salvage
command extracts salvageable files
from the corrupted domain and places copies of them in filesets created to
hold the recovered files.
Depending on the nature of the corruption, you may
be able to extract all or some of the data in the corrupted domain.
You can use the
salvage -d
command to extract files
modified after a specified date and time.
If you have no backups, you can
run the
salvage
utility without the
-d
option to recover all the files in the domain.
Recreate the domain from the latest backups then copy any
files recovered with the
salvage
command into the recreated
domain.
Use the SysMan Manage an AdvFS Domain utility, or enter the
salvage
command from the command line.
You can recover data to disk
or to tape.
The amount of data you can recover depends upon the nature of
the corruption to your domain.
See
salvage
(8)
for more information.
Running the
salvage
command does not guarantee that
you will recover all files in your domain.
You might be missing files, directories,
file names, or parts of files.
The utility generates a log file that contains
the status of files that were recovered.
Use the
-l
option
to list in the log file the status of all files that are encountered.
The
salvage
command places the recovered files in
directories named after the filesets.
You can move the recovered files to
new filesets.
The utility creates a
lost+found
directory
for each fileset where it puts files that have no parent directory.
You can
specify the pathname of the directory that is to contain the recovered fileset
directories.
If you do not specify a directory, the utility writes recovered
filesets under the current working directory.
You can also recover data from a damaged domain to tape in a
tar
format.
5.8.4.1 Salvaging Data to Disk
You can recover data from a corrupted domain to another local unused
disk.
In this example the corrupted domain is called
PERSONNEL
and contains the fileset
personnel_fset
mounted at
/personnel
.
The original domain is on volume
/dev/disk/dsk12c
and the
salvage
command places output on
/dev/disk/dsk3c
.
Unmount all the filesets in the corrupted domain.
Create a domain and a fileset to hold the recovered information
and mount the fileset.
For example, to mount the fileset
recover_fset
in the domain
RECOVER
mounted at
/recover
:
# mkfdmn /dev/disk/dsk3c RECOVER # mkfset RECOVER recover_fset # mkdir /recover # mount RECOVER#recover_fset /recover
Run the
salvage
command.
In this example,
files from the
PERSONNEL
domain that were modified after
1:30 PM on December 7, 2000 are extracted from the damaged domain.
# /sbin/advfs/salvage -d 200012071330 -D /recover PERSONNEL salvage: Domain to be recovered 'PERSONNEL' salvage: Volume(s) to be used '/dev/disk/dsk12c' salvage: Files will be restored to '/recover' salvage: Logfile will be placed in './salvage.log' salvage: tarting search of all filesets: 09-May-2001 salvage: tarting search of all volumes: 09-May-2001 salvage: Loading file names for all filesets: 09-May-2001 salvage: tarting recovery of all filesets: 09-May-2001
View the
salvage.log
file to ensure that all necessary
files were recovered.
Recreate the domain. Here the domain is recreated on the original volume.
Caution
If you recreate a domain on the same volume as your original domain, you destroy all the data in the original domain. To save your corrupted domain, recreate the domain on a different volume.
# rmfdmn PERSONNEL rmfdmn: remove domain PERSONNEL? [y/n] y rmfdmn: informational:[13]posting event: sys.unix.fs.advfs.fdmn.rm If running in single user mode, EVM is not running Please ignore this posting. rmfdmn: domain PERSONNEL removed. # mkfdmn /dev/disk/dsk12c PERSONNEL # mkfset PERSONNEL personnel_fset
If you are restoring some of the domain from backup, do this now. This procedure is specific to your site.
Copy the salvaged files from the temporary location to the restored domain and remove the recovery domain.
# mkdir /personnel # mount PERSONNEL#personnel_fset /personnel # cp -Rp /RECOVER/personnel_fset/* /personnel # umount /recover # rmfdmn RECOVER rmfdmn: remove domain RECOVER [y/n] y rmfdmn: domain RECOVER removed.
5.8.4.2 Salvaging Data to Tape
If your system does not have enough space to hold the information recovered
by the
salvage
utility, you can recover data to tape and
then write it back on to your original disk location.
To recover data from a corrupted domain called
PERSONNEL
on volume
/dev/disk/dsk12c
containing the
personnel_fset
fileset mounted at
/personnel
to tape:
Unmount all filesets in the corrupted domain.
Install a tape on the local tape drive.
Run the
salvage
command using the
-F
and
-f
options to specify
tar
format and the tape drive.
In this example, files from the
PERSONNEL
domain
that were modified after 1:30 PM on December 7, 2000 are extracted and stored
on tape.
# /sbin/advfs/salvage -d 200012071330 -F tar \ -f /dev/tape/tape0_d1 PERSONNEL salvage: Domain to be recovered 'PERSONNEL' salvage: Volume(s) to be used '/dev/disk/dsk12c' salvage: Files archived to '/dev/tape/tape0_d1' in TAR format salvage: Logfile will be placed in './salvage.log' salvage: Starting search of all filesets: 09-May-2001 salvage: Starting search of all volumes: 09-May-2001 salvage: Loading file names for all filesets: 09-May-2001 salvage: Starting recovery of all filesets: 09-May-2001
View the
salvage.log
file to ensure that all necessary
files were recovered.
Recreate the domain.
Caution
If you recreate a domain on the same volume as your original domain, you destroy all the data in the original domain. To save your corrupted domain, recreate the domain on a new volume.
# rmfdmn PERSONNEL rmfdmn: remove domain PERSONNEL? [y/n] y rmfdmn: informational:[13]posting event: sys.unix.fs.advfs.fdmn.rm If running in single user mode, EVM is not running Please ignore this posting. rmfdmn: domain PERSONNEL removed. # mkfdmn /dev/disk/dsk12c PERSONNEL # mkfset PERSONNEL personnel_fset
If you are restoring some of the domain from backup, do this now.
Copy the salvaged files from tape to the restored domain and remove the recovery domain.
# cd / # mkdir /personnel # mount PERSONNEL#personnel_fset /personnel # tar -xpvf /dev/tape/tape0_d1
5.8.4.3 Salvaging Data from a Corrupted root Domain
If your system is not bootable because the root domain is corrupt, you
can boot your system from the installation CD-ROM and run the
/sbin/advfs/salvage
command.
Follow the steps in
Section 5.11
to boot
your system and exit the installation.
Depending on the nature and extent
of the root domain corruption, successful file recovery may not be possible.
If you are booting from the installation CD-ROM, device name assignments
may differ from the assignments made on the installed operating system.
Use
the
hwmgr -view devices
command to view a table of special
device names mapped to hardware identification.
Be certain you are referencing
the intended devices before issuing commands that destroy data.
To recover data from a corrupted root domain on volume
/dev/disk/dsk0a
to another local, unused disk,
/dev/disk/dsk3c
:
Create a domain and filesets to hold the recovered information and mount the filesets.
# mkfdmn /dev/disk/dsk3c RECOVER # mkfset RECOVER recover_fset # mkdir /recover # mount RECOVER#recover_fset /recover
Run the
salvage
command.
You must use
the
-V
option to specify the volume that the command will
operate on.
In this example, files from the
PERSONNEL
domain
that were modified after 1:30 PM on December 7, 2000 are extracted and stored
in filesets mounted at
/recover
.
# /sbin/advfs/salvage -d 200012071330 -D /recover \ -V /dev/disk/dsk0a salvage: Volume(s) to be used '/dev/disk/dsk0a' salvage: Files will be restored to '/recover' salvage: Logfile will be placed in './salvage.log' salvage: Starting search of all filesets: 09-May-2001 salvage: Loading file names for all filesets: 09-May-2001 salvage: Starting recovery of all filesets: 09-May-2001
View the
salvage.log
file to ensure that all necessary
files were recovered.
Recreate the root domain as described in
Section 5.11.
Mount the root domain again at
/mnt
.
If you intend to recover
your root domain from backup, do so now.
Copy the salvaged files from the recovery location to the root domain and remove the recovery domain.
# cd /recover # cp -RP * /mnt # cd / # umount /mnt /recover # rmfdmn RECOVER rmfdmn: remove domain RECOVER [y/n] y rmfdmn: domain RECOVER removed.
The root domain is restored.
5.8.4.4 Salvaging Data Block by Block
If you ran the
salvage
utility and were unable to
recover a large number of files, run the
salvage -S
command.
This process is very slow because the utility reads every disk block at least
once.
If you are recovering to tape and have already created a new domain
on the disks containing the corrupted domain, you cannot use the
-S
option because your original information is lost.
Note
If you have accidentally used the
mkfdmn
command on a good domain, running thesalvage -S
utility is the only way to recover files.
Caution
The
salvage
utility opens and reads block devices directly, which can present a security problem. With the-S
option it might be possible to access data from older, deleted AdvFS domains while attempting to recover data from the current AdvFS domain.
The following example recovers data block by block.
# /sbin/advfs/salvage -S PERSONNEL salvage: Domain to be recovered 'PERSONNEL' salvage: Volume(s) to be used '/dev/disk/dsk12c' salvage: Files will be restored to '.' salvage: Logfile will be placed in './salvage.log' salvage: Starting sequential search of all volumes: 09-May-2001 salvage: Loading file names for all filesets: 09-May-2001 salvage: Starting recovery of all filesets: 09-May-2001
5.8.5 "Can't Clear a Bit Twice" Error Message
If you receive a "Cannot clear a bit twice" error message, your domain is damaged. To repair it:
Set the
AdvfsFixUpSBM
kernel variable to
allow access to the damaged domain.
This flag is off by default.
To turn it
on:
# dbx -k /vmunix /dev/mem dbx> assign AdvfsFixUpSBM = 1 dbx> quit
Mount and back up the filesets in the damaged domain.
Turn
AdvfsFixUpSBM
off:
# dbx -k /vmunix /dev/mem dbx> assign AdvfsFixUpSBM = 0 dbx> quit
Unmount the filesets in the domain.
Run the
verify
-f
utility.
If there are errors, continue through steps 5 and 6.
Recreate the domain and filesets.
Restore from the backup.
Note
The
AdvfsFixUpSBM
variable is global. Turn it off so that the error message is again available for all domains.
5.8.6 Recovering from a Domain Panic
When a metadata write error occurs, or if corruption is detected in
a single AdvFS domain, the system initiates a
When a domain panic occurs, an
For example:
By default, a domain panic on an active domain causes a live dump to
be created and placed in the
To recover from a domain panic, perform the following steps:
Run the
Unmount all the filesets in the affected domain.
Examine the
Run the
If the problem is a hardware problem, fix it before continuing.
Run the
If there are no errors, mount all the filesets you unmounted
and resume normal operations.
If the
If the failure prevents complete recovery, recreate the domain
on new volumes by using the
For example:
You do not need to reboot after a domain panic.
If you have recurring domain panics, you might try adjusting the
When a fileset is mounted, AdvFS verifies that all volumes in a domain
can be accessed.
The size recorded in the domain's metadata for each volume
must match the size of the volume.
If the sizes match, the mount proceeds.
If a volume is smaller than the recorded size, AdvFS attempts to read the
last block marked in use for the fileset.
If this block can be read, the mount
succeeds, but the fileset is marked as read-only.
If the last in-use block
for any volume in the domain cannot be read, the mount fails.
See
If a fileset is mounted read-only, check the labels of the flagged volumes
in the error message.
There are two common errors:
A disk is mislabeled on a RAID array.
An LSM volume upon which an AdvFS domain resides was shrunk
from its original size (see
Section 1.10).
If you have AdvFS Utilities, and if the domain consists of multiple
volumes with enough free space to remove the offending volume, you do not
need to remove your filesets.
However, you should back them up before proceeding.
Remove the volume from the domain by using the
Correct the disk label of the volume by using the
Add the corrected volume back to the domain by using the
Run the
For example, if
If you do not have AdvFS Utilities, or if there is not enough free space
in the domain to transfer the data from the offending volume:
Back up all filesets in the domain.
Remove the domain by using the
Correct the disk label of the volume by using the
Make the new domain.
If you have AdvFS Utilities and if the original domain was
multivolume, add the corrected volume back to the domain by using the
Restore the filesets from the backup.
For example, if
If you are recreating a multivolume domain, include the necessary
AdvFS must have a current
It is preferable to restore the
If you cannot restore the
If you choose to reconstruct the directory manually, you must know the
name of each domain and its associated volumes.
If you accidentally lose all or part of your
The following example reconstructs the
To reconstruct the two single-volume domains, enter:
The following example reconstructs one multivolume domain.
The
To reconstruct the multivolume domain, enter:
You can use the
If you moved disks to a new system, if device numbers have
changed, or if you lost track of a domain location
For repair, if you delete the
The
Determine if a partition is an AdvFS partition.
List partitions in the order they are found on disk.
Read the disk label to determine which partitions are in the
domain and if any are overlapping.
Scan all disks found in any
Recreate missing domain directories.
The domain name is created
from the device name.
Fix the domain count and links for a domain.
For each domain there are three numbers that must match for the AdvFS
file system to operate properly:
The number of physical partitions found by the
The domain volume count (the number stored in the AdvFS metadata
that specifies the number of partitions in the domain)
The number of
See
Inconsistencies can occur in these numbers for several reasons.
In general,
the
Table 5-2
shows possible causes and corrective
actions if the expected value, N, for the number of partitions and for the
domain value count do not equal the number of links in the
Table 5-3
shows possible causes and corrective
actions if the expected value, N, for the number of partitions and for the
number of links in the
Table 5-4
shows possible causes and corrective
actions if the expected value, N, for the domain volume count and for the
number of links in the
In the following example no domains are missing.
The
In the following example, directories that define the domains that include
A partition is found containing an AdvFS domain.
The domain
volume count reports one, but there is no domain directory in the
Another partition is found containing a different AdvFS domain.
The domain volume count is also one.
There is no domain directory that contains
this partition.
No other AdvFS partitions are found.
The domain volume counts
and the number of partitions found match for the two discovered domains.
The
The
The command and output are as follows:
Some problems occur in AdvFS because of hardware errors.
For example,
if a write to the file system fails due to a hardware fault, it might appear
as metadata corruption.
Hardware problems cannot be repaired by your file
system.
If unexplained errors on a volume in a multivolume domain, do the following:
As root user, examine the
This error message describes the domain, fileset, and volume on which
the error occurred.
It also describes how to find out which file was affected
by the I/O error.
If you have no AdvFS I/O error messages but still have
unexplained behavior on the file system, unmount the domain as soon as possible
and run the
Check for device driver error messages for the volume described
in the AdvFS I/O error message.
If you have no error messages, unmount the
domain as soon as possible and run the
Try to remove the faulty volume by using the
If
If you have a recent backup, recreate the domain and restore
it from backup.
If you have no backup, or if it is too old, use the
Remove the faulty domain by using the
Recreate the domain by using the
Restore the contents of the recreated domain using the information
obtained in the backup step.
Remount the filesets in the domain.
Catastrophic corruption of your AdvFS root domain typically requires
that you recreate your root file system in order to have a bootable system.
This section explains recovering a corrupted root domain on a non-clustered
system.
For other configurations, see
System Administration
"Duplicating
or Recovering a System (Root) Disk" and
Cluster Administration.
If your root
volume is an LSM volume, see
Logical Storage Manager.
Follow this procedure if the root domain is corrupt.
This procedure
assumes that the hardware disk device containing the corrupted root domain
is functioning properly, that the disklabel is correct, and that the problem
is due to data corruption.
You must be root user to reconstruct the root domain.
Depending on your system configuration, you might need the following:
Information about console commands
You will use Alpha System Reference Manual (SRM) console commands at
the system console prompt ( A current operating system CD-ROM
You can use the operating system CD-ROM that is packaged with the distribution
media to boot your system and perform maintenance activities on various utilities.
If your local site provides a Remote Installation Service (RIS) server,
you can boot your system across the network.
If you choose RIS services, follow
your site-specific procedures and consult the
Installation Guide.
Recent root domain backup media (full and recent incremental
backups)
You will need to recreate the root domain on the boot device.
You are
best prepared if you have a full and recent backup of the root domain.
If
you do not have adequate backup, depending on the nature and extent of the
root domain corruption, you may be able to recover root files using the
You need to identify the following hardware resources to complete the
restoration of your root disk.
If you plan to boot your system from the operating system CD-ROM, determine
the name of your CD-ROM drive.
One method of identifying your CD-ROM drive
is by issuing the
In this example, the CD-ROM device name is
If you plan to boot your system from a RIS server, determine the name
of your network interface device.
One method of identifying your network interface
device is by issuing the
In this example, the network interface device name is
For additional information, see the hardware manual for your system.
For information about RIS servers, see the
Installation Guide -- Advanced Topics.
In previous versions of the operating system, device names were assigned
based on the physical location of the drive on an I/O bus.
In Version 5.0 and later
operating system software, device names are assigned logically and stored
in a database.
These names are independent of the device's physical location.
You must determine the boot device name according to the SRM console.
If your boot device is the default boot device, you can identify this device
using the
If your boot device is not the default boot device, use the
For example, if
If the root domain is mountable when you boot from the installation
media, the installation procedure attempts to read the existing device database
from the installed root domain.
If this read succeeds, the following message
appears on the console:
If the hardware database read fails, messages similar to the following
appears on the console:
If the hardware database read fails, you must translate the UNIX device
name assignments to the proper hardware device by identifying the device by
its bus/target/LUN (see
Section 5.11.2).
The following steps recover your failed root domain.
Boot the system using one of the following methods:
Insert and boot your installation CD-ROM using the device
name that you determined previously.
For example:
Boot from your local RIS server.
For example:
Exit the installation as follows:
If you have a VGA graphics console, choose to exit the installation,
or from the File menu of the Installation and Configuration Welcome dialog
box, choose shell window.
If you have a serial console terminal, select option
You will get a shell (#) prompt.
Identify both the bus/target/LUN of the target disk that will
be used as the restored root disk and the status of backup device by using
the
In this example, the SRM console is identified
To visually confirm that you have identified the correct device, use
the
If you plan to recover from a local tape device, identify the device
in the list displayed by the
If you have a tape backup device, install it.
For more information see
System Administration
"Using dn_setup to Perform
Generic Operations."
To verify the installation, repeat the
If necessary, recover files with the
Create the new root domain and root fileset.
Mount the fileset
at
Use the
If necessary, copy files recovered with the
Halt the system.
Boot the system.
Verify success by checking the boot process for error messages.
It is a good idea to use the
If the procedure was not successful and hardware failures are not present,
your only recourse is to reinstall the operating system from the distribution
media and recreate your customized environment from backup media.
Before you restore a multivolume
First create a one volume usr domain and restore the
LMF has two parts.
A utility is stored in
The following example shows how to restore a multivolume domain where
the
Mount the root fileset as read/write.
Remove the links for the old
Create and mount the
Create a soft link in
Insert the
Reset the license database.
Add the extra volumes to
Do a full restore of the
The following example shows how to restore a multivolume domain where
the
Mount the root fileset as read/write.
Remove the links for the old
Create and mount the
Insert the
Insert the
Reset the license database.
Add the extra volumes to
Do a full restore of
Insert the
When each domain is mounted after a crash, the system automatically
runs recovery code that checks the transaction log file to ensure that file
system operations that were occurring when the system crashed are either completed
or backed out.
This ensures that AdvFS metadata is in a consistent state after
a crash.
If you are recovering your system by using an operating system other
than the one that crashed, see
Section 5.4.
If it appears that a domain is corrupted or it is otherwise causing
problems, run the
If a machine has failed, you can move disks containing AdvFS domains
to another computer running the AdvFS software.
Connect the disk(s) to the
new machine and modify the
You cannot move DVN4 domains to systems running Version 4 of the operating
system software.
Doing so generates an error message (see
Section 5.4).
You can move DVN3 domains from a Version 4 machine to a machine running Version
5.
The newer operating system recognizes the domains created earlier.
Do not use either the
If you do not know which partitions your domains were on, you can add
the disks on the new machine and run the
If the motherboard of your machine fails, you must move the disks to
another system.
You might need to reassign the disk SCSI IDs to avoid conflicts.
(See your disk manufacturer instructions for more information.)
For example, assume the IDs are assigned to disks 6 and disk 8.
Assume
also that the system has a domain,
Shut down the working machine to which you are moving the
disks.
Connect the disks from the bad machine to the good one.
Reboot.
You do not need to reboot to single-user mode; multiuser
mode works because you can complete the following steps while the system is
running.
Determine the device nodes created for the new disks.
The output is a detailed list of information about all
the disks on your machine.
The DEVICE FILE column shows the name that the
system uses to refer to each disk.
Find the listings for the disks that you
just added, for example,
Modify your
Edit the
Mount the volumes.
Note that if you run the
When a system crashes, AdvFS performs recovery at reboot.
Filesets
that were mounted at the time of the crash are recovered when they are remounted.
This recovery keeps the AdvFS metadata consistent and makes use of the AdvFS
transaction log file.
Since different versions of the operating system use different transaction
log file structures, it is important that you recover your filesets on the
version of the operating system that was running at the time of the crash.
If you do not, you risk corrupting the domain metadata and/or panicking the
domain.
If the system crashed because you set the
If the removal process is interrupted (see
Section 1.6.7),
under some circumstances the volume can be left in an inaccessible state where
you cannot write to it.
These volumes are marked as "data unavailable"
in the output of the
EVM
event is logged
(see
EVM
(5)) and the following message is printed to the system log and
the console:
AdvFS Domain Panic; Domain
name
Id
domain_Id
AdvFS Domain Panic; Domain staffb_domain Id 2dad7c28.0000dfbb
An AdvFS domain panic has occurred due to either a
metadata write error or an internal inconsistency.
This domain is being rendered inaccessible.
/var/adm/crash
directory.
Some AdvFS-related errors might also be recorded in
/var/adm/binary.errlog
.
Please file a problem report with your software support organization
and include the dump file and a copy of the running kernel.
mount -t
command and identify all
mounted filesets in the affected domain.
/etc/fdmns
directory to obtain
a list of the AdvFS volumes in the domain that panicked.
savemeta
command (see
savemeta
(8))
to collect information about the metadata files for each volume in the domain.
Technical support needs this information.
verify
utility on the domain (see
Section 5.2.2).
verify
command runs but shows errors,
mount the filesets, do a backup, and recreate the domain.
Note that the backup
might be incomplete and that earlier backup resources might be needed.
mkfdmn
command and restore
the domain's data from backup.
If the backup does not provide enough information,
you might need to run the
salvage
utility (see
Section 5.8.4).
# mount -t advfs
staffb_dmn#staff3_fs on /usr/staff3 type advfs (rw)
staffb_dmn#staff4_fs on /usr/staff4 type advfs (rw)
# umount /usr/staff3
# umount /usr/staff4
# ls -l /etc/fdmns/staffb_dmn
lrwxr-xr-x 1 root system 10 Nov 04 16:46
dsk35c->/dev/disk/dsk3c
lrwxr-xr-x 1 root system 10 Nov 04 16:50
dsk36c->/dev/disk/dsk6c
lrwxr-xr-x 1 root system 10 Nov 04 17:00
dsk37c->/dev/disk/dsk1c
# savemeta staffb_dmn /tmp/saved_dmn
# verify staffb_dmn
AdvfsDomainPanicLevel
attribute (see
Section 4.14)
in order to facilitate debugging.
5.8.7 Recovering from Filesets That are Mounted Read-Only
mount
(8)
for more information.
rmvol
command.
(This automatically migrates the data to the remaining
volumes.)
disklabel
command.
addvol
command.
balance
command to distribute the
data across the new volumes.
/dev/disk/dsk2c
(on a device here
called <disk>) within the
data5
domain is mislabeled,
you can migrate your files on that volume (automatic with the
rmvol
command), then move them back after you restore the volume.
# rmvol /dev/disk/dsk2c data5
# disklabel -z dsk2
# disklabel -rw dsk2 <disk>
# addvol /dev/disk/dsk2c data5
# balance data5
rmfdmn
command.
disklabel
command.
addvol
command.
/dev/disk/dsk1c
(on a device here
called <disk>) containing the
data3
domain is mislabeled:
# vdump -0f -u /data3
# rmfdmn data3
# disklabel -z dsk1 <disk>
# disklabel -w dsk1 <disk>
# mkfdmn data3
addvol
commands to add the additional volumes.
For example to add
/dev/disk/dsk5c
to the domain:
# addvol /dev/disk/dsk5c data3
# mkfset data3 data3fset
# mount data3#data3fset /data3
# vrestore -xf - /data3
5.9 Restoring the /etc/fdmns Directory
/etc/fdmns
directory in
order to mount filesets (see
Section 1.6.2).
A missing or damaged
/etc/fdmns
directory prevents access to a domain, but the data within
the domain remains intact.
You can restore the
/etc/fdmns
directory from backup or you can recreate it.
/etc/fdmns
directory
from backup if you have a current backup copy.
You can use any standard backup
facility (vdump
,
tar
, or
cpio
) to back up the
/etc/fdmns
directory.
To restore
the directory, use the recovery procedure that is compatible with your backup
process.
/etc/fdmns
directory, you
can reconstruct it manually (see
Section 5.9.1) or with
the
advscan
command (see
Section 5.9.2).
The procedure for reconstructing the
/etc/fdmns
directory
is similar for both single-volume and multivolume domains.
You can construct
the directory for a missing domain, missing links, or the whole directory.
5.9.1 Reconstructing the /etc/fdmns Directory Manually
/etc/fdmns
directory, and you know which domains and links are missing, you can reconstruct
it manually.
/etc/fdmns
directory and two domains, In this example the domains exist and their names
are known.
Each domain contains a single volume (or special device).
Note
that the order of creating the links in these examples does not matter.
The
domains are:
domain1
on
/dev/disk/dsk1c
domain2
on
/dev/disk/dsk2c
# mkdir /etc/fdmns
# mkdir /etc/fdmns/domain1
# cd /etc/fdmns/domain1
# ln -s /dev/disk/dsk1c dsk1c
# mkdir /etc/fdmns/domain2
# cd /etc/fdmns/domain2
# ln -s /dev/disk/dsk2c dsk2c
domain1
domain contains the following three volumes:
/dev/disk/dsk1c
/dev/disk/dsk2c
/dev/disk/dsk3c
# mkdir /etc/fdmns
# mkdir /etc/fdmns/domain1
# cd /etc/fdmns/domain1
# ln -s /dev/disk/dsk1c dsk1c
# ln -s /dev/disk/dsk2c dsk2c
# ln -s /dev/disk/dsk3c dsk3c
5.9.2 Reconstructing the /etc/fdmns Directory Using advscan
advscan
command to determine which
partitions on a disk or which Logical Storage Manager (LSM) volumes are part
of an AdvFS domain.
Then you can use the command to rebuild all or part of
your
/etc/fdmns
directory.
This command is useful:
/etc/fdmns
directory, delete a domain from the
/etc/fdmns
directory,
or delete links from a domain's subdirectory in the
/etc/fdmns
directory
advscan
command can:
/etc/fdmns
domain.
advscan
command that have the same domain ID
/etc/fdmns
links to the partitions,
because each partition must be represented by a link
advscan
(8)
for more information.
advscan
command treats the domain volume count as more
reliable than the number of partitions or the
/etc/fdmns
links.
The following tables list anomalies, possible causes, and corrective
actions that the
advscan
utility can take.
In the table,
the letter N represents the value that is expected to be consistent for the
number of partitions, the domain volume count, and the number of links.
/etc/fdmns/<dmn>
directory.
Table 5-2: Fileset Anomalies and Corrections - Links Not Equal
Number
of Links in /etc/fdmns/ <dmn>
Possible
Cause
Corrective
Action
<N
addvol
terminated
early or a link in
/etc/fdmns/<dmn>
was manually removed.If the domain is activated before
running the
advscan -f
command and the cause of the mismatch
is an interrupted
addvol
command, the situation is corrected
automatically.
Otherwise,
advscan
utility adds the partition
to the
/etc/fdmns/<dmn>
directory.
>N
rmvol
terminated
early or a link in
/etc/fdmns/<dmn>
was manually added.If the domain is activated and
the cause of the mismatch is an interrupted
rmvol
command,
the situation is corrected automatically.
If the cause Is a manually added
link in
/etc/fdmns/<dmn>
, systematically try removing
different links in the
/etc/fdmns/<dmn>
directory and
activating the domain.
The number of links to remove is the number of links
in the
/etc/fdmns/<dmn>
directory minus the domain volume
count displayed by
advscan
./etc/fdmns/<dmn>
directory do
not equal the domain volume count.
Table 5-3: Fileset Anomalies and Corrections - Domain Volume Count Not Equal
Domain
Volume Count
Possible
Cause
Corrective
Action
<N
Cause unknown.
Cannot correct; run the
salvage
utility to recover as much data as possible from the domain.
>N
The
addvol
command terminated early and the partition being added is missing or was reused.Cannot correct; run the
salvage
utility to recover as much data as possible from the remaining
volumes in the domain./etc/fdmns/<dmn>
directory do
not equal the number of partitions.
Table 5-4: Fileset Anomalies and Corrections - Number of Partitions Not Equal
Number
of Partitions
Possible
Cause
Corrective
Action
<N
Partition missing.
Cannot correct; run the
salvage
utility to recover as much data as possible from the remaining
volumes in the domain.
>N
The
addvol
command terminated early.None; domain mounts with N volumes;
rerun the
addvol
command.advscan
command scans devices
dsk0
and
dsk5
for AdvFS partitions and finds nothing amiss.
Two partitions are
found,
dsk0c
and
dsk5c
, the domain volume
count reports two, and two links are entered in the
/etc/fdmns
directory.
# advscan dsk0 dsk5
Scanning disks dsk0 dsk5
Found domains:
usr_domain
Domain Id 2e09be37.0002eb40
Created Thu Feb 24 09:54:15 2000
Domain volumes 2
/etc/fdmns links 2
Actual partitions found:
dsk0c
dsk5c
dsk6
were removed from the
/etc/fdmns
directory.
This means that the number of
/etc/fdmns
links, the number
of partitions, and the domain volume counts are no longer equal.
In this example
the
advscan
command scans device
dsk6
and recreates the missing domains as follows:
/etc/fdmns
directory that contains this partition.
advscan
command creates directories
for the two domains in the
/etc/fdmns
directory.
advscan
command creates symbolic links
for the devices in the
/etc/fdmns
domain directories.
# advscan -r dsk6
Scanning disks dsk6
Found domains:
*unknown*
Domain Id 2f2421ba.0008c1c0
Created Thu Jan 20 13:38:02 2000
Domain volumes 1
/etc/fdmns links 0
Actual partitions found:
dsk6a*
*unknown*
Domain Id 2f535f8c.000b6860
Created Fri Feb 25 09:38:20 2000
Domain volumes 1
/etc/fdmns links 0
Actual partitions found:
dsk6b*
Creating /etc/fdmns/domain_dsk6a/
linking dsk6a
Creating /etc/fdmns/domain_dsk6b/
linking dsk6b
5.10 Recovering from Corruption of a Domain
/var/adm/messages
file for AdvFS I/O error messages.
For example:
Dec 05 15:39:16 systemname vmunix: AdvFS I/O error:
Dec 05 15:39:16 systemname vmunix: Domain#Fileset:test1#tstfs
Dec 05 15:39:16 systemname vmunix: Mounted on: /test1
Dec 05 15:39:17 systemname vmunix: Volume: /dev/rz11c
Dec 05 15:39:17 systemname vmunix: Tag: 0x00000006.8001
Dec 05 15:39:17 systemname vmunix: Page: 76926
Dec 05 15:39:17 systemname vmunix: Block: 5164080
Dec 05 15:39:17 systemname vmunix: Block count: 256
Dec 05 15:39:17 systemname vmunix: Type of operation: Read
Dec 05 15:39:17 systemname vmunix: Error: 5
Dec 05 15:39:17 systemname vmunix: To obtain the name of
Dec 05 15:39:17 systemname vmunix: the file on which the
Dec 05 15:39:17 systemname vmunix: error occurred, type the
Dec 05 15:39:17 systemname vmunix: command
Dec 05 15:39:17 systemname vmunix: /sbin/advfs/tag2name
Dec 05 15:39:17 systemname vmunix: /test1/.tags/6
verify
utility (see
Section 5.2.2)
to check the consistency of the domain's metadata.
verify
utility to
check the integrity of the domain's metadata.
If there are no device driver
I/O error messages that correspond to the AdvFS I/O error messages, then the
file system is being affected by problems with the underlying hardware.
rmvol
utility (see
Section 1.6.7).
If this succeeds, the
file system problems should not recur.
rmvol
fails due to more I/O errors, you must recreate
the domain.
salvage
utility (see
Section 5.8.4) to
extract the contents of the corrupted domain.
rmfdmn
command.
mkfdmn
command.
Remember that if you are recreating your domain, it will have a DVN
of 4 by default (see
Section 1.6.3).
Add volumes as needed if
you have the AdvFS Utilities license.
Do not to include the faulty volume
in the new domain.
5.11 Recovering from Corruption of an AdvFS root Domain
>>>
) to perform
some tasks.
These commands are documented in the hardware manual for your
Alpha system.
If you cannot find the printed document, it is usually shipped
as a printable file on a CD-ROM supplied with the system.
salvage
utility.
The
salvage
utility may also
be used to recover files that were modified or created following the most
recent backup.
5.11.1 Identifying the Hardware Resources
5.11.1.1 SRM Console Names for CD-ROM Drive or Network Interface Device
show device
command at the SRM console
prompt.
>>> show device | grep -E 'RR|CD'
DKA400 RRD47 1206 dka400.4.0.5.0
DKA400
according to the SRM console firmware.
show device
command at the SRM
console prompt.
>>> show device | more
....
ewa0.0.0.8.0 EWA0 08-00-2B-C3-E3-DC
...
EWA0
according to the SRM console firmware.
5.11.1.2 SRM Console Boot Device Name
show bootdef_dev
command at the SRM console prompt.
>>> show bootdef_dev
bootdef_dev dkb400.4.0.5.1
show device
command from the SRM console prompt to identify your
boot device from the list.
dkb400
is the boot device,
dk
indicates that the device is a SCSI disk, the
b
indicates that the device is connected to SCSI bus
b
, and
the
400
indicates that the device's SCSI target ID is
4
and its logical unit number (LUN) is
00
.
Thus,
in this example, the bus/target/LUN information is
1/4/00
.
This information identifies the device when you restore your domain.
5.11.1.3 UNIX Device Names
Attempting to mount previous root file system disk
to save hardware configuration information...
done
Attempting to mount previous root file system disk
to save hardware configuration information...
FAILED
Unable to retain old hardware configuration from
SCSI 1 4 0 0 0 6000 10201 077
Unable to save existing hardware configuration.
New configuration will be used.
5.11.2 Applying the Procedure
>>> boot dka400
>>> boot ewa0
3) Exit Installation
.
hwmgr -view devices
command.
# hwmgr -view devices
HWID: Device Name Mfg Model Location
------------------------------------------------------------
38:/dev/disk/floppy0c 3.5in floppy fdi0-unit-0
41:/dev/disk/dsk0c DEC RZ1DB-CA (C) DEC bus-1-targ-4-lun-0
42:/dev/disk/dsk1c DEC RZ1CB-CA (C) DEC bus-1-targ-5-lun-0
43:/dev/disk/dsk2c DEC RZ1CB-CA (C) DEC bus-1-targ-6-lun-0
44:/dev/disk/cdrom0 DEC RRD47 (C) DEC bus-0-targ-5-lun-0
47:(unknown) DEC TLZ10 (C) DEC bus-1-targ-4-lun-0
DKB400
and the disk located at bus
b
, target
4
,
LUN
0
.
According to the hardware database, this same disk
is identified as
dsk0
(see
Section 5.11.1.2).
In this procedure,
/dev/disk/dsk0a
will be used as the
volume containing the corrupted root domain.
A new root domain will be created
on
/dev/disk/dsk0a
and files from the old root domain will
be restored on it.
hwmgr --flash
command to cause the disk's light
to flash for thirty seconds.
# /sbin/hwmgr -flash light -dsf /dev/disk/dsk0a
hwmgr
utility.
If you do not
see the tape device, check for proper installation and hardware configuration.
# dn_setup -install_tape
hwmgr
command.
salvage
command and save them to a temporary domain (see
Section 5.8.4).
/var/mnt
.
# mkfdmn -r /dev/disk/dsk0a root_domain
Warning: /dev/disk/dsk0a is marked in use for AdvFS.
If you continue with the operation you can
possibly destroy existing data.
CONTINUE? [y/n] y
# mkfset root_domain root
# mkdir /var/mnt
# mount root_domain#root /var/mnt
vrestore
command to restore the
files from backup device you installed earlier.
# vrestore -xf /dev/tape/tape0 -D /var/mnt
salvage
command into the newly created root domain (see
Section 5.8.4).
# halt
>>> boot
dsfmgr
command
to verify and fix the device databases and device special file names.
For
example:
# dsfmgr -v
5.12 Restoring a Multivolume usr Domain
/usr
file system,
you must first reconstruct the
usr_domain
domain with all
of its volumes.
However, restoring a multivolume domain requires the License
Management Facility (LMF).
LMF controls AdvFS Utilities, which includes the
addvol
command needed for creating multivolume domains.
addvol
command.
Then restore LMF and use it to enable the
addvol
command.
When this is complete, you can add volumes to the usr
domain and restore the complete multivolume domain.
/usr/sbin/lmf
and a database is stored in
/var/adm/lmf
.
On some systems
/var
is a link to
/usr
and both directories are
located in the usr fileset.
If your system has this configuration, recover
the
addvol
command and recover both parts of the LMF.
On
systems where the
/usr
and
/var
directories
are located in separate filesets in
usr_domain
, recover
the
addvol
command and the LMF utility into the usr fileset
and recover the LMF database into the var fileset.
/var
directory and the
/usr
directory
are both in the
usr
fileset in
usr_domain
.
The domain consists of the
dsk1g
,
dsk2c
,
and
dsk3c
volumes.
The procedure assumes that the root
file system has already been restored.
If it has not, see
Section 5.11.
# mount -u /
usr_domain
and create a new
usr_domain
using the initial volume.
# rm -rf /etc/fdmns/usr_domain
# mkfdmn /dev/disk/dsk1g usr_domain
/usr
and
/var
filesets.
# mkfset usr_domain usr
# mount -t advfs usr_domain#usr /usr
/usr
because that
is where the
lmf
command looks for its database.
# ln -s /var /usr/var
/usr
backup tape.
# cd /usr
# vrestore -vi
(/) add sbin/addvol
(/) add sbin/lmf
(/) add var/adm/lmf
(/) extract
(/) quit
# /usr/sbin/lmf reset
usr_domain
.
# /usr/sbin/addvol /dev/disk/dsk2c usr_domain
# /usr/sbin/addvol /dev/disk/dsk3c usr_domain
/usr
backup.
# cd /usr
# vrestore -xv
/usr
and
/var
directories are in
separate filesets in the same multivolume domain,
usr_domain
.
The domain consists of the
dsk1g
,
dsk2c
,
and
dsk3c
volumes.
In this case you must mount both the
/var
and the
/usr
backup tapes.
The procedure
assumes that the root file system has already been restored.
If it has not,
see
Section 5.11.
#
mount -u /
usr_domain
and create a new
usr_domain
using the initial volume.
# rm -rf /etc/fdmns/usr_domain
# mkfdmn /dev/disk/dsk1g usr_domain
/usr
and
/var
filesets.
# mkfset usr_domain usr
# mkfset usr_domain var
# mount -t advfs usr_domain#usr /usr
# mount -t advfs usr_domain#var /var
/var
backup tape and restore
from it.
# cd /var
# vrestore -vi
(/) add adm/lmf
(/) extract
(/) quit
/usr
backup tape.
# cd /usr
# vrestore -vi
(/) add sbin/addvol
(/) add sbin/lmf
(/) extract
(/) quit
# /usr/sbin/lmf reset
usr_domain
.
# /usr/sbin/addvol /dev/disk/dsk2c usr_domain
# /usr/sbin/addvol /dev/disk/dsk3c usr_domain
/usr
backup.
# cd /usr
# vrestore -xv
/var
backup tape and do a full
restore of
/var
backup.
# cd /var
# vrestore -xv
5.13 Recovering from a System Crash
5.13.1 Saving Copies of System Metadata
savemeta
command to save a copy of the
domain's metadata for examination by support personnel.
You must be root user
to run this command (see
savemeta
(8)).
5.13.2 Physically Moving an AdvFS Disk
/etc/fdmns
directory so the
new system recognizes the transferred volume(s).
You must be root user to
complete this process.
Caution
addvol
command or the
mkfdmn
command to add the volumes to the new machine.
Doing so will
delete all data on the disk you are moving.
See
Section 5.8.4
if you have already done so.
advscan
utility,
which might be able to recreate this information.
You can also look at the
disk label on the disk to see which partitions in the past were made into
AdvFS partitions.
The disk labels do not tell you which partitions belong
to which domains.
testing_domain
, on two
disks,
dsk3
and
dsk4
.
This domain contains
two filesets:
sample1_fset
and
sample2_fset
.
These filesets are mounted on
/data/sample1
and
/data/sample2
.
Assume you know that the domain that you are moving
had partitions
dsk3c
,
dsk4a
,
dsk4b
, and
dsk4g
.
Take the following steps to
move the disks:
# /sbin/hwmgr -show scsi -full
disk6
anddisk8
.
Use these names to set up symbolic links in step 5.
/etc/fdmns
directory to include
the information from the transferred domains.
# mkdir -p /etc/fdmns/testing_domain
# cd /etc/fdmns/testing_domain
# ln -s /dev/disk/dsk6c dsk6c
# ln -s /dev/disk/dsk8a dsk8a
# ln -s /dev/disk/dsk8b dsk8b
# ln -s /dev/disk/dsk8g dsk8g
# mkdir /data/sample1
# mkdir /data/sample2
/etc/fstab
file to add the fileset
mount-point information.
testing_domain#sample1_fset /data/sample1 advfs rw 1 0
testing_domain#sample2_fset /data/sample2 advfs rw 1 0
# mount /data/sample1
# mount /data/sample2
mkfdmn
command or the
addvol
command on partition
dsk6c
,
dsk8a
,
dsk8b
,
dsk8g
, or an
overlapping partition, you will destroy the data on the disk.
See
Section 5.8.4
if you have accidentally done so.
AdvfsDomainPanicLevel
attribute (see
Section 4.7) to promote a domain
panic to a system panic, run the
verify
command on the
panicked domain to ensure that it is not damaged.
If your filesets were unmounted
at the time of the crash, or if you remounted them successfully and ran the
verify
command (if needed), you can mount the filesets on a different
version of the operating system, if appropriate.
5.13.4 Recovering from Problems Removing Volumes
showfdmn
command.
If the volume does
not allow writes after an aborted
rmvol
operation, use
the
chvol -A
command to reactivate the volume.