5 Troubleshooting

This chapter examines problems that, while universal for file systems, might have solutions unique to AdvFS. See System Configuration and Tuning for related information about diagnosing performance problems.

This chapter covers the following:

Section 5.1 details the commands that you can use to check disk usage.

Section 5.2 suggests preventative maintenance strategies.

Section 5.3 explains how to increase the size of the root domain.

Section 5.4 identifies operating system incompatibilities.

Section 5.5 explains I/O method incompatibilities.

Section 5.7 lists ways that you can improve poor performance.

Section 5.8 suggests ways to fix disk problems.

Section 5.9 explains how to restore the /etc/fdmns directory.

Section 5.10 examines how to recover from volume failure.

Section 5.11 explains how to recover from the failure of the root domain.

Section 5.12 details how to restore a multivolume domain.

Section 5.13 suggests methods of crash recovery.

5.1 Checking Free Space and Disk Usage

You can look at the way space is allocated on a disk by file, fileset, or domain. Table 5-1 describes command-line commands that you can use to examine disk space usage.

Table 5-1: Disk Space Usage Commands

Command	Description
`df`	Displays disk space usage by fileset. Available space for a fileset is limited by the fileset quota if it is set.
`du`	Displays information about block allocation for files. Use the `-a` option to display information for individual files.
`ls`	Displays the space used by files. The `-l` option shows the space spanned by a sparse file. The `-s` option shows the actual block usage and might be more useful for use with sparse files.
`showfdmn`	Displays the attributes and block usage for each volume in an active domain. For multivolume domains, additional volume information is displayed.
`showfile`	Displays block usage and volume information for a file or for the contents of a directory.
`showfsets`	Displays information about the filesets in a domain. Use to display fileset quota limits.
`vdf`	Displays used and available disk space for a fileset or a domain.

See the reference pages for the commands for more complete information.

Under certain conditions, the disk usage information for AdvFS can become corrupt. Run the quotacheck -v command to correct the disk usage information.

5.2 Preventative Maintenance

This section describes a number of things you can do to prevent problems with your AdvFS file system.

5.2.1 Failing Disks

Back up your data regularly and frequently, and watch for signs of impending disk failure. Try to remove files from a problem disk before it fails. See the Event Management information in System Administration for more information about examining disk activity.

5.2.2 Verifying File System Consistency

To ensure that metadata is consistent, run the verify command to verify the file system structure. The verify utility checks disk structures such as the bitfile metadata table (BMT), the storage bitmaps, the tag directory, and the frag file for each fileset. It verifies that the directory structure is correct, that all directory entries reference a valid file, and that all files have a directory entry. You must be the root user to run this command.

It is a good idea to run the verify command:

When problems are evident (corruptions, domain panic, lost data, I/O errors)

Before an update installation

If your files have not been accessed in three to six months or longer, and you plan to run utilities such as balance, defragment, migrate, quotacheck, repquota, rmfset, rmvol, or vdump that access every file in a domain

Use the SysMan Manage an AdvFS Domain utility, or enter the verify command from the command line:

verify domain_name

The verify command mounts filesets in special directories. If the verify command is unable to mount a fileset due to the failure of a domain, as a last resort you can run the verify -F command. The -F option mounts the fileset using the -d option of the mount command, which means that AdvFS initializes the transaction log file for the domain without recovering the domain.

Caution

Because no domain recovery occurs for previously incomplete operations, using the verify -F command could cause data corruption.

Under some circumstances the verify command might fail to unmount the filesets. If this occurs, you must unmount the affected filesets manually and delete the mount points that were created in the /etc/fdmns/<domain_name> file.

On machines with millions of files, sufficient swap space must be allocated for the verify utility to run to completion. If the amount of memory required by the verify utility exceeds the kernel variable proc/max_per_proc_data_size process variable, the utility does not complete. To overcome this problem, allocate up to 10% of the domain size in swap space for running the verify command.

The following example verifies the domainx domain, which contains the filesets setx and sety:

# verify domainx
+++Domain verification+++
Domain Id 2f03b70a.000f1db0
Checking disks ...
Checking storage allocated on disk /dev/disk/dsk10g
Checking storage allocated on disk /dev/disk/dsk10a
Checking mcell list ...
Checking mcell position field ...
Checking tag directories ...
 
+++ Fileset verification +++
+++ Fileset setx +++
Checking frag file headers ...
Checking frag file type lists ...
Scanning directories and files ...
     1100
Scanning tags ...
     1100
Searching for lost files ...
     1100
 
+++ Fileset sety +++
Checking frag file headers ...
Checking frag file type lists ...
Scanning directories and files ...
     5100
Scanning tags ...
     5100
Searching for lost files ...
     5100

In this example, the verify utility finds no problems with the domain. See verify(8) for more information.

5.3 Increasing the Size of an AdvFS root Domain

The AdvFS root domain is limited to one volume (partition) unless you are running a cluster configuration. If you want to increase the size of the root domain, you must recreate the root domain on a larger volume. This section explains how to recreate the root domain on a different device. It does not cover the case of repartitioning the current root volume and restoring root to it. If you are moving the root domain to another disk already installed in the system, you can skip the section on installing the disk and begin at Section 5.3.2.

You need the following to move your root domain:

Current operating system CD-ROM
You can use the operating system CD-ROM that is packaged with the distribution media to recreate your root domain.
If your local site provides a Remote Installation Service (RIS) server, you can boot your system across the network. If you choose RIS services, follow your site-specific procedures and consult the Installation Guide.

Backup device
You will need either a backup tape or an unused disk partition to backup the root domain.

Information about console commands
You will use Alpha System Reference Manual (SRM) console commands at the system console prompt (>>>) to perform some tasks. These commands are documented in the hardware manual for your Alpha system. If you cannot find the printed document, it is usually shipped as a printable file on a CD-ROM supplied with the system.

This section explains increasing the size of a root domain on a non-clustered system. For other configurations, see System Administration and Cluster Administration. If your root volume is an LSM volume, see Logical Storage Manager.

5.3.1 Installing a New Disk for the root Domain

To move your root domain to a new disk, you must first install the disk and have it recognized.

Log in as root and shut down the system.
```
# shutdown -h now
```

Add the new disk device. For more information see your hardware manuals.

Verify that the SRM console recognizes the newly added disk. In this example, DKB300 (an RZ1BB-CS) was added.

>>> show device
polling ncr0(NCR 53C810) slot 1, bus 0 PCI, 
     hose 1 A SCSI Bus ID 7
dka500.5.0.1.1     DKA500                   RRD45     1645
polling isp0(QLogic ISP1020) slot 4, bus 0 PCI, 
     hose 1 SCSI Bus ID 7
dkb0.0.0.4.1       DKB0                     RZ1DB-CA  LYJ0
dkb100.1.0.4.1     DKB100                   RZ1CB-CA  LYJ0
dkb200.2.0.4.1     DKB200                   RZ1CB-CA  LYJ0
dkb300.3.0.4.1     DKB300                   RZ1BB-CS  0656 
mkb400.4.0.4.1     MKB400                   TLZ10     02ab
...

Boot the original system disk to update the device information databases with the new device. In this example, the default boot device, dkb0, is booted.

>>> show bootdef_dev 
bootdef_dev             dkb0.0.0.4.1
>>> boot

During the boot process, the operating system recognizes the new device and updates the device information databases accordingly.

...
dsfmgr: NOTE: updating kernel basenames for system at /
 scp kevm tty00 tty01 lp0 random urandom dmapi dsk3 dsk4
 dsk5 floppy0 cdrom0 -dsk6a +dsk6a -dsk6a +dsk6a -dsk6b
+dsk6b -dsk6b +dsk6b -dsk6c +dsk6c -dsk6c +dsk6c -dsk6d
+dsk6d -dsk6d +dsk6d -dsk6e +dsk6e -dsk6e +dsk6e -dsk6f
+dsk6f -dsk6f +dsk6f -dsk6g +dsk6g -dsk6g +dsk6g -dsk6h
+dsk6h -dsk6h +dsk6h
...

In this example, the operating system's device name for the added disk is dsk6.

5.3.2 Configuring a Device for Use as the root Volume

To make the device available to be used as the root volume, you must configure, that is, label and partition, it. You must be the root user to perform this operation. For methods of labeling your disk, see System Administration, "Partitioning Disks Using diskconfig" and "Manually Partitioning Disks."

Be sure to specify AdvFS for the Boot Block. Use the disklabel command with the -t advfs option or, if you are using the disconfig utility, choose AdvFS from the Boot Block: list.

Caution

Modifying a disk's partition layout destroys some or all of the data on disk. Be certain that you do not need any data on the disk that you choose for the new root domain.

For example, if you have expanded the a partition to 500 MB (1024000 512-byte sectors) and allocated the remaining space on the disk to the b partition as swap, your disk label might look like the following:

# disklabel dsk6
 #      size  offset fstype fsize bsize cpg # ~Cyl values 
a:   1024000       0 unused     0     0     #   0 - 744*
b:   3086480 1024000 swap       0     0     # 744*- 2987*
c:   4110480       0 unused     0     0     #   0 - 2987*
d:         0       0 unused     0     0     #   0 - 0
e:         0       0 unused     0     0     #   0 - 0
f:         0       0 unused     0     0     #   0 - 0
g:   1858632  393216 unused     0     0     # 285*- 1636*
h:   1858632 2251848 unused     0     0     # 636*- 2987*

5.3.3 Backing up the Current root Domain

The first step in moving a root domain is to make a full backup of the domain. Use a backup tape or an unused disk partition.

For example, to back up the root domain to tape /dev/tape/tape0_d1:

# vdump -0 -f /dev/tape/tape0_d1 /

To back up the root domain to an unused partition, create a temporary domain, fileset and mount-point directory. Back up to a file in that fileset. For example, for the domain TMP_BACKUP, the fileset tmp_backup, the mount point /tmp_backup, and the file containing the dump, root_backup.vdump:

# mkfdmn /dev/disk/dsk5c  TMP_BACKUP
# mkfset TMP_BACKUP tmp_backup 
# mkdir /tmp_backup
# mount TMP_BACKUP#tmp_backup /tmp_backup
# vdump -0 -f /tmp_backup/root_backup.vdump /
path     : /
dev/fset : root_domain#root
type     : advfs
advfs id : 0x3b000fb0.000919cc.1
vdump: Dumping directories
vdump: Dumping 96402959 bytes, 117 directories, 2024 files
vdump: Dumping regular files
vdump: Status at Thu May 17 12:52:38 2001
vdump: Dumped  96525730 of 96402959 bytes; 100.0% completed
vdump: Dumped  117 of 117 directories; 100.0% completed
vdump: Dumped  2024 of 2024 files; 100.0% completed
vdump: Dump completed at Thu May 17 12:52:38 2001

5.3.4 Recreating the root Domain on a Different Volume

To recreate the root domain on the new volume, you must restore the backup of the root domain to the new volume. This example also moves the swap partition from dsk3b to dsk6b.

Shut down the system booted from your old root domain.
```
# shutdown -h now
```

Boot from the current operating system CD-ROM or Remote Installation Service (RIS) server. For example, from the CD-ROM:
```
>>> boot dka500
```
From the RIS server:
```
>>> boot ewa0
```

Exit the installation.
- If you have a VGA graphics console, choose to exit the installation, or from the File menu of the Installation and Configuration Welcome dialog box, choose shell window.
- If you have a serial console terminal, select option 3) Exit Installation.
You will get a shell (#) prompt.

If you have backed your root domain to tape, install the tape device.
```
# dn_setup -install_tape 
```
For more information see System Administration "Using dn_setup to Perform Generic Operations."

Verify that the new device is recognized properly by the Operating System and that the backup device is properly installed.

#  hwmgr -view devices
HWID: Device Name     Mfg    Model          Location
------------------------------------------------------------
 4:(unknown)       
 6:(unknown)       
38:/dev/disk/floppy0c    3.5in floppy     fdi0-unit-0       
41:/dev/disk/cdrom0c DEC RRD45    (C) DEC bus-0-targ-5-lun-0
42:/dev/disk/dsk3c   DEC RZ1DB-CA (C) DEC bus-1-targ-0-lun-0
43:/dev/disk/dsk4c   DEC RZ1CB-CA (C) DEC bus-1-targ-1-lun-0
44:/dev/disk/dsk5c   DEC RZ1CB-CA (C) DEC bus-1-targ-2-lun-0
45:/dev/disk/dsk6c   DEC RZ1BB-CS (C) DEC bus-1-targ-3-lun-0
46:/dev/ntape/tape0  DEC TLZ10    (C) DEC bus-1-targ-4-lun-0

If the new root disk Device Name is listed as (unknown), check for proper hardware installation and configuration. In this example the root domain will be moved to dsk6. The tape backup device is tape0 and the original root domain resides on dsk3.

Create a new root domain and root fileset on the new root device and mount it at /mnt.

# mkfdmn -r /dev/disk/dsk6a root_domain
# mkfset root_domain root 
# mount root_domain#root /mnt

Restore the root domain from backup.

If your backup is on tape:

# vrestore -x -f /dev/tape/tape0_d1 -D /mnt

If your backup is on disk:

First create a directory entry for the backup domain in the /etc/fdmns directory. This new directory will only exist in the UNIX installation environment.

# mkdir /etc/fdmns/TMP_BACKUP

Then create a soft link in the new directory pointing to the volume used for the backup domain.

# ln -s /dev/disk/dsk5c /etc/fdmns/TMP_BACKUP/dsk5c

Mount the domain and fileset containing the backup. The new directory is created in /var because the installation root file system is mounted read-only.

# mkdir /var/tmp_backup 
# mount TMP_BACKUP#tmp_backup /var/tmp_backup

Restore the files from the TMP_BACKUP domain to the new root domain.

# vrestore -x -f /var/tmp_backup/root_backup.vdump -D /mnt
vrestore: Date of the vdump save-set:Fri May 11 2001
vrestore: Save-set source directory : / 
vrestore: informational: [13] posting event:
sys.unix.fs.advfs.fset.backup.lock 
       If running in single user mode, EVM is not running.
       Please ignore this posting.
vrestore: informational: [13] posting event: 
sys.unix.fs.advfs.fset.backup.unlock
      If running in single user mode, EVM is not running.
      Please ignore this posting.

The new root domain is now created and populated with files from the original root domain.

To finish the process, you must update system bookkeeping to point to the new root volume. In this example, the root domain and the swap partition were moved from dsk3 to dsk6. Nothing else was changed.
Update the /etc/fdmns directory to identify the new root domain. Here dsk6a is the volume containing the new root domain and dsk3a is the volume containing the original root domain.
```
# cd /mnt/etc/fdmns/root_domain
# ln -s /dev/disk/dsk6a dsk6a
# rm dsk3a 
```

Change the swap partition in sysconfigtab in the new root domain using the editor of your choice. This example uses the vi editor.
```
# vi /mnt/etc/sysconfigtab
```
1. In the vm: section (stanza), change the swap device line from swapdevice=/dev/disk/dsk3b to swapdevice=/dev/disk/dsk6b. This change reflects the new location of the swap partition.
2. Save the changes and exit the editor.

Halt the system and change the default boot device.

# halt
. . .  
>>> set bootdef_dev dkb300

Boot the new root domain.
```
 >>> boot 
 
```

Retain the original root domain until you are certain that the data in the original root domain was successfully transferred to the new root domain, then remove the original domain with the rmfdmn command.

5.4 Disk File Structure Incompatibility

Domains created on operating system software Version 5.0 and later have a new on-disk format that is incompatible with earlier versions (see Section 1.6.3). The newer operating system recognizes the older disk structure, but older operating systems do not recognize the newer disk structure. If you install your new operating system software as an update to your Version 4 operating system software (not a full installation), your /root, /usr, and /var files retain a domain version number (DVN) of 3 (see Section 1.6.3.1). If you fully install your Version 5 operating system, the /root, /usr, and /var files have a DVN of 4.

To access a DVN4 fileset from an older operating system, NFS mount the fileset from a server running Version 5.0 or later operating system software, or upgrade your operating system to Version 5.0 or later.

If you try to mount a fileset belonging to a DVN4 domain when you are running a version of the operating system earlier than Version 5.0, you get an error message.

There is no tool that automatically upgrades DVN3 domains to DVN4. To upgrade a domain to DVN4, use the procedure in Section 1.6.3.2.

5.4.1 Utility Incompatibility

Because of the new on-disk file formats in Version 5.0 and later of the operating system, some AdvFS utilities from earlier releases have the potential to corrupt domains created using the new on-disk formats. All statically-linked AdvFS-specific utilities from earlier operating system versions do not run on Version 5.0 and later. These utilities are usually from operating system versions prior to Version 4.0. In addition, the following dynamically-linked AdvFS utilities from earlier releases of Tru64 UNIX do not run on Version 5.0 and later:

advfsstat

balance

chvol

defragment

rmvol

showfdmn

verify

5.4.2 Avoiding Metadata Incompatibility

If a system crashes or goes down unexpectedly, after reboot, AdvFS performs recovery when the filesets that were mounted at the time of the crash are remounted. This recovery keeps the AdvFS metadata consistent and makes use of the AdvFS transaction log file.

Different versions of the operating system use different AdvFS log record types. Therefore, it is important that AdvFS recovery operations be done on the same version of the operating system as was running at the time of the crash.

To reboot without error using a different version of the operating system, cleanly unmount all filesets before rebooting. If the system failed due to a system panic or an AdvFS domain panic, it is best to reboot using the original version of the operating system and then run the verify command to ensure that the domain is not corrupted. If it is not corrupted, you can reboot your system using a different version of the operating system. If the verify utility indicates that the domain is corrupt, see Section 5.8.4.

5.5 Memory Mapping, Direct I/O, and Data Logging Incompatibility

Unless you have turned on atomic-write data logging by using the mount -o adl command, memory mapping, atomic-write data logging, and direct I/O are mutually exclusive. If a file is open in one of these modes, attempting to open the same file in a conflicting mode fails. For more information see Section 4.4, Section 4.6, and mmap(2).

5.6 Invalid or Corrupt Saveset Format

If you are restoring a saveset that has been written to disk and get an error message that its format is invalid or corrupt, check that you have not backed the saveset up to partition a or c, which include block 0 of the disk. Block 0, the disk label block, is protected from accidental writes to it. To dump to a partition that starts at block 0 of a disk, you must first clear the disk label. If you do not, the output from the vdump command might appear to contain valid savesets, but when the vrestore command attempts to interpret the disk label as part of the saveset, it returns an error (see Section 3.2.6).

5.7 Improving Poor Performance

The performance of a disk depends upon the I/O demands upon it. If you structure your domain so that heavy access is focused on one volume, it is likely that system performance will degrade. After you determine the load balance, there are a number of ways that you can equalize the activity and increase throughput. See System Configuration and Tuning, command reference pages, and Chapter 4 for more complete information.

To discover the causes of poor performance, first check system activity (see Section 4.1). There are a number of ways to improve performance:

Upgrade domains (Section 1.6.3.2)
DVN4 domains are indexed when a directory grows beyond a page, that is, about 200 files. Directories with more than 5000 files show the most benefit.

Eliminate disk access incompatibility (Section 4.6)
If you initiate direct I/O (which turns off caching) to read and write data to a file, any application that accesses the same file also has direct I/O. This might prove inefficient (see Section 4.6).

Defragment domains (Section 4.8)
As files grow, contiguous space on disk often is not available to accommodate new data, so files become fragmented. File fragmentation can reduce system performance because more I/O is required to read or write a file.

Move filesets to different volumes (Section 4.11)
You can move a domain to a volume that is larger or less congested. You can create another domain on another volume and move a fileset to it.

If you have AdvFS Utilities, you can also:

Balance a multivolume domain (Section 4.10)
Files that are distributed unevenly can degrade system performance. Use the balance command to redistribute the files evenly over all your volumes.

Stripe individual files (Section 4.13)
AdvFS allows you to stripe individual files across multiple volumes. Use AdvFS striping only on directly attached storage that is not otherwise striped. Combining AdvFS striping with other striping might degrade performance.

Migrate individual files (Section 4.12)
You can use the migrate utility to move a heavily accessed file or selected pages of a file to another volume in the domain. You can move the file to a specific volume, or you can let the system choose where to move the file.

Change AdvFS resources
You can change your file system size in the following ways:
- Increase the size of a domain by adding a volume using the addvol command (Section 1.6.6)
  For optimum performance, each volume you add should consist of the entire disk (typically, partition c). Do not add a volume that contains data you want to keep. When you run the addvol command, data on the added device is destroyed.
- Shrink a domain by removing a volume using the rmvol command (Section 1.6.7)
  You can interrupt the rmvol process by pressing Ctrl/c without damaging your domain. Files already removed from the volume remain in their new location. Files that had not been moved at the time of the interrupt remain in their original location.
  If the volume from which the files have been removed does not allow new file allocations after an aborted rmvol operation, use the chvol -A command to reactivate the volume.
  Striped file segments are moved to a volume that does not contain a stripe. If this is not possible, the system requests confirmation before doubling up on stripes (see Section 4.13.
- Change the size of a domain by changing volumes (Section 4.12)
  Add a new volume, move your files to it, then remove the old volume.
See System Limits for the number of volumes, domains, and so forth that the AdvFS file system can handle.

5.8 Fixing Disk Problems

There are a number of problems that may be directly related to the way you are using storage.

5.8.1 Reusing Space

If you want to add storage from an existing domain (the /etc/fdmns directory entry exists) to another domain, you can remove the volume by using the rmvol command then add it to the other domain.

For example, if your volume is /dev/disk/dsk5c, your original domain is old_domain, and the domain you want to add the volume to is new_domain, mount all the filesets in old_domain, then enter:

# rmvol /dev/disk/dsk5c old_domain
# addvol /dev/disk/dsk5c new_domain

If the disk or disk partition you want to add is not part of an existing domain but is giving you a warning message because it is labeled, reset the disk label. If you answer yes to the prompt on the addvol or mkfdmn command, the disk label is reset. All information that is on the disk or disk partition that you are adding is lost.

5.8.2 Limiting Disk Space Usage

If your system is running without any limits on resource usage, you can add quotas to limit the amount of disk space your users can access. AdvFS quotas provide a layer of control beyond that available with UFS.

User and group quotas limit the amount of space a user or group can allocate for a fileset. Fileset quotas restrain a fileset from using all of the available space in a domain.

You can set two types of quota limits: hard limits that cannot be exceeded, and soft limits that can be exceeded for a period of time called the grace period. You can turn quota enforcement on and off. See Chapter 2 for complete information.

If you are working in an editor and realize that the information you need to save will exceed your quota limit, do not abort the editor or write the file because data might be lost. Instead, remove files to make room for the edited file before writing it. You can also write the file to another fileset, such as tmp, remove files from the fileset whose quota you exceeded, and then move the file back to that fileset.

AdvFS imposes quota limits in the rare case that you are 8 KB below the user, group, or fileset quota and are attempting to use some or all of the remaining space. This is because AdvFS allocates storage in units of 8 KB. If adding 8 KB to a file exceeds the quota limit, then that file is not extended.

5.8.3 Fixing On-Disk Metadata Corruptions

If you have a domain that cannot be mounted without a domain panic, or if the verify command detects on-disk corruption and is unable to fix it, run the fixfdmn utility. The fixfdmn utility is designed primarily to put a domain into a usable (mountable) state. In the process, as much data as possible is retrieved. However, if recovering data from a file is your priority, use the salvage utility (see Section 5.8.4).

The fixfdmn utility runs on unmounted filesets. It scans on-disk metadata looking for corruptions and, if enough viable data is intact, it attempts to correct the corrupt metadata. If not enough viable metadata is available, the fixfdmn utility attempts to bypass the corruption by moving or deleting the corrupt metadata and deleting files as necessary.

You can run the fixfdmn -n command to check the domain and not do any repairs.

The utility saves a message log file and two undo files. The utility can use the undo files to restore the domain to the configuration it had before you ran the fixfdmn command.

See fixfdmn(8) for more information.

5.8.4 Recovering File Data from a Corrupted Domain

The way you recover the contents of a corrupted domain depends on the nature of the corruption. Follow the recovery path for as many steps as needed. The following procedure assumes that you are only experiencing file system corruption, not hardware failure.

Run the verify command to try to repair the domain (see Section 5.2.2 and verify(8)). The verify command fixes only a limited set of problems.

If the verify command detects on-disk corruption, run the fixfdmn command (see Section 5.8.3 and fixfdmn(8)).

If running the fixfdmn command does not solve the problem, determine the date of the most recent backup. Run the salvage command to recover as many of the recent file changes as possible. The salvage command extracts salvageable files from the corrupted domain and places copies of them in filesets created to hold the recovered files. Depending on the nature of the corruption, you may be able to extract all or some of the data in the corrupted domain.
You can use the salvage -d command to extract files modified after a specified date and time. If you have no backups, you can run the salvage utility without the -d option to recover all the files in the domain.

Recreate the domain from the latest backups then copy any files recovered with the salvage command into the recreated domain.

Use the SysMan Manage an AdvFS Domain utility, or enter the salvage command from the command line. You can recover data to disk or to tape. The amount of data you can recover depends upon the nature of the corruption to your domain. See salvage(8) for more information.

Running the salvage command does not guarantee that you will recover all files in your domain. You might be missing files, directories, file names, or parts of files. The utility generates a log file that contains the status of files that were recovered. Use the -l option to list in the log file the status of all files that are encountered.

The salvage command places the recovered files in directories named after the filesets. You can move the recovered files to new filesets. The utility creates a lost+found directory for each fileset where it puts files that have no parent directory. You can specify the pathname of the directory that is to contain the recovered fileset directories. If you do not specify a directory, the utility writes recovered filesets under the current working directory.

You can also recover data from a damaged domain to tape in a tar format.

5.8.4.1 Salvaging Data to Disk

You can recover data from a corrupted domain to another local unused disk. In this example the corrupted domain is called PERSONNEL and contains the fileset personnel_fset mounted at /personnel. The original domain is on volume /dev/disk/dsk12c and the salvage command places output on /dev/disk/dsk3c.

Unmount all the filesets in the corrupted domain.

Create a domain and a fileset to hold the recovered information and mount the fileset. For example, to mount the fileset recover_fset in the domain RECOVER mounted at /recover:
```
# mkfdmn /dev/disk/dsk3c RECOVER
# mkfset RECOVER recover_fset
# mkdir /recover
# mount RECOVER#recover_fset /recover
```

Run the salvage command. In this example, files from the PERSONNEL domain that were modified after 1:30 PM on December 7, 2000 are extracted from the damaged domain.

# /sbin/advfs/salvage -d 200012071330 -D /recover PERSONNEL
salvage: Domain to be recovered 'PERSONNEL' 
salvage: Volume(s) to be used '/dev/disk/dsk12c'
salvage: Files will be restored to '/recover' 
salvage: Logfile will be placed in './salvage.log' 
salvage: tarting search of all filesets: 09-May-2001
salvage: tarting search of all volumes: 09-May-2001
salvage: Loading file names for all filesets: 09-May-2001
salvage: tarting recovery of all filesets: 09-May-2001

View the salvage.log file to ensure that all necessary files were recovered.

Recreate the domain. Here the domain is recreated on the original volume.

Caution

If you recreate a domain on the same volume as your original domain, you destroy all the data in the original domain. To save your corrupted domain, recreate the domain on a different volume.

# rmfdmn PERSONNEL
rmfdmn: remove domain PERSONNEL? [y/n] y
rmfdmn: informational:[13]posting event:
     sys.unix.fs.advfs.fdmn.rm
       If running in single user mode, EVM is not running
       Please ignore this posting.
rmfdmn: domain PERSONNEL removed.
# mkfdmn /dev/disk/dsk12c PERSONNEL
# mkfset PERSONNEL personnel_fset

If you are restoring some of the domain from backup, do this now. This procedure is specific to your site.

Copy the salvaged files from the temporary location to the restored domain and remove the recovery domain.

# mkdir /personnel
# mount PERSONNEL#personnel_fset /personnel
# cp -Rp /RECOVER/personnel_fset/* /personnel
# umount /recover
# rmfdmn RECOVER
rmfdmn: remove domain RECOVER [y/n] y
rmfdmn: domain RECOVER removed.

5.8.4.2 Salvaging Data to Tape

If your system does not have enough space to hold the information recovered by the salvage utility, you can recover data to tape and then write it back on to your original disk location.

To recover data from a corrupted domain called PERSONNEL on volume /dev/disk/dsk12c containing the personnel_fset fileset mounted at /personnel to tape:

Unmount all filesets in the corrupted domain.

Install a tape on the local tape drive.

Run the salvage command using the -F and -f options to specify tar format and the tape drive.

In this example, files from the PERSONNEL domain that were modified after 1:30 PM on December 7, 2000 are extracted and stored on tape.

# /sbin/advfs/salvage -d 200012071330 -F tar \
 -f /dev/tape/tape0_d1 PERSONNEL
salvage: Domain to be recovered 'PERSONNEL' 
salvage: Volume(s) to be used '/dev/disk/dsk12c'
salvage: Files archived to '/dev/tape/tape0_d1' in TAR format
salvage: Logfile will be placed in './salvage.log' 
salvage: Starting search of all filesets: 09-May-2001
salvage: Starting search of all volumes: 09-May-2001
salvage: Loading file names for all filesets: 09-May-2001
salvage: Starting recovery of all filesets: 09-May-2001

View the salvage.log file to ensure that all necessary files were recovered.

Recreate the domain.

Caution

If you recreate a domain on the same volume as your original domain, you destroy all the data in the original domain. To save your corrupted domain, recreate the domain on a new volume.

# rmfdmn PERSONNEL
rmfdmn: remove domain PERSONNEL? [y/n] y
rmfdmn: informational:[13]posting event:
     sys.unix.fs.advfs.fdmn.rm
       If running in single user mode, EVM is not running
       Please ignore this posting.
rmfdmn: domain PERSONNEL removed.
# mkfdmn /dev/disk/dsk12c PERSONNEL
# mkfset PERSONNEL personnel_fset

If you are restoring some of the domain from backup, do this now.

Copy the salvaged files from tape to the restored domain and remove the recovery domain.

# cd /
# mkdir /personnel
# mount PERSONNEL#personnel_fset /personnel
# tar -xpvf /dev/tape/tape0_d1

5.8.4.3 Salvaging Data from a Corrupted root Domain

If your system is not bootable because the root domain is corrupt, you can boot your system from the installation CD-ROM and run the /sbin/advfs/salvage command. Follow the steps in Section 5.11 to boot your system and exit the installation. Depending on the nature and extent of the root domain corruption, successful file recovery may not be possible.

If you are booting from the installation CD-ROM, device name assignments may differ from the assignments made on the installed operating system. Use the hwmgr -view devices command to view a table of special device names mapped to hardware identification. Be certain you are referencing the intended devices before issuing commands that destroy data.

To recover data from a corrupted root domain on volume /dev/disk/dsk0a to another local, unused disk, /dev/disk/dsk3c:

Create a domain and filesets to hold the recovered information and mount the filesets.

# mkfdmn /dev/disk/dsk3c RECOVER
# mkfset RECOVER recover_fset
# mkdir /recover
# mount RECOVER#recover_fset /recover

Run the salvage command. You must use the -V option to specify the volume that the command will operate on.

In this example, files from the PERSONNEL domain that were modified after 1:30 PM on December 7, 2000 are extracted and stored in filesets mounted at /recover.

# /sbin/advfs/salvage -d 200012071330 -D /recover \
-V /dev/disk/dsk0a
salvage: Volume(s) to be used '/dev/disk/dsk0a'
salvage: Files will be restored to '/recover' 
salvage: Logfile will be placed in './salvage.log' 
salvage: Starting search of all filesets: 09-May-2001
salvage: Loading file names for all filesets: 09-May-2001
salvage: Starting recovery of all filesets: 09-May-2001

View the salvage.log file to ensure that all necessary files were recovered.

Recreate the root domain as described in Section 5.11. Mount the root domain again at /mnt. If you intend to recover your root domain from backup, do so now.

Copy the salvaged files from the recovery location to the root domain and remove the recovery domain.

# cd /recover
# cp -RP * /mnt
# cd /
# umount /mnt /recover
# rmfdmn RECOVER
rmfdmn: remove domain RECOVER [y/n] y
rmfdmn: domain RECOVER removed.

The root domain is restored.

5.8.4.4 Salvaging Data Block by Block

If you ran the salvage utility and were unable to recover a large number of files, run the salvage -S command. This process is very slow because the utility reads every disk block at least once. If you are recovering to tape and have already created a new domain on the disks containing the corrupted domain, you cannot use the -S option because your original information is lost.

Note

If you have accidentally used the mkfdmn command on a good domain, running the salvage -S utility is the only way to recover files.

Caution

The salvage utility opens and reads block devices directly, which can present a security problem. With the -S option it might be possible to access data from older, deleted AdvFS domains while attempting to recover data from the current AdvFS domain.

The following example recovers data block by block.

# /sbin/advfs/salvage -S PERSONNEL
salvage: Domain to be recovered 'PERSONNEL'  
salvage: Volume(s) to be used '/dev/disk/dsk12c'
salvage: Files will be restored to '.'  
salvage: Logfile will be placed in './salvage.log'  
salvage: Starting sequential search of all volumes: 09-May-2001
salvage: Loading file names for all filesets: 09-May-2001
salvage: Starting recovery of all filesets: 09-May-2001

5.8.5 "Can't Clear a Bit Twice" Error Message

If you receive a "Cannot clear a bit twice" error message, your domain is damaged. To repair it:

Set the AdvfsFixUpSBM kernel variable to allow access to the damaged domain. This flag is off by default. To turn it on:
```
# dbx -k /vmunix /dev/mem
dbx> assign AdvfsFixUpSBM = 1
dbx> quit
 
 
```

Mount and back up the filesets in the damaged domain.

Turn AdvfsFixUpSBM off:

# dbx -k /vmunix /dev/mem
dbx> assign AdvfsFixUpSBM = 0
dbx>  quit

Unmount the filesets in the domain. Run the verify -f utility. If there are errors, continue through steps 5 and 6.

Recreate the domain and filesets.

Restore from the backup.

Note

The AdvfsFixUpSBM variable is global. Turn it off so that the error message is again available for all domains.

5.8.6 Recovering from a Domain Panic

When a metadata write error occurs, or if corruption is detected in a single AdvFS domain, the system initiates a domain panic (rather than a system panic) on the domain. This isolates the failed domain and allows a system to continue to serve all other domains. After a domain panic, AdvFS no longer issues I/O requests to the disk controller for the affected domain. Although the domain cannot be accessed, the filesets in the domain can be unmounted.

When a domain panic occurs, an EVM event is logged (see EVM(5)) and the following message is printed to the system log and the console:

AdvFS Domain Panic; Domain name Id domain_Id

For example:

AdvFS Domain Panic; Domain staffb_domain Id 2dad7c28.0000dfbb
An AdvFS domain panic has occurred due to either a
 metadata write error or an internal inconsistency.
This domain is being rendered inaccessible.

By default, a domain panic on an active domain causes a live dump to be created and placed in the /var/adm/crash directory. Some AdvFS-related errors might also be recorded in /var/adm/binary.errlog. Please file a problem report with your software support organization and include the dump file and a copy of the running kernel.

To recover from a domain panic, perform the following steps:

Run the mount -t command and identify all mounted filesets in the affected domain.

Unmount all the filesets in the affected domain.

Examine the /etc/fdmns directory to obtain a list of the AdvFS volumes in the domain that panicked.

Run the savemeta command (see savemeta(8)) to collect information about the metadata files for each volume in the domain. Technical support needs this information.

If the problem is a hardware problem, fix it before continuing.

Run the verify utility on the domain (see Section 5.2.2).
- If there are no errors, mount all the filesets you unmounted and resume normal operations.
- If the verify command runs but shows errors, mount the filesets, do a backup, and recreate the domain. Note that the backup might be incomplete and that earlier backup resources might be needed.

If the failure prevents complete recovery, recreate the domain on new volumes by using the mkfdmn command and restore the domain's data from backup. If the backup does not provide enough information, you might need to run the salvage utility (see Section 5.8.4).

For example:

# mount -t advfs
staffb_dmn#staff3_fs on /usr/staff3 type advfs (rw)
staffb_dmn#staff4_fs on /usr/staff4 type advfs (rw) 
# umount /usr/staff3
# umount /usr/staff4
# ls -l /etc/fdmns/staffb_dmn
lrwxr-xr-x 1 root system 10 Nov 04 16:46
    dsk35c->/dev/disk/dsk3c
lrwxr-xr-x 1 root system 10 Nov 04 16:50
    dsk36c->/dev/disk/dsk6c
lrwxr-xr-x 1 root system 10 Nov 04 17:00
    dsk37c->/dev/disk/dsk1c
# savemeta staffb_dmn /tmp/saved_dmn
# verify staffb_dmn

You do not need to reboot after a domain panic.

If you have recurring domain panics, you might try adjusting the AdvfsDomainPanicLevel attribute (see Section 4.14) in order to facilitate debugging.

5.8.7 Recovering from Filesets That are Mounted Read-Only

When a fileset is mounted, AdvFS verifies that all volumes in a domain can be accessed. The size recorded in the domain's metadata for each volume must match the size of the volume. If the sizes match, the mount proceeds. If a volume is smaller than the recorded size, AdvFS attempts to read the last block marked in use for the fileset. If this block can be read, the mount succeeds, but the fileset is marked as read-only. If the last in-use block for any volume in the domain cannot be read, the mount fails. See mount(8) for more information.

If a fileset is mounted read-only, check the labels of the flagged volumes in the error message. There are two common errors:

A disk is mislabeled on a RAID array.

An LSM volume upon which an AdvFS domain resides was shrunk from its original size (see Section 1.10).

If you have AdvFS Utilities, and if the domain consists of multiple volumes with enough free space to remove the offending volume, you do not need to remove your filesets. However, you should back them up before proceeding.

Remove the volume from the domain by using the rmvol command. (This automatically migrates the data to the remaining volumes.)

Correct the disk label of the volume by using the disklabel command.

Add the corrected volume back to the domain by using the addvol command.

Run the balance command to distribute the data across the new volumes.

For example, if /dev/disk/dsk2c (on a device here called <disk>) within the data5 domain is mislabeled, you can migrate your files on that volume (automatic with the rmvol command), then move them back after you restore the volume.

# rmvol /dev/disk/dsk2c data5
# disklabel -z dsk2
# disklabel -rw dsk2 <disk>
# addvol /dev/disk/dsk2c data5
# balance data5

If you do not have AdvFS Utilities, or if there is not enough free space in the domain to transfer the data from the offending volume:

Back up all filesets in the domain.

Remove the domain by using the rmfdmn command.

Correct the disk label of the volume by using the disklabel command.

Make the new domain.

If you have AdvFS Utilities and if the original domain was multivolume, add the corrected volume back to the domain by using the addvol command.

Restore the filesets from the backup.

For example, if /dev/disk/dsk1c (on a device here called <disk>) containing the data3 domain is mislabeled:

# vdump -0f -u /data3
# rmfdmn data3
# disklabel -z dsk1 <disk>
# disklabel -w dsk1 <disk>
# mkfdmn data3

If you are recreating a multivolume domain, include the necessary addvol commands to add the additional volumes. For example to add /dev/disk/dsk5c to the domain:

# addvol /dev/disk/dsk5c data3
# mkfset data3 data3fset
# mount data3#data3fset /data3
# vrestore -xf - /data3

5.9 Restoring the /etc/fdmns Directory

AdvFS must have a current /etc/fdmns directory in order to mount filesets (see Section 1.6.2). A missing or damaged /etc/fdmns directory prevents access to a domain, but the data within the domain remains intact. You can restore the /etc/fdmns directory from backup or you can recreate it.

It is preferable to restore the /etc/fdmns directory from backup if you have a current backup copy. You can use any standard backup facility (vdump, tar, or cpio) to back up the /etc/fdmns directory. To restore the directory, use the recovery procedure that is compatible with your backup process.

If you cannot restore the /etc/fdmns directory, you can reconstruct it manually (see Section 5.9.1) or with the advscan command (see Section 5.9.2). The procedure for reconstructing the /etc/fdmns directory is similar for both single-volume and multivolume domains. You can construct the directory for a missing domain, missing links, or the whole directory.

If you choose to reconstruct the directory manually, you must know the name of each domain and its associated volumes.

5.9.1 Reconstructing the /etc/fdmns Directory Manually

If you accidentally lose all or part of your /etc/fdmns directory, and you know which domains and links are missing, you can reconstruct it manually.

The following example reconstructs the /etc/fdmns directory and two domains, In this example the domains exist and their names are known. Each domain contains a single volume (or special device). Note that the order of creating the links in these examples does not matter. The domains are:

domain1 on /dev/disk/dsk1c

domain2 on /dev/disk/dsk2c

To reconstruct the two single-volume domains, enter:

# mkdir /etc/fdmns
# mkdir /etc/fdmns/domain1
# cd /etc/fdmns/domain1
# ln -s /dev/disk/dsk1c dsk1c
# mkdir /etc/fdmns/domain2
# cd /etc/fdmns/domain2
# ln -s /dev/disk/dsk2c dsk2c

The following example reconstructs one multivolume domain. The domain1 domain contains the following three volumes:

/dev/disk/dsk1c

/dev/disk/dsk2c

/dev/disk/dsk3c

To reconstruct the multivolume domain, enter:

# mkdir /etc/fdmns
# mkdir /etc/fdmns/domain1
# cd /etc/fdmns/domain1
# ln -s /dev/disk/dsk1c dsk1c
# ln -s /dev/disk/dsk2c dsk2c
# ln -s /dev/disk/dsk3c dsk3c

5.9.2 Reconstructing the /etc/fdmns Directory Using advscan

You can use the advscan command to determine which partitions on a disk or which Logical Storage Manager (LSM) volumes are part of an AdvFS domain. Then you can use the command to rebuild all or part of your /etc/fdmns directory. This command is useful:

If you moved disks to a new system, if device numbers have changed, or if you lost track of a domain location

For repair, if you delete the /etc/fdmns directory, delete a domain from the /etc/fdmns directory, or delete links from a domain's subdirectory in the /etc/fdmns directory

The advscan command can:

Determine if a partition is an AdvFS partition.

List partitions in the order they are found on disk.

Read the disk label to determine which partitions are in the domain and if any are overlapping.

Scan all disks found in any /etc/fdmns domain.

Recreate missing domain directories. The domain name is created from the device name.

Fix the domain count and links for a domain.

For each domain there are three numbers that must match for the AdvFS file system to operate properly:

The number of physical partitions found by the advscan command that have the same domain ID

The domain volume count (the number stored in the AdvFS metadata that specifies the number of partitions in the domain)

The number of /etc/fdmns links to the partitions, because each partition must be represented by a link

See advscan(8) for more information.

Inconsistencies can occur in these numbers for several reasons. In general, the advscan command treats the domain volume count as more reliable than the number of partitions or the /etc/fdmns links. The following tables list anomalies, possible causes, and corrective actions that the advscan utility can take. In the table, the letter N represents the value that is expected to be consistent for the number of partitions, the domain volume count, and the number of links.

Table 5-2 shows possible causes and corrective actions if the expected value, N, for the number of partitions and for the domain value count do not equal the number of links in the /etc/fdmns/<dmn> directory.

Table 5-2: Fileset Anomalies and Corrections - Links Not Equal

Number of Links in /etc/fdmns/ <dmn>	Possible Cause	Corrective Action
<N	`addvol` terminated early or a link in `/etc/fdmns/<dmn>` was manually removed.	If the domain is activated before running the `advscan -f` command and the cause of the mismatch is an interrupted `addvol` command, the situation is corrected automatically. Otherwise, `advscan` utility adds the partition to the `/etc/fdmns/<dmn>` directory.
>N	`rmvol` terminated early or a link in `/etc/fdmns/<dmn>` was manually added.	If the domain is activated and the cause of the mismatch is an interrupted `rmvol` command, the situation is corrected automatically. If the cause Is a manually added link in `/etc/fdmns/<dmn>`, systematically try removing different links in the `/etc/fdmns/<dmn>` directory and activating the domain. The number of links to remove is the number of links in the `/etc/fdmns/<dmn>` directory minus the domain volume count displayed by `advscan`.

Table 5-3 shows possible causes and corrective actions if the expected value, N, for the number of partitions and for the number of links in the /etc/fdmns/<dmn> directory do not equal the domain volume count.

Table 5-3: Fileset Anomalies and Corrections - Domain Volume Count Not Equal

Domain Volume Count	Possible Cause	Corrective Action
<N	Cause unknown.	Cannot correct; run the `salvage` utility to recover as much data as possible from the domain.
>N	The `addvol` command terminated early and the partition being added is missing or was reused.	Cannot correct; run the `salvage` utility to recover as much data as possible from the remaining volumes in the domain.

Table 5-4 shows possible causes and corrective actions if the expected value, N, for the domain volume count and for the number of links in the /etc/fdmns/<dmn> directory do not equal the number of partitions.

Table 5-4: Fileset Anomalies and Corrections - Number of Partitions Not Equal

Number of Partitions	Possible Cause	Corrective Action
<N	Partition missing.	Cannot correct; run the `salvage` utility to recover as much data as possible from the remaining volumes in the domain.
>N	The `addvol` command terminated early.	None; domain mounts with N volumes; rerun the `addvol` command.

In the following example no domains are missing. The advscan command scans devices dsk0 and dsk5 for AdvFS partitions and finds nothing amiss. Two partitions are found, dsk0c and dsk5c, the domain volume count reports two, and two links are entered in the /etc/fdmns directory.

# advscan dsk0 dsk5
Scanning disks  dsk0 dsk5
Found domains:
usr_domain
                Domain Id       2e09be37.0002eb40
                Created         Thu Feb 24 09:54:15 2000
                Domain volumes          2
                /etc/fdmns links        2
                Actual partitions found:
                                        dsk0c
                                        dsk5c

In the following example, directories that define the domains that include dsk6 were removed from the /etc/fdmns directory. This means that the number of /etc/fdmns links, the number of partitions, and the domain volume counts are no longer equal. In this example the advscan command scans device dsk6 and recreates the missing domains as follows:

A partition is found containing an AdvFS domain. The domain volume count reports one, but there is no domain directory in the /etc/fdmns directory that contains this partition.

Another partition is found containing a different AdvFS domain. The domain volume count is also one. There is no domain directory that contains this partition.

No other AdvFS partitions are found. The domain volume counts and the number of partitions found match for the two discovered domains.

The advscan command creates directories for the two domains in the /etc/fdmns directory.

The advscan command creates symbolic links for the devices in the /etc/fdmns domain directories.

The command and output are as follows:

# advscan -r dsk6
Scanning disks  dsk6
Found domains:
*unknown*
                Domain Id       2f2421ba.0008c1c0
                Created         Thu Jan 20 13:38:02 2000
 
                Domain volumes          1
                /etc/fdmns links        0
 
                Actual partitions found:
                                        dsk6a*

*unknown*
                Domain Id       2f535f8c.000b6860
                Created         Fri Feb 25 09:38:20 2000
 
               Domain volumes          1
                /etc/fdmns links       0
 
                Actual partitions found:
                                        dsk6b*
Creating /etc/fdmns/domain_dsk6a/
        linking dsk6a
 
Creating /etc/fdmns/domain_dsk6b/
        linking dsk6b

5.10 Recovering from Corruption of a Domain

Some problems occur in AdvFS because of hardware errors. For example, if a write to the file system fails due to a hardware fault, it might appear as metadata corruption. Hardware problems cannot be repaired by your file system.

If unexplained errors on a volume in a multivolume domain, do the following:

As root user, examine the /var/adm/messages file for AdvFS I/O error messages. For example:

Dec 05 15:39:16 systemname vmunix: AdvFS I/O error: 
Dec 05 15:39:16 systemname vmunix: Domain#Fileset:test1#tstfs
Dec 05 15:39:16 systemname vmunix: Mounted on: /test1 
Dec 05 15:39:17 systemname vmunix: Volume: /dev/rz11c 
Dec 05 15:39:17 systemname vmunix: Tag: 0x00000006.8001 
Dec 05 15:39:17 systemname vmunix: Page: 76926 
Dec 05 15:39:17 systemname vmunix: Block: 5164080 
Dec 05 15:39:17 systemname vmunix: Block count: 256 
Dec 05 15:39:17 systemname vmunix: Type of operation: Read 
Dec 05 15:39:17 systemname vmunix: Error: 5 
Dec 05 15:39:17 systemname vmunix: To obtain the name of 
Dec 05 15:39:17 systemname vmunix: the file on which the 
Dec 05 15:39:17 systemname vmunix: error occurred, type the 
Dec 05 15:39:17 systemname vmunix: command 
Dec 05 15:39:17 systemname vmunix: /sbin/advfs/tag2name 
Dec 05 15:39:17 systemname vmunix: /test1/.tags/6

This error message describes the domain, fileset, and volume on which the error occurred. It also describes how to find out which file was affected by the I/O error. If you have no AdvFS I/O error messages but still have unexplained behavior on the file system, unmount the domain as soon as possible and run the verify utility (see Section 5.2.2) to check the consistency of the domain's metadata.

Check for device driver error messages for the volume described in the AdvFS I/O error message. If you have no error messages, unmount the domain as soon as possible and run the verify utility to check the integrity of the domain's metadata. If there are no device driver I/O error messages that correspond to the AdvFS I/O error messages, then the file system is being affected by problems with the underlying hardware.

Try to remove the faulty volume by using the rmvol utility (see Section 1.6.7). If this succeeds, the file system problems should not recur.
If rmvol fails due to more I/O errors, you must recreate the domain.
1. If you have a recent backup, recreate the domain and restore it from backup. If you have no backup, or if it is too old, use the salvage utility (see Section 5.8.4) to extract the contents of the corrupted domain.
2. Remove the faulty domain by using the rmfdmn command.
3. Recreate the domain by using the mkfdmn command. Remember that if you are recreating your domain, it will have a DVN of 4 by default (see Section 1.6.3). Add volumes as needed if you have the AdvFS Utilities license. Do not to include the faulty volume in the new domain.
4. Restore the contents of the recreated domain using the information obtained in the backup step.
5. Remount the filesets in the domain.

5.11 Recovering from Corruption of an AdvFS root Domain

Catastrophic corruption of your AdvFS root domain typically requires that you recreate your root file system in order to have a bootable system. This section explains recovering a corrupted root domain on a non-clustered system. For other configurations, see System Administration "Duplicating or Recovering a System (Root) Disk" and Cluster Administration. If your root volume is an LSM volume, see Logical Storage Manager.

Follow this procedure if the root domain is corrupt. This procedure assumes that the hardware disk device containing the corrupted root domain is functioning properly, that the disklabel is correct, and that the problem is due to data corruption. You must be root user to reconstruct the root domain.

Depending on your system configuration, you might need the following:

Information about console commands
You will use Alpha System Reference Manual (SRM) console commands at the system console prompt (>>>) to perform some tasks. These commands are documented in the hardware manual for your Alpha system. If you cannot find the printed document, it is usually shipped as a printable file on a CD-ROM supplied with the system.

A current operating system CD-ROM
You can use the operating system CD-ROM that is packaged with the distribution media to boot your system and perform maintenance activities on various utilities.
If your local site provides a Remote Installation Service (RIS) server, you can boot your system across the network. If you choose RIS services, follow your site-specific procedures and consult the Installation Guide.

Recent root domain backup media (full and recent incremental backups)
You will need to recreate the root domain on the boot device. You are best prepared if you have a full and recent backup of the root domain. If you do not have adequate backup, depending on the nature and extent of the root domain corruption, you may be able to recover root files using the salvage utility. The salvage utility may also be used to recover files that were modified or created following the most recent backup.

5.11.1 Identifying the Hardware Resources

You need to identify the following hardware resources to complete the restoration of your root disk.

5.11.1.1 SRM Console Names for CD-ROM Drive or Network Interface Device

If you plan to boot your system from the operating system CD-ROM, determine the name of your CD-ROM drive. One method of identifying your CD-ROM drive is by issuing the show device command at the SRM console prompt.

  >>> show device | grep -E 'RR|CD' 
DKA400        RRD47   1206   dka400.4.0.5.0

In this example, the CD-ROM device name is DKA400 according to the SRM console firmware.

If you plan to boot your system from a RIS server, determine the name of your network interface device. One method of identifying your network interface device is by issuing the show device command at the SRM console prompt.

>>> show device | more
....
ewa0.0.0.8.0    EWA0     08-00-2B-C3-E3-DC
...

In this example, the network interface device name is EWA0 according to the SRM console firmware.

For additional information, see the hardware manual for your system. For information about RIS servers, see the Installation Guide -- Advanced Topics.

5.11.1.2 SRM Console Boot Device Name

In previous versions of the operating system, device names were assigned based on the physical location of the drive on an I/O bus. In Version 5.0 and later operating system software, device names are assigned logically and stored in a database. These names are independent of the device's physical location.

You must determine the boot device name according to the SRM console. If your boot device is the default boot device, you can identify this device using the show bootdef_dev command at the SRM console prompt.

>>> show bootdef_dev
bootdef_dev      dkb400.4.0.5.1

If your boot device is not the default boot device, use the show device command from the SRM console prompt to identify your boot device from the list.

For example, if dkb400 is the boot device, dk indicates that the device is a SCSI disk, the b indicates that the device is connected to SCSI bus b, and the 400 indicates that the device's SCSI target ID is 4 and its logical unit number (LUN) is 00. Thus, in this example, the bus/target/LUN information is 1/4/00. This information identifies the device when you restore your domain.

5.11.1.3 UNIX Device Names

If the root domain is mountable when you boot from the installation media, the installation procedure attempts to read the existing device database from the installed root domain. If this read succeeds, the following message appears on the console:

Attempting to mount previous root file system disk 
to save hardware configuration information...
done

If the hardware database read fails, messages similar to the following appears on the console:

Attempting to mount previous root file system disk 
to save hardware configuration information...
FAILED
 
Unable to retain old hardware configuration from
     SCSI 1 4 0 0 0 6000 10201 077
 
Unable to save existing hardware configuration. 
New configuration will be used.

If the hardware database read fails, you must translate the UNIX device name assignments to the proper hardware device by identifying the device by its bus/target/LUN (see Section 5.11.2).

5.11.2 Applying the Procedure

The following steps recover your failed root domain.

Boot the system using one of the following methods:
- Insert and boot your installation CD-ROM using the device name that you determined previously. For example:
```
>>> boot dka400
```
- Boot from your local RIS server. For example:
```
>>> boot ewa0
```

Exit the installation as follows:
- If you have a VGA graphics console, choose to exit the installation, or from the File menu of the Installation and Configuration Welcome dialog box, choose shell window.
- If you have a serial console terminal, select option 3) Exit Installation.
You will get a shell (#) prompt.

Identify both the bus/target/LUN of the target disk that will be used as the restored root disk and the status of backup device by using the hwmgr -view devices command.
```
# hwmgr -view devices
HWID: Device Name     Mfg Model         Location
------------------------------------------------------------
38:/dev/disk/floppy0c    3.5in floppy     fdi0-unit-0    
41:/dev/disk/dsk0c   DEC RZ1DB-CA (C) DEC bus-1-targ-4-lun-0
42:/dev/disk/dsk1c   DEC RZ1CB-CA (C) DEC bus-1-targ-5-lun-0
43:/dev/disk/dsk2c   DEC RZ1CB-CA (C) DEC bus-1-targ-6-lun-0
44:/dev/disk/cdrom0  DEC RRD47    (C) DEC bus-0-targ-5-lun-0
47:(unknown)         DEC TLZ10    (C) DEC bus-1-targ-4-lun-0     
 
```
In this example, the SRM console is identified DKB400 and the disk located at bus b, target 4, LUN 0. According to the hardware database, this same disk is identified as dsk0 (see Section 5.11.1.2). In this procedure, /dev/disk/dsk0a will be used as the volume containing the corrupted root domain. A new root domain will be created on /dev/disk/dsk0a and files from the old root domain will be restored on it.
To visually confirm that you have identified the correct device, use the hwmgr --flash command to cause the disk's light to flash for thirty seconds.
```
# /sbin/hwmgr -flash light -dsf /dev/disk/dsk0a
```
If you plan to recover from a local tape device, identify the device in the list displayed by the hwmgr utility. If you do not see the tape device, check for proper installation and hardware configuration.

If you have a tape backup device, install it.
```
# dn_setup -install_tape 
```
For more information see System Administration "Using dn_setup to Perform Generic Operations."
To verify the installation, repeat the hwmgr command.

If necessary, recover files with the salvage command and save them to a temporary domain (see Section 5.8.4).

Create the new root domain and root fileset. Mount the fileset at /var/mnt.

# mkfdmn -r /dev/disk/dsk0a root_domain
Warning: /dev/disk/dsk0a is marked in use for AdvFS.
If you continue with the operation you can
possibly destroy existing data.
CONTINUE? [y/n] y
# mkfset root_domain root
# mkdir /var/mnt
# mount root_domain#root /var/mnt

Use the vrestore command to restore the files from backup device you installed earlier.
```
 # vrestore -xf  /dev/tape/tape0 -D /var/mnt
```

If necessary, copy files recovered with the salvage command into the newly created root domain (see Section 5.8.4).

Halt the system.
```
# halt 
 
```

Boot the system.
```
>>> boot   
 
```

Verify success by checking the boot process for error messages.

It is a good idea to use the dsfmgr command to verify and fix the device databases and device special file names. For example:
```
# dsfmgr -v
```

If the procedure was not successful and hardware failures are not present, your only recourse is to reinstall the operating system from the distribution media and recreate your customized environment from backup media.

5.12 Restoring a Multivolume usr Domain

Before you restore a multivolume /usr file system, you must first reconstruct the usr_domain domain with all of its volumes. However, restoring a multivolume domain requires the License Management Facility (LMF). LMF controls AdvFS Utilities, which includes the addvol command needed for creating multivolume domains.

First create a one volume usr domain and restore the addvol command. Then restore LMF and use it to enable the addvol command. When this is complete, you can add volumes to the usr domain and restore the complete multivolume domain.

LMF has two parts. A utility is stored in /usr/sbin/lmf and a database is stored in /var/adm/lmf. On some systems /var is a link to /usr and both directories are located in the usr fileset. If your system has this configuration, recover the addvol command and recover both parts of the LMF. On systems where the /usr and /var directories are located in separate filesets in usr_domain, recover the addvol command and the LMF utility into the usr fileset and recover the LMF database into the var fileset.

The following example shows how to restore a multivolume domain where the /var directory and the /usr directory are both in the usr fileset in usr_domain. The domain consists of the dsk1g, dsk2c, and dsk3c volumes. The procedure assumes that the root file system has already been restored. If it has not, see Section 5.11.

Mount the root fileset as read/write.
```
# mount -u /
```

Remove the links for the old usr_domain and create a new usr_domain using the initial volume.
```
# rm -rf /etc/fdmns/usr_domain
# mkfdmn /dev/disk/dsk1g usr_domain
```

Create and mount the /usr and /var filesets.

# mkfset usr_domain usr
# mount -t advfs usr_domain#usr /usr

Create a soft link in /usr because that is where the lmf command looks for its database.
```
# ln -s /var /usr/var
```

Insert the /usr backup tape.

# cd /usr
# vrestore -vi
(/) add sbin/addvol 
(/) add sbin/lmf
(/) add var/adm/lmf
(/) extract
(/) quit

Reset the license database.
```
# /usr/sbin/lmf reset
```

Add the extra volumes to usr_domain.

# /usr/sbin/addvol /dev/disk/dsk2c usr_domain
# /usr/sbin/addvol /dev/disk/dsk3c usr_domain

Do a full restore of the /usr backup.
```
# cd /usr
# vrestore -xv
```

The following example shows how to restore a multivolume domain where the /usr and /var directories are in separate filesets in the same multivolume domain, usr_domain. The domain consists of the dsk1g, dsk2c, and dsk3c volumes. In this case you must mount both the /var and the /usr backup tapes. The procedure assumes that the root file system has already been restored. If it has not, see Section 5.11.

Mount the root fileset as read/write.
# mount -u /

Remove the links for the old usr_domain and create a new usr_domain using the initial volume.
```
# rm -rf /etc/fdmns/usr_domain
# mkfdmn /dev/disk/dsk1g usr_domain
```

Create and mount the /usr and /var filesets.

# mkfset usr_domain usr
# mkfset usr_domain var
# mount -t advfs usr_domain#usr /usr
# mount -t advfs usr_domain#var /var

Insert the /var backup tape and restore from it.

# cd /var
# vrestore -vi
(/) add adm/lmf
(/) extract
(/) quit

Insert the /usr backup tape.

# cd /usr
# vrestore -vi
(/) add sbin/addvol
(/) add sbin/lmf
(/) extract
(/) quit

Reset the license database.
```
# /usr/sbin/lmf reset
```

Add the extra volumes to usr_domain.

# /usr/sbin/addvol /dev/disk/dsk2c usr_domain
# /usr/sbin/addvol /dev/disk/dsk3c usr_domain

Do a full restore of /usr backup.
```
# cd /usr
# vrestore -xv
```

Insert the /var backup tape and do a full restore of /var backup.
```
# cd /var
# vrestore -xv
```

5.13 Recovering from a System Crash

When each domain is mounted after a crash, the system automatically runs recovery code that checks the transaction log file to ensure that file system operations that were occurring when the system crashed are either completed or backed out. This ensures that AdvFS metadata is in a consistent state after a crash. If you are recovering your system by using an operating system other than the one that crashed, see Section 5.4.

5.13.1 Saving Copies of System Metadata

If it appears that a domain is corrupted or it is otherwise causing problems, run the savemeta command to save a copy of the domain's metadata for examination by support personnel. You must be root user to run this command (see savemeta(8)).

5.13.2 Physically Moving an AdvFS Disk

If a machine has failed, you can move disks containing AdvFS domains to another computer running the AdvFS software. Connect the disk(s) to the new machine and modify the /etc/fdmns directory so the new system recognizes the transferred volume(s). You must be root user to complete this process.

You cannot move DVN4 domains to systems running Version 4 of the operating system software. Doing so generates an error message (see Section 5.4). You can move DVN3 domains from a Version 4 machine to a machine running Version 5. The newer operating system recognizes the domains created earlier.

Caution

Do not use either the addvol command or the mkfdmn command to add the volumes to the new machine. Doing so will delete all data on the disk you are moving. See Section 5.8.4 if you have already done so.

If you do not know which partitions your domains were on, you can add the disks on the new machine and run the advscan utility, which might be able to recreate this information. You can also look at the disk label on the disk to see which partitions in the past were made into AdvFS partitions. The disk labels do not tell you which partitions belong to which domains.

If the motherboard of your machine fails, you must move the disks to another system. You might need to reassign the disk SCSI IDs to avoid conflicts. (See your disk manufacturer instructions for more information.)

For example, assume the IDs are assigned to disks 6 and disk 8. Assume also that the system has a domain, testing_domain, on two disks, dsk3 and dsk4. This domain contains two filesets: sample1_fset and sample2_fset. These filesets are mounted on /data/sample1 and /data/sample2. Assume you know that the domain that you are moving had partitions dsk3c, dsk4a, dsk4b, and dsk4g. Take the following steps to move the disks:

Shut down the working machine to which you are moving the disks.

Connect the disks from the bad machine to the good one.

Reboot. You do not need to reboot to single-user mode; multiuser mode works because you can complete the following steps while the system is running.

Determine the device nodes created for the new disks.
```
# /sbin/hwmgr -show scsi -full
```
The output is a detailed list of information about all the disks on your machine. The DEVICE FILE column shows the name that the system uses to refer to each disk. Find the listings for the disks that you just added, for example, disk6 anddisk8. Use these names to set up symbolic links in step 5.

Modify your /etc/fdmns directory to include the information from the transferred domains.

# mkdir -p /etc/fdmns/testing_domain
# cd /etc/fdmns/testing_domain
# ln -s /dev/disk/dsk6c dsk6c
# ln -s /dev/disk/dsk8a dsk8a
# ln -s /dev/disk/dsk8b dsk8b
# ln -s /dev/disk/dsk8g dsk8g
# mkdir /data/sample1
# mkdir /data/sample2

Edit the /etc/fstab file to add the fileset mount-point information.

testing_domain#sample1_fset /data/sample1 advfs rw 1 0
testing_domain#sample2_fset /data/sample2 advfs rw 1 0

Mount the volumes.
```
# mount /data/sample1
# mount /data/sample2
```
Note that if you run the mkfdmn command or the addvol command on partition dsk6c, dsk8a, dsk8b, dsk8g, or an overlapping partition, you will destroy the data on the disk. See Section 5.8.4 if you have accidentally done so.

5.13.3 Log File Inconsistency

When a system crashes, AdvFS performs recovery at reboot. Filesets that were mounted at the time of the crash are recovered when they are remounted. This recovery keeps the AdvFS metadata consistent and makes use of the AdvFS transaction log file.

Since different versions of the operating system use different transaction log file structures, it is important that you recover your filesets on the version of the operating system that was running at the time of the crash. If you do not, you risk corrupting the domain metadata and/or panicking the domain.

If the system crashed because you set the AdvfsDomainPanicLevel attribute (see Section 4.7) to promote a domain panic to a system panic, run the verify command on the panicked domain to ensure that it is not damaged. If your filesets were unmounted at the time of the crash, or if you remounted them successfully and ran the verify command (if needed), you can mount the filesets on a different version of the operating system, if appropriate.

5.13.4 Recovering from Problems Removing Volumes

If the removal process is interrupted (see Section 1.6.7), under some circumstances the volume can be left in an inaccessible state where you cannot write to it. These volumes are marked as "data unavailable" in the output of the showfdmn command. If the volume does not allow writes after an aborted rmvol operation, use the chvol -A command to reactivate the volume.