7    Troubleshooting

This chapter examines problems that, while universal for file systems, may have unique solutions for AdvFS. See System Configuration and Tuning for related information about diagnosing performance problems.

7.1    Managing Disk Space

The first step in managing excessive disk space consumption is to request that users delete unnecessary files. There are a number of utilities that look at file usage. You can also limit disk space consumption by imposing quotas on users and groups or on the filesets in a file domain.

7.1.1    Checking Free Space and Disk Usage

You can look at the way space is allocated on a disk by file, fileset, or file domain. The AdvFS GUI (see Chapter 6) displays a hierarchical view of disk objects and the space they use. Table 7-1 shows command-line commands that examine disk space usage.

Table 7-1:  Disk Space Usage Information Commands

Command Description
du Displays information about block allocation for files; use the -a option to display information for individual files.
df Displays disk space usage by fileset; available space for a fileset is limited by the fileset quota if it is set
showfdmn Displays the attributes and block usage for each volume in an active file domain; for multivolume file domains, additional volume information is displayed.
showfile Displays block usage and volume information for a file or for the contents of a directory.
showfsets Displays information about the filesets in a file domain; use to display fileset quota limits.
vdf Displays disk space used and available disk space for a fileset or a file domain.

See the reference pages for the commands for more complete information.

Under certain conditions, the disk usage information for AdvFS may become corrupt. To correct this, change the entry in the /etc/fstab file to enable the quotacheck command to run. The quotacheck command only checks filesets that have the userquota and groupquota options specified. For example, for the fileset usr_domain#usr:

usr_domain#usr /usr advfs rw,userquota,groupquota 0 2

Then run the quotacheck command for the fileset:

# quotacheck usr_domain#usr

This should correct the disk usage information.

7.1.2    Limiting Disk Space Usage

If your system has been running without any limits on resource usage, you can add quotas to limit the amount of disk space your users can access. AdvFS quotas provide a layer of control beyond that available with UFS. You can limit the number of files or blocks used by a fileset as well as the number of files or blocks used by individual users and by groups.

You can set two types of quota limits: hard limits that cannot be exceeded and soft limits that can be exceeded for a period of time called the grace period. You can turn quota enforcement on and off. See Chapter 3 for complete information.

7.1.2.1    Setting User and Group Quotas

User and group quotas limit the amount of space a user or group can allocate for a fileset (see Section 3.2). Table 7-2 shows the command-line commands that operate on user and group quotas.

Table 7-2:  User and Group Quotas Commands

Command Description
edquota Edits quotas and grace periods.
ncheck Displays a list of pairs (tag and path name) for all files in a specified fileset. Use the sorted output as input for the quot command.
quot Displays the number of blocks in the named filesets currently owned by each user.
quota For files or filesets that have quotas enabled, displays disk space usage and limits for users and groups.
quotacheck Checks file system quota consistency and corrects it if necessary.
quotaon, quotaoff Turn quota enforcement on and off.
repquota Prints a summary of the disk usage and quotas by user, group, or fileset.

See the command reference pages for more complete information.

7.1.2.2    Setting Fileset Quotas

Fileset quotas restrain a fileset from grabbing all of the available space in a file domain (see Section 3.3). Without them, any fileset can use it all. The AdvFS GUI (see Chapter 6) displays a hierarchical view of disk objects from which you can view fileset quota information. Table 7-3 shows the command-line commands that display and manage fileset quotas.

Table 7-3:  Fileset Quota Commands

Command Description
chfsets Sets limits (quotas) for block usage and number of files in a fileset.
df Displays the limits and actual number of blocks used in a fileset.
showfdmn Displays disk space usage for a file domain.
showfile Displays block usage and volume location of a file.
showfsets Displays files and block usage for a fileset.
vdf Displays disk space used and available disk space for a fileset or a file domain.

See the command reference pages for more complete information.

7.1.2.3    Running into Quota Limits

If you are working in an editor and realize that the information you need to save will put you over your quota limit, do not abort the editor or write the file because data may be lost. Instead, remove files to make room for the edited file prior to writing it. You can also write the file to another fileset, such as tmp, remove files from the fileset whose quota you exceeded, and then move the file back to that fileset.

AdvFS will impose quota limits in the rare case that you are 8 kilobytes below the user, group, or fileset quota and are attempting to use some or all of the space you have left. This is because AdvFS allocates storage in units of 8 kilobytes. If adding 8 kilobytes to a file would exceed the quota limit, then that file cannot be extended.

7.2    Disk File Structure Incompatibility

If you install your Version 5 operating system as an update to your Version 4 system (not a full installation), your /root, /usr, and /var files will retain a DVN of 3 (see Section 2.3.3.1).

By default, file domains created on Version 5.0 and later have a new format that is incompatible with earlier versions (see Section 2.3.3). The newer operating system recognizes the older disk structure, but the older does not recognize the newer. To access a fileset with the new format (a DVN of 4) from an older operating system, NFS mount the fileset from a Version 5.0 system or upgrade your operating system to Version 5.

If you try to mount a fileset belonging to a file domain with a DVN of 4 when you are running a version of the operating system earlier than Version 5.0, you will get an error message.

There is no tool that upgrades all file domains with a DVN of 3 to domains with DVN of 4. You must upgrade each file domain (see Section 2.3.3.2).

7.2.1    Utility Incompatibility

Older versions of AdvFS utilities should not be run on Version 5.0. Using utilities from earlier operating system releases may appear to run on file domains created in Version 5.0 or later but have potential to corrupt the domains.

7.2.2    Transaction log Incompatiblity

The structure of the transaction log has been modified over different releases of the operating system. This is normally not a problem to users. However, after a system crash, it is important to recover any domains that had mounted filesets at the time of the crash on the same version of operating system that crashed (see Section 7.7.4).

7.3    Memory Mapping and Data Logging Incompatibility

Attempts to memory map an AdvFS file using the mmap() system call will fail if the file has atomic write data logging activated. Enter the chfile command to determine the type of logging in effect:

chfile file_name

If the I/O mode indicates atomic write data logging and you want to turn it off to allow memory mapping to occur, use the following command format:

chfile -L off file_name

See Section 2.7 for more information.

7.4    Handling Poor Performance

The performance of a disk depends upon the I/O demands upon it. If your file domain is structured so that heavy access is focused on one volume, it is likely that system performance will degrade. Once you have determined the load balance on your system, there are a number of ways to equalize the activity and increase throughput. See System Configuration and Tuning, command reference pages, and Chapter 5 for more complete information.

To discover the causes of poor performance:

If you have AdvFS Utilities, you can also:

7.5    Handling Disk Problems

Back up your data regularly and frequently and watch for signs of impending disk failure. Removing files from a problem disk before it fails can prevent a lot of trouble. See the Event Management information in System Administration for more information.

7.5.1    Recovering from Disk Failure

There is no particular message that will tell you that your disk is about to fail, but some warning messages may indicate potential problems. Run the uerf utility, the event report formatter, to print out the hardware-detected events. This report provides information that may help you identify some hardware-related problems.

Hardware problems cannot be repaired by your file system. If you start seeing unexplained errors for a volume, remove that volume from the file domain as soon as possible:

  1. If you can read data from your disk, you can remove the volume with the rmvol utility (see Section 2.3.7).

  2. If you cannot remove the volume, try to back it up.

    1. If you are successful, remove the file domain containing the bad disk, recreate the file domain and filesets on another volume, and restore the data from the backup.

      Remember that if you are recreating your file domain under Version 5.0, your file domains will have a DVN of 4 by default (see Section 2.3.3).

    2. If a disk error prevents you from performing the backup, use the salvage command to extract information from the file domain and send the retrieved files to a new file domain (see Section 7.5.8).

7.5.2    Errors Restoring from Disk

If you have used the dump or rvdump command to write to a disk partition that contains a valid disk label, the device driver has not written over the label. The restore or vrestore command will interpret the disk label as part of the saveset, returning an error message (see Section 4.1.4).

7.5.3    Recovering from a Domain Panic

When a metadata write error occurs, or if corruption is detected in a single AdvFS file domain, the system initiates a domain panic (rather than a system panic) on the file domain. This isolates the failed domain and allows a system to continue to serve all other domains. After a domain panic AdvFS no longer issues I/O requests to the disk controller for the affected domain. Although the file domain cannot be accessed, the filesets in the file domain can be unmounted.

When a domain panic occurs, an EVM event is logged (see EVM(5)) and the following message is printed to the system log and the console:

AdvFS Domain Panic; Domain name Id domain_Id

For example:

AdvFS Domain Panic; Domain staffb_domain Id 2dad7c28.0000dfbb
An AdvFS domain panic has occurred due to either a
 metadata write error or an internal inconsistency.
This domain is being rendered inaccessible.

To recover from a domain panic, perform the following steps. If you cannot successfully complete steps 1 through 8, go to step 10.

  1. Run the mount command with the -t option and look for all mounted filesets in the file domain. For example:

    # mount -t advfs
    staffb_dmn#staff3_fs on /usr/staff3 type advfs (rw)
    staffb_dmn#staff4_fs on /usr/staff4 type advfs (rw)
    

  2. Use the umount command to unmount all filesets in the file domain affected by the domain panic. For example:

    # umount /usr/staff3
    # umount /usr/staff4
    

  3. Use the ls command with the -l option to examine the /etc/fdmns directory to obtain a list of the AdvFS volumes in the domain that panicked. For example:

    # ls -l /etc/fdmns/staffb_dmn
    lrwxr-xr-x 1 root system 10 Aug 25 16:46
        dsk35c-->/dev/disk/dsk3c
    lrwxr-xr-x 1 root system 10 Aug 25 16:50
        dsk36c-->/dev/disk/dsk6c
    lrwxr-xr-x 1 root system 10 Aug 25 17:00
        dsk37c-->/dev/disk/dsk1c
    

  4. Use the savemeta command (see savemeta(8)) to collect information about the metadata files for each volume in the domain. The savemeta command will save information about the BMT and the storage bitmap for each volume in the domain. It will save the transaction log and the root tag file for the domain. These saved files will be written in the dirctory specified. For example, to save the metadata for the domain staffb_dmn in the directory, /tmp/saved_dmn:

    # /sbin/advfs/savemeta staffb_dmn /tmp/saved_dmn
     
    

  5. Use the dia utility, the DECevent report formatter, to extract information about the domain panic from the binary error log. See dia(8) for more information.

  6. If the problem is a hardware problem, fix it before continuing (see Section 7.5.1).

  7. Run the verify utility on the domain (see Section 7.7.1). For example:

    # verify staffb_dmn

  8. If there are no errors, mount all the filesets you had unmounted and resume normal operations.

  9. If the verify command was able to run but showed errors, mount the filesets, do a backup, and recreate the file domain. Note that the backup may be incomplete and that earlier backup resources may be needed.

  10. If the failure prevents complete recovery, recreate the file domain with the mkfdmn command and restore the domain's data from backup. If this does not provide enough information, you may need to run the salvage utility (see Section 7.5.8). Please file a problem report containing the information you have collected with your software support organization.

You do not need to reboot after a domain panic.

If you have recurring domain panics, it may be helpful to adjust the AdvfsDomainPanicLevel attribute (see Section 5.2.7) in order to facilitate debugging.

7.5.4    Recovering from Filesets Mounted Read-Only

If there is a problem with a volume, AdvFS may mount a fileset read-only when you did not specify this option. When a fileset is mounted, AdvFS verifies that all the data in all volumes in a file domain can be accessed. The size recorded in the domain's metadata for each volume must match the size of the volume. If the sizes match, the mount proceeds. If a volume is smaller than the recorded size, AdvFS attempts to read the last block marked in use for the fileset. If this block can be read, the mount will succeed, but the fileset will be marked as read-only. If the last in-use block for any volume in the domain cannot be read, the mount will fail. See mount(8) for more information.

If you find your fileset is mounted read-only, check the labels of the flagged volumes in the error message. There are two common reasons the mount will fail:

If you have AdvFS Utilities and if the domain consists of multiple volumes and has enough free space to remove the offending volume, you do not need to remove your filesets. However, it is a good idea to back them up before proceeding:

  1. Remove the volume from the domain using the rmvol command. (This will automatically migrate the data to the remaining volumes.)

  2. Correct the disk label of the volume with the disklabel command.

  3. Add the corrected volume back to the domain with the addvol command.

  4. Run the balance command to distribute the data across the new volumes.

For example, if /dev/disk/dsk2c (an rz29 disk) within the data5 file domain is mislabeled, you can migrate your files on that volume (automatic with the rmvol command), then move them back when you have restored the volume:

# rmvol /dev/disk/dsk2c data5
# disklabel -z dsk2
# disklabel -rw dsk2 rz29
# addvol /dev/disk/dsk2c data5
# balance data5

If you do not have AdvFS Utilities or if there is not enough free space in the domain to transfer the data from the offending volume:

  1. Back up all filesets in the domain.

  2. Remove the domain with the rmfdmn command.

  3. Correct the disk label of the volume with the disklabel command.

  4. Make the new domain.

  5. If you have AdvFS Utilities and if the original domain was multivolume, add the corrected volume back to the domain with the addvol command.

  6. Restore the filesets from the backup.

For example, if /dev/disk/dsk1c (an rz28 disk) containing the data3 file domain is mislabeled:

# vdump -0f -u /data3
# rmfdmn data3
# disklabel -z dsk1 rz28
# disklabel -w dsk1 rz28
# mkfdmn data3

If you are recreating a multivolume file domain, include the necessary addvol commands to add the additional volumes. For example to add /dev/disk/dsk5c to the file domain:

# addvol /dev/disk/dsk5c data3
# mkfset data3 data3fset
# mount data3#data3fset /data3
# vrestore -xf - /data3 

7.5.5    Possible Data Problems Prior to Version 4.0D

In operating systems prior to version 4.0D, under some circumstances AdvFS stored two different versions of a particular page (an 8-kilobyte segment) in a file. One version of the page was hidden and not readable while the other was read when the file was read. The readable version of the page was not necessarily the most recent or complete version.

This defect is now fixed, but file domains created using older versions may still contain corrupted files. It is a good idea to locate and correct these files. You do not need to recreate the file domain. To locate and fix files, do the following:

  1. Run the verify utility to identify the corrupted files (see Section 7.7.1).

  2. Run the verify utility with the -f option set to capture the two versions of the page for each corrupted file.

  3. Mount the affected fileset and, if necessary, edit the pages of the file to create a single page with the correct data.

  4. Merge the correct page into the file.

The following example detects the corrupted file 526.file.4 in the fileset test_fileset in the file domain test_domain and fixes it:

  1. Run the verify utility on the file domain:

    # verify test_domain
    +++ Domain verification
    +++ Domain Id 32d3e638.000a46a0   
    Checking disks ...   
    Checking storage allocated on disk /dev/disk/dsk1a
    Checking mcell list ...   
    Checking that all in-use mcells are
    attached to a file's mcell chain...   
    Checking tag directories ...   
    +++ Fileset verification +++  
    +++ Fileset test_fileset +++   
    Checking frag file headers ... 
    Checking frag file type lists ...   
    Scanning directories and files ...  
    Overlapping frag data corruption detected in:
    File:  <mount point>/526.file.4
    Page: 1  
    Run verify -f on this domain to enable recovery of this data.
    Scanning tags ...   
    Searching for lost files ...
    

    The verify utility has detected a corrupted file in fileset test_fileset. The name of the file is 526.file.4 and it is located in the highest directory of the fileset when it is mounted. The page that is corrupted is page 1.

  2. Run the verify command with the -f option set. This captures the readable page in a file with the .frag extension and the hidden page in a file with the .ext extension.

    #  verify -f test_domain
    +++ Domain verification 
    +++ Domain Id 32d3e638.000a46a0
    Checking disks ...   
    Checking storage allocated on disk /dev/disk/dsk1a   
    Checking mcell list ...   
    Checking that all in-use mcells are 
    attached to a file's mcell chain...
    Checking tag directories ...  
    +++ Fileset verification +++
    +++ Fileset test_fileset +++
    Checking frag file headers ...
    Checking frag file type lists ...
    Scanning directories and files ... 
    Overlapping frag data corruption detected in:
    File: <mount point>/526.file.4
    Page: 1
    Temporary files created representing the two versions of
    page 1 of file <mount point>/526.file.4 
    The temporary file with the .frag extension contains the 
    hidden page. Refer to the AdvFS documentation for a 
    description of how to use these temporary files to recover 
    from this overlapping frag corruption problem.
    Scanning tags ...   
    Searching for lost files ...   
    

  3. Mount the fileset containing the corrupted file. The .ext and .frag files contain information from the corrupted pages.

    # mount test_domain#test_fileset /test 
    #ls -l /test
    total 169
    drwx------ 2 root system  8192 Jan3 13:23 .tags  
    -rw-r--r-- 1 root system 24576 Jan4 12:27 526.file.1  
    -rw-r--r-- 1 root system 40960 Jan4 12:27 526.file.2  
    -rw-r--r-- 1 root system 32768 Jan4 12:27 526.file.3  
    -rw-r--r-- 1 root system 24576 Jan9 12:27 526.file.4  
    -rw------- 1 root system  8192 Jan9 14:32 
         526.file.4.page_1.ext
    -rw------- 1 root system  8192 Jan3 14:32 
         526.file.4.page_1.frag
    -rw-r----- 1 root operator  8192 Jan8 13:23 quota.group  
    -rw-r----- 1 root operator  8192 Jan8 13:23 quota.user  
    

    In the example above 526.file.4 is the original corrupted file. The file containing the hidden page 1 is 526.file.4.page_1.ext. The file containing the page of the same data as the original file is 526.file.4.page_1.frag.

    To fix the corrupted file, view the .ext and .frag files to decide what to do:

  4. Create a new fixed version of the corrupted file using the corrupted file and new file (the_page_1 in this example).

    1. Copy page 0 from the corrupted file into a new file:

      # dd if=526.file.4 of=newfile bs=8192 count=1 > /dev/null 2>&1
      

    2. Append the desired page 1 to the new file:

      # dd if=the_page_1 of=newfile bs=8192 count=1 seek=1 > = /dev/null  2>&1
      

    3. Append the remainder of the original file to the end of the new file:

      # dd if=526.file.4 of=newfile bs=8192 seek=2 skip=2 > = /dev/null 2>&1
      

    4. Run the diff command on the new and the original file to confirm that only page 1 has changed and to confirm that the difference is what is desired:

      # diff 526.file.4 newfile
      

    5. Rename the new file and remove the temporary files:

      # mv newfile 526.file.4
      # rm 526.file.4.page_1.ext 526.file.4.page_1.frag the_page_1
      

7.5.6    Reusing AdvFS Volumes

All volumes (disks, disk partitions, LSM volumes, etc.) are labeled either unused or with the file system for which they were last used. You can only add a volume labeled unused to your file domain (see Section 2.2).

If the volume you wish to add is part of an existing file domain (the /etc/fdmns directory entry exists), the easiest way to return the volume label to unused status is to remove the volume with the rmvol command or to remove the file domain with the rmfdmn command (which labels all volumes that were in the file domain unused).

For example, if your volume is /dev/disk/dsk5c, your original file domain is old_domain, and the file domain you want to add the volume to is new_domain:

# rmvol /dev/disk/dsk5c old_domain
# addvol /dev/disk/dsk5c new_domain

If the volume you want to add is not part of an existing file domain but is giving you a warning message because it is labeled, reset the disk label. If you answer yes to the prompt on the addvol or mkfdmn command, the disk label will be reset. You will lose all information that was on the volume that you are adding.

7.5.7    Checking AdvFS Disk Structure

The verify command checks the AdvFS metadata structure. It is a good idea to run this command:

  1. When problems are evident (corruptions, domain panic, lost data, I/O errors).

  2. Before an update installation.

  3. If your files have not been accessed in three to six months or longer and you plan to run utilities such as balance, defragment, migrate, quotacheck, repquota, rmfset, rmvol, or vdump that access every file in a domain.

7.5.8    Salvaging File Data from a Damaged AdvFS File Domain

How you recover file data from a damaged file domain depends on the severity of the damage. Pick the simplest recovery path for the information you have.

  1. Run the verify utility to try to repair the domain (see Section 7.7.1 and verify(8)). The verify utility can only fix a limited set of problems.

  2. Recreate the domain from your most recent backup.

  3. If your backup is not recent enough, use your most recent backup with the salvage utility to obtain more current copies of files.

    The amount of data you are able to recover will depend upon the damage to your domain. You must be root user to run the salvage utility. See salvage(8) for more information.

Running the salvage utility does not guarantee that you will recover all of your domain. You may be missing files, directories, file names, or parts of files. The utility generates a log file that contains the status of files that were recovered. Use the -l option to list in the log file the status of all files that are encountered.

The salvage utility places recovered files in directories named after the filesets. There is a lost+found directory for each fileset that contains files for which no parent directory can be found. You can specify the path name of the directory that is to contain the fileset directories. If you do not specify a directory, the utility writes recovered filesets under the current working directory. You cannot mount the directories in which the files are recovered. You must move the recovered files to new filesets.

The best way to recover your domain is to use your daily backup tapes. If files have changed since the last backup, you can use the tapes along with the salvage utility as follows:

  1. Create a new file domain and filesets to hold the recovered information. Mount the filesets.

  2. Restore from your backup tape(s) to the new domain.

  3. Run the salvage utility with the -d option set to recover files that have changed since the backup. If you have no backups, you can run the salvage utility without the -d option to recover all the files in the domain.

The fastest salvage process is to recover file information to another location on disk. The following example recovers data to disk:

# /sbin/advfs/salvage -d 199812071330 corrupt3_domain
salvage: Domain to be recovered 'corrupt3_domain' 
salvage: Volume(s) to be used '/dev/disk/dsk12a'
             '/dev/disk/dsk12g' '/dev/disk/dsk12h' 
salvage: Files will be restored to '.' 
salvage: Logfile will be placed in './salvage.log' 
salvage: Starting search of all filesets: 08-Dec-1998 11:53:40 
salvage: Starting search of all volumes: 08-Dec-1998 11:55:41 
salvage: Loading file names for all filesets: 08-Dec-1998 11:56:42 
salvage: Starting recovery of all filesets: 
             08-Dec-1998 11:57:02

If not enough room is available on disk for the recovered information, you can recover data to tape and then write it back on to your original disk location. However, since this process destroys the original damaged data on disk, once you have created a new file domain, there is no way to rerun the salvage command if problems arise.

  1. Run the salvage command with the -d option set and use the -F and -f options to specify tar format and tape drive. If you have no backups, you can run the salvage utility without the -d option to recover all the files in the domain.

  2. Remove the corrupt domain.

  3. Create a new file domain and filesets to hold the recovered information. Mount the filesets.

  4. Restore from your backup tape(s) to the new domain.

  5. Extract the tar archive from the tape that the salvage utility created (see tar(1)) to the new filesets.

Caution

Writing over the corrupt data on the disk is an irreversible process. If there is an error, you can no longer recover any more data from the corrupt domain. Therefore, look at the salvage log file or the files on the tar tape to make sure you have gotten all the files you need. If you have not recovered a significant number of files, you can use the salvage command with the -S option described below.

The following example recovers data to tape and restores the data to a newly created domain:

# /sbin/advfs/salvage -F tar -d 9810280930 corrupt_domain
salvage: Domain to be recovered 'corrupt_domain' 
salvage: Volume(s) to be used '/dev/disk/dsk8c''/dev/disk/dsk5c' 
salvage: Files will archived to '/dev/tape/tape0_d1' 
              in TAR format 
salvage: Logfile will be placed in './salvage.log' 
salvage: Starting search of all filesets:  08-Dec-1998 10:28:13 
salvage: Starting search of all volumes:  08-Dec-1998 10:31:41

# rmfdmn corrupt_domain
# mkfdmn /dev/disk/dsk5c good_domain
# addvol /dev/disk/dsk8c good_domain
# mkfset good_domain fset1
# mkfset good_domain fset2
# mount good_domain#fset1 /fset1
# mount good_domain#fset2 /fset2

Then restore filesets from tape(s) created by the salvage command.

# cd /fset1 
# tar -xpf /dev/tape/tape0_d1 fset1
# cd /fset2 
# tar -xpf /dev/tape/tape0_d1 fset2

If you have run the salvage utility and have been unable to recover a large number of files, run salvage with the -S option set. This process is very slow because the utility reads every disk block at least once.

Caution

The salvage utility with the -S option set opens and reads block devices directly. This could present a security problem. It may be possible to recover data from older, deleted AdvFS file domains while attempting to recover data from current AdvFS file domains.

Note that if you have chosen recovery to tape and have already created a new file domain on the disks containing the corrupted domain, you cannot use the -S option because your original information has been lost.

Note

If you have accidentally used the mkfdmn command on a good domain, running the salvage utility with the -S option set is the only way to recover files.

For example:

# salvage -S corrupt3_domain
salvage: Domain to be recovered 'corrupt3_domain'  
salvage: Volume(s) to be used '/dev/disk/dsk2a''/dev/disk/dsk2g'
              '/dev/disk/dsk2h'  
salvage: Files will be restored to '.'  
salvage: Logfile will be placed in './salvage.log'  
salvage: Starting sequential search of all volumes:
              08-Sep-1998 14:45:39  
salvage: Loading file names for all filesets: 
              08-Sep-1998 15:00:38
salvage: Starting recovery of all filesets: 08-Sep-1998 15:00:40

7.6    Restoring an AdvFS File System

Use the vrestore command to restore your AdvFS files that have been backed up with the vdump command.

7.6.1    Restoring the /etc/fdmns Directory

AdvFS must have a current /etc/fdmns directory in order to mount filesets (see Section 2.3.2). A missing or damaged /etc/fdmns directory prevents access to a file domain, but the data within the file domain remains intact. You can restore the /etc/fdmns directory from backup or you can recreate it.

If you have a current backup copy of the directory, it is preferable to restore the /etc/fdmns directory from backup. Any standard backup facility (vdump, tar, or cpio) can back up the /etc/fdmns directory. To restore the directory, use the recovery procedure that is compatible with your backup process.

You can reconstruct the /etc/fdmns directory manually or with the advscan command. The procedure for reconstructing the /etc/fdmns directory is similar for both single-volume and multivolume file domains. You can construct the directory for a missing file domain, missing links, or the whole directory.

If you choose to reconstruct the directory manually, you must know the name of each file domain on your system and its associated volumes.

7.6.1.1    Reconstructing the /etc/fdmns Directory Manually

If you accidentally lose all or part of your /etc/fdmns directory, and you know which file domains and links are missing, you can reconstruct it manually.

The following example reconstructs the /etc/fdmns directory and two file domains where the file domains exist and their names are known. Each contains a single volume (or special device). Note that the order of creating the links in these examples does not matter. The file domains are:

domain1 on /dev/disk/dsk1c

domain2 on /dev/disk/dsk2c

To reconstruct the two single-volume file domains, enter:

# mkdir /etc/fdmns
# mkdir /etc/fdmns/domain1
# cd /etc/fdmns/domain1
# ln -s /dev/disk/dsk1c
# mkdir /etc/fdmns/domain2
# cd /etc/fdmns/domain2
# ln -s /dev/disk/dsk2c

The following example reconstructs one multivolume file domain. The domain1 file domain contains the following three volumes:

/dev/disk/dsk1c

/dev/disk/dsk2c

/dev/disk/dsk3c

To reconstruct the multivolume file domain, enter the following:

# mkdir /etc/fdmns
# mkdir /etc/fdmns/domain1
# cd /etc/fdmns/domain1
# ln -s /dev/disk/dsk1c
# ln -s /dev/disk/dsk2c
# ln -s /dev/disk/dsk3c

7.6.1.2    Reconstructing the /etc/fdmns Directory Using advscan

You can use the advscan command to determine which partitions on a disk or Logical Storage Manager (LSM) disk group are part of an AdvFS file domain. Then you can use the command to rebuild all or part of your /etc/fdmns directory. This command is useful:

The advscan command can:

For each domain there are three numbers that must match for the AdvFS file system to operate properly:

See advscan(8) for more information.

Inconsistencies can occur in these numbers in a number of ways and for a number of reasons. In general, the advscan command treats the domain volume count as more reliable than the number of partitions or /etc/fdmns links. The following tables list anomalies, possible causes, and corrective actions that advscan can take. In the table, the letter N represents the value that is expected to be consistent for the number of partitions, domain volume count, and number of links.

Table 7-4 shows possible cause and corrective action if the expected value, N, for the number of partitions and for the domain value count do not equal the number of links in /etc/fdmns/<dmn>.

Table 7-4:  Fileset Anomalies and Corrections

Number of Links in /etc/fdmns/<dmn> Possible Cause Corrective Action
<N addvol terminated early or a link in /etc/fdmns/<dmn> was manually removed. If the domain is activated before running advscan with the -f option and the cause of the mismatch was an interrupted addvol, the situation will be corrected automatically. Otherwise, advscan will add the partition to the /etc/fdmns/<dmn> directory.
>N rmvol terminated early or a link in /etc/fdmns/<dmn> was manually added. If the domain is activated and the cause of the mismatch was an interrupted rmvol, the situation will be corrected automatically. Otherwise, if the cause was a manually added link in /etc/fdmns/<dmn>, systematically try removing different links in the /etc/fdmns/<dmn> directory and try activating the domain. The number of links to remove is the number of links in the /etc/fdmns/<dmn> directory minus the domain volume count displayed by advscan.

Table 7-5 shows possible cause and corrective action if the expected value, N, for the number of partitions and for the number of links in /etc/fdmns/<dmn> do not equal the domain volume count:

Table 7-5:  Fileset Anomalies and Corrections

Domain Volume Count Possible Cause Corrective Action
<N Cause unknown Cannot correct; run salvage to recover as much data as possible from the domain.
>N addvol terminated early and partition being added is missing or has been reused. Cannot correct; run salvage to recover as much data as possible from the remaining volumes in the domain.

Table 7-6 shows possible cause and corrective action if the expected value, N, for the domain volume count and for the number of links in /etc/fdmns/<dmn> do not equal the number of partitions:

Table 7-6:  Fileset Anomalies and Corrections

Number of Partitions Possible Cause Corrective Action
<N Partition missing. Cannot correct; run salvage to recover as much data as possible from the remaining volumes in the domain.
>N addvol terminated early. None; domain will mount with N volumes; rerun addvol

To locate AdvFS partitions, enter the advscan command:

advscan [options] disks

In the following example there are no missing file domains. The advscan command scans devices dsk0 and dsk5 for AdvFS partitions and finds nothing amiss. There are two partitions found (dsk0c and dsk5c), the domain volume count reports two, and there are two links entered in the /etc/fdmns directory.

# advscan dsk0 dsk5
Scanning disks  dsk0 dsk5
Found domains:
usr_domain
                Domain Id       2e09be37.0002eb40
                Created         Thu Jun 24 09:54:15 1999
                Domain volumes          2
                /etc/fdmns links        2
                Actual partitions found:
                                        dsk0c
                                        dsk5c

In the following example, directories that define the file domains that include dsk6 were removed from the /etc/fdmns directory. This means that the number of /etc/fdmns links, the number of partitions, and the domain volume counts are no longer equal.

The advscan command scans device dsk6 and recreates the missing file domains as follows:

  1. A partition is found containing an AdvFS file domain. The domain volume count reports one, but there is no file domain directory in the /etc/fdmns directory that contains this partition.

  2. Another partition is found containing a different AdvFS file domain. The file domain volume count is also one. There is no file domain directory that contains this partition.

  3. No other AdvFS partitions are found. The domain volume counts and the number of partitions found match for the two discovered domains.

  4. The advscan command creates directories for the two file domains in the /etc/fdmns directory.

  5. The advscan command creates symbolic links for the devices in the /etc/fdmns file domain directories.

The command and output are as follows:

# advscan -r dsk6
Scanning disks  dsk6
Found domains:
*unknown*
                Domain Id       2f2421ba.0008c1c0
                Created         Wed Jan 20 13:38:02 1999
 
                Domain volumes          1
                /etc/fdmns links        0
 
                Actual partitions found:
                                        dsk6a*
*unknown*
                Domain Id       2f535f8c.000b6860
                Created         Thu Feb 25 09:38:20 1999
 
               Domain volumes          1
                /etc/fdmns links       0
 
                Actual partitions found:
                                        dsk6b*
Creating /etc/fdmns/domain_dsk6a/
        linking dsk6a
 
Creating /etc/fdmns/domain_dsk6b/
        linking dsk6b

7.6.2    Recovering from Failure of the root Domain

A catastrophic failure of the disk containing your AdvFS root file domain requires that you recreate your root file domain and then restore the root file domain contents from your backup media.

The following example assumes that you are booting from the CD-ROM device DKA500, which is the installation Stand Alone System (SAS). The tape drive is /dev/tape/tape0. The root is being restored to device /dev/disk/dsk1, which is an rz28 disk.

  1. Boot your system as stand-alone:

    >>> b DKA500

  2. Pick option:

    3) UNIX Shell
    

    You will now be at the default root user prompt (#) in single-user mode.

  3. Examine the devices available:

    # ls /dev/disk
    # ls /dev/tape/tape0
    

  4. Make the disk label:

    # disklabel -rw -t advfs /dev/rdisk/dsk1 rz28

  5. Create the root file domain and fileset. Note that if you have changed the root file domain name or fileset name, use the new name:

    # mkfdmn -r /dev/disk/dsk1a root_domain
    # mkfset root_domain root
    

  6. Mount the newly created root domain and restore from tape using a restore utility compatible with your dump utility:

    # mount root_domain#root /mnt
    # cd /mnt
    # vrestore -x -D .
    

You can now boot your restored root domain.

7.6.3    Restoring a Multivolume usr Domain

To restore a multivolume /usr file system, the usr_domain file domain must first be reconstructed with all of its volumes before you restore the files. However, creating a multivolume file domain requires the addvol utility, and the addvol command will not run unless the License Management Facility (LMF) database, which resides in the /usr/sbin directory, is available. See lmf(8) for information.

On some systems the /var directory, where the LMF database resides, and the /usr directory are both located in the usr fileset. So the directory containing the license database must be recovered from the usr fileset before the addvol command can be accessed. On some systems the /var directory is in a separate fileset. If this is the case, the addvol command can be recovered first and then can be used to add the volumes.

The following example restores a multivolume file domain where the /var directory and the /usr directory are both in the usr fileset in the usr_domain file domain consisting of the dsk1g, dsk2c, and dsk3c volumes. The procedure assumes that the root file system has already been restored.

  1. Mount the root fileset as read/write:

    # mount -u /
    

  2. Remove the links for the old usr_domain and create a new usr_domain using the initial volume:

    # rm -rf /etc/fdmns/usr_domain
    # mkfdmn /dev/disk/dsk1g usr_domain
    

  3. Create and mount the /usr and /var filesets:

    # mkfset usr_domain usr# mount -t advfs usr_domain#usr /usr
    

  4. Create a soft link in /usr because that is where the lmf command looks for its database:

    # ln -s /var /usr/var
    

  5. Insert the /usr backup tape:

    # cd /usr
    # vrestore -vi
    (/) add sbin/addvol 
    (/) add sbin/lmf
    (/) add var/adm/lmf
    (/) extract
    (/) quit
    

  6. Reset the license database:

    # /usr/sbin/lmf reset

  7. Add the extra volumes to usr_domain:

    # /usr/sbin/addvol /dev/disk/dsk2c usr_domain
    # /usr/sbin/addvol /dev/disk/dsk3c usr_domain
    

  8. Do a full restore of the /usr backup:

    # cd /usr
    # vrestore -xv
    

The following example restores a multivolume file domain where the /usr and /var directories are in separate filesets in the same multivolume domain, usr_domain, consisting of dsk1g, dsk2c, and dsk3c. This means that you must mount both the /var and the /usr backup tapes. The procedure assumes that the root file system has already been restored.

  1. Mount the root fileset as read/write:

    # mount -u /

  2. Remove the links for the old usr_domain and create a new usr_domain using the initial volume:

    # rm -rf /etc/fdmns/usr_domain
    # mkfdmn /dev/disk/dsk1g usr_domain
    

  3. Create and mount the /usr and /var filesets:

    # mkfset usr_domain usr
    # mkfset usr_domain var
    # mount -t advfs usr_domain#usr /usr
    # mount -t advfs usr_domain#var /var
    

  4. Insert the /var backup tape and restore from it:

    # cd /var
    # vrestore -vi
    (/) add adm/lmf
    (/) extract
    (/) quit
    

  5. Insert the /usr backup tape:

    # cd /usr
    # vrestore -vi
    (/) add sbin/addvol
    (/) add sbin/lmf
    (/) extract
    (/) quit
    

  6. Reset the license database:

    # /usr/sbin/lmf reset
    

  7. Add the extra volumes to usr_domain:

    # /usr/sbin/addvol /dev/disk/dsk2c usr_domain
    # /usr/sbin/addvol /dev/disk/dsk3c usr_domain
    

  8. Do a full restore of /usr backup:

    # cd /usr
    # vrestore -xv
    

  9. Insert the /var backup tape and do a full restore of /var backup:

    # cd /var
    # vrestore -xv
    

7.7    Recovering from a System Crash

As each domain is mounted after a crash, it automatically runs recovery code that checks through the transaction log to ensure that any file system operations that were occurring when the system crashed are either completed or backed out. This ensures that AdvFS metadata is in a consistent state after a crash.

7.7.1    Verifying File System Consistency

If you want to be sure that the metadata is consistent, you can run the verify command to verify the file system structure. This utility checks disk structures such as the bitfile metadata table (BMT), the storage bitmaps, the tag directory, and the frag file for each fileset. It verifies that the directory structure is correct and that all directory entries reference a valid file and that all files have a directory entry. See verify(8) for a full description of command capabilities and Section 7.5.7 for suggestions on when to run the command.

The verify command mounts filesets in special directories as it proceeds. If the command is unable to mount a fileset due to the failure of a file domain, as a last resort run the command with the -F option. This will cause the fileset to be mounted using the -d option of the mount command, which mounts the fileset without running recovery on the file domain. This will cause your file domain to be inconsistent because the disk structure will not have been checked and made consistent. Under some circumstances the verify command may fail to unmount the filesets. If this occurs, you must unmount the affected filesets manually.

The following example verifies the domainx file domain, which contains the filesets setx and sety:

# verify domainx
+++Domain verification+++
Domain Id 2f03b70a.000f1db0
Checking disks ...
Checking storage allocated on disk /dev/disk/dsk10g
Checking storage allocated on disk /dev/disk/dsk10a
Checking mcell list ...
Checking mcell position field ...
Checking tag directories ...
 
+++ Fileset verification +++
+++ Fileset setx +++
Checking frag file headers ...
Checking frag file type lists ...
Scanning directories and files ...
     1100
Scanning tags ...
     1100
Searching for lost files ...
     1100
 
+++ Fileset sety +++
Checking frag file headers ...
Checking frag file type lists ...
Scanning directories and files ...
     5100
Scanning tags ...
     5100
Searching for lost files ...
     5100

In this example, the verify command finds no problems with the file domain. For an example of output where the verify command has detected a corrupted file, see Section 7.5.5.

7.7.2    Displaying Disk Structures

Table 7-7 lists the disk structure dumping utilities that enable you to examine a file domain with suspected metadata corruption. The commands display raw data from the disk in a number of formats.

Table 7-7:  Disk Structure Dumping Utilities

Command Description
nvbmtpg Displays a formatted page of the bitfile metadata table (BMT)
nvfragpg Displays file fragment information
nvlogpg Displays a formatted page of the log
nvtagpg Displays a formatted page of the tag directory
savemeta Saves on-disk metadata
shblock Displays unformatted disk blocks
vfilepg Displays a page of an AdvFS file
vsbmpg Displays a page of the storage bitmap

See the command reference pages for more information.

7.7.3    Moving an AdvFS Disk to Another Machine

If a machine has failed, it is possible to move disks containing AdvFS file domains to another computer running AdvFS. Connect the disk(s) to the new machine and modify the /etc/fdmns directory so the new system will recognize the transferred volume(s). You must be root user to complete this process.

You cannot move file domains that have a DVN of 4 to systems running a Version 4 operating system. Doing so will generate an error message (see Section 7.2). You can move file domains with a DVN of 3 to a machine running Version 5. The newer operating system will recognize the file domains created earlier.

Caution

Do not use either the addvol command or the mkfdmn command to add the volumes to the new machine. Doing so will delete all data on the disk you are moving. See Section 7.5.8 if you have already done so.

If you do not know what partitions your domains were on, you can add the disks on the new machine and run the advscan command, which may be able to recreate this information. You can also look at the disk label on the disk to see which partitions in the past have been made into AdvFS partitions. This will not tell you which partitions belong to which file domains.

For example, if the motherboard of your machine fails, you need to move the disks to another system. You may need to reassign the disk SCSI IDs to avoid conflicts. (See your disk manufacturer instructions for more information.) For this example, assume the IDs are assigned to disks 6 and 8. Assume also that the system has a file domain, testing_domain, on two disks, dsk3 and dsk4. This domain contains two filesets: sample1_fset and sample2_fset. These filesets are mounted on /data/sample1 and /data/sample2.

Assume you know that the file domain that you are moving had partitions dsk3c, dsk4a, dsk4b, and dsk4g. The moving process would take the following steps:

  1. Shut down the working machine to which you are moving the disks.

  2. Connect the disks from the bad machine to the good one.

  3. Reboot. You do not need to reboot to SAS; multiuser mode works because you can complete the following steps while the system is running.

  4. Figure out the device nodes created for the new disks:

    # /sbin/hwmgr -show scsi -full
    

    The output is a detailed list of information about all the disks on your machine. The DEVICE FILE column shows the name that the system uses to refer to each disk. Determine the listing for the disk you just added, for example, disk6. Use this name to set up symbolic links in step 5 below.

  5. Modify your /etc/fdmns directory to include the information from the transferred domains:

    # mkdir -p /etc/fdmns/testing_domain
    # cd /etc/fdmns/testing_domain
    # ln -s /dev/disk/dsk6c dsk6c
    # ln -s /dev/disk/dsk8a dsk8a
    # ln -s /dev/disk/dsk8b dsk8b
    # ln -s /dev/disk/dsk8g dsk8g
    # mkdir /data/sample1
    # mkdir /data/sample2
    

  6. Edit the /etc/fstab file to add the fileset mount-point information:

    testing_domain#sample1_fset /data/sample1 advfs rw 1 0
    testing_domain#sample2_fset /data/sample2 advfs rw 1 0
    

  7. Mount the volumes:

    # mount /data/sample1
    # mount /data/sample2
    

    Note that if you run the mkfdmn command or the addvol command on partition dsk6c, dsk8a, dsk8b, or dsk8g, or an overlapping partition, you will destroy the data on the disk. See Section 7.5.8 if you have accidently done so.

7.7.4    Changing Operating Systems

If a system crashes, AdvFS will perform recovery at reboot. Filesets that were mounted at the time of the crash will be recovered when they are remounted. This recovery keeps the AdvFS metatdata consistent and makes use of the AdvFS transaction log.

Since different versions of the operating system use different transaction log structures, it is important that you recover your filesets on the version of the operating system that was running at the time of the crash. If you do not, you risk corrupting the domain metadata and/or panicking the domain.

If the system crash has occurred because you have set the AdvfsDomainPanicLevel attribute (see Section 5.2.6) to promote a domain panic to a system panic, it is also good idea to run the verify command on the panicked file domain to insure that it is not damaged. If your filesets were unmounted at the time of the crash, or if you have remounted them successfully and have run the verify command (if needed), you can mount the filesets on a different version of the operating system, if appropriate.