5    Troubleshooting

This chapter examines problems that, while universal for file systems, may have unique solutions for AdvFS. See System Configuration and Tuning for related information about diagnosing performance problems.

5.1    Disk File Structure Incompatibility

If you install your Version 5 operating system as an update to your Version 4 system (not a full installation), your /root, /usr, and /var files will retain a DVN of 3 (see Section 1.4.3.1).

By default, domains created on Version 5.0 and later have a new format that is incompatible with earlier versions (see Section 1.4.3). The newer operating system recognizes the older disk structure, but the older does not recognize the newer. To access a fileset with the new format (a DVN of 4) from an older operating system, NFS mount the fileset from a Version 5 system or upgrade your operating system to Version 5. There is the potential for problems when files created on one operating system are moved to another.

If you try to mount a fileset belonging to a domain with a DVN of 4 when you are running a version of the operating system earlier than Version 5.0, you will get an error message.

There is no tool that upgrades all domains with a DVN of 3 to domains with DVN of 4. You must upgrade each domain (see Section 1.4.3.2).

5.1.1    Utility Incompatibility

Because of the new on-disk file formats, some AdvFS-specific utilities from earlier releases have the potential to corrupt domains created using the new on-disk formats. All statically-linked AdvFS-specific utilities from earlier operating system versions will not run on Version 5.0 and later. These utilities are usually from operating system versions prior to Version 4.0. In additional, the following dynamically-linked utilities from earlier releases of Tru64 UNIX do not run on Version 5.0 and later:

5.1.2    Avoiding Metadata Incompatibility

If a system crashes or goes down unexpectedly, for example due to loss of power, after reboot AdvFS will perform recovery when the filesets that were mounted at the time of the crash are remounted. This recovery keeps the AdvFS metadata consistent and makes use of the AdvFS transaction log file.

Different versions of the operating system use different AdvFS log record types. Therefore, it is important that AdvFS recovery be done on the same version of the operating system as was running at the time of the crash. For example, if your system was running Version 5.1 when it crashed, do not reboot using Version 3.2G because the log records may be formatted differently from those saved by the Version 5.1 system.

To reboot without error using a different version of the operating system, cleanly unmount all filesets before rebooting. Note that if the system failed due to a system panic or an AdvFS domain panic, it is best to reboot using the original version of the operating system and then run the verify command to ensure that the domain is not corrupted. If it is not, it is then safe to reboot using a different version of the operating system. If running the verify command indicates that the domain has been corrupted, see Section 5.4.6.

5.2    Memory Mapping, Direct I/O and Data Logging Incompatibility

Memory mapping, atomic-write data logging and direct I/O are mutually exclusive. If a file is open in one of these modes, attempting to open the same file in one of the conflicting modes will fail. For more information see Section 4.1.4 and Section 4.1.5 and the mmap(2) reference page.

5.3    Handling Poor Performance

The performance of a disk depends upon the I/O demands upon it. If your domain is structured so that heavy access is focused on one volume, it is likely that system performance will degrade. Once you have determined the load balance, there are a number of ways to equalize the activity and increase throughput. See System Configuration and Tuning, command reference pages, and Chapter 4 for more complete information.

To discover the causes of poor performance, first check system activity (see Section 4.2). There are a number of ways to improve performance:

If you have AdvFS Utilities, you can also:

5.4    Handling Disk Problems

Back up your data regularly and frequently and watch for signs of impending disk failure. Removing files from a problem disk before it fails can prevent a lot of trouble. See the Event Management information in System Administration for more information.

5.4.1    Checking Free Space and Disk Usage

You can look at the way space is allocated on a disk by file, fileset, or domain. The AdvFS GUI (see Chapter 6) displays a hierarchical view of disk objects and the space they use. Table 5-1 shows command-line commands that examine disk space usage.

Table 5-1:  Disk Space Usage Information Commands

Command Description
du Displays information about block allocation for files; use the -a option to display information for individual files.
df Displays disk space usage by fileset; available space for a fileset is limited by the fileset quota if it is set.
showfdmn Displays the attributes and block usage for each volume in an active domain; for multivolume domains, additional volume information is displayed.
showfile Displays block usage and volume information for a file or for the contents of a directory.
showfsets Displays information about the filesets in a domain; use to display fileset quota limits.
vdf Displays used and available disk space for a fileset or a domain.

See the reference pages for the commands for more complete information.

Under certain conditions, the disk usage information for AdvFS may become corrupt. To correct this, change the entry in the /etc/fstab file to enable the quotacheck command to run. The quotacheck command only checks filesets that have the userquota and groupquota options specified. For example, for the fileset usr_domain#usr:

usr_domain#usr /usr advfs rw,userquota,groupquota 0 2

Then run the quotacheck command for the fileset:

# quotacheck usr_domain#usr

This should correct the disk usage information.

5.4.2    Reusing AdvFS Volumes

All volumes (disks, disk partitions, LSM volumes, etc.) are labeled either unused or with the file system for which they were last used. You can only add a volume labeled unused to your domain (see Section 1.3).

If the volume you wish to add is part of an existing domain (the /etc/fdmns directory entry exists), the easiest way to return the volume label to unused status is to remove the volume with the rmvol command or to remove the domain with the rmfdmn command (which labels all volumes that were in the domain unused).

For example, if your volume is /dev/disk/dsk5c, your original domain is old_domain, and the domain you want to add the volume to is new_domain, mount all the filesets in old_domain then enter:

# rmvol /dev/disk/dsk5c old_domain
# addvol /dev/disk/dsk5c new_domain

If the volume you want to add is not part of an existing domain but is giving you a warning message because it is labeled, reset the disk label. If you answer yes to the prompt on the addvol or mkfdmn command, the disk label will be reset. You will lose all information that was on the volume that you are adding.

5.4.3    Dumping to Block 0

To dump to a partition that starts at block 0 of a disk, you must first clear the disk label. If you do not, the vdump command may appear to contain valid savesets, but when the vrestore command attempts to interpret the disk label as part of the saveset, it will return an error (see Section 3.1.5).

5.4.4    Disk Space Usage Limits

If your system has been running without any limits on resource usage, you can add quotas to limit the amount of disk space your users can access. AdvFS quotas provide a layer of control beyond that available with UFS.

User and group quotas limit the amount of space a user or group can allocate for a fileset. Fileset quotas restrain a fileset from grabbing all of the available space in a domain.

You can set two types of quota limits: hard limits that cannot be exceeded and soft limits that can be exceeded for a period of time called the grace period. You can turn quota enforcement on and off. See Chapter 2 for complete information.

If you are working in an editor and realize that the information you need to save will put you over your quota limit, do not abort the editor or write the file because data may be lost. Instead, remove files to make room for the edited file prior to writing it. You can also write the file to another fileset, such as tmp, remove files from the fileset whose quota you exceeded, and then move the file back to that fileset.

AdvFS will impose quota limits in the rare case that you are 8 kilobytes below the user, group, or fileset quota and are attempting to use some or all of the space you have left. This is because AdvFS allocates storage in units of 8 kilobytes. If adding 8 kilobytes to a file would exceed the quota limit, then that file cannot be extended.

5.4.5    Verifying File System Consistency

To ensure that metadata is consistent, run the verify command to verify the file system structure. This utility checks disk structures such as the bitfile metadata table (BMT), the storage bitmaps, the tag directory, and the frag file for each fileset. It verifies that the directory structure is correct, that all directory entries reference a valid file, and that all files have a directory entry. You must be the root user to run this command.

It is a good idea to run the verify command:

Use the SysMan "Repair an AdvFS Domain" or, from the command line, enter:

verify domain_name

The verify command mounts filesets in special directories as it proceeds. If the command is unable to mount a fileset due to the failure of a domain, as a last resort run the command with the -F option. This option mounts the fileset using the -d option of the mount command, which means that AdvFS initializes the transaction log for the domain without recovery. As no domain recovery will occur for previously incomplete operations, this could cause data corruption.

Under some circumstances the verify command may fail to unmount the filesets. If this occurs, you must unmount the affected filesets manually.

On machines with many millions of files, sufficient swap must be allocated for the verify utility to run to completion. If the amount of memory required by verify exceeds the kernel variable proc/max_per_proc_data_size process variable, the utility will not complete. To overcome this problem, allocate up to 10% of the domain size in swap for running the verify command.

The following example verifies the domainx domain, which contains the filesets setx and sety:

# verify domainx
+++Domain verification+++
Domain Id 2f03b70a.000f1db0
Checking disks ...
Checking storage allocated on disk /dev/disk/dsk10g
Checking storage allocated on disk /dev/disk/dsk10a
Checking mcell list ...
Checking mcell position field ...
Checking tag directories ...
 
+++ Fileset verification +++
+++ Fileset setx +++
Checking frag file headers ...
Checking frag file type lists ...
Scanning directories and files ...
     1100
Scanning tags ...
     1100
Searching for lost files ...
     1100
 
+++ Fileset sety +++
Checking frag file headers ...
Checking frag file type lists ...
Scanning directories and files ...
     5100
Scanning tags ...
     5100
Searching for lost files ...
     5100

In this example, the verify command finds no problems with the domain.

5.4.6    Salvaging File Data from a Damaged Domain

How you recover file data from a damaged domain depends on the severity of the damage. Pick the simplest recovery path for the information you have.

  1. Run the verify utility to try to repair the domain (see Section 5.4.5 and verify(8)). The verify utility can only fix a limited set of problems.

  2. Recreate the domain from your most recent backup.

  3. If your backup is not recent enough, use your most recent backup with the salvage utility to obtain more current copies of files.

The amount of data you are able to recover will depend upon the damage to your domain. You must be root user to run the salvage utility. See salvage(8) for more information.

Use the SysMan "Recover Files from an AdvFS Domain" or, from the command line, enter:

salvage domain_name

Running the salvage utility does not guarantee that you will recover all of your domain. You may be missing files, directories, file names, or parts of files. The utility generates a log file that contains the status of files that were recovered. Use the -l option to list in the log file the status of all files that are encountered.

The salvage utility places recovered files in directories named after the filesets. There is a lost+found directory for each fileset that contains files for which no parent directory can be found. You can specify the path name of the directory that is to contain the fileset directories. If you do not specify a directory, the utility writes recovered filesets under the current working directory. You cannot mount the directories in which the files are recovered. You must move the recovered files to new filesets.

The best way to recover your domain is to use your daily backup tapes. If files have changed since the last backup, you can use the tapes along with the salvage utility as follows:

  1. Create a new domain and filesets to hold the recovered information. Mount the filesets.

  2. Restore from your backup tape(s) to the new domain.

  3. Run the salvage utility with the -d option set to recover files that have changed since the backup. If you have no backups, you can run the salvage utility without the -d option to recover all the files in the domain.

The fastest salvage process is to recover file information to another location on disk. The following example recovers data to disk:

# /sbin/advfs/salvage -d 199812071330 corrupt3_domain
salvage: Domain to be recovered 'corrupt3_domain' 
salvage: Volume(s) to be used '/dev/disk/dsk12a'
             '/dev/disk/dsk12g' '/dev/disk/dsk12h' 
salvage: Files will be restored to '.' 
salvage: Logfile will be placed in './salvage.log' 
salvage: Starting search of all filesets: 
             09-Mar-2000 11:53:40 
salvage: Starting search of all volumes: 
             09-Mar-2000 11:55:41 
salvage: Loading file names for all filesets:
             09-Mar-2000 11:56:42 
salvage: Starting recovery of all filesets: 
             09-Mar-2000 11:57:02

If not enough room is available on disk for the recovered information, you can recover data to tape and then write it back on to your original disk location. However, since this process destroys the original damaged data on disk, once you have created a new domain, there is no way to rerun the salvage command if problems arise.

  1. Run the salvage command with the -d option set and use the -F and -f options to specify tar format and tape drive. If you have no backups, you can run the salvage utility without the -d option to recover all the files in the domain.

  2. Remove the corrupt domain.

  3. Create a new domain and filesets to hold the recovered information. Mount the filesets.

  4. Restore from your backup tape(s) to the new domain.

  5. Extract the tar archive from the tape that the salvage utility created (see tar(1)) to the new filesets.

Caution

Writing over the corrupt data on the disk is an irreversible process. If there is an error, you can no longer recover any more data from the corrupt domain. Therefore, look at the salvage log file or the files on the tar tape to make sure you have gotten all the files you need. If you have not recovered a significant number of files, you can use the salvage command with the -S option described below.

The following example recovers data to tape and restores the data to a newly created domain:

# /sbin/advfs/salvage -F tar -d 9810280930 corrupt_domain
salvage: Domain to be recovered 'corrupt_domain' 
salvage: Volume(s) to be used '/dev/disk/dsk8c'
              '/dev/disk/dsk5c' 
salvage: Files will archived to '/dev/tape/tape0_d1' 
              in TAR format 
salvage: Logfile will be placed in './salvage.log' 
salvage: Starting search of all filesets:  
              09-Mar-2000 10:28:13 
salvage: Starting search of all volumes:  
              09-Mar-2000 10:31:41
# rmfdmn corrupt_domain
# mkfdmn /dev/disk/dsk5c good_domain
# addvol /dev/disk/dsk8c good_domain
# mkfset good_domain fset1
# mkfset good_domain fset2
# mount good_domain#fset1 /fset1
# mount good_domain#fset2 /fset2

Then restore filesets from tape(s) created by the salvage command.

# cd /fset1 
# tar -xpf /dev/tape/tape0_d1 fset1
# cd /fset2 
# tar -xpf /dev/tape/tape0_d1 fset2

If you have run the salvage utility and have been unable to recover a large number of files, run salvage with the -S option set. This process is very slow because the utility reads every disk block at least once.

Caution

The salvage utility with the -S option set opens and reads block devices directly. This could present a security problem. It may be possible to recover data from older, deleted AdvFS domains while attempting to recover data from current AdvFS domains.

Note that if you have chosen recovery to tape and have already created a new domain on the disks containing the corrupted domain, you cannot use the -S option because your original information has been lost.

Note

If you have accidentally used the mkfdmn command on a good domain, running the salvage utility with the -S option set is the only way to recover files.

For example:

# salvage -S corrupt3_domain
salvage: Domain to be recovered 'corrupt3_domain'  
salvage: Volume(s) to be used '/dev/disk/dsk2a'
              '/dev/disk/dsk2g' '/dev/disk/dsk2h'  
salvage: Files will be restored to '.'  
salvage: Logfile will be placed in './salvage.log'  
salvage: Starting sequential search of all volumes:
              08-May-2000 14:45:39  
salvage: Loading file names for all filesets:
              08-May-2000 15:00:38
salvage: Starting recovery of all filesets:
              08-May-2000 15:00:40

5.4.7    "Can't Clear a Bit Twice" Error Message

If you receive a "Cannot clear a bit twice" error message, your domain is damaged. To repair it:

  1. Set the AdvfsFixUpSBM kernel variable to allow access to the damaged domain. This flag is off by default

  2. Mount and back up the filesets in the damaged domain.

  3. Turn AdvfsFixUpSBM off.

  4. Unmount the filesets in the domain Run the verify utility with the -f option. If there are errors, continue through steps 5 and 6.

  5. Recreate the domain and filesets.

  6. Restore from the backup.

To turn AdvfsFixUpSBM on:

# dbx -k /vmunix /dev/mem
dbx> assign AdvfsFixUpSBM = 1
dbx> quit
 
 

To turn AdvfsFixUpSBM off:

# dbx -k /vmunix /dev/mem
dbx> assign AdvfsFixUpSBM = 0
dbx>  quit
 
 

Note

The AdvfsFixUpSBM variable is global. Turn it off so that the error message is again available for all domains.

5.4.8    Recovering from a Domain Panic

When a metadata write error occurs, or if corruption is detected in a single AdvFS domain, the system initiates a domain panic (rather than a system panic) on the domain. This isolates the failed domain and allows a system to continue to serve all other domains. After a domain panic AdvFS no longer issues I/O requests to the disk controller for the affected domain. Although the domain cannot be accessed, the filesets in the domain can be unmounted.

When a domain panic occurs, an EVM event is logged (see EVM(5)) and the following message is printed to the system log and the console:

AdvFS Domain Panic; Domain name Id domain_Id

For example:

AdvFS Domain Panic; Domain staffb_domain Id 2dad7c28.0000dfbb
An AdvFS domain panic has occurred due to either a
 metadata write error or an internal inconsistency.
This domain is being rendered inaccessible.

By default, a domain panic on an active domain will cause a live dump to be created and placed in the /var/adm/crash directory. Please file a problem report with your software support organization and include the dump file and a copy of the running kernel.

To recover from a domain panic, perform the following steps:

  1. Run the mount command with the -t option and identify all mounted filesets in the affected domain.

  2. Unmount all these filesets.

  3. Examine the /etc/fdmns directory to obtain a list of the AdvFS volumes in the domain that panicked.

  4. Run the savemeta command (see savemeta(8)) to collect information about the metadata files for each volume in the domain for Compaq support personnel. These saved files will be written in the directory specified and contain information that technical support needs.

  5. If the problem is a hardware problem, fix it before continuing.

  6. Run the verify utility on the domain (see Section 5.4.5).

  7. If the failure prevents complete recovery, recreate the domain with the mkfdmn command and restore the domain's data from backup. If this does not provide enough information, you may need to run the salvage utility (see Section 5.4.6).

For example:

# mount -t advfs
staffb_dmn#staff3_fs on /usr/staff3 type advfs (rw)
staffb_dmn#staff4_fs on /usr/staff4 type advfs (rw) 
# umount /usr/staff3
# umount /usr/staff4
# ls -l /etc/fdmns/staffb_dmn
lrwxr-xr-x 1 root system 10 Aug 25 16:46
    dsk35c->/dev/disk/dsk3c
lrwxr-xr-x 1 root system 10 Aug 25 16:50
    dsk36c->/dev/disk/dsk6c
lrwxr-xr-x 1 root system 10 Aug 25 17:00
    dsk37c->/dev/disk/dsk1c
# /sbin/advfs/savemeta staffb_dmn /tmp/saved_dmn
# verify staffb_dmn

You do not need to reboot after a domain panic.

If you have recurring domain panics, it may be helpful to adjust the AdvfsDomainPanicLevel attribute (see Section 4.3.7) in order to facilitate debugging.

5.4.9    Recovering from Filesets Mounted Read-Only

When a fileset is mounted, AdvFS verifies that all volumes in a domain can be accessed. The size recorded in the domain's metadata for each volume must match the size of the volume. If the sizes match, the mount proceeds. If a volume is smaller than the recorded size, AdvFS attempts to read the last block marked in use for the fileset. If this block can be read, the mount will succeed, but the fileset will be marked as read-only. If the last in-use block for any volume in the domain cannot be read, the mount will fail. See mount(8) for more information.

If a fileset is mounted read-only, check the labels of the flagged volumes in the error message. There are two common errors:

If you have AdvFS Utilities and if the domain consists of multiple volumes and has enough free space to remove the offending volume, you do not need to remove your filesets. However, it is a good idea to back them up before proceeding:

  1. Remove the volume from the domain using the rmvol command. (This will automatically migrate the data to the remaining volumes.)

  2. Correct the disk label of the volume with the disklabel command.

  3. Add the corrected volume back to the domain with the addvol command.

  4. Run the balance command to distribute the data across the new volumes.

For example, if /dev/disk/dsk2c (on a device here called <disk>) within the data5 domain is mislabeled, you can migrate your files on that volume (automatic with the rmvol command), then move them back when you have restored the volume:

# rmvol /dev/disk/dsk2c data5
# disklabel -z dsk2
# disklabel -rw dsk2 <disk>
# addvol /dev/disk/dsk2c data5
# balance data5

If you do not have AdvFS Utilities or if there is not enough free space in the domain to transfer the data from the offending volume:

  1. Back up all filesets in the domain.

  2. Remove the domain with the rmfdmn command.

  3. Correct the disk label of the volume with the disklabel command.

  4. Make the new domain.

  5. If you have AdvFS Utilities and if the original domain was multivolume, add the corrected volume back to the domain with the addvol command.

  6. Restore the filesets from the backup.

For example, if /dev/disk/dsk1c (on a device here called<disk>) containing the data3 domain is mislabeled:

# vdump -0f -u /data3
# rmfdmn data3
# disklabel -z dsk1 <disk>
# disklabel -w dsk1 <disk>
# mkfdmn data3

If you are recreating a multivolume domain, include the necessary addvol commands to add the additional volumes. For example to add /dev/disk/dsk5c to the domain:

# addvol /dev/disk/dsk5c data3
# mkfset data3 data3fset
# mount data3#data3fset /data3
# vrestore -xf - /data3 

5.5    Restoring an AdvFS File System

Use the vrestore command to restore your AdvFS files that have been backed up with the vdump command.

5.5.1    Restoring the /etc/fdmns Directory

AdvFS must have a current /etc/fdmns directory in order to mount filesets (see Section 1.4.2). A missing or damaged /etc/fdmns directory prevents access to a domain, but the data within the domain remains intact. You can restore the /etc/fdmns directory from backup or you can recreate it.

If you have a current backup copy of the directory, it is preferable to restore the /etc/fdmns directory from backup. Any standard backup facility (vdump, tar, or cpio) can back up the /etc/fdmns directory. To restore the directory, use the recovery procedure that is compatible with your backup process.

You can reconstruct the /etc/fdmns directory manually or with the advscan command. The procedure for reconstructing the /etc/fdmns directory is similar for both single-volume and multivolume domains. You can construct the directory for a missing domain, missing links, or the whole directory.

If you choose to reconstruct the directory manually, you must know the name of each domain and its associated volumes.

5.5.1.1    Reconstructing the /etc/fdmns Directory Manually

If you accidentally lose all or part of your /etc/fdmns directory, and you know which domains and links are missing, you can reconstruct it manually.

The following example reconstructs the /etc/fdmns directory and two domains where the domains exist and their names are known. Each contains a single volume (or special device). Note that the order of creating the links in these examples does not matter. The domains are:

domain1 on /dev/disk/dsk1c

domain2 on /dev/disk/dsk2c

To reconstruct the two single-volume domains, enter:

# mkdir /etc/fdmns
# mkdir /etc/fdmns/domain1
# cd /etc/fdmns/domain1
# ln -s /dev/disk/dsk1c dsk1c
# mkdir /etc/fdmns/domain2
# cd /etc/fdmns/domain2
# ln -s /dev/disk/dsk2c dsk2c

The following example reconstructs one multivolume domain. The domain1 domain contains the following three volumes:

/dev/disk/dsk1c

/dev/disk/dsk2c

/dev/disk/dsk3c

To reconstruct the multivolume domain, enter the following:

# mkdir /etc/fdmns
# mkdir /etc/fdmns/domain1
# cd /etc/fdmns/domain1
# ln -s /dev/disk/dsk1c dsk1c
# ln -s /dev/disk/dsk2c dsk2c
# ln -s /dev/disk/dsk3c dsk3c

5.5.1.2    Reconstructing the /etc/fdmns Directory Using advscan

You can use the advscan command to determine which partitions on a disk or Logical Storage Manager (LSM) disk group are part of an AdvFS domain. Then you can use the command to rebuild all or part of your /etc/fdmns directory. This command is useful:

The advscan command can:

For each domain there are three numbers that must match for the AdvFS file system to operate properly:

See advscan(8) for more information.

Inconsistencies can occur in these numbers in a number of ways and for a number of reasons. In general, the advscan command treats the domain volume count as more reliable than the number of partitions or /etc/fdmns links. The following tables list anomalies, possible causes, and corrective actions that advscan can take. In the table, the letter N represents the value that is expected to be consistent for the number of partitions, domain volume count, and number of links.

Table 5-2 shows possible cause and corrective action if the expected value, N, for the number of partitions and for the domain value count do not equal the number of links in /etc/fdmns/<dmn>.

Table 5-2:  Fileset Anomalies and Corrections

Number of Links in /etc/fdmns/<dmn> Possible Cause Corrective Action
<N addvol terminated early or a link in /etc/fdmns/<dmn> was manually removed. If the domain is activated before running advscan with the -f option and the cause of the mismatch was an interrupted addvol, the situation will be corrected automatically. Otherwise, advscan will add the partition to the /etc/fdmns/<dmn> directory.
>N rmvol terminated early or a link in /etc/fdmns/<dmn> was manually added. If the domain is activated and the cause of the mismatch was an interrupted rmvol, the situation will be corrected automatically. Otherwise, if the cause was a manually added link in /etc/fdmns/<dmn>, systematically try removing different links in the /etc/fdmns/<dmn> directory and try activating the domain. The number of links to remove is the number of links in the /etc/fdmns/<dmn> directory minus the domain volume count displayed by advscan.

Table 5-3 shows possible cause and corrective action if the expected value, N, for the number of partitions and for the number of links in /etc/fdmns/<dmn> do not equal the domain volume count:

Table 5-3:  Fileset Anomalies and Corrections

Domain Volume Count Possible Cause Corrective Action
<N Cause unknown Cannot correct; run salvage to recover as much data as possible from the domain.
>N addvol terminated early and partition being added is missing or has been reused. Cannot correct; run salvage to recover as much data as possible from the remaining volumes in the domain.

Table 5-4 shows possible cause and corrective action if the expected value, N, for the domain volume count and for the number of links in /etc/fdmns/<dmn> do not equal the number of partitions:

Table 5-4:  Fileset Anomalies and Corrections

Number of Partitions Possible Cause Corrective Action
<N Partition missing. Cannot correct; run salvage to recover as much data as possible from the remaining volumes in the domain.
>N addvol terminated early. None; domain will mount with N volumes; rerun addvol

To locate AdvFS partitions, enter the advscan command:

advscan [options] disks

In the following example there are no missing domains. The advscan command scans devices dsk0 and dsk5 for AdvFS partitions and finds nothing amiss. There are two partitions found (dsk0c and dsk5c), the domain volume count reports two, and there are two links entered in the /etc/fdmns directory.

# advscan dsk0 dsk5
Scanning disks  dsk0 dsk5
Found domains:
usr_domain
                Domain Id       2e09be37.0002eb40
                Created         Thu Feb 24 09:54:15 2000
                Domain volumes          2
                /etc/fdmns links        2
                Actual partitions found:
                                        dsk0c
                                        dsk5c

In the following example, directories that define the domains that include dsk6 were removed from the /etc/fdmns directory. This means that the number of /etc/fdmns links, the number of partitions, and the domain volume counts are no longer equal.

The advscan command scans device dsk6 and recreates the missing domains as follows:

  1. A partition is found containing an AdvFS domain. The domain volume count reports one, but there is no domain directory in the /etc/fdmns directory that contains this partition.

  2. Another partition is found containing a different AdvFS domain. The domain volume count is also one. There is no domain directory that contains this partition.

  3. No other AdvFS partitions are found. The domain volume counts and the number of partitions found match for the two discovered domains.

  4. The advscan command creates directories for the two domains in the /etc/fdmns directory.

  5. The advscan command creates symbolic links for the devices in the /etc/fdmns domain directories.

The command and output are as follows:

# advscan -r dsk6
Scanning disks  dsk6
Found domains:
*unknown*
                Domain Id       2f2421ba.0008c1c0
                Created         Thu Jan 20 13:38:02 2000
 
                Domain volumes          1
                /etc/fdmns links        0
 
                Actual partitions found:
                                        dsk6a*


*unknown*
                Domain Id       2f535f8c.000b6860
                Created         Fri Feb 25 09:38:20 2000
 
               Domain volumes          1
                /etc/fdmns links       0
 
                Actual partitions found:
                                        dsk6b*
Creating /etc/fdmns/domain_dsk6a/
        linking dsk6a
 
Creating /etc/fdmns/domain_dsk6b/
        linking dsk6b

5.5.2    Recovering from Volume Failure

Some problems show up in AdvFS because of hardware errors. For example, if a write to the file system fails due to a hardware fault, it might show up as metadata corruption. Hardware problems cannot be repaired by your file system. If you start seeing unexplained errors from a file system, do the following:

  1. As root user, examine the /var/adm/messages file for AdvFS I/O error messages. For example:

    Sep 28 15:39:16 systemname vmunix: AdvFS I/O error: 
    Sep 28 15:39:16 systemname vmunix: Domain#Fileset:test1#test1
    Sep 28 15:39:16 systemname vmunix: Mounted on: /test1 
    Sep 28 15:39:17 systemname vmunix: Volume: /dev/rz11c 
    Sep 28 15:39:17 systemname vmunix: Tag: 0x00000006.8001 
    Sep 28 15:39:17 systemname vmunix: Page: 76926 
    Sep 28 15:39:17 systemname vmunix: Block: 5164080 
    Sep 28 15:39:17 systemname vmunix: Block count: 256 
    Sep 28 15:39:17 systemname vmunix: Type of operation: Read 
    Sep 28 15:39:17 systemname vmunix: Error: 5 
    Sep 28 15:39:17 systemname vmunix: To obtain the name of 
    Sep 28 15:39:17 systemname vmunix: the file on which the 
    Sep 28 15:39:17 systemname vmunix: error occurred, type the 
    Sep 28 15:39:17 systemname vmunix: command 
    Sep 28 15:39:17 systemname vmunix: /sbin/advfs/tag2name 
    Sep 28 15:39:17 systemname vmunix: /test1/.tags/6
    

    This error message describes the domain, fileset, and volume on which the error occurred. It also describes how to find out what file was affected by the I/O error. If you do not find any AdvFS I/O error messages but are still seeing unexplained behavior on the file system, unmount the domain as soon as possible and run the verify utility to check the consistency of the domain's metadata.

  2. Check for device driver error messages for the volume described in the AdvFS I/O error message. If you do not find any error messages, unmount the domain as soon as possible and run the verify utility to check the integrity of the domain's metadata. If you do find device driver I/O error messages that correspond to the AdvFS I/O error messages, then the file system is being affected by problems with the underlying hardware.

  3. Try to remove the faulty volume using the rmvol utility (see Section 1.4.7). If this succeeds, the file system problems should not recur. If rmvol fails due to more I/O errors, it will be necessary to recreate the domain.

  4. If you have a recent backup, recreate the domain and restore it from backup. If you have no backup or it is too old, use the salvage utility (see Section 5.4.6) to extract the contents of the corrupted domain.

  5. Remove the faulty domain using the rmfdmn command.

  6. Recreate the domain using the mkfdmn command. Remember that if you are recreating your domain under Version 5.0 and later, your domains will have a DVN of 4 by default (see Section 1.4.3). Add volumes as needed if you have the AdvFS Advanced Utilities package installed. Do not to include the faulty volume in the new domain.

  7. Restore the contents of the recreated domain using the information obtained in step 4.

  8. Remount the filesets in the domain.

5.5.3    Recovering from Failure of the root Domain

A catastrophic failure of the disk containing your AdvFS root domain requires that you recreate your root domain in order to have a domain to boot from. Before you recreate your domain, it is a good idea to satisfy yourself that the failure is not due to hardware problems. Check the console, look for cable or power problems, etc.

If you have files in the root domain that were not backed up, run the salvage utility with the -d option to obtain more recent information from your domain. Make sure that regularly scheduled jobs are disabled. Then boot from your installation CD-ROM.

To recover from the failure of the root domain:

  1. Run the salvage utility if necessary and save the files at another location.

  2. Boot your system as stand-alone.

  3. Transfer to single-user mode.

  4. Examine the devices available.

  5. Label the disk you have chosen.

  6. Create the root domain and fileset. Note that if you have changed the root domain name or fileset name, use the new name.

  7. Mount the newly created root domain and restore from backup.

  8. If necessary, move any files recovered from the salvage process into the root domain.

  9. If necessary, move your /usr file to this disk.

The following example assumes that you are booting from the CD-ROM device DKA500, which is the installation Stand Alone System (SAS). The tape drive is /dev/tape/tape0. The root is being restored to device /dev/disk/dsk1, which is a device here called <disk>. The example boots in single-user mode, creates a new root domain, and restores its contents from backup.

>>> b DKA500
3) UNIX Shell
# ls /dev/disk
# ls /dev/tape/tape0
# disklabel -rw -t advfs /dev/rdisk/dsk1a <disk>
# mkfdmn -r /dev/disk/dsk1a root_domain
# mkfset root_domain root
# mount root_domain#root /mnt
# cd /mnt
# vrestore -x -D .
# mkfdmn /dev/disk/dsk1a usr_domain
# mkfset usr_domain usr
# mount usr/_domain#usr /usr
# mount root_domain#root /mnt
# cd /usr
 
 

You can now boot your restored root domain.

5.5.4    Restoring a Multivolume usr Domain

To restore a multivolume /usr file system, the usr_domain domain must first be reconstructed with all of its volumes before you restore the files. However, creating a multivolume domain requires the addvol utility, and the addvol command will not run unless the License Management Facility (LMF) database, which resides in the /usr/sbin directory, is available. See lmf(8) for information.

On some systems the /var directory, where the LMF database resides, and the /usr directory are both located in the usr fileset. So the directory containing the license database must be recovered from the usr fileset before the addvol command can be accessed. On some systems the /var directory is in a separate fileset. If this is the case, the addvol command can be recovered first and then can be used to add the volumes.

The following example restores a multivolume domain where the /var directory and the /usr directory are both in the usr fileset in the usr_domain domain consisting of the dsk1g, dsk2c, and dsk3c volumes. The procedure assumes that the root file system has already been restored.

  1. Mount the root fileset as read/write:

    # mount -u /
    

  2. Remove the links for the old usr_domain and create a new usr_domain using the initial volume:

    # rm -rf /etc/fdmns/usr_domain
    # mkfdmn /dev/disk/dsk1g usr_domain
    

  3. Create and mount the /usr and /var filesets:

    # mkfset usr_domain usr# mount -t advfs usr_domain#usr /usr
    

  4. Create a soft link in /usr because that is where the lmf command looks for its database:

    		# ln -s /var /usr/var
    

  5. Insert the /usr backup tape:

    # cd /usr
    # vrestore -vi
    (/) add sbin/addvol 
    (/) add sbin/lmf
    (/) add var/adm/lmf
    (/) extract
    (/) quit
    

  6. Reset the license database:

    # /usr/sbin/lmf reset

  7. Add the extra volumes to usr_domain:

    # /usr/sbin/addvol /dev/disk/dsk2c usr_domain
    # /usr/sbin/addvol /dev/disk/dsk3c usr_domain
    

  8. Do a full restore of the /usr backup:

    # cd /usr
    # vrestore -xv
    

The following example restores a multivolume domain where the /usr and /var directories are in separate filesets in the same multivolume domain, usr_domain, consisting of dsk1g, dsk2c, and dsk3c. This means that you must mount both the /var and the /usr backup tapes. The procedure assumes that the root file system has already been restored.

  1. Mount the root fileset as read/write:

    # mount -u /

  2. Remove the links for the old usr_domain and create a new usr_domain using the initial volume:

    # rm -rf /etc/fdmns/usr_domain
    # mkfdmn /dev/disk/dsk1g usr_domain
    

  3. Create and mount the /usr and /var filesets:

    # mkfset usr_domain usr
    # mkfset usr_domain var
    # mount -t advfs usr_domain#usr /usr
    # mount -t advfs usr_domain#var /var
    

  4. Insert the /var backup tape and restore from it:

    # cd /var
    # vrestore -vi
    (/) add adm/lmf
    (/) extract
    (/) quit
    

  5. Insert the /usr backup tape:

    # cd /usr
    # vrestore -vi
    (/) add sbin/addvol
    (/) add sbin/lmf
    (/) extract
    (/) quit
    

  6. Reset the license database:

    # /usr/sbin/lmf reset
    

  7. Add the extra volumes to usr_domain:

    # /usr/sbin/addvol /dev/disk/dsk2c usr_domain
    # /usr/sbin/addvol /dev/disk/dsk3c usr_domain
    

  8. Do a full restore of /usr backup:

    # cd /usr
    # vrestore -xv
    

  9. Insert the /var backup tape and do a full restore of /var backup:

    # cd /var
    # vrestore -xv
    

5.6    Recovering from a System Crash

As each domain is mounted after a crash, it automatically runs recovery code that checks through the transaction log to ensure that any file system operations that were occurring when the system crashed are either completed or backed out. This ensures that AdvFS metadata is in a consistent state after a crash.

5.6.1    Saving Copies of System Metadata

If you believe that a domain is corrupted or otherwise causing problems, run the savemeta command to save a copy of the domain's metadata for examination by Compaq support personnel. You must be root user to run this command (see savemeta(8)).

5.6.2    Physically Moving an AdvFS Disk

If a machine has failed, it is possible to move disks containing AdvFS domains to another computer running AdvFS. Connect the disk(s) to the new machine and modify the /etc/fdmns directory so the new system will recognize the transferred volume(s). You must be root user to complete this process.

You cannot move domains that have a DVN of 4 to systems running a Version 4 operating system. Doing so will generate an error message (see Section 5.1). You can move domains with a DVN of 3 to a machine running Version 5. The newer operating system will recognize the domains created earlier.

Caution

Do not use either the addvol command or the mkfdmn command to add the volumes to the new machine. Doing so will delete all data on the disk you are moving. See Section 5.4.6 if you have already done so.

If you do not know what partitions your domains were on, you can add the disks on the new machine and run the advscan command, which may be able to recreate this information. You can also look at the disk label on the disk to see which partitions in the past have been made into AdvFS partitions. This will not tell you which partitions belong to which domains.

For example, if the motherboard of your machine fails, you need to move the disks to another system. You may need to reassign the disk SCSI IDs to avoid conflicts. (See your disk manufacturer instructions for more information.) For this example, assume the IDs are assigned to disks 6 and 8. Assume also that the system has a domain, testing_domain, on two disks, dsk3 and dsk4. This domain contains two filesets: sample1_fset and sample2_fset. These filesets are mounted on /data/sample1 and /data/sample2.

Assume you know that the domain that you are moving had partitions dsk3c, dsk4a, dsk4b, and dsk4g. The moving process would take the following steps:

  1. Shut down the working machine to which you are moving the disks.

  2. Connect the disks from the bad machine to the good one.

  3. Reboot. You do not need to reboot to single-user mode; multiuser mode works because you can complete the following steps while the system is running.

  4. Figure out the device nodes created for the new disks:

    # /sbin/hwmgr -show scsi -full
    

    The output is a detailed list of information about all the disks on your machine. The DEVICE FILE column shows the name that the system uses to refer to each disk. Determine the listing for the disk you just added, for example, disk6. Use this name to set up symbolic links in step 5 below.

  5. Modify your /etc/fdmns directory to include the information from the transferred domains:

    # mkdir -p /etc/fdmns/testing_domain
    # cd /etc/fdmns/testing_domain
    # ln -s /dev/disk/dsk6c dsk6c
    # ln -s /dev/disk/dsk8a dsk8a
    # ln -s /dev/disk/dsk8b dsk8b
    # ln -s /dev/disk/dsk8g dsk8g
    # mkdir /data/sample1
    # mkdir /data/sample2
    

  6. Edit the /etc/fstab file to add the fileset mount-point information:

    testing_domain#sample1_fset /data/sample1 advfs rw 1 0
    testing_domain#sample2_fset /data/sample2 advfs rw 1 0
    

  7. Mount the volumes:

    # mount /data/sample1
    # mount /data/sample2
    

    Note that if you run the mkfdmn command or the addvol command on partition dsk6c, dsk8a, dsk8b, or dsk8g, or an overlapping partition, you will destroy the data on the disk. See Section 5.4.6 if you have accidentally done so.

5.6.3    Log File Inconsistency

If a system crashes, AdvFS will perform recovery at reboot. Filesets that were mounted at the time of the crash will be recovered when they are remounted. This recovery keeps the AdvFS metadata consistent and makes use of the AdvFS transaction log.

Since different versions of the operating system use different transaction log structures, it is important that you recover your filesets on the version of the operating system that was running at the time of the crash. If you do not, you risk corrupting the domain metadata and/or panicking the domain.

If the system crash has occurred because you have set the AdvfsDomainPanicLevel attribute (see Section 4.3.6) to promote a domain panic to a system panic, it is also good idea to run the verify command on the panicked domain to ensure that it is not damaged. If your filesets were unmounted at the time of the crash, or if you have remounted them successfully and have run the verify command (if needed), you can mount the filesets on a different version of the operating system, if appropriate.