4 Optimizing and Tuning the AdvFS File System

There are a number of ways to configure and tune your AdvFS file system. Some of the tuning functions are available through a graphical user interface (see Appendix D). The System Configuration and Tuning manual provides detailed information on tunable parameters for AdvFS.

This chapter covers the following:

Section 4.1 explains ways to monitor system performance.

Section 4.2 discusses improving performance by eliminating frag files.

Section 4.3 suggests ways to improve transaction log performance.

Section 4.4 details methods of data logging that might improve performance.

Section 4.5 explains data cache tuning.

Section 4.6 discusses direct I/O as a method of increasing performance.

Section 4.7 suggests system attributes that might be changed.

Section 4.8 details how and when to defragment a file domain.

Section 4.9 explains how to defragment a single file.

Section 4.10 discusses balancing the distribution of data.

Section 4.11 explains how to move filesets to reduce the strain on system resources.

Section 4.12 explains how to migrate a file to another volume to improve performance.

Section 4.13 explains AdvFS file system striping.

Section 4.14 describes how to control the level of a domain panic.

See System Configuration and Tuning and Chapter 1 for more detailed information about allocating domains and filesets effectively.

4.1 Monitoring Performance

There are a number of ways to gather performance information:

The iostat utility reports I/O statistics for terminals, disks, and the CPU. It displays the number of transfers per second (tps) and bytes per second (bps) in kilobytes. From this you can determine where I/O bottlenecks are occurring. That is, if one device shows sustained high throughput, this device is being utilized more than others. Then you can decide the action that might increase throughput: moving files, obtaining faster volumes, striping files, and so on. You can view I/O statistics with the SysMan Monitoring and Tuning - View Input/Output (I/O) Statistics utility or from the command line (see iostat(1)).

The advfsstat utility displays detailed information about the activity of filesets and domains over time. You can examine, for example, the activity of the buffer cache, volume reads/writes, and the bitfile metadata table (BMT) record. See advfsstat(8) for more information.

Collect for Tru64 UNIX gathers and displays information for subsystems such as memory, disk, tape, network or file systems. Collect runs on all supported releases of Tru64 UNIX. For more information, contact collect_support@compaq.com.

4.2 Improving Performance by Disabling the Frag File

You can control the allocation of space for files that waste more than 5% of their allocated storage. The page size for AdvFS is 8 KB. Files or parts of files that are less than 8 KB are stored in the frag file for the fileset (see Section 1.3.3). Fragging minimizes wasted space in the fileset. If fragging is turned off, I/O is more efficient, but storage requirements increase.

Filesets have frags turned on by default. You can disable them:

At fileset creation by using the following command format:
mkfset -o nofrag domain_name fileset_name

For an existing fileset by using the following command format:
chfsets -o nofrag domain_name fileset_name

Note that frags that already exist continue to exist.

The showfsets command displays the fragmentation status of a fileset. For example:

# showfsets dom_1 fset_3
 
    Id           : 3a3a47cb.000b52a5.2.8006
    Files        :        5,  SLim=     0,  HLim=     0
    Blocks (512) :        0,  SLim=     0,  HLim=     0
    Quota Status : user=off group=off      
    Object Safety: off      
    Fragging     : on
    DMAPI        : off

Disabling or enabling frags in a fileset does not affect existing files. If you want to change the frag status of an existing file, you must:

Change the frag status for the fileset to the one you want for your file by using the chfsets command.

Copy the file to a new location.

Delete the original file and rename the new file to the original name.

Optionally, change the frag status back to the original for the fileset by using the chfsets command.

For example, to disable fragging in the arizona fileset in the states domain in order to initiate permanent atomic-write data logging for the existing taxes file (Section 4.4):

# chfsets -o nofrag states arizona
# cp taxes tmptaxes
# mv -f tmptaxes taxes

4.3 Improving Transaction Log File Performance

Each domain has a transaction log file that keeps track of fileset activity for all filesets in the domain. This requires a high volume of read/write activity to the log file. If the log file resides on a congested disk or bus, or if the domain contains many filesets, system performance can degrade. You can shift the balance of I/O activity so that the log file activity does not use up the bandwidth of the device where you have stored your files.

Monitor performance of the volume with the SysMan Monitoring and Tuning - View Input/Output (I/O) Statistics utility or with the iostat utility. If you have AdvFS Utilities, do one of the following if the volume containing the log file appears to be overloaded:

Divide the domain into several smaller domains. Because each domain has its own transaction log, each log then handles transactions for fewer filesets.

Move the transaction log file to a faster or less congested volume.

Isolate the transaction log file on its own volume.

Moving the transaction log file can also be useful when you are using LSM storage and want to increase reliability by placing your transaction log file on a mirrored volume. For example, if the disk containing the transaction log file crashes, the mirrored log can be accessed.

To move the transaction log file to another volume:

Use the showfdmn command to determine the location of the log file. The letter L after the volume number indicates the volume on which the log file resides.

Use the switchlog command to move the log file to another volume.

For example, to move the transaction log file for the domain region1:

# showfdmn region1 
     Id              Date Created     LogPgs Version Domain Name
31bf51ba.0001be10 Wed Feb  9 16:24 2000  512       4 region1
 
Vol  512-Blks    Free % Used Cmode Rblks Wblks Vol Name
  1L  1787904  885168    52%    on   128   128 /dev/disk/dsk0g
  2   1790096 1403872    22%    on   128   128 /dev/disk/dsk0h
     -------------------------
      3578000 2259040    37%

# switchlog region1 2
# showfdmn region1 
     Id              Date Created     LogPgs Version Domain Name
31bf51ba.0001be10 Wed Feb  9 16:24 2000  512       4 region1
 
Vol  512-Blks    Free % Used Cmode Rblks Wblks Vol Name
  1   1787904  885168    52%    on   128   128 /dev/disk/dsk0g
  2L  1790096 1395680    22%    on   128   128 /dev/disk/dsk0h
     -------------------------
      3578000 2250848    37%

Isolating the transaction log file allows all log I/O to be separate from other domain reads and writes. As there is no other activity on the log volume, the log file I/O is not slowed down and does not slow down other domain I/O.

To isolate the transaction log file on its own volume:

Add a small partition (volume) to the domain for which you are going to isolate the log file.
Remember that the I/O load of other partition(s) on the device affects the performance of the entire disk including the log file partition.
If the remaining partitions are allocated to other domains, there might be more than one transaction log file on the same device. This might not be a problem on a solid state disk but might negate the value of isolating the log file on slower devices.

Use the switchlog command to move the log file to another volume.

Use the showfdmn command to determine the number of free blocks on the volume with the log file.

With the information from the showfdmn command, use the dd command to build a dummy file of the right size.

Migrate the dummy file to the volume that contains the log file. This fills the volume completely and leaves no space for other files. Because you never access this file, only the transaction log file is active on the volume.

For example, to isolate the transaction log file for the domain sales:

# addvol /dev/disk/dsk9a sales 
# switchlog sales 2 
# showfdmn sales 
     Id               Date Created     LogPgs Version Domain Name
312387a9.000b049f Thu Mar 16 14:24 2000  512       4 sales
 
Vol  512-Blks    Free % Used Cmode Rblks Wblks Vol Name
  1   2050860 1908016     7%    on   128   128 /dev/disk/dsk10c
  2L   131072  122752     6%    on   128   128 /dev/disk/dsk9a
     -------------------------
      2181932 2030768     7%

Allocate all the free blocks on the volume containing the log file to a dummy file, /adv1/foo, then move the data to the log file volume:

# dd if=/dev/zero of=/adv1/foo count=122752 
122752+0 records in
122752+0 records out
# migrate -d 2 /adv1/foo

4.4 Improving Data Consistency

The method you choose to write data to a file can affect what is saved if a machine fails. You can synchronize I/O for writing cached metadata and data to disk and you can turn atomic-write data logging on and off.

You can use the fcntl() command to turn synchronous writes and atomic-write data logging on and off. See fcntl(2) and the Programmer's Guide for more information. The following sections describe other ways to do this.

4.4.1 Asynchronous I/O

Write requests, by default, are cached; that is, data is written to the buffer cache, not immediately to disk. This method, asynchronous I/O, generally gives the highest throughput, in part because multiple writes to the same page can be combined into one physical write to disk. This decreases disk traffic and increases the concurrent access of common data by multiple threads and processes. In addition, delaying the write to disk increases the likelihood that a page is combined with contiguous pages into a single, larger physical write, saving seek time and delays caused by rotational latency.

If a crash occurs, the next time a fileset in the domain is mounted, the completed log transactions are replayed to disk and incomplete transactions are backed out so that the original metadata on disk is restored. These log transactions, by default, save only metadata, not the data written to the file. This means that file sizes and locations on disk are consistent but, depending on when the crash occurred, the user data from recent writes might be out of date. This is a trade-off for the increased throughput gained using this method.

4.4.2 Asynchronous Atomic-Write Data Logging I/O

Asynchronous atomic-write data logging I/O is similar to asynchronous I/O except that the data written to the buffer cache is also written to the log file for each write request. This atomic-write data logging is done in 8 KB increments. Eventually the data is also written to the file, meaning that the data is written to disk twice: once to the log file and then to the file. The extra write of the data to the log file can degrade throughput compared with using asynchronous I/O.

If a crash occurs, the data is recovered from the log file when the fileset is remounted. As in asynchronous I/O, all completed log transactions are replayed and incomplete transactions are backed out. Unlike asynchronous I/O, however, the user's data has been written to the log, so both the metadata and the data intended for the file can be restored. This guarantees that each 8 KB increment of a write is either completely written to disk or is not written to disk. Because only completed write requests are processed, obsolete, possibly sensitive data located where the system was about to write at the time of the crash can never be accessed. Disk writes in the wrong order, which might cause inconsistencies in the event of a crash, can never occur.

Another way to prevent access to obsolete data is to use the chfsets -o objectsafety command (see Section 1.7.10). Choosing object safety prevents an application from reusing old data, but it does not guarantee complete data recovery.

There are two types of atomic-write data logging: persistent and temporary. Persistent logging sets an on-disk flag for a file so atomic-write data logging persists across mounts and unmounts. Temporary data logging sets an in-memory flag activating atomic-write data logging for all files in the fileset for the duration of the mount.

4.4.2.1 Persistent Atomic-Write Data Logging

To turn persistent atomic-write data logging I/O on and off, use the fcntl() function or enter the chfile -L command:

chfile -L on file_name

chfile -L off file_name

If a file has a frag, persistent atomic-write data logging cannot be activated. To activate data logging on a file that has a frag, do one of the following:

Activate temporary atomic-write data logging, which operates on files with frags.

Choose the nofrag option for the fileset when you access it by using the mkfset or chfsets command.
If you disable frag files by using the chfsets -o nofrag command, files with existing frags do not change. You can, however, copy a file with frags, delete the original file, and rename the copy to the original file name. The file is then free of frag files (see Section 4.2).

Append enough bytes to the file to bring it up to the next 8 KB boundary. For example, if fileb has 6803 bytes, it is stored in one 7 KB frag. To activate data logging, you can add 1389 bytes so the file terminates on an 8 KB boundary:
```
# dd if=/dev/zero of=fileb bs=1 seek=6803 \
    count=1389  conv=notrunc
```
Files that use persistent atomic-write data logging cannot be memory mapped through the mmap system call. See Section 5.5 for information on conflicting file usage.

4.4.2.2 Temporary Atomic-Write Data Logging

Use the mount -o adl command to set an in-memory flag that activates temporary atomic-write data logging in a fileset for the duration of the mount. Persistent atomic-write data logging commands take precedence over temporary commands while the file is open.

Any application that has the file open can call the fcntl() function to turn off temporary atomic-write data logging, or use the chfile -L off command to turn off persistent atomic-write data logging while the file is open. All applications that have the file open are affected. After all applications close the file, temporary atomic-write data logging is restored for the file. You can check atomic-write data logging status for a file by using the chfile command.

Files using temporary atomic-write data logging can be memory mapped. Temporary atomic-write data logging is suspended until the last thread using the memory-mapped file unmaps it. In addition, files that have frags can use temporary data logging feature. NFS mounting does not affect logging behavior.

4.4.3 Synchronous I/O

Synchronous I/O is similar to asynchronous I/O, but the data is written both to the cache and to the disk before the write request returns to the calling application. This means that if a write is successful, the data is guaranteed to be on disk. Synchronous I/O reduces throughput because the write does not return until after the I/O is complete. In addition, since the application, not the file system, determines when the data needs to be flushed to disk, the likelihood of consolidating I/Os might be reduced if synchronous write requests are small.

To turn synchronous I/O off and on, use the O_SYNC or O_DSYNC flag to the open() system call (see the Programmer's Guide). To force all applications to use synchronous I/O even if files are not opened in that mode, enter the chfile -l command:

chfile -l on file_name

chfile -l off file_name

4.4.4 Synchronous Atomic-Write Data Logging I/O

You cannot use both the -l and -L options of the chfile command to set synchronous atomic-write data logging. However, if you activate persistent atomic-write data logging on a file by using the chfile -L on command, you can open the file for synchronous I/O by using the O_SYNC or O_DSYNC flag to the open() system call (see the Programmer's Guide).

You can activate temporary atomic-write data logging for the fileset by using the mount -o adl, sync command.

4.5 Data Cache Tuning

Caching improves performance when data is reused frequently. AdvFS uses a dynamic memory cache called the Unified Buffer Cache (UBC) to manage file metadata and user data.

By using the UBC for caching, AdvFS can maintain file data in memory as long as memory is available. If other system resources require some of the memory in use by the file system cache, the UBC can reclaim the memory that is needed.

Because AdvFS uses the UBC to control caching, the cache is tuned with the UBC tunable parameters. These include:

Variables that modify the maximum percentage of physical memory that the UBC can use at one time

The percentage of pages that must be dirty before the UBC starts writing them to disk

The maximum amount of memory allocated to the UBC that can be used to cache a single file

See System Configuration and Tuning for guidelines for modifying these parameters.

Although caching data is the default and generally improves file system performance, in some situations an application can increase throughput by bypassing the data cache (see Section 4.6).

4.6 Improving Data Transfer Rate with Direct I/O

Direct I/O mode bypasses caching and synchronously reads and writes data from a file without copying the data into a cache (the normal AdvFS process). That is, when direct I/O is enabled for a file, read and write requests on it are executed to and from disk storage through direct memory access (similar to raw I/O), bypassing AdvFS caching. This can improve the speed of the I/O process for applications that access data only once.

Although direct I/O handles requests of any byte size, you get the best performance when the requested transfer size is aligned on a disk sector boundary and the transfer size is an even multiple of the underlying sector size (currently 512 bytes).

Direct I/O is particularly suited for files that are used exclusively by a database. However, if an application tends to access data multiple times, direct I/O can adversely impact performance because caching does not occur. When you specify direct I/O, it takes precedence and any data already in the buffer cache for that file is automatically flushed to disk.

To open a file for direct I/O, use the open() function and specify the O_DIRECTIO flag. For example, for file_x enter:

open (file_x, O_DIRECTIO|O_RDWR, 0644)

Regardless of the previous mode, the new mode is direct I/O and remains so until the last close of the file. Note that direct I/O, atomic-write data logging, and memory mapping are mutually exclusive modes. Therefore, if the file is already open for atomic-write data logging or is memory mapped, then calling the open function to initiate direct I/O fails.

The fcntl() function can be used to determine whether the file is open in cached or in direct I/O mode. See fcntl(2) and open(2), or the Programmer's Guide for more information.

4.7 Changing Attributes to Improve System Performance

You can change a number of attributes to improve system performance. System Configuration and Tuning details the significance of each attribute and the trade-offs engendered when they are changed. See sysconfig(8) for more information. You can modify attributes to:

Increase the dirty-data caching threshold
Dirty or modified data is data that was written by an application and cached but is not yet been written to disk. You can modify the amount of dirty data that AdvFS caches for each volume in a domain by using the chvol -t command or, for all new volumes of a file system, by using the AdvfsReadyQLim attribute (see chvol(8)).
Modifying this variable by using the chvol command is most effective if smooth sync is disabled. If your system is using smooth sync (the default), then the rate at which the dirty data is flushed to disk is best tuned by using the smoothsync_age attribute.

Promote continuous I/O by using the smoothsync_age attribute
The smoothsync_age attribute specifies the number of seconds that a modified page stays in the buffer cache before being flushed to disk. This allows the file system to balance the need to flush modified pages to disk in a timely manner with the benefits of keeping the page in memory while all modifications are being made.

Change the I/O transfer size
AdvFS reads and writes data by 8 KB pages. The maximum transfer size is determined by the device driver and depends on the underlying storage configuration but is typically 128 or 256 blocks. LSM might assign a larger maximum transfer size. You can adjust the maximum transfer size by using the chvol command with the -r (read) or -w (write) options (see chvol(8)).

Flush modified memory-mapped pages
The AdvfsSyncMmapPages attribute controls whether modified memory-mapped pages are flushed to disk during a sync system call.

Increase the memory available for access structures
AdvFS allocates file access structures until the percentage of pageable memory used for the access structures reaches AdvfsAccessMaxPercent. Increasing the value of the AdvfsAccessMaxPercent attribute might improve AdvFS performance on systems that open and reuse many files, but this decreases the memory available for the virtual memory subsystem and the Unified Buffer Cache (UBC). Decreasing the value of the attribute frees pageable memory but might degrade AdvFS performance on systems that open and reuse many files.

4.8 Defragmenting a Domain

The AdvFS file system attempts to store file data in contiguous blocks on a disk. This collection of contiguous blocks is called a file extent. If all data in a file is stored in contiguous blocks, that file has one file extent. However, as files grow, contiguous blocks on the disk might not be available to accommodate the new data, and the system spreads the file over discontiguous blocks. As a result, the file is fragmented on the disk and consists of multiple file extents. File fragmentation degrades the read/write performance because many disk addresses must be examined to access a file.

The defragment utility reduces the amount of file fragmentation in a domain by attempting to make the files more contiguous so that the number of file extents is reduced. In addition, defragmenting a domain often makes the free space on a disk more contiguous so files that are created later are also less fragmented.

Files might be moved to other volumes in the defragmentation of a multivolume domain. You cannot control the placement of files during defragmentation, but you can use the showfile to identify where a file is stored. If you want to move a file, use the migrate command (see Section 4.12).

You can improve the efficiency of the defragment process by deleting any unneeded files in the domain before running the defragment utility. Aborting the defragment process does not damage the file system. Files that have been defragmented remain in their new locations.

It is difficult to specify the load that defragmenting places on a system. The time it takes to defragment a domain depends on:

The size of the volume(s)

The amount of free space available

The activity of the system

The configuration of your domain
Because the defragment utility creates a thread per volume (up to a maximum of 20 threads), a domain consisting of several small volumes is faster to defragment than one consisting of a large volume. Multiple threads might exact a severe performance penalty for ongoing I/O. If you want to limit defragmentation to a single thread (similar to Version 4 operating system software behavior), use the defragment -N 1 command.

To defragment a domain, use the SysMan Manage an AdvFS Domain utility, a graphical user interface (see Appendix D), or enter the defragment command from the command line:

defragment domain_name

The following restrictions apply to running the defragment command:

You must have root user privileges.

All filesets in the domain must be mounted. If you try to defragment an active domain that includes unmounted filesets, an error message is displayed.

A minimum free space of 1% of the total space or 5 MB per volume (whichever is less) must be available to defragment each volume.

The defragment utility cannot be run while the addvol, rmvol, balance, or rmfset command is running in the same domain.

See defragment(8) for more information.

4.8.1 Choosing to Defragment

Run the defragment utility on your domain when you experience performance degradation and then only when file system activity is low.

To determine the amount of fragmentation in your domain without starting the utility, run the defragment -v -n command. If the average number of extents or the number of extents per file with extents is high or the aggregate I/O performance is low, defragmentation might be helpful.

The level of fragmentation you should allow in your file system before running the utility depends on the size of the files and the number of extents. This is largely application dependent, so monitor the number of extents to see if elevated extent counts correlate with decreased application performance. In many cases, even a large, fairly fragmented file does not show a noticeable decrease in performance because of fragmentation. It is not necessary to run the defragment command on a system that is not experiencing performance-related problems because of excessive file fragmentation.

If your file system has been untouched for a month or two, that is, if you did not run full periodic backups nor regularly reference your whole file system, it is a good idea to run the verify command before you run the defragment command. Run the verify command when there is low file system activity.

Running the balance utility before you run defragment might speed up the defragmentation process.

If you have a system, such as a mail server, that contains files that are mostly smaller than 8 KB, run the defragment command only when the output of the showfile -x /mntpt/.tags/1 command indicates that the frag file is highly fragmented. Here /mntpt is the mount point for the fileset and .tags/1 is the frag file.

If you find that one file shows high fragmentation, you can defragment that file (see Section 4.9).

4.8.2 Defragmenting Example

If you have the hardware resources and AdvFS Utilities, you can add a volume by using the addvol command then remove the previous volume by using the rmvol command. Removing the volume migrates the domain to the new volume, and the files in it are defragmented as part of the migration.

The following example looks at the fragmentation of the accounts_domain domain and at the number of extents in the orig_file_1 file, and then defragments the domain for a maximum of 15 minutes. Verbose mode is used to display the fragmentation data at the beginning of each pass through the domain and at the end of the defragmentation process.

# defragment -vn accounts_domain
defragment: Gathering data for 'accounts_domain'
Current domain data:
   Extents:                 263675
   Files w/ extents:        152693
   Avg exts per file w/exts:  1.73
   Aggregate I/O perf:         70%
   Free space fragments:     85574
                <100K   <1M   <10M   >10M
   Free space:   34%   45%    19%     2%
   Fragments:  76197  8930    440      7
# showfile -x orig_file_1
    Id Vol PgSz Pages XtntType Segs SegSz   I/O Perf  File
6.8002   2   16    71   simple   **    ** async  82%  orig_file_1
        \\: 1
            pageOff    pageCnt    vol    volBlock    blockCnt
                  0          5      2       40720          80
                  5         12      2       41856         192
                 17         16      2       40992         256
                 33          7      2       42048         112
                 40         12      2       41360         192
                 52         15      2       42160         240
                 67          4      2       41792          64
            extentCnt: 7
# defragment -v -t 15 accounts_domain
defragment:  Defragmenting domain 'accounts_domain'
 
Pass 1; 
  Volume 2: area at block      144 (  130800 blocks): 0% full
  Volume 1: area at block   468064 (  539008 blocks): 49% full
  Domain data as of the start of this pass:
    Extents:                   7717
    Files w/extents:           6436
    Avg exts per file w/exts:  1.20
    Aggregate I/O perf:         78%
    Free space fragments:       904
                    <100K    <1M    <10M    >10M
     Free space:      4%     5%     12%     79%
     Fragments:      825     60      13       6
Pass 2;
  Volume 1: area at block   924288 (  547504 blocks): 69% full
  Volume 2: area at block      144 (  130800 blocks):  0% full
  Domain data as of the start of this pass:
    Extents:                   6507
    Files w/extents:           6436
    Avg exts per file w/exts:  1.01
    Aggregate I/O perf:         86%
    Free space fragments:      1752
                    <100K    <1M    <10M    >10M
     Free space:      8%     13%     11%     67%
     Fragments:     1574     157      15       6
 
Pass 3;
  Domain data as of the start of this pass:
    Extents:                   6485
    Files w/extents:           6436
    Avg exts per file w/exts:  1.01
    Aggregate I/O perf:         99%
    Free space fragments:       710
                    <100K    <1M    <10M    >10M
     Free space:      3%    11%     21%     65%
     Fragments:      546    126      32       6
 
Defragment: Defragmented domain 'accounts_domain'

Information displayed before each pass and at the conclusion of the defragmentation process indicates the amount of improvement made to the domain. A decrease in the Extents and Avg exts per file w/extents values indicates a reduction in file fragmentation. An increase in the Aggregate I/O perf value indicates improvement in the overall efficiency of file-extent allocation.

4.9 Defragmenting a File

To determine the amount of fragmentation level of a single file, run the showfile -x command to show the extents in the file. (You might already have this information from examining the output of the defragment -v -n command.)

To reduce the fragmentation of a file:

Use the migrate utility to move the file to the same or a different volume containing adequate contiguous free space.

Back up and restore a file. This tends to defragment it.
1. Back up the file by using the vdump command.
2. Delete or rename the file.
3. Restore the data by using the vrestore command.

4.10 Balancing a Multivolume Domain

The balance utility distributes the percentage of used space evenly between volumes in a multivolume domain created with the optional AdvFS Utilities. This improves performance and evens the distribution of future file allocations.

Figure 4-1: Balancing a Domain

Files are moved from one volume to another, as illustrated in Figure 4-1, until the percentage of used space on each volume in the domain is as equal as possible. Because the balance utility does not generally split files, domains with very large files might not balance as evenly as domains with smaller files.

To redistribute files across volumes, use the SysMan Manage an AdvFS Domain utility, a graphical user interface (see Appendix D), or enter the balance command from the command line:

balance domain_name

If you interrupt the balance process, all relocated files remain at their new locations. The rest of the files remain in their original locations.

The following restrictions apply to running the balance utility:

You must have root user privileges.

All filesets in the domain must be mounted. If you try to balance an active domain that includes unmounted filesets, an error message is displayed.

The balance utility cannot run while the addvol, rmvol, defragment, or rmfset command is running in the same domain.

See balance(8) for more information.

4.10.1 Choosing to Balance

Use the showfdmn command to display domain information. Look at the % Used field to determine if the files are evenly distributed.

Use the balance utility to even file distribution after you have added a volume by using the addvol command or removed a volume by using the rmvol command (if there are multiple volumes remaining).

4.10.2 Balance Example

In the following example, the multivolume domain usr_domain is not balanced. Volume 1 has 63% used space while volume 2, a smaller volume, has 0% used space (it has just been added). After balancing, both volumes have approximately the same percentage of used space.

# showfdmn usr_domain
            Id       Date Created      LogPgs Version Domain Name
3437d34d.000ca710 Mon Apr 3 10:50:05 2000 512       4 usr_domain
 
 Vol  512-Blks   Free % Used  Cmode Rblks  Wblks  Vol Name 
  1L   1488716 549232    63%     on   128    128  /dev/disk/dsk0g
  2     262144 262000     0%     on   128    128  /dev/disk/dsk4a
     --------- -------  ------
       1750860 811232    54%
# balance usr_domain
 balance: Balancing domain 'usr_domain' 
 balance: Balanced domain 'usr_domain'
# showfdmn usr_domain
            Id       Date Created      LogPgs Version Domain Name
3437d34d.000ca710 Mon Apr 3 10:50:05 2000 512       4 usr_domain
 
 Vol  512-Blks   Free % Used  Cmode Rblks  Wblks  Vol Name 
  1L   1488716 689152    54%     on   128    128  /dev/disk/dsk0g
  2     262144 122064    53%     on   128    128  /dev/disk/dsk4a
     --------- -------  ------
       1750860 811216    54%

4.11 Moving Filesets to Different Volumes

If you suspect that a fileset or domain is straining system resources, run the iostat utility either from the SysMan Monitoring and Tuning - View Input/Output (I/O) Statistics utility, or from the command line (see iostat(1)). If the filesets or domains are located on devices that appear to be a bottleneck, you can migrate files or pages of files to equalize the load. If a high-performance device is available, you can move a file that is I/O-intensive to the more efficient volume.

To move a domain and its fileset to a new volume:

Make a new domain on the new device. It must have a temporary new name.

Create a fileset with the same name as the old.

Create a temporary mount-point directory for the fileset.

Mount the new fileset on the temporary mount point.

Use the vdump command to copy the fileset from the old device. Use the vrestore command to restore it to the newly mounted fileset.

Unmount the old and new filesets.

Rename the new domain to the old name. Since you have not changed the domain and fileset names, it is not necessary to edit the /etc/fstab file.

Mount the new fileset using the mount point of the old fileset. The directory tree is then unchanged. Delete the temporary mount-point directory.

If you have more than one fileset in your domain, follow steps two through eight for each fileset.

If you are running operating system software Version 5.0 or later, the new domain is created with the new DVN of 4 (see Section 1.6.3). However, if you must retain the DVN of 3 in order to use earlier versions of the operating system, see mkfdmn(8). The vdump and vrestore utilities are not affected by the change of DVN.

The following example moves the domain accounts with the fileset technical to volume dsk3c. The domain new_accounts is the temporary domain and is mounted initially at /tmp-mnt. Assume the fileset accounts#technical is mounted on /technical. Assume that the /etc/fstab file has an entry instructing the system to mount accounts#technical on /technical.

# mkfdmn /dev/disk/dsk3c new_accounts
# mkfset new_accounts technical
# mkdir /tmp_mnt
# mount new_accounts#technical /tmp_mnt
# vdump -dxf - /technical|vrestore -xf - -D /tmp_mnt
# umount /technical
# umount /tmp_mnt
# rmfdmn accounts
# rmdir /tmp_mnt
# mv /etc/fdmns/new_accounts/ /etc/fdmns/accounts/
# mount accounts#technical /technical

4.12 Migrating Files to Different Volumes

If you have the optional AdvFS Utilities, you can use the migrate utility to move heavily accessed or large files to a different volume in the domain. The balance and defragment utilities also migrate files but are not under user control. You can specify the volume where a file is to be moved or allow the system to pick the best space in the domain by using the migrate command. You can migrate either an entire file or specific pages to the same or a different volume. Figure 4-2 illustrates the migration process.

Figure 4-2: Migrating Files

To move an entire file to a specific volume, use the migrate -d command:

migrate -d destination_vol_index file_name

A file that is migrated is defragmented in the process if possible. This means that you can use the migrate command to defragment selected files.

The following restrictions apply to the migrate utility:

You must have root user privilege.

You can perform only one migrate operation at a time on the same file.

When you migrate a striped file, you can migrate from only one volume at a time.

The migrate utility does not evaluate your migration decisions. For example, you can move more than one striped file segment to the same disk, which defeats the purpose of striping the file.

4.12.1 Choosing to Migrate

Choose the migrate utility over the balance utility when you want to control the movement of individual files. The balance utility moves files only to optimize distribution. For example, it might move many small files when moving a single larger file would be a better solution for your system.

Choose the migrate utility over the defragment utility when you want to defragment an individual file. If you have a large enough contiguous area on disk, you can migrate the file to that area to defragment it.

You can use the showfile -x command to look at the extent map and the performance of a file. A low performance rate (less than 80%) indicates that the file is fragmented on the disk. The extent map shows whether the entire file or a portion of the file is fragmented.

4.12.2 Migrate Example

The following example displays the extent map of a file called src and migrates the file. The file, which resides in a two-volume domain, shows a change from 11 file extents to one and a performance efficiency improvement from 18% to 100%.

# showfile -x src
    Id Vol PgSz Pages XtntType  Segs  SegSz  I/O  Perf  File
8.8002   1   16    11   simple    **     ** async  18%  src
             extentMap: 1
        pageOff    pageCnt     vol    volBlock    blockCnt
              0          1       1      187296          16
              1          1       1      187328          16
              2          1       1      187264          16
              3          1       1      187184          16
              4          1       1      187216          16
              5          1       1      187312          16
              6          1       1      187280          16
              7          1       1      187248          16
              8          1       1      187344          16
              9          1       1      187200          16
             10          1       1      187232          16
        extentCnt: 11

# migrate -d 2 src
# showfile -x src
    Id Vol PgSz Pages XtntType Segs SegSz  I/O  Perf  File
8.8002   1   16    11   simple   **    ** async 100%  src
   extentMap: 1
      pageOff    pageCnt     vol    volBlock    blockCnt
            0         11       2       45536         176
      extentCnt: 1

The src file now resides on volume 2 of the domain, consists of one file extent, and has a 100% performance efficiency. Note that in the command output, the first data line of the display lists the metadata. The metadata does not migrate to the new volume. It remains in the original location. The extentMap portion of the display lists the file's migrated pages.

4.13 Striping Files

You can stripe, that is, distribute, files across a number of volumes. This increases the sequential read/write performance because I/O requests to the different disk drives can be overlapped. Virtual storage solutions, such as LSM, RAID, and storage area networks (SAN), stripe all files and are usually configured at system setup. AdvFS striping is applied to single files and can be executed at any time.

Note

Use AdvFS striping only on directly attached storage that does not include LSM, RAID, or a SAN. Combining AdvFS striping with system striping might interfere with optimal placement and cause system degradation.

The AdvFS stripe utility distributes stripe segments across specific disks (or volumes) of a domain. You must have the AdvFS Utilities to run this command. The stripe width is fixed at 64 KB, but you can specify the number of volumes over which to stripe the file.

The form of the stripe command is:

stripe -n volume_count
filename

To stripe a file:

Create a new, empty file.
If you do not create an empty file, the following error message is displayed:
```
stripe: advfs_set_bf_attributes failed-ENOT_SUPPORTED (-1041)
```

Stripe it across the number of volumes desired.

Copy the data from the original file to the striped file.

Delete the original file and rename the striped file, if desired.

As the file is appended, AdvFS determines the number of pages per stripe segment; the segments alternate among the disks in a sequential pattern. For example, the file system allocates the first segment of a two-disk striped file on the first disk and the next segment on the second disk. This completes one sequence, or stripe. The next stripe starts on the first disk, and so on. Because AdvFS spreads the I/O of the striped file across the specified disks, the sequential read/write performance of the file increases.

You cannot use the stripe utility to modify the number of disks that an already striped file crosses or to restripe a file that is already striped. To change the configuration of a striped file, you must create a new file, stripe it, then copy the original file data to it.

You cannot use AdvFS stripe utility on the /etc/fstab file.

4.13.1 Choosing to Use AdvFS Striping on a File

Before you use the stripe utility, run the iostat utility either from the SysMan Monitoring and Tuning - View Input/Output (I/O) Statistics utility, or from the command line to determine if disk I/O is causing the bottleneck. See iostat(1) for more information. Cross check the blocks per second and transactions per second with the drive's sustained transfer rate. Maximum stripe performance can be achieved if each stripe disk is on its own disk controller.

Using AdvFS striping when system-wide striping is in effect might degrade performance.

4.13.2 AdvFS Stripe Example

The following example creates an empty file, file_1, stripes it, copies data from the original file, orig_file_1, into the striped file, then displays the extents of the striped file:

# touch file_1
# ls -l file_1
-rw-r--r-- 1 root  system 0 Oct 07 11:06 file_1
# stripe -n 3 file_1
# cp orig_file_1 file_1
# showfile -x file_1

     Id Vol PgSz Pages XtntType Segs SegSz I/O   Perf File
7.8001   1   16    71   stripe    3     8 async 100% file_1
  extentMap: 1
     pageOff   pageCnt   volIndex  volBlock   blockCnt
           0         8          2     42400        384
          24         8
          48         8
     extentCnt: 1
  extentMap: 2
     pageOff   pageCnt   volIndex   volBlock   blockCnt
           8         8          3      10896        384
          32         8
          56         8
     extentCnt: 1
 
  extentMap: 3
     pageOff   pageCnt   volIndex   volBlock  blockCnt
          16         8          1     186784       368
          40         8
          64         7
     extentCnt: 1

4.13.3 Removing AdvFS Striping

You can alter the pattern of striping in your domain by:

Removing striping from a file
Copy the striped file to a file that is not striped. Delete the original.

Removing a striped volume
If you remove a volume that contains an AdvFS stripe segment, the rmvol utility moves the segment to another volume that does not already contain a stripe segment of the same file. If all remaining volumes contain stripe segments, the system requests confirmation before the segment is moved to a volume that already contains a stripe segment of the file. To retain the full benefit of striping when this occurs, stripe a new file across existing volumes and copy the file with the doubled-up segments to the new file.

4.14 Controlling Domain Panic Information

The AdvfsDomainPanicLevel attribute allows you to choose whether to have crash dumps created when a domain panic occurs. Values of the attribute are:

0 - Create crash dumps for no domains.

1 - Create crash dumps only for domains with mounted filesets (default).

2 - Create crash dumps for all domains.

3 - Promote the domain panic to a system panic. The system crashes.

See sysconfig(8) for information on changing attributes. See Section 5.8.6 for information about recovering from a domain panic.